Materials Genome Initiative: materials data · universallyaccepted best practices. Under the MGI,...

34
NISTIR 8038 Materials Genome Initiative: Materials Data Charles H. Ward James A. Warren This publication is available free of charge from: http://dx.doi.org/10.6028/NIST.IR.8038

Transcript of Materials Genome Initiative: materials data · universallyaccepted best practices. Under the MGI,...

Page 1: Materials Genome Initiative: materials data · universallyaccepted best practices. Under the MGI, Federal agencies have funded research and development designedtoexplore, develop,

NISTIR 8038

Materials Genome Initiative Materials Data

Charles H Ward James A Warren

This publication is available free of charge from httpdxdoiorg106028NISTIR8038

NISTIR 8038

Materials Genome Initiative Materials Data

Charles H Ward Materials and Manufacturing Directorate

Air Force Research Laboratory

James A Warren Material Measurement Laboratory

National Institute of Standards and Technology

This publication is available free of charge from httpdxdoiorg106028NISTIR8038

January 2015

US Department of Commerce Penny Pritzker Secretary

National Institute of Standards and Technology Willie May Acting Under Secretary of Commerce for Standards and Technology and Acting Director

13

Workshop Summary

Materials Genome Initiative Materials Data

15 -shy‐ 16 July 2014

Dayton Ohio

ii

13

DISCLAIMER Certain commercial equipment instruments software or materials are identifiedin this paper in order to specify the procedures of experiments and simulations adequatelySuch identification is not13 intended to imply recommendation or endorsement13 by the NationalInstitute of Standards and Technology nor is it13 intended to imply that13 the software materials13 instruments or equipment13 identified are necessarily the best13 available for the purpose

iii

13

Introduction

key objective for the Materials Genome Initiative (MGI) is access to digital13 materials dataAccess may be for an13 individual a team company or an entire community13 of interest There are

several challenges13 to providing access13 to digital materials13 data for which there are presently no

universally13 accepted best practices Under the MGI Federal agencies have funded research and

development designed13 to13 explore develop and apply MGI methodologies Several of these13 efforts either focus on an infrastructure13 for sharing13 data or emphasize13 data sharing as a key13 requirement of the

overall technical effort One objective of starting several MGI-shy‐focused programs concurrently was that13 the approaches

lessons learned and best practices would be13 shared between programs in order to accelerate13 the

overall development of a materials innovation13 infrastructure (MII) This workshop13 assembled

representatives from Federally funded efforts where a critical component13 of13 the overall research effort13 involves sharing of materials data in digital13 format13 The intent of this workshop was for13 the participants to provide insights into the goals of13 their13 effort details of13 their13 approach to sharing digital materials data challenges they face and13 lessons learned This exchange13 of ideas will not only enhance13 the

likelihood for success of the individual13 efforts but will13 also speed the communityrsquos understanding ofhow best to13 proceed13 in13 developing an13 MII

During the workshop participants were asked to identify the top three next steps needed to be

taken within the materials community13 and by13 government agencies in addressing the challengesassociated with digital materials data The following is list of priorities13 derived from those inputs for13 each sector

Communitybull Develop and deploy standards for data and13 federatedcollaborative environments bull Communicate value and13 need13 for digital materials databull Define critical data to be compiled

bull Engage Community

bull Explore and leverage data13 solutions developed outside materials community

bull Train students in these emerging areasbull Establish community norms for publishing materials data

Governmentbull Lead data strategyapproach developmentbull Incentivize data depositionbull Facilitate13 data13 deposition (policy and technology) bull Support data13 sharing and deposition (provide resources) bull Support data13 creation

bull Enhance data13 management competency

bull Provide13 quality datasets for model developmentvalidation

1

13 2

13

MGI Materials Data Workshop Organizing Committee

Air Force Research Laboratory (AFRL)

Dr Charles13 Ward amp Mr Clare Paul

Army Research Laboratory (ARL)

Dr John Beatty amp Mr Wayne Ziegler

Energy Efficiency and Renewable Energy Department of13 Energy (DOE-shy‐EERE)

Mr William13 Joost

National Aeronautics and Space Administration (NASA)

Dr Stephen Smith amp Dr Dennis13 Griffin

National Institute of Standards13 and Technology (NIST)

Dr James Warren

Office of Naval Research (ONR)

Dr William13 Mullins amp Dr Kenny Lipkowitz

3

13

AgendaMGI Materials Data Workshop

Tec^Edge 5000 Springfield Street Dayton OH

1 -shy‐ 16 July 2014

Tuesday 074 ndash 0800 Check-shy‐in

080 ndash 0815 Chuck Ward13 (AFRL) ndash Introductory remarks

Discussion Lead Will Joost (DOE-shy‐EERE)

081 ndash 0845 Jim Warren (NIST)13 ndash Data Infrastructure

084 ndash 0915 John Allison Brian13 Puchala (Univ Michigan)13 ndash PRISMSALMMII

091 ndash 0945 Carrie Campbell (NIST) ndash ChiMaD

094 ndash 1015 Discussion

101 ndash 1030 Break

Discussion Lead John Beatty (ARL)

103 ndash 1100 Marco Fornari (Central Michigan Univ)13 -shy‐ AFLOWLIBorg

110 ndash 1130 Surya13 Kalidindi (Ga13 Tech) ndash Mosaic of MicrostructureIGERT-shy‐CIF21

113 ndash 1200 Discussion

120 ndash 1230 Lunch

Discussion Lead Bill Mullins (ONR)

123 ndash 1300 Tom Searles (MDMi) ndash Cast Aluminum

130 ndash 1320 Dongwon Shin (Oak Ridge13 National Laboratory ORNL)13 ndash Cast Aluminum

132 ndash 1340 Jake Zindel (Ford Motor13 Co) ndash Cast Aluminum

134 ndash 1415 Discussion

141 ndash 1430 Break

Discussion Lead Jim Warren (NIST)

143 -shy‐ 1500 Jason Sebastian (Questek)13 ndash Cast Iron

150 ndash 1530 Lou Hector (General Motors) ndash Advanced13 Steel

153 ndash 1600 Lara Liou (GE Aviation)13 -shy‐ Integrated Computational13 Methods for Composite Materials

160 ndash 1700 Discussion

1700 Adjourn13 for the day

4

13

Wednesday

074 ndash 0800 Check-shy‐in

Discussion Lead Wayne Zeigler (ARL)

080 ndash 0830 Terry Wong (Rocketdyne) ndash Nickel RS FEP

083 ndash 0850 Mike Glavicic (Rolls Royce) ndash MAI Ti Modeling

085 ndash 0910 Rajiv Naik (Pratt amp Whitney) ndash MAI Data Informatics Data Standards Modeling amp UQ

091 ndash 0940 Discussion

094 ndash 0955 Break

Discussion Lead Dennis Griffin (NASA)

095 ndash 1015 Ro Gorham (National Center for Defense Manufacturing and13 Machining NCDMM)13 ndashAmerica Makes

101 ndash 1045 Richard13 Serwecki (NASA) -shy‐ MAPTIS

104 ndash 1105 Scott Henry (ASM International) -shy‐ Computational Materials Data Network

110 ndash 1135 Discussion

113 ndash 1200 Lunch

Discussion Lead Clare Paul (AFRL)

120 ndash 1230 Ben13 Blaiszik (Univ of13 ChicagoANL)13 -shy‐ Globus

123 ndash 1300 Mike Groeber (AFRL) ndash DREAM3DSIMPL

130 ndash 1320 Sam Chance13 (iNovex ) -shy‐ Semantics for Deep Interoperability

132 ndash 1340 Harlan Shober (RJ Lee) -shy‐ Big Data and13 Semantic Technologies

134 ndash 1400 Tim Hawes (Decisive Analytics Corp) ndash Natural Language Processing

140 ndash 1445 Discussion

144 ndash 1500 Wrap up

1500 Adjourn

5

13

Presentation13 Summaries

Welcome13 and Introduction Chuck Ward Air Force Research13 Laboratory

The primary intent of the workshop13 is to promote communication between the groups being

funded under13 the Materials Genome Initiative that13 had significant13 interest13 in the management13 of13 materials data The greater awareness is expected to enhance the likelihood13 for success of the

individual13 efforts and speed13 the communityrsquos understanding of how best to13 proceed13 in13 developing an13 MII

The Materials Research Society (MRS) and the Minerals Metals and Materials Society (TMS)conducted a survey13 of the community13 in MayJune of 2013 to gauge13 attitudes and needs in materials data Over 650 people took the survey13 with 74 of respondents saying they would be willing to share

data if it were encouraged13 as a condition13 of funding or publishing The community showed13 the most interest in gaining access to databases of some of the most basic and essential13 materials data such asthermo-shy‐physical properties

The survey asked participants to evaluate advantages of local versus centralized databasesCentralized13 repositories were perceived13 by respondents to13 offer advantages in13 cost an13 ability to13 share

onersquos data provisioning of uniform data standards and13 an13 ability to13 organize and13 find13 data Localrepositories were perceived to provide advantages in ease of data input time efficiency and flexibility inaccommodating new data13 types

The survey also asked participants to provide their thoughts on impediments to and motivators for13 sharing data Most13 of13 the impediments can be attributed13 to13 concerns of ownership13 of data orexternal data13 restrictions while technical issues such as file size and proprietary formatting were13 also

concerns Overall participants saw the ability to contribute data to the community for reuse and

critique positively The highest motivator was13 to enhance onersquos13 own research visibility Interestingly

respondents saw adding metadata as a motivator presumably due to13 the perceived13 higher value13 of data13 when it is13 described more robustly

6

13

MGI Efforts in Infrastructure Development13 at NISTJim Warren National Institute of13 Standards and Technology

The presentation provided an overview of NIST activities supporting the creation13 of a materials data infrastructure The infrastructure could13 start by defining the interfacesAPIs (Application Programing Interfaces) with which user is able13 to interact for13 the deposit13 or13 retrieval of13 data Beyond13 an interface the following are essential elements for creation of viable infrastructure

bull The establishment of repositories to13 store the databull Tools to enable digital capture of the information preferably at the time of creation

(computation or13 experiment) bull Tools to enable markup of the data13 with sufficient metadata13 to inform someone else how the

data was created as well as various attributes that would help the user judge the quality of theinformation

bull Assignment of a persistent digital identifier (like the DOI (Digital Object13 Identifier) for13 journals)13 so the data can be cited and discovered by other

bull The registration of the availability of the data into13 some sort of ldquoregistryrdquo to13 enable discovery without prior knowledge of the existence of the repositoryspecific data features

Additionally creation of new policy and13 standards as well as development of tools to13 evaluate the quality of13 the data and tools to13 enable data-shy‐driven13 discovery will be required to manage and use materials data to the greatest extent possible

NIST is responding to the OSTP (Office of13 Science and Technology and Policy) memos (Dr John Holdren 2 Feb 14) by developing practices that will be of use to13 the community including a Common13 Access Platform allowing access across multiple collections of information13 Research13 and13 Development of Persistent Identifier (PID) infrastructure with13 the Research Data Alliance (RDA) and RampD o Data Type Registries with13 RDA

Finally NIST is collaborating with the National Data Service (based at the National Center for Supercomputing Applications) and is developing a set of best practices for Data Management Plans

7

13

Th Materials Commons A Novel Information13 Repository and13 Collaboration13 Platform for the13 Materials CommunityBrian13 Puchala Glenn13 Tarcea Stravya13 Tamma Emmanuelle Marquis John13 Allison13 University of Michigan

Critical to13 accelerating the pace of materials science and13 development is the development ofnew and13 improved13 infrastructure providing a seamless way for materials researchers to13 share and13 use

materials data and models To address this need we are developing the Materials Commons (httpprismsenginumicheduprismsmaterialscommons)13 an information repository andcollaboration platform for the metals13 community13 in selected technical emphasis13 areas We envision the

Materials Commons becoming a continuous seamless part of the scientific workflow13 processResearchers will upload13 the results of experiments and13 computations as they are performedautomatically where13 possible along with the13 provenance13 information describing the13 experimental and

computational processes By13 associating this13 data with the experimental and virtual computational materials samples from13 which it is obtained the Materials Commons will build process-shy‐structure-shy‐property relationships enabling the construction13 and13 validation13 of constitutive and13 process models The

Materials Commons website provides an easy-shy‐to-shy‐use interface for uploading and13 downloading data and13 data provenance searching and sharing data At its core the13 Materials Commons consists of secure13 data storage cluster an13 application13 for efficiently uploading and downloading large data sets and a REST

(REpresentational State Transfer) based13 API to13 access and13 extend13 the capabilities of the repository TheAPI allows for features such13 as automated13 data upload13 from experiments and13 computations seamless integration of13 computational models and algorithmic analysis of13 process-shy‐structure-shy‐propertyrelationships The Materials Commons is a central thrust13 of13 the Center13 for13 PRedictive Structural Materials Science (PRISMS)

NIST Materials Data Informatics EffortsCarrie Campbell National Institute of Standards an Technology

The presentation covered NISTrsquos pilot efforts in13 creating a data infrastructure focusing on Phase-shy‐Based13 Property Data Phase-shy‐based data13 are13 diverse13 data13 that13 include 1 2 3 and 4-shy‐dimensional13 data are13 semi-shy‐structured and often include incomplete data sets The figure below provides anoverview of NISTs data efforts supporting the MGI

8

13

NIST Materials Data Repository (httpsmaterialsdatanistgovdspacexmlui) is a customizedDSpace repository for materials that enables sharing of variety of data types including text13 files images and video The repository also provides persistent identifier for each entry and allows users to choose

from a variety of13 license types NIST is also working on two data curation tools for phase-shy‐based13 dataThe Thermodynamic Research13 Center (httptrcnistgov) is extending the ThermoML markup languageand Guide13 Data13 Capture13 software13 tool to metallic systems13 with an initial emphasis on phase equilibriaand thermochemical data The13 Materials Data13 Curation System (MDCS)13 employs user-shy‐defined13 XML-shy‐based13 schemas to curate13 data The13 MDCS13 allows either web-shy‐base data curation13 or using REST API The

entered data13 is stored in a non-shy‐relational database repository can be searched using SPARQL (SPARQL Protocol and RDF Query Language) interface13 and integrated with variety of scientific workflow13 tools13

AFLOW ConsortiumMarco Fornari Michigan13 State University

The overall goal of the Materials Genome Initiative (MGI) is to accelerate the discovery13 development and13 deployment of new materials with ad hoc functionalities The AFLOW Consortium

focuses on the software infrastructures to achieve the MGI goals by using first13 principles calculations The emphasis is on the data13 that must be generated archived validated and post-shy‐processed13 efficiently13 both13 to13 identify new materials and13 distill meaningful quantitative relationships within13 the data This extends the13 solution to the13 inverse13 problem of chemically replacing13 rare13 and costly elements in criticaltechnologies

The AFLOW Consortium provides access to the repository AFLOWLIBORG

lthttpAFLOWLIBORGgt13 that publicly distributes data13 on energetics and thermodynamic stability for more than 600000 compounds and band structures for roughly 300000 materials The repository

includes bulk crystals and alloys13 Data were computed with the AFLOW high-shy‐throughput13 framework

using density functional theory The results of the calculations can13 be queried13 with13 the recentlydeveloped13 REST-shy‐API that standardizes and13 labels the data for retrieval purposes The13 fields defined in

the AFLOW REST-shy‐API include a material object identifier (AUID) data cloud13 location13 (AURL) ownership13 9

13

and sponsorship keywords and data13 specific labels (COMPOSITION ENTROPY EGAP hellip) Simple13 scripting languages13 can be used to automatically download13 from the AFLOWLIB13 repository The AFLOW

approach has13 been extensively applied on Platinum-shy‐group alloys thermoelectric materials magnetic materials topological insulators scintillators etc The AFLOW REST-shy‐API can13 be extended13 to13 includeexperimental data For further details on the13 database the13 reader is directed tohttpmaterialsdukeeduauroAUROARTICULAjcommatsci201202002pdf

Materials Innovation Infrastructure Lessons Learned and Future PlansSurya Kalidindi Georgia Institute13 of Technology

central activity in13 materials research13 is the exploration of processing-shy‐structure-shy‐property (PSP) linkages13 Workflow is defined13 as the sequence13 of steps employed for establishing13 PS linkages ofinterest to any specific engineeringtechnology application

High Throughput Explorations of PSP Linkages13 We must treat13 hierarchical13 material as acomplex System which means embracing and exploiting uncertainty quantification13 and13 complexity

management conceptstools13 It also means we need to design develop and13 validate decision support systems13 that will leverage the best current understanding (with13 its uncertainties) and13 provide objective13 guidance13 on future13 effort investment (combination of experiments and simulations)13 This meansleveraging Data Science to effectively feed the decision support systems described above by developingand deploying high13 throughput strategies for obtaining high value information from both experimentsand models

What are integrated13 workflows Integrated workflows are those that utilize the best13 combination of experiments13 and simulations13 in extracting robust and reliable PSP linkages13 This requiresengaging and exploiting cross-shy‐disciplinary expertise that includes materials science manufacturingsystems approaches uncertainty quantification computational science data and information sciencesThrough this one must ensure that the workflows output the critical information needed13 by design13 and13 manufacturing stakeholders in the materials development value chain13 Integration demands expertise

sharing13 which essentially is the core element of a collaboration Cross-shy‐disciplinary Integration13 demands13 effective13 collaborations13 such as those one might set up using the modern tools of the day13 ie a e-shy‐Collaboration

10

13

Motivators for e-shy‐Collaborations13 e-shy‐Collaborations are central to13 enhancing the productivity ofmaterials development activities in several ways

bull individual13 research groups can identify and engage13 complementary expertise13 needed for their projects

bull organizations can13 coordinate multiple teams and13 multiple projects to13 maximize outputimpactbull organizations can13 identify and13 seek the right partners that13 enhance their13 individual and combined

value e-shy‐Collaborations can13 be open semi-shy‐open or closed13 depending o the needs of the specific organization Productivity enhancement leading to dramatic reductions in cost and time13 expended in materials development is the13 main incentive13 for e-shy‐collaborations

Main Impediments to e-shy‐CollaborationsLack13 of tools to identify13 the13 ldquobestrdquo13 collaborators

bull Need software tools that quantify more precisely the expertise of individuals groupsorganizations

bull Need software tools for13 allowing rich annotations and e-shy‐recording of13 discussions Need analytic tools for13 identification extraction13 and management of metadata (ie13 high13 value abstractionsfrom the raw data)

bull Need searchable semantic metadata databases (these are13 lightweight and can be13 centralized

and customized for specific13 user communities) bull Need software tools for tracking provenance (ie13 whowhatwhenhow13 generated modified

etc) Lack13 of experience13 with the13 use13 of integrated workflows

bull Need software tools13 for e-shy‐recording of13 workflows and development of searchable workflow

databases bull Need recommendation schemes that suggest the best workflow(s) based on recorded prior

experience this provides means to transfer expertise13 acquired by an organization to other teams and other13 projects (even in the same organization)

bull Leads to possible automation and dramatic enhancement of productivity

MINED Grouprsquos Past Experience13 (~2005 -shy‐ ~2008) Funded by the Army Research Office ARO(David Stepp) engaged computer science (CS) graduates to produce13 a single13 user platform that13 integrated certain software codes previously produced13 by our group software development subsequently abandoned but recruited two CS graduates into PhD programs in Materials Science andEngineeringMechanical Engineering (MSEME)13 which dramatically altered the direction of13 research in

the group brief chronology of development bull Senior Design Project (2011-shy‐2012) An online13 portal designed for sharing data13 files and tools

learned the importance of user interfaces and the need for professional13 code developers andsoftware architects

bull HUBzero Instantiation (2012-shy‐2013) Built hub by customizing HUBzero for hierarchical structuralmaterials there are several advantages in terms of accessing and executing codes13 (on the server side) but there are significant barriers13 to establishing fast and productive13 cross-shy‐disciplinary e-shy‐collaborations

11

13

bull GTRI Integration Platform (2013-shy‐2014) Engaged professional code13 developers and software13 architects to design13 a multi-shy‐user web13 portal that allows data and13 code sharing while tracking

provenance and workflows underestimated the resources needed for such a project bull MATIN Collaboration Environment (2013-shy‐2014) Lightweight integrated software13 shell that allows

users to13 seamlessly interface with13 numerous cloud13 services offered13 through13 the web13 through13 their13 APIs Examples include GitHub for versioning and archival of codes Authorea forcollaborative editing of documents figshare for citable publication of research Plotly forcollaborative data analysis13 and visualization Google+ and LinkedIn for eteaming13 and networking MATIN focuses on building an emergent community ndash Materials amp Manufacturing Informatics (MMI) MATIN provisions web services for13 software integration platforms and innovationinfrastructure Examples of MATIN13

bull Spatial Statistics httpsgithubcomtonyfastSpatialStatisticsFFT

bull PyMKS httpopenmaterialsgithubiopymksindexhtml bull Example discussion about periodic boundary conditions in running Finite Element (FE or FEM)

simulations httpsgithubcomsfepysfepyissues267

In the platform developed it is noted that the user or userrsquos organization can set permissions at the

level13 of individual datasetscodes o the degree and13 type of13 sharingCentral Challenge Activation The transition from traditional disciplinary workflows to the

integrated workflows needs to overcome an activation barrier There is no13 clear incentive to13 get ahead13 of the curve in13 anticipating how the upcoming transformations in how materials development will be

pursued13 in13 the future Open (reproducible) science alone is not a sufficiently strong motivator for materials development community with13 strong interests in intellectual13 property development13 indeed itmight actually13 be perceived13 as a negative There is need to explore13 and refine13 what incentives work for individual13 researchers (eg increased productivity)

Future13 Plans13 Continue to13 develop13 and13 deploy MATIN at GT and any other organizationsgroups that13 express an interest in working with us Design create and distribute13 suite13 of ldquomobile13 firstrdquo Appsdesigned13 to13 make data ingest into13 collaboration environments such as MATIN from both material characterization equipment and multi-shy‐scale models an enhanced13 user experience with13 friendly andnatural interfaces Proactively promoteincentivize13 datametadata13 ingest into publicly shareddatabases Work with13 professional societies to13 announce and run new competitionsawards for13 recognizing and rewarding good practices by the members of the broader materials innovation13 usercommunity (eg Best13 Documented Data Award Best Documented Tool Award Best Documented

Workflow Award) Develop and deliver new workshops for training the new generation of13 graduate

students postdocs and junior faculty in the13 emerging workflows of Materials amp ManufacturingInformatics

High Performance Cast Aluminum Alloys for Next Generation Passenger Vehicle EnginesDongwon Shin Materials Science and Technology Division Oak Ridge National Laboratory

12

13

Oak Ridge National Laboratory (ORNL) teamed with Chrysler and Nemak13 to develop next-shy‐generation high-shy‐performance cast aluminum alloys that have13 improved castability high-shy‐temperature

strength and fatigue13 performance At this point the13 data management practice is limited13 to13 analytical data spreadsheets other than13 the CALPHAD computational thermodynamic databases and13 proprietary

property databases for casting simulation As the project goes on more robust data schema will bedeveloped13 to13 store various types of data populated13 from the experiments and13 calculations

Besides the data structure design the speaker discussed13 the necessity of a new approach13 to13 identifying key descriptors from the large dataset that can guide the design of new materials An alloydesign13 requires the balance of extremely complex relationships among a large number of descriptors Some13 examples of these13 descriptors include13 phase fraction13 of key intermetallics lattice parameters ofthe matrix and diffusion13 kinetics13 of solute atoms However optimizing these quantities relies on good

physical metallurgy intuition13 coupled13 with13 clever experiments dogged13 sleuthing and13 a healthy dose ofluck13 Such descriptor-shy‐based predictions of materials properties date back to13 the beginning13 of modern

solid-shy‐state chemistry and physics As13 shown in the Darken-shy‐Gurry map to predict the solubility ofelements in Tantalum13 intuitive descriptors13 such as electronegativity and radius13 could be used formaterials design However as the chemistry and physics13 of materials become more complex intuition-shy‐based13 search13 may not work anymore Thus the use of more advanced13 statistical analysis and13 datamining approach might be required to accelerate the design of complex multi-shy‐componentmulti-shy‐phase

alloy design13

Data Related to Cast Aluminum

Jake Zindel Ford Motor Company

The main focus of our research is the development of CAE (Computer-shy‐aided Engineering) tools to support13 two significant13 activities in the company component13 design (product13 development)13 and

manufacturing engineering (high-shy‐volume production) A key13 part of our research is conducting13 experiments to collect data13 for the13 development and validation of13 the CAE tools

The data13 we collect consists of mechanical properties physical properties microstructuralimages and manufacturing process parameters that form the13 pedigree13 of the13 material This information

can be in the form of Excel files proprietary-shy‐software storage files screen shots and written reports Our immediate need is to deal with this large amount of existing information collected over

many years on a ldquolocalrdquo level The first step is to organize many independent collections of informationin all13 their forms13 The next issue is the time required to enter all13 this information into the database13 Maintenance may also require a significant amount of effort Lastly there will also13 be a need13 to13 manipulate the data contained in the database using Excel Mini-shy‐tab MATLAB etc Therefore links or13 easy exportation of data13 to these13 types of applications will be13 necessary

The next level would involve dealing with information that has been created or collected byothers From an13 OEM (Original Equipment13 Manufacturer)13 perspective this would13 involve convincing

suppliers13 to share processing information that they may consider proprietary In addition to suppliersuniversities and13 national laboratories are also13 sources of data Data from all the13 sources will require13 some sort of protocol to determine its13 quality and resolve inconsistencies

13

13

In the end13 it seems likely that considerable budget and time will be required to implement comprehensive informationdatabases which13 could be the largest hurdle13 of them all

QuesTekrsquos MGIICME Approach to Alloy Optimization and Design (Including Cast Iron) Jason Sebastian QuesTek Innovations LLC

Integrated Computational Materials Engineering (ICME) methods involve the holistic application13 of different computational models across various length13 scales to13 the design development and13 rapid13 qualification13 of advanced13 materials Using itsMaterials by Design technologies QuesTek has successfully applied these ICME methods to13 the design development13 and full qualification of several advanced alloys (including aerospace13 structural steels aerospace13 gear steels advanced castable13 titanium alloys high-shy‐strength corrosion-shy‐resistant13 aluminum alloys13 and high-shy‐performance cast iron)

QuesTekrsquos ICME methods benefit tremendously from the activities of the US Materials GenomeInitiative (MGI)13 Recognizing that the ldquo10 to13 2 yearsrdquo that13 it takes for alloy design and development ishindering the rapid13 deployment of higher-shy‐performance materials13 the MGI is focused on ldquocutting in halfrdquo the time and cost13 to design and deploy novel advanced materials needed to enhance and sustain UScompetitiveness

QuesTek also has ongoing13 activities on the13 application of ICME methods to the13 design and

development of materials for the energy sector including advanced13 materials that enable vehicle

lightweighting13 Of particular note is an ongoing effort with a major diesel13 engine manufacturer on the13 development of high-shy‐performance cast iron13 alloys for engine block applications QuesTekrsquos efforts have

made extensive use of various MGI-shy‐type tools and databases including 1)13 density functional theory

methods for high-shy‐throughput13 screening of cast iron inoculant stabilities (structure lattice parameterstability of various13 oxidesulfide phases) 2)13 liquid iron mobility databases (to help accurately modelsolidification phenomena) and 3) thermodynamic13 databases13 (to examine nano-shy‐precipitate stability)

Moving forward to continue to accelerate the materials development cycle efforts shouldremain focused on the development13 of13 the fundamental databases (thermodynamic kinetic etc)13 that13 enable13 ICME models There13 should also be13 increased focus o efforts related13 to13 the integration13 of the

various computational databases models and tools (the ldquoIrdquo13 in ICME)

Integrated Computational13 Materials Engineering (ICME) of Generation Three Advanced High Strength

Steels Louis G Hector Jr General Motors RampD Center

While traditional material development methods often involve laborious trial-shy‐and-shy‐error testing Integrated Computational13 Materials Engineering (ICME) which falls under the umbrella of the Materials Genome Initiative is offering a means13 for materials13 development under a more compressed time scale Advanced13 high13 strength13 steels (AHSS) which13 contain13 multiple phases (eg ferrite bainite pearlite austenite martensite) and may exhibit13 phase transformation with strain are fertile ground for13 ICME However prediction of macro-shy‐scale constitutive behavior based upon the multi-shy‐scale physical chemical and mechanical phenomena13 in these13 complex microstructures is formidable challenge for ICMEUnprecedented collaboration between universities industry and government labs will be required to

14

13

address AHSS13 development archiving of data and the13 implementation of these13 materials into products that13 benefit13 the American consumer This presentation began with a brief overview of a newDOEUSAMP (US Automotive Materials Partnership)-shy‐funded ICME program aimed at13 the development13 of Generation13 Three (Gen13 3) AHSS The lack of commercial Gen13 3 steels that meet DOE targets is offering13 significant challenges13 to this13 new ICME program The program is13 meeting these challenges13 following four thrusts (1)13 making Gen 3 steels to meet DOE targets (2) passing experimental measurements of key

properties to13 an13 ICME model that produces microstructure-shy‐based13 constitutive models (3) materialevaluation in forming13 and vehicle13 performance13 simulations (4) archiving13 all relevant data13 for easy use13 by

others in13 the program Although13 the Gen13 3 AHSS ICME program is ldquomission13 orientedrdquo it demonstrates an important role for universities in providing the necessary foundation in fundamental materialsscience and engineering

Integrated Computational13 Methods for Composites Materials (ICM2)13 Lessons Learned from

Integration Planning and Feasibility Demonstrations Lara Liou GE Aviation

General Electric Aviation and Lockheed Martin Aeronautics have teamed through an Air Force

Research13 Lab13 program to13 conduct a demonstration13 of an13 Integrated13 Computational Materials Engineering digital framework that links material processing property and13 structural relationships to13 account for processability manufacturability and system performance13 with the13 goal of demonstrating

that13 the usage of13 integrated models can contribute to future airframe and engine designs with dramatic reductions in development13 time and cost Both engine and airframe Foundational Engineering Problems will be demonstrated through a common digital framework to be developed and validated onincreasingly complex articles13 The modeling tools that13 cover composite process models localized13 mechanical behavior models and global design models (Convergent Manufacturing TechnologyrsquosCOMPRO CCA Autodeskrsquos Autodesk Simulation13 Composite Analysis and13 an13 Abaqus based13 University ofMichigan micromechanics model) are centrally linked via a commercially available model integration

interface (Phoenix Integrationrsquos ModelCenterreg)13 Detailed plans of what data will13 be passed among thevarious models as well as how it will be passed have been completed Creation of these detailed13 process maps and13 simple feasibility demonstrations have raised13 inherent technical and13 logistical challenges13 to integrating commercially13 available multi-shy‐scale models Robust model integration methods13 to address these challenges were13 discussed

The exercise of drawing up detailed process maps for what to integrate and how to integrate13 the selected tools and models resulted in key early lessons learned and identification of13 tools critical tolong-shy‐term robust and sustained application of13 Integrated Computational Methods for Composites Materials (ICM2)13 The key lessons learned include

bull Individual13 models must be relatively mature and validated before an ICM2 problem can be

confidently13 defined bull F analysis-shy‐based13 tools within13 the same digital thread should utilize13 the13 same13 FE solver bull The team ideally13 should consist of at least one expert for each13 tool and13 its underlying model as

well as an expert in the digital thread and integration interface tools and programming

15

13

bull Mesh interpolation between13 finite element analysis based13 tools will likely be required and13 acommercially13 available tool for this13 type of composite mesh interpolation is13 key13 for robust and

sustained application of ICM2

Residual Stress Engineering of Rotating Nickel Components Terry Wong13 Aerojet Rocketdyne

The goal for the PW-shy‐8 project13 The Nickel Base Superalloy Residual Stress Foundational Engineering Problem program aims to13 develop13 a multi-shy‐disciplinary analysis approach13 to13 predict and13 incorporate bulk residual13 stress in the13 design of component This will allow for design of part that is optimized for a particular application or design intent (such13 as weight performance or reduction13 ofscrap) To accomplish this13 goal the PW-shy‐813 team has two major tasks 1) the13 development of an13 infrastructure and associated tools needed to predict and incorporate bulk residual13 stress in the designstages13 of a part and 2) a demonstration task that will validate the infrastructures13 and tools13 developed

The Data13 Management task is the foundational element13 of13 the larger13 infrastructure

development task The goal for the data management work is to13 develop13 an13 infrastructure that allowsthe various team members in PW-shy‐813 to share13 data13 and to store13 data13 in database This not only ensures that data generated13 by this program is never lost but also13 allows for data to13 be transferred13 in13 a secure

manner In order to meet the goal of a system13 that securely stores and shares data three major tasks must be accomplished 1) Validation that there13 is secure13 method to transfer data13 between team

members and that the method of storing data is also secure 2)13 development of13 a database schema and

3) development of means to easily upload data13 into the13 database13 and download data13 from the13 database

Of the three the most difficult task was to create a system in which the transfer and storage ofdata was secure Various proposals were made to13 ensure the security of the data but many of them

were rejected not because the proposal was actually unsecure but because it was not acceptable byeach of the13 team memberrsquos IT security policies The13 process to get each team membersrsquo IT department to approve the security of13 the data transfer13 and storage took more effort13 than anticipated In the endthe team members along with their13 respective IT departments approved the use of13 TRUcentrix as a tool to transfer13 data and the use of13 an ASM International13 hosted13 database to13 store data

The development of database schema13 is necessary step in creating the database to store all the data generated (via physical tests as well as data generated by models) A big help in creating the

database schema was to13 use the schema generated13 by Dr Steven13 Arnold13 from NASArsquos Glenn13 Research13 Center Using NASArsquos schema eliminated13 the need13 to13 start the database build13 from scratch13 and13 allowed13 us to13 focus o the specific needs of storing residual stress data

The last task involves the development of tools that make the input of data13 easy The moredifficult it is to13 enter data into13 a database the more likely data13 will not be13 entered Excel-shy‐based13 templates are being developed in order13 to make data entry as easy as possible

The two big lessons learned from PW-shy‐813 in relationship to data13 management are

bull It is essential13 to work with each13 companiesrsquo IT and IT Security people to develop a system thatsatisfies13 the security requirements of each company

16

13

bull Database schema development should not be reinvented but begin with already established the13 schema

Perspectives on MAI ProgramsDr Mike Glavicic Rolls Royce

The current formulation of MAI (AFRLrsquos Metals Affordability Initiative) programs are structured13 independently from one another in that a standard methodology does not exist that describes whatdata is to13 be archived13 at the conclusion13 of a program Nor does there exist a standard13 protocol of howto archive things on a server13 or some other computer resource at the conclusion13 of a program As aresult the current13 practices of13 MAI programs

bull Do not address IP (intellectual property)13 and licensing issues in systematic fashion which leadsto issues in the commercialization of ICME software that is generated13 under a program

bull Do not have a formalized requirement of how data is to be packaged and stored at the end of aprogram Hence even13 in13 the best of cases in13 which13 all of the data is stored the data is archived13 in13 simple folder13 fashion that13 is not13 searchable in any manner13 possible Moreover even if13 all of13 the

data is present it is extremely cumbersome to13 untangle the data for future use using a file folder strategy

Hence at the end of the program both the government and13 MAI team are13 left to manage13 the13 data and13 results in13 an13 unregimented13 fashion

The lack of regimented protocol of how to package programrsquos13 results13 (raw data and reports) makes it extremely difficult for new government programs to leverage the data from previous programs Case in13 point the RR-shy‐1013 program was developed with the13 singular goal to integrate13 the13 models developed13 in13 the LAD-shy‐213 program into commercial software13 platform From the13 outset of the13 program data models and13 other information13 that was required to attained the13 goals of the13 RR-shy‐1013 program were13 either difficult to track down lost or not provided to the13 RR-shy‐1013 team due13 to contractual disputes As aresult funding that13 was earmarked for13 specific technical work at13 the conception of13 the13 program had to

be diverted13 for the recreation13 of data that existed13 in13 a previous program and13 the progress of the

program was stunted13 by this unforeseen13 burden In13 spite of the RR-shy‐1013 programrsquos13 ability to overcome

these issues and become a successful program it13 should be noted that13 the success of13 other13 programs that13 will encounter13 similar13 problems will likely not13 apply as RR-shy‐1013 was very fortunate13 to have13 the13 means to overcome these serious issues that13 were a result13 of13 the lack of13 a data management13 plan

Furthermore if the13 MGI initiative13 is to progressively develop materials models that are13 leveraged and improved in future government funded programs a policy on how to store and share thedata from disconnected13 programs is required The PW-shy‐1113 program is the first13 step in this process as it13 will have the data from three MAI programs (RR-shy‐10 GE-shy‐10 PW-shy‐9) uploaded into database13 that will be13 searchable and will enable the reuse of the data collected in these programs However even if this13 program is a success13 there will undoubtedly13 be other features13 that will be required in the platform

developed13 and13 hence future modification13 will be required Most importantly there does not currently13 exist server that has been set up and funded for the13 archiving13 of all13 future programs with a protocol13

17

13

13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

that13 must13 be adhered to at13 the conclusion of13 a funded program Without13 such a protocol and the

existence13 of server to be13 used in all future13 programs the13 goals set forth by MGI will be13 unnecessarily

difficult and13 more expensive to attain

MAI Data Informatics Data Standards Modeling amp UQDr Rajiv Naik Pratt amp Whitney

The MAI Data13 Informatics Data13 Standards Modeling amp Uncertainty Quantification (UQ) (PW-shy‐11)13 effort13 will enable the pervasive implementation of ICME by developing (i) a web-shy‐based13 OEM-shy‐Suppliercollaboration framework13 for production13 certification13 and13 process modeling data (ii) data standards and

protocols for interoperability between13 OEMs and13 Suppliers and13 (iii)13 UQ methods to capture variability in

process microstructure and property data Models and data13 from earlier MAI programs such as GE-shy‐10 PW-shy‐913 and RR-shy‐1013 will be13 used to develop and implement OEM-shy‐Supplier integration data standards data

structures and UQ models Several OEMs and suppliers (Pratt amp Whitney General Electric Rolls-shy‐RoyceTIMET Alcoa13 and RTI International Metals Inc) are working together on the PW-shy‐11 Team

The OEM-shy‐Supplier collaboration tool will enable data13 exchange related to mechanical testingmicrostructural13 characteristics processing and process modeling The13 data13 standards developed by the

PW-shy‐1113 team will provide13 an industry standard for such OEM-shy‐Supplier collaboration The collaboration13 framework would provide a platform to manage OEM-shy‐Supplier13 workflows data and13 informationsecurity and access13 control and the13 ability to perform system administration without the13 need13 for day-shy‐to-shy‐day intervention13 of the IT department The capture of such13 supplier data over the collaboration

framework will enable the OEMs to enhance process and property13 models data analytics and UQ

modeling Three potential technologies will be evaluated13 for the OEM-shy‐Supplier Collaboration13 Frameworkndash GE Forge Vanderbilt Forge and Semantic Web technology and one13 technology will be selected13 for the

next phase of the program

18

13

Semantic Web technology is paradigm shift from highly structured13 databases to13 more flexible

graph databases from single database searches to13 multiple and13 disparate13 database13 searches and from

keyword searches to contextual and meaningful (semantic) searches It can provide flexibility13 of data

structures and enhanced OEM-shy‐Supplier13 interoperability However it requires more work upfront13 for13 implementation as it needs the development of a common vocabulary and relationships

The data13 standards and data13 structures effort will leverage work at national and13 international standards13 organizations such13 as ASM ASTM CEN and13 NIST and13 develop13 standards and vocabularies forproduction13 certification data that can include chemistry microstructure and mechanical properties This effort will also develop means for the capture of data needed13 for the UQ and13 ICME models andbuild13 databases for13 the capture of13 the data from the GE-shy‐10 PW-shy‐9 and RR-shy‐10 MAI programs

The ICME13 models being developed under the GE-shy‐10 PW-shy‐9 and RR-shy‐10 programs involve severalinterconnected models related to processing microstructure and mechanical properties The UQ effort under PW-shy‐11 will involve considering the uncertainty at each13 step13 of the modeling process and flowing it through the interconnected models to evaluate the relationships between13 the variabilities of13 the

processing parameters the microstructure and13 the mechanical performance

ASMrsquos Computational Materials Data Network amp the NIST-shy‐ASM Structural Materials Data

Demonstration ProjectScott D Henry ASM International

ASM International ndash the worldrsquos largest13 association of13 metals-shy‐focused materials engineers scientists technicians educators and students13 ndash has launched13 a series of initiatives under the

ldquoComputational Materials Data (CMD)13 Networkrdquo umbrella The objective13 of the13 CMD Network is to serve13 as center for information collection and dissemination for materials data13 to support integrated

computational materials13 engineering (ICME) and to help realize the goals13 of the US Materials13 Genome

InitiativeThe first major project for the CMD Network is the NIST-shy‐ASM Structural Materials Data

Demonstration Project (SMDDP) a cooperative research project funded by NIST with participation from

the Kent13 State Center13 for13 Materials Informatics and Granta Design Ltd13 The main13 objectives of SMDDP

are13 to

19

13

bull Establish well-shy‐pedigreed13 and13 curated13 demonstration13 datasets for non-shy‐proprietary metallic structural materials13 data over multiple length scales The team has13 chosen to work with 6061

aluminum and the13 related Al-shy‐Mg-shy‐Si system

bull Work with NIST and the materials data community to develop materials data schema and

ontologies for the demonstration13 datasets cognizant of broader interests and13 datasets

bull Develop and carry out a series of test problems that represent relevant use cases for the13 repository

bull Make data open to the materials data community for use in data analytics modeling and

educational purposes

bull Actively engage the materials data community and13 widely disseminate the findings from the

project

bull Develop and implement data13 capture13 and curation procedures that can serve13 as models for otherdata repositories

The basic philosophy behind the project is that often standards13 emerge through trial and error and examples are13 needed to spur this process

The project is using the NIST-shy‐KSU DSpace13 repository for data13 files The13 curated structured

database is being developed13 using GRANTA13 MI software and13 is hosted13 o ASM serversThe project ndashlaunched in late 2013 and planned as an 18-shy‐month program13 ndash is being conducted in

phases The13 first phase13 involved developing starter database13 and gaining experience13 in data13 curation The second phase (currently underway) involves building out more comprehensive data13 sets that will be

suitable for carrying out test problems13 related to processing-shy‐structure-shy‐property relationships in13 Al 6061 The third phase will involve making further refinements in response to testing The complete data13 set will be opened up online to the materials community as a whole

Globus13 Scientific13 Data Services13 Current and FutureBen13 Blaiszik1 Kyle Chard1 Jim Pruyne1 Rachana Anathakrishnan1 Steve Tuecke12 Ian Foster12

1University of Chicago Computation Institute 2Argonne National Laboratory MCS Division

During this talk we discussed a variety of currently available Globus services13 Via these Globusservices developers13 and IT administrators13 can offer simple secure and robust file transfer as13 well as13 identity profile and group management to their user communities13 Globus allows end users to create

and13 manage a unique identity that can13 be linked13 to13 external identities for authentication End13 users can13 also transfer files across wide13 area13 networks faster and more13 easily whatever their location

The talk also covered prototype services related to data13 publication metadata tagging and13 data

search Globus13 publishing capabilities13 are delivered through a hosted service Metadata is13 stored in the

cloud while raw published data and a fully-shy‐formed metadata log are stored on campus institutionaland group resources that13 are managed and operated by campus administrators Published datasets are

20

13

organized13 by communities and13 their member collections Globus users can13 create and13 manage theirown13 communities and13 collections through13 the data publication13 service13 with collection level policies regarding user13 access Additionally metadata from these communities are imported into a centralized

index to allow for robust searching functionality

Semantics13 for Deep13 InteroperabilitySamuel Chance iNovex

Semantic technology is based on mature13 web standards and intended to be13 used as global framework for13 machine-shy‐to-shy‐machine communication Interoperability is enabled even when the more

basic elements of the semantic technology stack are used13 in13 conjunction with a controlled vocabularyKnowledge13 representation is enabled as one13 begins to utilize13 the13 ldquohigherrdquo elements of the13 stack

brief history of the semantic web13 was presented13 and13 started13 with13 a depiction13 of the web13 that most people see today documents and web pages presented to the user13 through a web application It13 then depicted a transition to an intermediate state where documents and data can co-shy‐exist on asemantic13 web riding atop the same infrastructure (http) Even in this13 state the data can13 be directly

interpreted by a machine13 The history concluded with a depiction of data being extracted from adocument to13 become part of the web13 of data

The basics of semantic technology were presented and emphasized that the technology has afoundation based13 o formal logic that enables software13 to understand information and reduce13 the13 need

for13 a human to interpret13 the meaning This section of13 the presentation concluded with a chart13 depicting

how ldquotriplesrdquo link to13 each13 other to13 form a web13 of dataThe final section touched upon adoption of13 and things to consider13 when semantic technologies

are13 being considered as solution for community or enterprise

21

13

Big Data13 and Semantic TechnologiesHarlan Shober RJ Lee Group

The presentation started with a description of entities that13 comprise laboratoryinformatics with the central elements being a federated architecture scientific ldquoBig Datardquo management13 and workflow-shy‐driven Laboratory Information Management13 Systems (LIMS)To aid attendee understanding initial context13 was provided by first13 defining for ldquoBig Datardquo and

Semantic Technologies Context13 was further bolstered through a description of a problem that13 included laboratory testingcharacterization and fabrication processes The challenges for atypical laboratory include the use of many systems and the creation of large amounts of data13 that13 grows daily but13 cannot13 be readily accessed

After the context13 was established the needed functionality was described and consistedof several elements contribute index search analyze and archive While bearing in mind theneeded functionality a string of universal scientific data13 management13 process categories werethen presented with the intent13 of applying the universal description to the materials design anddevelopment13 domain

22

13

Though semantic technologies are applicable over the entire lifecycle of the data13 management13 process three categories were specifically identified as primary candidates fortheir application Data13 Acquisition Data13 Manipulation and Storage and Discovery andAdvisory W3C standards for Resource Description Framework (RDF) were applied to a notionalexample with the intent13 of completing a thread that13 gave the attendee a sense of how thistechnology could be applied

The presentation continued with a description of Postulate a specific solutiondeveloped by RJ Lee Group that13 could be used as part13 of a suite of tools to enable the captureof data13 and the near-shy‐frictionless tagging of data13 with additional observations provenanceinformation or links to additional data Using tools like Postulate repositories could be moreeasily shared with others in a federated manner These technologies would enable thedevelopment13 of indices that13 increase the precision of search results and ease analytics since thedata13 would be ldquoself-shy‐describedrdquo

In the remainder of the presentation Postulate functionality was further described andsome of the challenges presented earlier were addressed For example how do you processthe quantity and variety of data how do you store it how do you find the data13 you care aboutand how can the data13 be presented to you in a meaningful way

Natural13 Language Processing (NLP)Tim Hawes Decisive Analytics

The presentation provided the attendees with a description of NLP13 and how it can be13 used It started with examples13 of how it can be used to mine ldquoBig Datardquo to predict credit card fraud or assess13 user purchase histories for product development NLP is also13 being used13 in13 the field13 of biology and13 appears to have13 enjoyed some13 level of success

Since13 the13 biology and materials communities both publish large13 volume13 of documents (abstracts papers presentations etc) many of13 the NLP approaches developed for13 biology may be

applicable13 to materials The next section of the presentation focused on describing various aspects of NLP tokenization

part-shy‐of-shy‐speech tagging disambiguation parsing entity disambiguation and role13 labeling This was helpful in13 removing the mystery surrounding NLP for those13 who hadnrsquot had previous exposure

Typical products from NLP13 would be useful to the materials development community Examples include providing the gist of a documentrsquos content providing answers to natural13 language questions and knowledge13 construction

Specific NLP13 challenges for the13 materials domain were13 identified inline13 use13 of equations prevalence of tables graphs and13 images (more of a computer vision13 problem) and13 numerically driven13 information13 Additionally the ldquogenrerdquo problem was discussed as13 well as13 a way this might be addressed

using raw data (versus learning data) and13 semi-shy‐supervised learning active learning and distant supervision approaches

23

13

The presentation concluded by reiterating how NLP13 can be used and the value of using it describing some its limitations and13 identifying tool development needed13 for the materials domain

24

13

Community13 and13 Government Priorities in13 Addressing13 Materials Data

The following numbered13 priorities for13 both community and government13 actions were lsquorolled uprsquofrom workshop participantsrsquo inputs and were reviewed at13 the workshop The raw inputs provided by

participants are captured13 under each13 numbered13 priority and provide context to13 the priority and13 often13 more specific actions

Community-shy‐Led13 Priorities1 Develop13 and deploy13 standards13 for data and federatedcollaborative environments-shy‐ Promote13 standards for data13 and collaboration environments-shy‐ Designing and deploying suitable software environments that are capable of facilitating e-shy‐

recording and accelerating (fail often-shy‐fail fast)13 of13 the communityrsquos experience (both successes and

failures)13 in exploring integrated workflows needed to realize the vision of ICME and MGI-shy‐ Data13 standards (program managementtechnical datametadata)-shy‐ Community must start developing a common13 data format data package could13 look like an13

HDF5 file and be shared in a variety of systems -shy‐ Begin13 definition13 of standardized13 infrastructure

-shy‐ Produce13 draft guideline for the metadata including information regarding authorship provenance targethellip

-shy‐ Share13 the13 data13 types being collected with the13 community Include13 nametypeunits of data

-shy‐ Community must adopt a federated13 approach13 -shy‐ allow organizations to autonomously create their13 own13 toolsets but be able to13 readwrite to13 the common13 data package mentioned13 in13 point 1

-shy‐ Good documentation on repositories are important-shy‐ Address networking of repositories (lightweight metadata databaseindex) -shy‐ Digital heterostructure microstructure mapping algorithms for both scanning electron microscopy

(SEM)13 and transmission electron microscopy (TEM) -shy‐ Materials process model (thermodynamics) integration with ICMSE (Integrated Computational

Materials Science and Engineering) data flow

-shy‐ Data sharing protocol across users (government13 OEM13 developers) with appropriate securitycontrol

-shy‐ Work with instrumentequipment OEMs to get auto ingest capabilities and open architectures-shy‐ Work together to define common vocabularies for interoperability-shy‐ Standard tests for ICME13 model data13 standards for microstructure13 data

-shy‐ Standards for ICME13 model data13 storage13 and workflows 2 Communicate the value and need13 for digital materials13 data-shy‐ Brand13 management ndash consistent messaging of goals13 and beliefs -shy‐ Share13 success stories -shy‐ Value (ROI13 Return on Investment) o data capture (globally locally)-shy‐ Give incentive ($$$) for case studies to provide evidence that a shared data repository can be

actually useful beyond the archival purpose

25

13

-shy‐ Share13 ROIs for MII (the13 Materials Innovation Infrastructure)-shy‐ ICME projects due to their iterative nature take longer and are more expensive to realize benefits

in industry implementation13 Multi-shy‐year ICME programs are becoming13 a hard sell to management difficult to13 establish ROI and13 realize benefits claimed

3 Engage the community-shy‐ Write grant proposals-shy‐ Collaboration13 (or e-shy‐collaboration) science that will help us13 identify13 the specific13 incentives13 and

impediments for establishing mutually beneficial highly productive sharing of expertise between13 disciplinary experts in13 materials manufacturing and13 data sciences

-shy‐ Continue to13 work o hard13 problemsmdashdevelop13 open13 source codes discover new descriptors data

mining uncertainty analysis-shy‐ Organize as a community (governmentindustryacademia nationalinternational) -shy‐ Follow data13 recommendations in MGI strategic planmdashworkshops to develop roadmap-shy‐ Attend13 more events like the MGI workshop13 mdash gaining13 lessons learned from other organizations is

extremely valuable

4 ldquoDefine critical datardquo13 to be13 compiled-shy‐ Consensus o what fundamental genomic data is most-shy‐important to compile (thermodynamics13

kinetics13 properties [strength]13 high-shy‐throughput13 DFT) -shy‐ Agree to13 the concept of a ldquouniversal repositoryrdquo for all genomic data that everyone will contribute

to and use (cf everyone maintaining and hosting their13 own databases) this includes issues of13 data

security and sharing etc -shy‐ List of most-shy‐important ICMEMGI13 materials development activities from an industrycommercial13

perspective (alloys composites piezoelectrics etc)-shy‐ Some13 ongoing repository efforts seem redundant We13 should avoid unnecessary competition and

try not13 to step o other peoples toes5 Explore and leverage data solutions developed13 outside materials13 community-shy‐ Semantic web infrastructure13 ndash evaluate13 feasibility of lsquounstructured materials datardquo management

(Technical pathcosttimeframe) -shy‐ Keep communicating outside13 the13 community about solutions -shy‐ Focus on available13 tools open ones when possible

6 Train13 students-shy‐ Sponsor students13 who can think in terms13 of IT and MSE

-shy‐ Training workshops (through professional societies) for training the new generation of materials students13 and researchers13 in the use of readily (already) accessible data science tools13 and cloud

services13 to make13 the13 best use13 of this emerging13 infrastructure13 in enhancing13 their own individual day-shy‐to-shy‐day productivity

7 Establish community13 norms13 for publishing materials13 data-shy‐ Involve publishers by requiring data when publishing Begin by making just the data available first

and then work on standardization panel

26

13

-shy‐ Data associated with publications should be shared along with the publications Not only the

metadata (eg average grain size vs cooling rate) but also the source of the data (eg micrographs used13 to13 generate13 grain size temperaturetime charts13 used to calculate cooling rates etc)

Government-shy‐Led13 Priorities1 Lead13 data strategyapproach13 development-shy‐ Hold meeting like this one in 6-shy‐1213 months with purpose13 of developing roadmap for MII-shy‐ Define a technical roadmap for13 the materials data infrastructure

-shy‐ Governmentrsquos got to decide on a strategy-shy‐ Whatrsquos the critical nucleus eg getting13 CALPHAD off the13 ground

-shy‐ Government must establish an official stance on security and infrastructure-shy‐ List of most-shy‐important ICMEMGI materials development activities from a government perspective

(alloys composites piezoelectrics etc)13 -shy‐ Needs assessment for professional workforce development-shy‐ Work with industry to make sure the research-shy‐to-shy‐engineering13 gap is filled What is the practical13 use

of the projects -shy‐ Provide13 the13 materials community organizing function

-shy‐ Business case for productivity improvement with13 data readily available

-shy‐ Facilitator as honest broker of capability awareness and data13 sharing

-shy‐ Interdisciplinary coordinated effort13 to avoidminimize duplication and redundancy

2 Facilitate data deposition (policy amp technology)-shy‐ Setting expectations and guidelines for best practices and standards where13 relevant -shy‐ Invest in tool13 making data management as easy as possible-shy‐ Promote standards developmentcommon data structuresshare knowledge representations -shy‐ Learn about the WBC Semantic Web capabilities and limitations -shy‐ Government should foster community development for both data package refinement and creating

standard approach to creating an enterprise toolset -shy‐ Provide13 motivation for standard data13 collection ensure13 it includes pedigreeprovenancemeta-shy‐data

-shy‐ Facilitator for materials data13 standards (good and trustworthy data13 vs bad data) -shy‐ Facilitator for forming consortium for data13 mining data13 warehousing and sharing

-shy‐ Facilitate13 industry standards for ICME13 model validation and maturity levels-shy‐ Facilitate13 standards for model-shy‐based13 process and13 supplier qualification

3 Incentivize data deposition-shy‐ Provide13 motivation for13 standard data collection ensure it13 includes13 pedigreeprovenancemeta-shy‐data-shy‐ Push ldquodata13 management planrdquo requirements to ldquodata13 depositrdquo requirements -shy‐ Make data sharing a requirement -shy‐ Contractual requirements that preserve data for future use

-shy‐ Demand data submission as a deliverable that will be in the contract before paying the funds -shy‐ Require that any data generated13 by federal grants is provided13 to13 the government (or a designated13

repository)13 within 3 years after completionmdashstart with basic13 research

-shy‐ Demonstrate the return on investments in the materials data infrastructure

27

13

4 Support13 data sharing13 and deposition13 (provide resources)-shy‐ Increase of MGI13 funds dedicated to MII13 development to 10-shy‐-shy‐repositories amp collab platforms -shy‐ Commit to13 long-shy‐term support for data repositories amp collaborative platforms-shy‐ Funding for development collection of fundamental genomic data

-shy‐ Support thermodynamicskinetics database development for light-shy‐weight alloy systems-shy‐ Funding for universal data13 repository (eg at NIST) ndash creation and maintenance

-shy‐ Fund or establish data13 repositories with commonuniform data13 collection format -shy‐ Createidentify secure data sharing site (like AMRDEC)13 for13 government13 contractorsmdashnot a

repository

-shy‐ AFRL take lead13 o making a database that can13 be13 shared

-shy‐ Infrastructure for materials related semantic web applications5 Support13 data creation-shy‐ Support experimental measurement effort to provide13 quality data for13 database development -shy‐ Support more13 high-shy‐throughput13 (HT)13 studies13 for applied programs13 to populate data Current HT

approaches support are13 limited to too fundamental single-shy‐phase DFT calculations 6 Enhance data management13 competency-shy‐ Government needs to increase in-shy‐house expertise in13 these areas or alternatively work closer with13

groups like NIST We cannot allow other organizations to define our needs for us7 Provide quality datasets13 for model13 developmentvalidation-shy‐ Need for open model benchmark cases with data to be available to the community

28

13

Attendees

John Allison University of MichiganJohn Beatty Army Research Laboratory

Ben13 Blaiszik University of ChicagoANL Carrie Campbell NIST

Sam Chance13 InovexJennifer13 Fielding Air Force Research LabMarco Fornari Central Michigan

Tim Hawes Decisive Analytics Corp

Lou Hector General MotorsAndrea Helbach Air Force Research LabKris Hill ITI-shy‐GlobalMike Glavicic Rolls Royce

Ro Gorham NCDMM

Dennis Griffin NASA

Mike Groeber Air Force Research LabScott Henry ASM InternationalMatt Jacobsen Air Force Research LabWill Joost Dept of Energy-shy‐EERE

Surya13 Kalidindi Georgia Tech

Kevin Kendig Air Force Research LabBen13 Leever Air Force Research LabLara Liou GE Aviation

Kenny Lipkowitz Office of Naval Research

Bill Mullins Office of Naval Research

Benji Maruyama Air Force Research LabRajiv Naik Pratt amp WhitneyJim Neumann Honeywell Ruth13 Pachter Air Force Research LabClare Paul Air Force Research LabAmra Peles United13 Technologies Research13 CenterAndy Powell GE Aviation

Brian13 Puchala University MichiganAjit Roy Air Force Research LabAyman13 Salem Materials ResourcesTom Searles Materials Data Management Jason Sebastian QuestekRichard13 Serwecki NASA

Dongwon Shin Oak Ridge13 National Labo

29

13

Harlan Shober RJ LeeJeff13 Sheehy NASA

Jeff13 Simmons Air Force Research LabVeera Sundararaghavan University of MichiganSesh Tamirisakandala RTI Glenn Tarcea University of MichiganTJ Turner Air Force Research LabVasisht Venkatesh Pratt amp WhitneyChuck Ward Air Force Research LabJim Warren Natl Institute of Standards amp TechJeff13 Williams GE Aviation

Terry Wong Rockedyne

Wayne Ziegler Army Research13 Laboratory

Jake Zindel Ford Motor Company

30

Page 2: Materials Genome Initiative: materials data · universallyaccepted best practices. Under the MGI, Federal agencies have funded research and development designedtoexplore, develop,

NISTIR 8038

Materials Genome Initiative Materials Data

Charles H Ward Materials and Manufacturing Directorate

Air Force Research Laboratory

James A Warren Material Measurement Laboratory

National Institute of Standards and Technology

This publication is available free of charge from httpdxdoiorg106028NISTIR8038

January 2015

US Department of Commerce Penny Pritzker Secretary

National Institute of Standards and Technology Willie May Acting Under Secretary of Commerce for Standards and Technology and Acting Director

13

Workshop Summary

Materials Genome Initiative Materials Data

15 -shy‐ 16 July 2014

Dayton Ohio

ii

13

DISCLAIMER Certain commercial equipment instruments software or materials are identifiedin this paper in order to specify the procedures of experiments and simulations adequatelySuch identification is not13 intended to imply recommendation or endorsement13 by the NationalInstitute of Standards and Technology nor is it13 intended to imply that13 the software materials13 instruments or equipment13 identified are necessarily the best13 available for the purpose

iii

13

Introduction

key objective for the Materials Genome Initiative (MGI) is access to digital13 materials dataAccess may be for an13 individual a team company or an entire community13 of interest There are

several challenges13 to providing access13 to digital materials13 data for which there are presently no

universally13 accepted best practices Under the MGI Federal agencies have funded research and

development designed13 to13 explore develop and apply MGI methodologies Several of these13 efforts either focus on an infrastructure13 for sharing13 data or emphasize13 data sharing as a key13 requirement of the

overall technical effort One objective of starting several MGI-shy‐focused programs concurrently was that13 the approaches

lessons learned and best practices would be13 shared between programs in order to accelerate13 the

overall development of a materials innovation13 infrastructure (MII) This workshop13 assembled

representatives from Federally funded efforts where a critical component13 of13 the overall research effort13 involves sharing of materials data in digital13 format13 The intent of this workshop was for13 the participants to provide insights into the goals of13 their13 effort details of13 their13 approach to sharing digital materials data challenges they face and13 lessons learned This exchange13 of ideas will not only enhance13 the

likelihood for success of the individual13 efforts but will13 also speed the communityrsquos understanding ofhow best to13 proceed13 in13 developing an13 MII

During the workshop participants were asked to identify the top three next steps needed to be

taken within the materials community13 and by13 government agencies in addressing the challengesassociated with digital materials data The following is list of priorities13 derived from those inputs for13 each sector

Communitybull Develop and deploy standards for data and13 federatedcollaborative environments bull Communicate value and13 need13 for digital materials databull Define critical data to be compiled

bull Engage Community

bull Explore and leverage data13 solutions developed outside materials community

bull Train students in these emerging areasbull Establish community norms for publishing materials data

Governmentbull Lead data strategyapproach developmentbull Incentivize data depositionbull Facilitate13 data13 deposition (policy and technology) bull Support data13 sharing and deposition (provide resources) bull Support data13 creation

bull Enhance data13 management competency

bull Provide13 quality datasets for model developmentvalidation

1

13 2

13

MGI Materials Data Workshop Organizing Committee

Air Force Research Laboratory (AFRL)

Dr Charles13 Ward amp Mr Clare Paul

Army Research Laboratory (ARL)

Dr John Beatty amp Mr Wayne Ziegler

Energy Efficiency and Renewable Energy Department of13 Energy (DOE-shy‐EERE)

Mr William13 Joost

National Aeronautics and Space Administration (NASA)

Dr Stephen Smith amp Dr Dennis13 Griffin

National Institute of Standards13 and Technology (NIST)

Dr James Warren

Office of Naval Research (ONR)

Dr William13 Mullins amp Dr Kenny Lipkowitz

3

13

AgendaMGI Materials Data Workshop

Tec^Edge 5000 Springfield Street Dayton OH

1 -shy‐ 16 July 2014

Tuesday 074 ndash 0800 Check-shy‐in

080 ndash 0815 Chuck Ward13 (AFRL) ndash Introductory remarks

Discussion Lead Will Joost (DOE-shy‐EERE)

081 ndash 0845 Jim Warren (NIST)13 ndash Data Infrastructure

084 ndash 0915 John Allison Brian13 Puchala (Univ Michigan)13 ndash PRISMSALMMII

091 ndash 0945 Carrie Campbell (NIST) ndash ChiMaD

094 ndash 1015 Discussion

101 ndash 1030 Break

Discussion Lead John Beatty (ARL)

103 ndash 1100 Marco Fornari (Central Michigan Univ)13 -shy‐ AFLOWLIBorg

110 ndash 1130 Surya13 Kalidindi (Ga13 Tech) ndash Mosaic of MicrostructureIGERT-shy‐CIF21

113 ndash 1200 Discussion

120 ndash 1230 Lunch

Discussion Lead Bill Mullins (ONR)

123 ndash 1300 Tom Searles (MDMi) ndash Cast Aluminum

130 ndash 1320 Dongwon Shin (Oak Ridge13 National Laboratory ORNL)13 ndash Cast Aluminum

132 ndash 1340 Jake Zindel (Ford Motor13 Co) ndash Cast Aluminum

134 ndash 1415 Discussion

141 ndash 1430 Break

Discussion Lead Jim Warren (NIST)

143 -shy‐ 1500 Jason Sebastian (Questek)13 ndash Cast Iron

150 ndash 1530 Lou Hector (General Motors) ndash Advanced13 Steel

153 ndash 1600 Lara Liou (GE Aviation)13 -shy‐ Integrated Computational13 Methods for Composite Materials

160 ndash 1700 Discussion

1700 Adjourn13 for the day

4

13

Wednesday

074 ndash 0800 Check-shy‐in

Discussion Lead Wayne Zeigler (ARL)

080 ndash 0830 Terry Wong (Rocketdyne) ndash Nickel RS FEP

083 ndash 0850 Mike Glavicic (Rolls Royce) ndash MAI Ti Modeling

085 ndash 0910 Rajiv Naik (Pratt amp Whitney) ndash MAI Data Informatics Data Standards Modeling amp UQ

091 ndash 0940 Discussion

094 ndash 0955 Break

Discussion Lead Dennis Griffin (NASA)

095 ndash 1015 Ro Gorham (National Center for Defense Manufacturing and13 Machining NCDMM)13 ndashAmerica Makes

101 ndash 1045 Richard13 Serwecki (NASA) -shy‐ MAPTIS

104 ndash 1105 Scott Henry (ASM International) -shy‐ Computational Materials Data Network

110 ndash 1135 Discussion

113 ndash 1200 Lunch

Discussion Lead Clare Paul (AFRL)

120 ndash 1230 Ben13 Blaiszik (Univ of13 ChicagoANL)13 -shy‐ Globus

123 ndash 1300 Mike Groeber (AFRL) ndash DREAM3DSIMPL

130 ndash 1320 Sam Chance13 (iNovex ) -shy‐ Semantics for Deep Interoperability

132 ndash 1340 Harlan Shober (RJ Lee) -shy‐ Big Data and13 Semantic Technologies

134 ndash 1400 Tim Hawes (Decisive Analytics Corp) ndash Natural Language Processing

140 ndash 1445 Discussion

144 ndash 1500 Wrap up

1500 Adjourn

5

13

Presentation13 Summaries

Welcome13 and Introduction Chuck Ward Air Force Research13 Laboratory

The primary intent of the workshop13 is to promote communication between the groups being

funded under13 the Materials Genome Initiative that13 had significant13 interest13 in the management13 of13 materials data The greater awareness is expected to enhance the likelihood13 for success of the

individual13 efforts and speed13 the communityrsquos understanding of how best to13 proceed13 in13 developing an13 MII

The Materials Research Society (MRS) and the Minerals Metals and Materials Society (TMS)conducted a survey13 of the community13 in MayJune of 2013 to gauge13 attitudes and needs in materials data Over 650 people took the survey13 with 74 of respondents saying they would be willing to share

data if it were encouraged13 as a condition13 of funding or publishing The community showed13 the most interest in gaining access to databases of some of the most basic and essential13 materials data such asthermo-shy‐physical properties

The survey asked participants to evaluate advantages of local versus centralized databasesCentralized13 repositories were perceived13 by respondents to13 offer advantages in13 cost an13 ability to13 share

onersquos data provisioning of uniform data standards and13 an13 ability to13 organize and13 find13 data Localrepositories were perceived to provide advantages in ease of data input time efficiency and flexibility inaccommodating new data13 types

The survey also asked participants to provide their thoughts on impediments to and motivators for13 sharing data Most13 of13 the impediments can be attributed13 to13 concerns of ownership13 of data orexternal data13 restrictions while technical issues such as file size and proprietary formatting were13 also

concerns Overall participants saw the ability to contribute data to the community for reuse and

critique positively The highest motivator was13 to enhance onersquos13 own research visibility Interestingly

respondents saw adding metadata as a motivator presumably due to13 the perceived13 higher value13 of data13 when it is13 described more robustly

6

13

MGI Efforts in Infrastructure Development13 at NISTJim Warren National Institute of13 Standards and Technology

The presentation provided an overview of NIST activities supporting the creation13 of a materials data infrastructure The infrastructure could13 start by defining the interfacesAPIs (Application Programing Interfaces) with which user is able13 to interact for13 the deposit13 or13 retrieval of13 data Beyond13 an interface the following are essential elements for creation of viable infrastructure

bull The establishment of repositories to13 store the databull Tools to enable digital capture of the information preferably at the time of creation

(computation or13 experiment) bull Tools to enable markup of the data13 with sufficient metadata13 to inform someone else how the

data was created as well as various attributes that would help the user judge the quality of theinformation

bull Assignment of a persistent digital identifier (like the DOI (Digital Object13 Identifier) for13 journals)13 so the data can be cited and discovered by other

bull The registration of the availability of the data into13 some sort of ldquoregistryrdquo to13 enable discovery without prior knowledge of the existence of the repositoryspecific data features

Additionally creation of new policy and13 standards as well as development of tools to13 evaluate the quality of13 the data and tools to13 enable data-shy‐driven13 discovery will be required to manage and use materials data to the greatest extent possible

NIST is responding to the OSTP (Office of13 Science and Technology and Policy) memos (Dr John Holdren 2 Feb 14) by developing practices that will be of use to13 the community including a Common13 Access Platform allowing access across multiple collections of information13 Research13 and13 Development of Persistent Identifier (PID) infrastructure with13 the Research Data Alliance (RDA) and RampD o Data Type Registries with13 RDA

Finally NIST is collaborating with the National Data Service (based at the National Center for Supercomputing Applications) and is developing a set of best practices for Data Management Plans

7

13

Th Materials Commons A Novel Information13 Repository and13 Collaboration13 Platform for the13 Materials CommunityBrian13 Puchala Glenn13 Tarcea Stravya13 Tamma Emmanuelle Marquis John13 Allison13 University of Michigan

Critical to13 accelerating the pace of materials science and13 development is the development ofnew and13 improved13 infrastructure providing a seamless way for materials researchers to13 share and13 use

materials data and models To address this need we are developing the Materials Commons (httpprismsenginumicheduprismsmaterialscommons)13 an information repository andcollaboration platform for the metals13 community13 in selected technical emphasis13 areas We envision the

Materials Commons becoming a continuous seamless part of the scientific workflow13 processResearchers will upload13 the results of experiments and13 computations as they are performedautomatically where13 possible along with the13 provenance13 information describing the13 experimental and

computational processes By13 associating this13 data with the experimental and virtual computational materials samples from13 which it is obtained the Materials Commons will build process-shy‐structure-shy‐property relationships enabling the construction13 and13 validation13 of constitutive and13 process models The

Materials Commons website provides an easy-shy‐to-shy‐use interface for uploading and13 downloading data and13 data provenance searching and sharing data At its core the13 Materials Commons consists of secure13 data storage cluster an13 application13 for efficiently uploading and downloading large data sets and a REST

(REpresentational State Transfer) based13 API to13 access and13 extend13 the capabilities of the repository TheAPI allows for features such13 as automated13 data upload13 from experiments and13 computations seamless integration of13 computational models and algorithmic analysis of13 process-shy‐structure-shy‐propertyrelationships The Materials Commons is a central thrust13 of13 the Center13 for13 PRedictive Structural Materials Science (PRISMS)

NIST Materials Data Informatics EffortsCarrie Campbell National Institute of Standards an Technology

The presentation covered NISTrsquos pilot efforts in13 creating a data infrastructure focusing on Phase-shy‐Based13 Property Data Phase-shy‐based data13 are13 diverse13 data13 that13 include 1 2 3 and 4-shy‐dimensional13 data are13 semi-shy‐structured and often include incomplete data sets The figure below provides anoverview of NISTs data efforts supporting the MGI

8

13

NIST Materials Data Repository (httpsmaterialsdatanistgovdspacexmlui) is a customizedDSpace repository for materials that enables sharing of variety of data types including text13 files images and video The repository also provides persistent identifier for each entry and allows users to choose

from a variety of13 license types NIST is also working on two data curation tools for phase-shy‐based13 dataThe Thermodynamic Research13 Center (httptrcnistgov) is extending the ThermoML markup languageand Guide13 Data13 Capture13 software13 tool to metallic systems13 with an initial emphasis on phase equilibriaand thermochemical data The13 Materials Data13 Curation System (MDCS)13 employs user-shy‐defined13 XML-shy‐based13 schemas to curate13 data The13 MDCS13 allows either web-shy‐base data curation13 or using REST API The

entered data13 is stored in a non-shy‐relational database repository can be searched using SPARQL (SPARQL Protocol and RDF Query Language) interface13 and integrated with variety of scientific workflow13 tools13

AFLOW ConsortiumMarco Fornari Michigan13 State University

The overall goal of the Materials Genome Initiative (MGI) is to accelerate the discovery13 development and13 deployment of new materials with ad hoc functionalities The AFLOW Consortium

focuses on the software infrastructures to achieve the MGI goals by using first13 principles calculations The emphasis is on the data13 that must be generated archived validated and post-shy‐processed13 efficiently13 both13 to13 identify new materials and13 distill meaningful quantitative relationships within13 the data This extends the13 solution to the13 inverse13 problem of chemically replacing13 rare13 and costly elements in criticaltechnologies

The AFLOW Consortium provides access to the repository AFLOWLIBORG

lthttpAFLOWLIBORGgt13 that publicly distributes data13 on energetics and thermodynamic stability for more than 600000 compounds and band structures for roughly 300000 materials The repository

includes bulk crystals and alloys13 Data were computed with the AFLOW high-shy‐throughput13 framework

using density functional theory The results of the calculations can13 be queried13 with13 the recentlydeveloped13 REST-shy‐API that standardizes and13 labels the data for retrieval purposes The13 fields defined in

the AFLOW REST-shy‐API include a material object identifier (AUID) data cloud13 location13 (AURL) ownership13 9

13

and sponsorship keywords and data13 specific labels (COMPOSITION ENTROPY EGAP hellip) Simple13 scripting languages13 can be used to automatically download13 from the AFLOWLIB13 repository The AFLOW

approach has13 been extensively applied on Platinum-shy‐group alloys thermoelectric materials magnetic materials topological insulators scintillators etc The AFLOW REST-shy‐API can13 be extended13 to13 includeexperimental data For further details on the13 database the13 reader is directed tohttpmaterialsdukeeduauroAUROARTICULAjcommatsci201202002pdf

Materials Innovation Infrastructure Lessons Learned and Future PlansSurya Kalidindi Georgia Institute13 of Technology

central activity in13 materials research13 is the exploration of processing-shy‐structure-shy‐property (PSP) linkages13 Workflow is defined13 as the sequence13 of steps employed for establishing13 PS linkages ofinterest to any specific engineeringtechnology application

High Throughput Explorations of PSP Linkages13 We must treat13 hierarchical13 material as acomplex System which means embracing and exploiting uncertainty quantification13 and13 complexity

management conceptstools13 It also means we need to design develop and13 validate decision support systems13 that will leverage the best current understanding (with13 its uncertainties) and13 provide objective13 guidance13 on future13 effort investment (combination of experiments and simulations)13 This meansleveraging Data Science to effectively feed the decision support systems described above by developingand deploying high13 throughput strategies for obtaining high value information from both experimentsand models

What are integrated13 workflows Integrated workflows are those that utilize the best13 combination of experiments13 and simulations13 in extracting robust and reliable PSP linkages13 This requiresengaging and exploiting cross-shy‐disciplinary expertise that includes materials science manufacturingsystems approaches uncertainty quantification computational science data and information sciencesThrough this one must ensure that the workflows output the critical information needed13 by design13 and13 manufacturing stakeholders in the materials development value chain13 Integration demands expertise

sharing13 which essentially is the core element of a collaboration Cross-shy‐disciplinary Integration13 demands13 effective13 collaborations13 such as those one might set up using the modern tools of the day13 ie a e-shy‐Collaboration

10

13

Motivators for e-shy‐Collaborations13 e-shy‐Collaborations are central to13 enhancing the productivity ofmaterials development activities in several ways

bull individual13 research groups can identify and engage13 complementary expertise13 needed for their projects

bull organizations can13 coordinate multiple teams and13 multiple projects to13 maximize outputimpactbull organizations can13 identify and13 seek the right partners that13 enhance their13 individual and combined

value e-shy‐Collaborations can13 be open semi-shy‐open or closed13 depending o the needs of the specific organization Productivity enhancement leading to dramatic reductions in cost and time13 expended in materials development is the13 main incentive13 for e-shy‐collaborations

Main Impediments to e-shy‐CollaborationsLack13 of tools to identify13 the13 ldquobestrdquo13 collaborators

bull Need software tools that quantify more precisely the expertise of individuals groupsorganizations

bull Need software tools for13 allowing rich annotations and e-shy‐recording of13 discussions Need analytic tools for13 identification extraction13 and management of metadata (ie13 high13 value abstractionsfrom the raw data)

bull Need searchable semantic metadata databases (these are13 lightweight and can be13 centralized

and customized for specific13 user communities) bull Need software tools for tracking provenance (ie13 whowhatwhenhow13 generated modified

etc) Lack13 of experience13 with the13 use13 of integrated workflows

bull Need software tools13 for e-shy‐recording of13 workflows and development of searchable workflow

databases bull Need recommendation schemes that suggest the best workflow(s) based on recorded prior

experience this provides means to transfer expertise13 acquired by an organization to other teams and other13 projects (even in the same organization)

bull Leads to possible automation and dramatic enhancement of productivity

MINED Grouprsquos Past Experience13 (~2005 -shy‐ ~2008) Funded by the Army Research Office ARO(David Stepp) engaged computer science (CS) graduates to produce13 a single13 user platform that13 integrated certain software codes previously produced13 by our group software development subsequently abandoned but recruited two CS graduates into PhD programs in Materials Science andEngineeringMechanical Engineering (MSEME)13 which dramatically altered the direction of13 research in

the group brief chronology of development bull Senior Design Project (2011-shy‐2012) An online13 portal designed for sharing data13 files and tools

learned the importance of user interfaces and the need for professional13 code developers andsoftware architects

bull HUBzero Instantiation (2012-shy‐2013) Built hub by customizing HUBzero for hierarchical structuralmaterials there are several advantages in terms of accessing and executing codes13 (on the server side) but there are significant barriers13 to establishing fast and productive13 cross-shy‐disciplinary e-shy‐collaborations

11

13

bull GTRI Integration Platform (2013-shy‐2014) Engaged professional code13 developers and software13 architects to design13 a multi-shy‐user web13 portal that allows data and13 code sharing while tracking

provenance and workflows underestimated the resources needed for such a project bull MATIN Collaboration Environment (2013-shy‐2014) Lightweight integrated software13 shell that allows

users to13 seamlessly interface with13 numerous cloud13 services offered13 through13 the web13 through13 their13 APIs Examples include GitHub for versioning and archival of codes Authorea forcollaborative editing of documents figshare for citable publication of research Plotly forcollaborative data analysis13 and visualization Google+ and LinkedIn for eteaming13 and networking MATIN focuses on building an emergent community ndash Materials amp Manufacturing Informatics (MMI) MATIN provisions web services for13 software integration platforms and innovationinfrastructure Examples of MATIN13

bull Spatial Statistics httpsgithubcomtonyfastSpatialStatisticsFFT

bull PyMKS httpopenmaterialsgithubiopymksindexhtml bull Example discussion about periodic boundary conditions in running Finite Element (FE or FEM)

simulations httpsgithubcomsfepysfepyissues267

In the platform developed it is noted that the user or userrsquos organization can set permissions at the

level13 of individual datasetscodes o the degree and13 type of13 sharingCentral Challenge Activation The transition from traditional disciplinary workflows to the

integrated workflows needs to overcome an activation barrier There is no13 clear incentive to13 get ahead13 of the curve in13 anticipating how the upcoming transformations in how materials development will be

pursued13 in13 the future Open (reproducible) science alone is not a sufficiently strong motivator for materials development community with13 strong interests in intellectual13 property development13 indeed itmight actually13 be perceived13 as a negative There is need to explore13 and refine13 what incentives work for individual13 researchers (eg increased productivity)

Future13 Plans13 Continue to13 develop13 and13 deploy MATIN at GT and any other organizationsgroups that13 express an interest in working with us Design create and distribute13 suite13 of ldquomobile13 firstrdquo Appsdesigned13 to13 make data ingest into13 collaboration environments such as MATIN from both material characterization equipment and multi-shy‐scale models an enhanced13 user experience with13 friendly andnatural interfaces Proactively promoteincentivize13 datametadata13 ingest into publicly shareddatabases Work with13 professional societies to13 announce and run new competitionsawards for13 recognizing and rewarding good practices by the members of the broader materials innovation13 usercommunity (eg Best13 Documented Data Award Best Documented Tool Award Best Documented

Workflow Award) Develop and deliver new workshops for training the new generation of13 graduate

students postdocs and junior faculty in the13 emerging workflows of Materials amp ManufacturingInformatics

High Performance Cast Aluminum Alloys for Next Generation Passenger Vehicle EnginesDongwon Shin Materials Science and Technology Division Oak Ridge National Laboratory

12

13

Oak Ridge National Laboratory (ORNL) teamed with Chrysler and Nemak13 to develop next-shy‐generation high-shy‐performance cast aluminum alloys that have13 improved castability high-shy‐temperature

strength and fatigue13 performance At this point the13 data management practice is limited13 to13 analytical data spreadsheets other than13 the CALPHAD computational thermodynamic databases and13 proprietary

property databases for casting simulation As the project goes on more robust data schema will bedeveloped13 to13 store various types of data populated13 from the experiments and13 calculations

Besides the data structure design the speaker discussed13 the necessity of a new approach13 to13 identifying key descriptors from the large dataset that can guide the design of new materials An alloydesign13 requires the balance of extremely complex relationships among a large number of descriptors Some13 examples of these13 descriptors include13 phase fraction13 of key intermetallics lattice parameters ofthe matrix and diffusion13 kinetics13 of solute atoms However optimizing these quantities relies on good

physical metallurgy intuition13 coupled13 with13 clever experiments dogged13 sleuthing and13 a healthy dose ofluck13 Such descriptor-shy‐based predictions of materials properties date back to13 the beginning13 of modern

solid-shy‐state chemistry and physics As13 shown in the Darken-shy‐Gurry map to predict the solubility ofelements in Tantalum13 intuitive descriptors13 such as electronegativity and radius13 could be used formaterials design However as the chemistry and physics13 of materials become more complex intuition-shy‐based13 search13 may not work anymore Thus the use of more advanced13 statistical analysis and13 datamining approach might be required to accelerate the design of complex multi-shy‐componentmulti-shy‐phase

alloy design13

Data Related to Cast Aluminum

Jake Zindel Ford Motor Company

The main focus of our research is the development of CAE (Computer-shy‐aided Engineering) tools to support13 two significant13 activities in the company component13 design (product13 development)13 and

manufacturing engineering (high-shy‐volume production) A key13 part of our research is conducting13 experiments to collect data13 for the13 development and validation of13 the CAE tools

The data13 we collect consists of mechanical properties physical properties microstructuralimages and manufacturing process parameters that form the13 pedigree13 of the13 material This information

can be in the form of Excel files proprietary-shy‐software storage files screen shots and written reports Our immediate need is to deal with this large amount of existing information collected over

many years on a ldquolocalrdquo level The first step is to organize many independent collections of informationin all13 their forms13 The next issue is the time required to enter all13 this information into the database13 Maintenance may also require a significant amount of effort Lastly there will also13 be a need13 to13 manipulate the data contained in the database using Excel Mini-shy‐tab MATLAB etc Therefore links or13 easy exportation of data13 to these13 types of applications will be13 necessary

The next level would involve dealing with information that has been created or collected byothers From an13 OEM (Original Equipment13 Manufacturer)13 perspective this would13 involve convincing

suppliers13 to share processing information that they may consider proprietary In addition to suppliersuniversities and13 national laboratories are also13 sources of data Data from all the13 sources will require13 some sort of protocol to determine its13 quality and resolve inconsistencies

13

13

In the end13 it seems likely that considerable budget and time will be required to implement comprehensive informationdatabases which13 could be the largest hurdle13 of them all

QuesTekrsquos MGIICME Approach to Alloy Optimization and Design (Including Cast Iron) Jason Sebastian QuesTek Innovations LLC

Integrated Computational Materials Engineering (ICME) methods involve the holistic application13 of different computational models across various length13 scales to13 the design development and13 rapid13 qualification13 of advanced13 materials Using itsMaterials by Design technologies QuesTek has successfully applied these ICME methods to13 the design development13 and full qualification of several advanced alloys (including aerospace13 structural steels aerospace13 gear steels advanced castable13 titanium alloys high-shy‐strength corrosion-shy‐resistant13 aluminum alloys13 and high-shy‐performance cast iron)

QuesTekrsquos ICME methods benefit tremendously from the activities of the US Materials GenomeInitiative (MGI)13 Recognizing that the ldquo10 to13 2 yearsrdquo that13 it takes for alloy design and development ishindering the rapid13 deployment of higher-shy‐performance materials13 the MGI is focused on ldquocutting in halfrdquo the time and cost13 to design and deploy novel advanced materials needed to enhance and sustain UScompetitiveness

QuesTek also has ongoing13 activities on the13 application of ICME methods to the13 design and

development of materials for the energy sector including advanced13 materials that enable vehicle

lightweighting13 Of particular note is an ongoing effort with a major diesel13 engine manufacturer on the13 development of high-shy‐performance cast iron13 alloys for engine block applications QuesTekrsquos efforts have

made extensive use of various MGI-shy‐type tools and databases including 1)13 density functional theory

methods for high-shy‐throughput13 screening of cast iron inoculant stabilities (structure lattice parameterstability of various13 oxidesulfide phases) 2)13 liquid iron mobility databases (to help accurately modelsolidification phenomena) and 3) thermodynamic13 databases13 (to examine nano-shy‐precipitate stability)

Moving forward to continue to accelerate the materials development cycle efforts shouldremain focused on the development13 of13 the fundamental databases (thermodynamic kinetic etc)13 that13 enable13 ICME models There13 should also be13 increased focus o efforts related13 to13 the integration13 of the

various computational databases models and tools (the ldquoIrdquo13 in ICME)

Integrated Computational13 Materials Engineering (ICME) of Generation Three Advanced High Strength

Steels Louis G Hector Jr General Motors RampD Center

While traditional material development methods often involve laborious trial-shy‐and-shy‐error testing Integrated Computational13 Materials Engineering (ICME) which falls under the umbrella of the Materials Genome Initiative is offering a means13 for materials13 development under a more compressed time scale Advanced13 high13 strength13 steels (AHSS) which13 contain13 multiple phases (eg ferrite bainite pearlite austenite martensite) and may exhibit13 phase transformation with strain are fertile ground for13 ICME However prediction of macro-shy‐scale constitutive behavior based upon the multi-shy‐scale physical chemical and mechanical phenomena13 in these13 complex microstructures is formidable challenge for ICMEUnprecedented collaboration between universities industry and government labs will be required to

14

13

address AHSS13 development archiving of data and the13 implementation of these13 materials into products that13 benefit13 the American consumer This presentation began with a brief overview of a newDOEUSAMP (US Automotive Materials Partnership)-shy‐funded ICME program aimed at13 the development13 of Generation13 Three (Gen13 3) AHSS The lack of commercial Gen13 3 steels that meet DOE targets is offering13 significant challenges13 to this13 new ICME program The program is13 meeting these challenges13 following four thrusts (1)13 making Gen 3 steels to meet DOE targets (2) passing experimental measurements of key

properties to13 an13 ICME model that produces microstructure-shy‐based13 constitutive models (3) materialevaluation in forming13 and vehicle13 performance13 simulations (4) archiving13 all relevant data13 for easy use13 by

others in13 the program Although13 the Gen13 3 AHSS ICME program is ldquomission13 orientedrdquo it demonstrates an important role for universities in providing the necessary foundation in fundamental materialsscience and engineering

Integrated Computational13 Methods for Composites Materials (ICM2)13 Lessons Learned from

Integration Planning and Feasibility Demonstrations Lara Liou GE Aviation

General Electric Aviation and Lockheed Martin Aeronautics have teamed through an Air Force

Research13 Lab13 program to13 conduct a demonstration13 of an13 Integrated13 Computational Materials Engineering digital framework that links material processing property and13 structural relationships to13 account for processability manufacturability and system performance13 with the13 goal of demonstrating

that13 the usage of13 integrated models can contribute to future airframe and engine designs with dramatic reductions in development13 time and cost Both engine and airframe Foundational Engineering Problems will be demonstrated through a common digital framework to be developed and validated onincreasingly complex articles13 The modeling tools that13 cover composite process models localized13 mechanical behavior models and global design models (Convergent Manufacturing TechnologyrsquosCOMPRO CCA Autodeskrsquos Autodesk Simulation13 Composite Analysis and13 an13 Abaqus based13 University ofMichigan micromechanics model) are centrally linked via a commercially available model integration

interface (Phoenix Integrationrsquos ModelCenterreg)13 Detailed plans of what data will13 be passed among thevarious models as well as how it will be passed have been completed Creation of these detailed13 process maps and13 simple feasibility demonstrations have raised13 inherent technical and13 logistical challenges13 to integrating commercially13 available multi-shy‐scale models Robust model integration methods13 to address these challenges were13 discussed

The exercise of drawing up detailed process maps for what to integrate and how to integrate13 the selected tools and models resulted in key early lessons learned and identification of13 tools critical tolong-shy‐term robust and sustained application of13 Integrated Computational Methods for Composites Materials (ICM2)13 The key lessons learned include

bull Individual13 models must be relatively mature and validated before an ICM2 problem can be

confidently13 defined bull F analysis-shy‐based13 tools within13 the same digital thread should utilize13 the13 same13 FE solver bull The team ideally13 should consist of at least one expert for each13 tool and13 its underlying model as

well as an expert in the digital thread and integration interface tools and programming

15

13

bull Mesh interpolation between13 finite element analysis based13 tools will likely be required and13 acommercially13 available tool for this13 type of composite mesh interpolation is13 key13 for robust and

sustained application of ICM2

Residual Stress Engineering of Rotating Nickel Components Terry Wong13 Aerojet Rocketdyne

The goal for the PW-shy‐8 project13 The Nickel Base Superalloy Residual Stress Foundational Engineering Problem program aims to13 develop13 a multi-shy‐disciplinary analysis approach13 to13 predict and13 incorporate bulk residual13 stress in the13 design of component This will allow for design of part that is optimized for a particular application or design intent (such13 as weight performance or reduction13 ofscrap) To accomplish this13 goal the PW-shy‐813 team has two major tasks 1) the13 development of an13 infrastructure and associated tools needed to predict and incorporate bulk residual13 stress in the designstages13 of a part and 2) a demonstration task that will validate the infrastructures13 and tools13 developed

The Data13 Management task is the foundational element13 of13 the larger13 infrastructure

development task The goal for the data management work is to13 develop13 an13 infrastructure that allowsthe various team members in PW-shy‐813 to share13 data13 and to store13 data13 in database This not only ensures that data generated13 by this program is never lost but also13 allows for data to13 be transferred13 in13 a secure

manner In order to meet the goal of a system13 that securely stores and shares data three major tasks must be accomplished 1) Validation that there13 is secure13 method to transfer data13 between team

members and that the method of storing data is also secure 2)13 development of13 a database schema and

3) development of means to easily upload data13 into the13 database13 and download data13 from the13 database

Of the three the most difficult task was to create a system in which the transfer and storage ofdata was secure Various proposals were made to13 ensure the security of the data but many of them

were rejected not because the proposal was actually unsecure but because it was not acceptable byeach of the13 team memberrsquos IT security policies The13 process to get each team membersrsquo IT department to approve the security of13 the data transfer13 and storage took more effort13 than anticipated In the endthe team members along with their13 respective IT departments approved the use of13 TRUcentrix as a tool to transfer13 data and the use of13 an ASM International13 hosted13 database to13 store data

The development of database schema13 is necessary step in creating the database to store all the data generated (via physical tests as well as data generated by models) A big help in creating the

database schema was to13 use the schema generated13 by Dr Steven13 Arnold13 from NASArsquos Glenn13 Research13 Center Using NASArsquos schema eliminated13 the need13 to13 start the database build13 from scratch13 and13 allowed13 us to13 focus o the specific needs of storing residual stress data

The last task involves the development of tools that make the input of data13 easy The moredifficult it is to13 enter data into13 a database the more likely data13 will not be13 entered Excel-shy‐based13 templates are being developed in order13 to make data entry as easy as possible

The two big lessons learned from PW-shy‐813 in relationship to data13 management are

bull It is essential13 to work with each13 companiesrsquo IT and IT Security people to develop a system thatsatisfies13 the security requirements of each company

16

13

bull Database schema development should not be reinvented but begin with already established the13 schema

Perspectives on MAI ProgramsDr Mike Glavicic Rolls Royce

The current formulation of MAI (AFRLrsquos Metals Affordability Initiative) programs are structured13 independently from one another in that a standard methodology does not exist that describes whatdata is to13 be archived13 at the conclusion13 of a program Nor does there exist a standard13 protocol of howto archive things on a server13 or some other computer resource at the conclusion13 of a program As aresult the current13 practices of13 MAI programs

bull Do not address IP (intellectual property)13 and licensing issues in systematic fashion which leadsto issues in the commercialization of ICME software that is generated13 under a program

bull Do not have a formalized requirement of how data is to be packaged and stored at the end of aprogram Hence even13 in13 the best of cases in13 which13 all of the data is stored the data is archived13 in13 simple folder13 fashion that13 is not13 searchable in any manner13 possible Moreover even if13 all of13 the

data is present it is extremely cumbersome to13 untangle the data for future use using a file folder strategy

Hence at the end of the program both the government and13 MAI team are13 left to manage13 the13 data and13 results in13 an13 unregimented13 fashion

The lack of regimented protocol of how to package programrsquos13 results13 (raw data and reports) makes it extremely difficult for new government programs to leverage the data from previous programs Case in13 point the RR-shy‐1013 program was developed with the13 singular goal to integrate13 the13 models developed13 in13 the LAD-shy‐213 program into commercial software13 platform From the13 outset of the13 program data models and13 other information13 that was required to attained the13 goals of the13 RR-shy‐1013 program were13 either difficult to track down lost or not provided to the13 RR-shy‐1013 team due13 to contractual disputes As aresult funding that13 was earmarked for13 specific technical work at13 the conception of13 the13 program had to

be diverted13 for the recreation13 of data that existed13 in13 a previous program and13 the progress of the

program was stunted13 by this unforeseen13 burden In13 spite of the RR-shy‐1013 programrsquos13 ability to overcome

these issues and become a successful program it13 should be noted that13 the success of13 other13 programs that13 will encounter13 similar13 problems will likely not13 apply as RR-shy‐1013 was very fortunate13 to have13 the13 means to overcome these serious issues that13 were a result13 of13 the lack of13 a data management13 plan

Furthermore if the13 MGI initiative13 is to progressively develop materials models that are13 leveraged and improved in future government funded programs a policy on how to store and share thedata from disconnected13 programs is required The PW-shy‐1113 program is the first13 step in this process as it13 will have the data from three MAI programs (RR-shy‐10 GE-shy‐10 PW-shy‐9) uploaded into database13 that will be13 searchable and will enable the reuse of the data collected in these programs However even if this13 program is a success13 there will undoubtedly13 be other features13 that will be required in the platform

developed13 and13 hence future modification13 will be required Most importantly there does not currently13 exist server that has been set up and funded for the13 archiving13 of all13 future programs with a protocol13

17

13

13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

that13 must13 be adhered to at13 the conclusion of13 a funded program Without13 such a protocol and the

existence13 of server to be13 used in all future13 programs the13 goals set forth by MGI will be13 unnecessarily

difficult and13 more expensive to attain

MAI Data Informatics Data Standards Modeling amp UQDr Rajiv Naik Pratt amp Whitney

The MAI Data13 Informatics Data13 Standards Modeling amp Uncertainty Quantification (UQ) (PW-shy‐11)13 effort13 will enable the pervasive implementation of ICME by developing (i) a web-shy‐based13 OEM-shy‐Suppliercollaboration framework13 for production13 certification13 and13 process modeling data (ii) data standards and

protocols for interoperability between13 OEMs and13 Suppliers and13 (iii)13 UQ methods to capture variability in

process microstructure and property data Models and data13 from earlier MAI programs such as GE-shy‐10 PW-shy‐913 and RR-shy‐1013 will be13 used to develop and implement OEM-shy‐Supplier integration data standards data

structures and UQ models Several OEMs and suppliers (Pratt amp Whitney General Electric Rolls-shy‐RoyceTIMET Alcoa13 and RTI International Metals Inc) are working together on the PW-shy‐11 Team

The OEM-shy‐Supplier collaboration tool will enable data13 exchange related to mechanical testingmicrostructural13 characteristics processing and process modeling The13 data13 standards developed by the

PW-shy‐1113 team will provide13 an industry standard for such OEM-shy‐Supplier collaboration The collaboration13 framework would provide a platform to manage OEM-shy‐Supplier13 workflows data and13 informationsecurity and access13 control and the13 ability to perform system administration without the13 need13 for day-shy‐to-shy‐day intervention13 of the IT department The capture of such13 supplier data over the collaboration

framework will enable the OEMs to enhance process and property13 models data analytics and UQ

modeling Three potential technologies will be evaluated13 for the OEM-shy‐Supplier Collaboration13 Frameworkndash GE Forge Vanderbilt Forge and Semantic Web technology and one13 technology will be selected13 for the

next phase of the program

18

13

Semantic Web technology is paradigm shift from highly structured13 databases to13 more flexible

graph databases from single database searches to13 multiple and13 disparate13 database13 searches and from

keyword searches to contextual and meaningful (semantic) searches It can provide flexibility13 of data

structures and enhanced OEM-shy‐Supplier13 interoperability However it requires more work upfront13 for13 implementation as it needs the development of a common vocabulary and relationships

The data13 standards and data13 structures effort will leverage work at national and13 international standards13 organizations such13 as ASM ASTM CEN and13 NIST and13 develop13 standards and vocabularies forproduction13 certification data that can include chemistry microstructure and mechanical properties This effort will also develop means for the capture of data needed13 for the UQ and13 ICME models andbuild13 databases for13 the capture of13 the data from the GE-shy‐10 PW-shy‐9 and RR-shy‐10 MAI programs

The ICME13 models being developed under the GE-shy‐10 PW-shy‐9 and RR-shy‐10 programs involve severalinterconnected models related to processing microstructure and mechanical properties The UQ effort under PW-shy‐11 will involve considering the uncertainty at each13 step13 of the modeling process and flowing it through the interconnected models to evaluate the relationships between13 the variabilities of13 the

processing parameters the microstructure and13 the mechanical performance

ASMrsquos Computational Materials Data Network amp the NIST-shy‐ASM Structural Materials Data

Demonstration ProjectScott D Henry ASM International

ASM International ndash the worldrsquos largest13 association of13 metals-shy‐focused materials engineers scientists technicians educators and students13 ndash has launched13 a series of initiatives under the

ldquoComputational Materials Data (CMD)13 Networkrdquo umbrella The objective13 of the13 CMD Network is to serve13 as center for information collection and dissemination for materials data13 to support integrated

computational materials13 engineering (ICME) and to help realize the goals13 of the US Materials13 Genome

InitiativeThe first major project for the CMD Network is the NIST-shy‐ASM Structural Materials Data

Demonstration Project (SMDDP) a cooperative research project funded by NIST with participation from

the Kent13 State Center13 for13 Materials Informatics and Granta Design Ltd13 The main13 objectives of SMDDP

are13 to

19

13

bull Establish well-shy‐pedigreed13 and13 curated13 demonstration13 datasets for non-shy‐proprietary metallic structural materials13 data over multiple length scales The team has13 chosen to work with 6061

aluminum and the13 related Al-shy‐Mg-shy‐Si system

bull Work with NIST and the materials data community to develop materials data schema and

ontologies for the demonstration13 datasets cognizant of broader interests and13 datasets

bull Develop and carry out a series of test problems that represent relevant use cases for the13 repository

bull Make data open to the materials data community for use in data analytics modeling and

educational purposes

bull Actively engage the materials data community and13 widely disseminate the findings from the

project

bull Develop and implement data13 capture13 and curation procedures that can serve13 as models for otherdata repositories

The basic philosophy behind the project is that often standards13 emerge through trial and error and examples are13 needed to spur this process

The project is using the NIST-shy‐KSU DSpace13 repository for data13 files The13 curated structured

database is being developed13 using GRANTA13 MI software and13 is hosted13 o ASM serversThe project ndashlaunched in late 2013 and planned as an 18-shy‐month program13 ndash is being conducted in

phases The13 first phase13 involved developing starter database13 and gaining experience13 in data13 curation The second phase (currently underway) involves building out more comprehensive data13 sets that will be

suitable for carrying out test problems13 related to processing-shy‐structure-shy‐property relationships in13 Al 6061 The third phase will involve making further refinements in response to testing The complete data13 set will be opened up online to the materials community as a whole

Globus13 Scientific13 Data Services13 Current and FutureBen13 Blaiszik1 Kyle Chard1 Jim Pruyne1 Rachana Anathakrishnan1 Steve Tuecke12 Ian Foster12

1University of Chicago Computation Institute 2Argonne National Laboratory MCS Division

During this talk we discussed a variety of currently available Globus services13 Via these Globusservices developers13 and IT administrators13 can offer simple secure and robust file transfer as13 well as13 identity profile and group management to their user communities13 Globus allows end users to create

and13 manage a unique identity that can13 be linked13 to13 external identities for authentication End13 users can13 also transfer files across wide13 area13 networks faster and more13 easily whatever their location

The talk also covered prototype services related to data13 publication metadata tagging and13 data

search Globus13 publishing capabilities13 are delivered through a hosted service Metadata is13 stored in the

cloud while raw published data and a fully-shy‐formed metadata log are stored on campus institutionaland group resources that13 are managed and operated by campus administrators Published datasets are

20

13

organized13 by communities and13 their member collections Globus users can13 create and13 manage theirown13 communities and13 collections through13 the data publication13 service13 with collection level policies regarding user13 access Additionally metadata from these communities are imported into a centralized

index to allow for robust searching functionality

Semantics13 for Deep13 InteroperabilitySamuel Chance iNovex

Semantic technology is based on mature13 web standards and intended to be13 used as global framework for13 machine-shy‐to-shy‐machine communication Interoperability is enabled even when the more

basic elements of the semantic technology stack are used13 in13 conjunction with a controlled vocabularyKnowledge13 representation is enabled as one13 begins to utilize13 the13 ldquohigherrdquo elements of the13 stack

brief history of the semantic web13 was presented13 and13 started13 with13 a depiction13 of the web13 that most people see today documents and web pages presented to the user13 through a web application It13 then depicted a transition to an intermediate state where documents and data can co-shy‐exist on asemantic13 web riding atop the same infrastructure (http) Even in this13 state the data can13 be directly

interpreted by a machine13 The history concluded with a depiction of data being extracted from adocument to13 become part of the web13 of data

The basics of semantic technology were presented and emphasized that the technology has afoundation based13 o formal logic that enables software13 to understand information and reduce13 the13 need

for13 a human to interpret13 the meaning This section of13 the presentation concluded with a chart13 depicting

how ldquotriplesrdquo link to13 each13 other to13 form a web13 of dataThe final section touched upon adoption of13 and things to consider13 when semantic technologies

are13 being considered as solution for community or enterprise

21

13

Big Data13 and Semantic TechnologiesHarlan Shober RJ Lee Group

The presentation started with a description of entities that13 comprise laboratoryinformatics with the central elements being a federated architecture scientific ldquoBig Datardquo management13 and workflow-shy‐driven Laboratory Information Management13 Systems (LIMS)To aid attendee understanding initial context13 was provided by first13 defining for ldquoBig Datardquo and

Semantic Technologies Context13 was further bolstered through a description of a problem that13 included laboratory testingcharacterization and fabrication processes The challenges for atypical laboratory include the use of many systems and the creation of large amounts of data13 that13 grows daily but13 cannot13 be readily accessed

After the context13 was established the needed functionality was described and consistedof several elements contribute index search analyze and archive While bearing in mind theneeded functionality a string of universal scientific data13 management13 process categories werethen presented with the intent13 of applying the universal description to the materials design anddevelopment13 domain

22

13

Though semantic technologies are applicable over the entire lifecycle of the data13 management13 process three categories were specifically identified as primary candidates fortheir application Data13 Acquisition Data13 Manipulation and Storage and Discovery andAdvisory W3C standards for Resource Description Framework (RDF) were applied to a notionalexample with the intent13 of completing a thread that13 gave the attendee a sense of how thistechnology could be applied

The presentation continued with a description of Postulate a specific solutiondeveloped by RJ Lee Group that13 could be used as part13 of a suite of tools to enable the captureof data13 and the near-shy‐frictionless tagging of data13 with additional observations provenanceinformation or links to additional data Using tools like Postulate repositories could be moreeasily shared with others in a federated manner These technologies would enable thedevelopment13 of indices that13 increase the precision of search results and ease analytics since thedata13 would be ldquoself-shy‐describedrdquo

In the remainder of the presentation Postulate functionality was further described andsome of the challenges presented earlier were addressed For example how do you processthe quantity and variety of data how do you store it how do you find the data13 you care aboutand how can the data13 be presented to you in a meaningful way

Natural13 Language Processing (NLP)Tim Hawes Decisive Analytics

The presentation provided the attendees with a description of NLP13 and how it can be13 used It started with examples13 of how it can be used to mine ldquoBig Datardquo to predict credit card fraud or assess13 user purchase histories for product development NLP is also13 being used13 in13 the field13 of biology and13 appears to have13 enjoyed some13 level of success

Since13 the13 biology and materials communities both publish large13 volume13 of documents (abstracts papers presentations etc) many of13 the NLP approaches developed for13 biology may be

applicable13 to materials The next section of the presentation focused on describing various aspects of NLP tokenization

part-shy‐of-shy‐speech tagging disambiguation parsing entity disambiguation and role13 labeling This was helpful in13 removing the mystery surrounding NLP for those13 who hadnrsquot had previous exposure

Typical products from NLP13 would be useful to the materials development community Examples include providing the gist of a documentrsquos content providing answers to natural13 language questions and knowledge13 construction

Specific NLP13 challenges for the13 materials domain were13 identified inline13 use13 of equations prevalence of tables graphs and13 images (more of a computer vision13 problem) and13 numerically driven13 information13 Additionally the ldquogenrerdquo problem was discussed as13 well as13 a way this might be addressed

using raw data (versus learning data) and13 semi-shy‐supervised learning active learning and distant supervision approaches

23

13

The presentation concluded by reiterating how NLP13 can be used and the value of using it describing some its limitations and13 identifying tool development needed13 for the materials domain

24

13

Community13 and13 Government Priorities in13 Addressing13 Materials Data

The following numbered13 priorities for13 both community and government13 actions were lsquorolled uprsquofrom workshop participantsrsquo inputs and were reviewed at13 the workshop The raw inputs provided by

participants are captured13 under each13 numbered13 priority and provide context to13 the priority and13 often13 more specific actions

Community-shy‐Led13 Priorities1 Develop13 and deploy13 standards13 for data and federatedcollaborative environments-shy‐ Promote13 standards for data13 and collaboration environments-shy‐ Designing and deploying suitable software environments that are capable of facilitating e-shy‐

recording and accelerating (fail often-shy‐fail fast)13 of13 the communityrsquos experience (both successes and

failures)13 in exploring integrated workflows needed to realize the vision of ICME and MGI-shy‐ Data13 standards (program managementtechnical datametadata)-shy‐ Community must start developing a common13 data format data package could13 look like an13

HDF5 file and be shared in a variety of systems -shy‐ Begin13 definition13 of standardized13 infrastructure

-shy‐ Produce13 draft guideline for the metadata including information regarding authorship provenance targethellip

-shy‐ Share13 the13 data13 types being collected with the13 community Include13 nametypeunits of data

-shy‐ Community must adopt a federated13 approach13 -shy‐ allow organizations to autonomously create their13 own13 toolsets but be able to13 readwrite to13 the common13 data package mentioned13 in13 point 1

-shy‐ Good documentation on repositories are important-shy‐ Address networking of repositories (lightweight metadata databaseindex) -shy‐ Digital heterostructure microstructure mapping algorithms for both scanning electron microscopy

(SEM)13 and transmission electron microscopy (TEM) -shy‐ Materials process model (thermodynamics) integration with ICMSE (Integrated Computational

Materials Science and Engineering) data flow

-shy‐ Data sharing protocol across users (government13 OEM13 developers) with appropriate securitycontrol

-shy‐ Work with instrumentequipment OEMs to get auto ingest capabilities and open architectures-shy‐ Work together to define common vocabularies for interoperability-shy‐ Standard tests for ICME13 model data13 standards for microstructure13 data

-shy‐ Standards for ICME13 model data13 storage13 and workflows 2 Communicate the value and need13 for digital materials13 data-shy‐ Brand13 management ndash consistent messaging of goals13 and beliefs -shy‐ Share13 success stories -shy‐ Value (ROI13 Return on Investment) o data capture (globally locally)-shy‐ Give incentive ($$$) for case studies to provide evidence that a shared data repository can be

actually useful beyond the archival purpose

25

13

-shy‐ Share13 ROIs for MII (the13 Materials Innovation Infrastructure)-shy‐ ICME projects due to their iterative nature take longer and are more expensive to realize benefits

in industry implementation13 Multi-shy‐year ICME programs are becoming13 a hard sell to management difficult to13 establish ROI and13 realize benefits claimed

3 Engage the community-shy‐ Write grant proposals-shy‐ Collaboration13 (or e-shy‐collaboration) science that will help us13 identify13 the specific13 incentives13 and

impediments for establishing mutually beneficial highly productive sharing of expertise between13 disciplinary experts in13 materials manufacturing and13 data sciences

-shy‐ Continue to13 work o hard13 problemsmdashdevelop13 open13 source codes discover new descriptors data

mining uncertainty analysis-shy‐ Organize as a community (governmentindustryacademia nationalinternational) -shy‐ Follow data13 recommendations in MGI strategic planmdashworkshops to develop roadmap-shy‐ Attend13 more events like the MGI workshop13 mdash gaining13 lessons learned from other organizations is

extremely valuable

4 ldquoDefine critical datardquo13 to be13 compiled-shy‐ Consensus o what fundamental genomic data is most-shy‐important to compile (thermodynamics13

kinetics13 properties [strength]13 high-shy‐throughput13 DFT) -shy‐ Agree to13 the concept of a ldquouniversal repositoryrdquo for all genomic data that everyone will contribute

to and use (cf everyone maintaining and hosting their13 own databases) this includes issues of13 data

security and sharing etc -shy‐ List of most-shy‐important ICMEMGI13 materials development activities from an industrycommercial13

perspective (alloys composites piezoelectrics etc)-shy‐ Some13 ongoing repository efforts seem redundant We13 should avoid unnecessary competition and

try not13 to step o other peoples toes5 Explore and leverage data solutions developed13 outside materials13 community-shy‐ Semantic web infrastructure13 ndash evaluate13 feasibility of lsquounstructured materials datardquo management

(Technical pathcosttimeframe) -shy‐ Keep communicating outside13 the13 community about solutions -shy‐ Focus on available13 tools open ones when possible

6 Train13 students-shy‐ Sponsor students13 who can think in terms13 of IT and MSE

-shy‐ Training workshops (through professional societies) for training the new generation of materials students13 and researchers13 in the use of readily (already) accessible data science tools13 and cloud

services13 to make13 the13 best use13 of this emerging13 infrastructure13 in enhancing13 their own individual day-shy‐to-shy‐day productivity

7 Establish community13 norms13 for publishing materials13 data-shy‐ Involve publishers by requiring data when publishing Begin by making just the data available first

and then work on standardization panel

26

13

-shy‐ Data associated with publications should be shared along with the publications Not only the

metadata (eg average grain size vs cooling rate) but also the source of the data (eg micrographs used13 to13 generate13 grain size temperaturetime charts13 used to calculate cooling rates etc)

Government-shy‐Led13 Priorities1 Lead13 data strategyapproach13 development-shy‐ Hold meeting like this one in 6-shy‐1213 months with purpose13 of developing roadmap for MII-shy‐ Define a technical roadmap for13 the materials data infrastructure

-shy‐ Governmentrsquos got to decide on a strategy-shy‐ Whatrsquos the critical nucleus eg getting13 CALPHAD off the13 ground

-shy‐ Government must establish an official stance on security and infrastructure-shy‐ List of most-shy‐important ICMEMGI materials development activities from a government perspective

(alloys composites piezoelectrics etc)13 -shy‐ Needs assessment for professional workforce development-shy‐ Work with industry to make sure the research-shy‐to-shy‐engineering13 gap is filled What is the practical13 use

of the projects -shy‐ Provide13 the13 materials community organizing function

-shy‐ Business case for productivity improvement with13 data readily available

-shy‐ Facilitator as honest broker of capability awareness and data13 sharing

-shy‐ Interdisciplinary coordinated effort13 to avoidminimize duplication and redundancy

2 Facilitate data deposition (policy amp technology)-shy‐ Setting expectations and guidelines for best practices and standards where13 relevant -shy‐ Invest in tool13 making data management as easy as possible-shy‐ Promote standards developmentcommon data structuresshare knowledge representations -shy‐ Learn about the WBC Semantic Web capabilities and limitations -shy‐ Government should foster community development for both data package refinement and creating

standard approach to creating an enterprise toolset -shy‐ Provide13 motivation for standard data13 collection ensure13 it includes pedigreeprovenancemeta-shy‐data

-shy‐ Facilitator for materials data13 standards (good and trustworthy data13 vs bad data) -shy‐ Facilitator for forming consortium for data13 mining data13 warehousing and sharing

-shy‐ Facilitate13 industry standards for ICME13 model validation and maturity levels-shy‐ Facilitate13 standards for model-shy‐based13 process and13 supplier qualification

3 Incentivize data deposition-shy‐ Provide13 motivation for13 standard data collection ensure it13 includes13 pedigreeprovenancemeta-shy‐data-shy‐ Push ldquodata13 management planrdquo requirements to ldquodata13 depositrdquo requirements -shy‐ Make data sharing a requirement -shy‐ Contractual requirements that preserve data for future use

-shy‐ Demand data submission as a deliverable that will be in the contract before paying the funds -shy‐ Require that any data generated13 by federal grants is provided13 to13 the government (or a designated13

repository)13 within 3 years after completionmdashstart with basic13 research

-shy‐ Demonstrate the return on investments in the materials data infrastructure

27

13

4 Support13 data sharing13 and deposition13 (provide resources)-shy‐ Increase of MGI13 funds dedicated to MII13 development to 10-shy‐-shy‐repositories amp collab platforms -shy‐ Commit to13 long-shy‐term support for data repositories amp collaborative platforms-shy‐ Funding for development collection of fundamental genomic data

-shy‐ Support thermodynamicskinetics database development for light-shy‐weight alloy systems-shy‐ Funding for universal data13 repository (eg at NIST) ndash creation and maintenance

-shy‐ Fund or establish data13 repositories with commonuniform data13 collection format -shy‐ Createidentify secure data sharing site (like AMRDEC)13 for13 government13 contractorsmdashnot a

repository

-shy‐ AFRL take lead13 o making a database that can13 be13 shared

-shy‐ Infrastructure for materials related semantic web applications5 Support13 data creation-shy‐ Support experimental measurement effort to provide13 quality data for13 database development -shy‐ Support more13 high-shy‐throughput13 (HT)13 studies13 for applied programs13 to populate data Current HT

approaches support are13 limited to too fundamental single-shy‐phase DFT calculations 6 Enhance data management13 competency-shy‐ Government needs to increase in-shy‐house expertise in13 these areas or alternatively work closer with13

groups like NIST We cannot allow other organizations to define our needs for us7 Provide quality datasets13 for model13 developmentvalidation-shy‐ Need for open model benchmark cases with data to be available to the community

28

13

Attendees

John Allison University of MichiganJohn Beatty Army Research Laboratory

Ben13 Blaiszik University of ChicagoANL Carrie Campbell NIST

Sam Chance13 InovexJennifer13 Fielding Air Force Research LabMarco Fornari Central Michigan

Tim Hawes Decisive Analytics Corp

Lou Hector General MotorsAndrea Helbach Air Force Research LabKris Hill ITI-shy‐GlobalMike Glavicic Rolls Royce

Ro Gorham NCDMM

Dennis Griffin NASA

Mike Groeber Air Force Research LabScott Henry ASM InternationalMatt Jacobsen Air Force Research LabWill Joost Dept of Energy-shy‐EERE

Surya13 Kalidindi Georgia Tech

Kevin Kendig Air Force Research LabBen13 Leever Air Force Research LabLara Liou GE Aviation

Kenny Lipkowitz Office of Naval Research

Bill Mullins Office of Naval Research

Benji Maruyama Air Force Research LabRajiv Naik Pratt amp WhitneyJim Neumann Honeywell Ruth13 Pachter Air Force Research LabClare Paul Air Force Research LabAmra Peles United13 Technologies Research13 CenterAndy Powell GE Aviation

Brian13 Puchala University MichiganAjit Roy Air Force Research LabAyman13 Salem Materials ResourcesTom Searles Materials Data Management Jason Sebastian QuestekRichard13 Serwecki NASA

Dongwon Shin Oak Ridge13 National Labo

29

13

Harlan Shober RJ LeeJeff13 Sheehy NASA

Jeff13 Simmons Air Force Research LabVeera Sundararaghavan University of MichiganSesh Tamirisakandala RTI Glenn Tarcea University of MichiganTJ Turner Air Force Research LabVasisht Venkatesh Pratt amp WhitneyChuck Ward Air Force Research LabJim Warren Natl Institute of Standards amp TechJeff13 Williams GE Aviation

Terry Wong Rockedyne

Wayne Ziegler Army Research13 Laboratory

Jake Zindel Ford Motor Company

30

Page 3: Materials Genome Initiative: materials data · universallyaccepted best practices. Under the MGI, Federal agencies have funded research and development designedtoexplore, develop,

13

Workshop Summary

Materials Genome Initiative Materials Data

15 -shy‐ 16 July 2014

Dayton Ohio

ii

13

DISCLAIMER Certain commercial equipment instruments software or materials are identifiedin this paper in order to specify the procedures of experiments and simulations adequatelySuch identification is not13 intended to imply recommendation or endorsement13 by the NationalInstitute of Standards and Technology nor is it13 intended to imply that13 the software materials13 instruments or equipment13 identified are necessarily the best13 available for the purpose

iii

13

Introduction

key objective for the Materials Genome Initiative (MGI) is access to digital13 materials dataAccess may be for an13 individual a team company or an entire community13 of interest There are

several challenges13 to providing access13 to digital materials13 data for which there are presently no

universally13 accepted best practices Under the MGI Federal agencies have funded research and

development designed13 to13 explore develop and apply MGI methodologies Several of these13 efforts either focus on an infrastructure13 for sharing13 data or emphasize13 data sharing as a key13 requirement of the

overall technical effort One objective of starting several MGI-shy‐focused programs concurrently was that13 the approaches

lessons learned and best practices would be13 shared between programs in order to accelerate13 the

overall development of a materials innovation13 infrastructure (MII) This workshop13 assembled

representatives from Federally funded efforts where a critical component13 of13 the overall research effort13 involves sharing of materials data in digital13 format13 The intent of this workshop was for13 the participants to provide insights into the goals of13 their13 effort details of13 their13 approach to sharing digital materials data challenges they face and13 lessons learned This exchange13 of ideas will not only enhance13 the

likelihood for success of the individual13 efforts but will13 also speed the communityrsquos understanding ofhow best to13 proceed13 in13 developing an13 MII

During the workshop participants were asked to identify the top three next steps needed to be

taken within the materials community13 and by13 government agencies in addressing the challengesassociated with digital materials data The following is list of priorities13 derived from those inputs for13 each sector

Communitybull Develop and deploy standards for data and13 federatedcollaborative environments bull Communicate value and13 need13 for digital materials databull Define critical data to be compiled

bull Engage Community

bull Explore and leverage data13 solutions developed outside materials community

bull Train students in these emerging areasbull Establish community norms for publishing materials data

Governmentbull Lead data strategyapproach developmentbull Incentivize data depositionbull Facilitate13 data13 deposition (policy and technology) bull Support data13 sharing and deposition (provide resources) bull Support data13 creation

bull Enhance data13 management competency

bull Provide13 quality datasets for model developmentvalidation

1

13 2

13

MGI Materials Data Workshop Organizing Committee

Air Force Research Laboratory (AFRL)

Dr Charles13 Ward amp Mr Clare Paul

Army Research Laboratory (ARL)

Dr John Beatty amp Mr Wayne Ziegler

Energy Efficiency and Renewable Energy Department of13 Energy (DOE-shy‐EERE)

Mr William13 Joost

National Aeronautics and Space Administration (NASA)

Dr Stephen Smith amp Dr Dennis13 Griffin

National Institute of Standards13 and Technology (NIST)

Dr James Warren

Office of Naval Research (ONR)

Dr William13 Mullins amp Dr Kenny Lipkowitz

3

13

AgendaMGI Materials Data Workshop

Tec^Edge 5000 Springfield Street Dayton OH

1 -shy‐ 16 July 2014

Tuesday 074 ndash 0800 Check-shy‐in

080 ndash 0815 Chuck Ward13 (AFRL) ndash Introductory remarks

Discussion Lead Will Joost (DOE-shy‐EERE)

081 ndash 0845 Jim Warren (NIST)13 ndash Data Infrastructure

084 ndash 0915 John Allison Brian13 Puchala (Univ Michigan)13 ndash PRISMSALMMII

091 ndash 0945 Carrie Campbell (NIST) ndash ChiMaD

094 ndash 1015 Discussion

101 ndash 1030 Break

Discussion Lead John Beatty (ARL)

103 ndash 1100 Marco Fornari (Central Michigan Univ)13 -shy‐ AFLOWLIBorg

110 ndash 1130 Surya13 Kalidindi (Ga13 Tech) ndash Mosaic of MicrostructureIGERT-shy‐CIF21

113 ndash 1200 Discussion

120 ndash 1230 Lunch

Discussion Lead Bill Mullins (ONR)

123 ndash 1300 Tom Searles (MDMi) ndash Cast Aluminum

130 ndash 1320 Dongwon Shin (Oak Ridge13 National Laboratory ORNL)13 ndash Cast Aluminum

132 ndash 1340 Jake Zindel (Ford Motor13 Co) ndash Cast Aluminum

134 ndash 1415 Discussion

141 ndash 1430 Break

Discussion Lead Jim Warren (NIST)

143 -shy‐ 1500 Jason Sebastian (Questek)13 ndash Cast Iron

150 ndash 1530 Lou Hector (General Motors) ndash Advanced13 Steel

153 ndash 1600 Lara Liou (GE Aviation)13 -shy‐ Integrated Computational13 Methods for Composite Materials

160 ndash 1700 Discussion

1700 Adjourn13 for the day

4

13

Wednesday

074 ndash 0800 Check-shy‐in

Discussion Lead Wayne Zeigler (ARL)

080 ndash 0830 Terry Wong (Rocketdyne) ndash Nickel RS FEP

083 ndash 0850 Mike Glavicic (Rolls Royce) ndash MAI Ti Modeling

085 ndash 0910 Rajiv Naik (Pratt amp Whitney) ndash MAI Data Informatics Data Standards Modeling amp UQ

091 ndash 0940 Discussion

094 ndash 0955 Break

Discussion Lead Dennis Griffin (NASA)

095 ndash 1015 Ro Gorham (National Center for Defense Manufacturing and13 Machining NCDMM)13 ndashAmerica Makes

101 ndash 1045 Richard13 Serwecki (NASA) -shy‐ MAPTIS

104 ndash 1105 Scott Henry (ASM International) -shy‐ Computational Materials Data Network

110 ndash 1135 Discussion

113 ndash 1200 Lunch

Discussion Lead Clare Paul (AFRL)

120 ndash 1230 Ben13 Blaiszik (Univ of13 ChicagoANL)13 -shy‐ Globus

123 ndash 1300 Mike Groeber (AFRL) ndash DREAM3DSIMPL

130 ndash 1320 Sam Chance13 (iNovex ) -shy‐ Semantics for Deep Interoperability

132 ndash 1340 Harlan Shober (RJ Lee) -shy‐ Big Data and13 Semantic Technologies

134 ndash 1400 Tim Hawes (Decisive Analytics Corp) ndash Natural Language Processing

140 ndash 1445 Discussion

144 ndash 1500 Wrap up

1500 Adjourn

5

13

Presentation13 Summaries

Welcome13 and Introduction Chuck Ward Air Force Research13 Laboratory

The primary intent of the workshop13 is to promote communication between the groups being

funded under13 the Materials Genome Initiative that13 had significant13 interest13 in the management13 of13 materials data The greater awareness is expected to enhance the likelihood13 for success of the

individual13 efforts and speed13 the communityrsquos understanding of how best to13 proceed13 in13 developing an13 MII

The Materials Research Society (MRS) and the Minerals Metals and Materials Society (TMS)conducted a survey13 of the community13 in MayJune of 2013 to gauge13 attitudes and needs in materials data Over 650 people took the survey13 with 74 of respondents saying they would be willing to share

data if it were encouraged13 as a condition13 of funding or publishing The community showed13 the most interest in gaining access to databases of some of the most basic and essential13 materials data such asthermo-shy‐physical properties

The survey asked participants to evaluate advantages of local versus centralized databasesCentralized13 repositories were perceived13 by respondents to13 offer advantages in13 cost an13 ability to13 share

onersquos data provisioning of uniform data standards and13 an13 ability to13 organize and13 find13 data Localrepositories were perceived to provide advantages in ease of data input time efficiency and flexibility inaccommodating new data13 types

The survey also asked participants to provide their thoughts on impediments to and motivators for13 sharing data Most13 of13 the impediments can be attributed13 to13 concerns of ownership13 of data orexternal data13 restrictions while technical issues such as file size and proprietary formatting were13 also

concerns Overall participants saw the ability to contribute data to the community for reuse and

critique positively The highest motivator was13 to enhance onersquos13 own research visibility Interestingly

respondents saw adding metadata as a motivator presumably due to13 the perceived13 higher value13 of data13 when it is13 described more robustly

6

13

MGI Efforts in Infrastructure Development13 at NISTJim Warren National Institute of13 Standards and Technology

The presentation provided an overview of NIST activities supporting the creation13 of a materials data infrastructure The infrastructure could13 start by defining the interfacesAPIs (Application Programing Interfaces) with which user is able13 to interact for13 the deposit13 or13 retrieval of13 data Beyond13 an interface the following are essential elements for creation of viable infrastructure

bull The establishment of repositories to13 store the databull Tools to enable digital capture of the information preferably at the time of creation

(computation or13 experiment) bull Tools to enable markup of the data13 with sufficient metadata13 to inform someone else how the

data was created as well as various attributes that would help the user judge the quality of theinformation

bull Assignment of a persistent digital identifier (like the DOI (Digital Object13 Identifier) for13 journals)13 so the data can be cited and discovered by other

bull The registration of the availability of the data into13 some sort of ldquoregistryrdquo to13 enable discovery without prior knowledge of the existence of the repositoryspecific data features

Additionally creation of new policy and13 standards as well as development of tools to13 evaluate the quality of13 the data and tools to13 enable data-shy‐driven13 discovery will be required to manage and use materials data to the greatest extent possible

NIST is responding to the OSTP (Office of13 Science and Technology and Policy) memos (Dr John Holdren 2 Feb 14) by developing practices that will be of use to13 the community including a Common13 Access Platform allowing access across multiple collections of information13 Research13 and13 Development of Persistent Identifier (PID) infrastructure with13 the Research Data Alliance (RDA) and RampD o Data Type Registries with13 RDA

Finally NIST is collaborating with the National Data Service (based at the National Center for Supercomputing Applications) and is developing a set of best practices for Data Management Plans

7

13

Th Materials Commons A Novel Information13 Repository and13 Collaboration13 Platform for the13 Materials CommunityBrian13 Puchala Glenn13 Tarcea Stravya13 Tamma Emmanuelle Marquis John13 Allison13 University of Michigan

Critical to13 accelerating the pace of materials science and13 development is the development ofnew and13 improved13 infrastructure providing a seamless way for materials researchers to13 share and13 use

materials data and models To address this need we are developing the Materials Commons (httpprismsenginumicheduprismsmaterialscommons)13 an information repository andcollaboration platform for the metals13 community13 in selected technical emphasis13 areas We envision the

Materials Commons becoming a continuous seamless part of the scientific workflow13 processResearchers will upload13 the results of experiments and13 computations as they are performedautomatically where13 possible along with the13 provenance13 information describing the13 experimental and

computational processes By13 associating this13 data with the experimental and virtual computational materials samples from13 which it is obtained the Materials Commons will build process-shy‐structure-shy‐property relationships enabling the construction13 and13 validation13 of constitutive and13 process models The

Materials Commons website provides an easy-shy‐to-shy‐use interface for uploading and13 downloading data and13 data provenance searching and sharing data At its core the13 Materials Commons consists of secure13 data storage cluster an13 application13 for efficiently uploading and downloading large data sets and a REST

(REpresentational State Transfer) based13 API to13 access and13 extend13 the capabilities of the repository TheAPI allows for features such13 as automated13 data upload13 from experiments and13 computations seamless integration of13 computational models and algorithmic analysis of13 process-shy‐structure-shy‐propertyrelationships The Materials Commons is a central thrust13 of13 the Center13 for13 PRedictive Structural Materials Science (PRISMS)

NIST Materials Data Informatics EffortsCarrie Campbell National Institute of Standards an Technology

The presentation covered NISTrsquos pilot efforts in13 creating a data infrastructure focusing on Phase-shy‐Based13 Property Data Phase-shy‐based data13 are13 diverse13 data13 that13 include 1 2 3 and 4-shy‐dimensional13 data are13 semi-shy‐structured and often include incomplete data sets The figure below provides anoverview of NISTs data efforts supporting the MGI

8

13

NIST Materials Data Repository (httpsmaterialsdatanistgovdspacexmlui) is a customizedDSpace repository for materials that enables sharing of variety of data types including text13 files images and video The repository also provides persistent identifier for each entry and allows users to choose

from a variety of13 license types NIST is also working on two data curation tools for phase-shy‐based13 dataThe Thermodynamic Research13 Center (httptrcnistgov) is extending the ThermoML markup languageand Guide13 Data13 Capture13 software13 tool to metallic systems13 with an initial emphasis on phase equilibriaand thermochemical data The13 Materials Data13 Curation System (MDCS)13 employs user-shy‐defined13 XML-shy‐based13 schemas to curate13 data The13 MDCS13 allows either web-shy‐base data curation13 or using REST API The

entered data13 is stored in a non-shy‐relational database repository can be searched using SPARQL (SPARQL Protocol and RDF Query Language) interface13 and integrated with variety of scientific workflow13 tools13

AFLOW ConsortiumMarco Fornari Michigan13 State University

The overall goal of the Materials Genome Initiative (MGI) is to accelerate the discovery13 development and13 deployment of new materials with ad hoc functionalities The AFLOW Consortium

focuses on the software infrastructures to achieve the MGI goals by using first13 principles calculations The emphasis is on the data13 that must be generated archived validated and post-shy‐processed13 efficiently13 both13 to13 identify new materials and13 distill meaningful quantitative relationships within13 the data This extends the13 solution to the13 inverse13 problem of chemically replacing13 rare13 and costly elements in criticaltechnologies

The AFLOW Consortium provides access to the repository AFLOWLIBORG

lthttpAFLOWLIBORGgt13 that publicly distributes data13 on energetics and thermodynamic stability for more than 600000 compounds and band structures for roughly 300000 materials The repository

includes bulk crystals and alloys13 Data were computed with the AFLOW high-shy‐throughput13 framework

using density functional theory The results of the calculations can13 be queried13 with13 the recentlydeveloped13 REST-shy‐API that standardizes and13 labels the data for retrieval purposes The13 fields defined in

the AFLOW REST-shy‐API include a material object identifier (AUID) data cloud13 location13 (AURL) ownership13 9

13

and sponsorship keywords and data13 specific labels (COMPOSITION ENTROPY EGAP hellip) Simple13 scripting languages13 can be used to automatically download13 from the AFLOWLIB13 repository The AFLOW

approach has13 been extensively applied on Platinum-shy‐group alloys thermoelectric materials magnetic materials topological insulators scintillators etc The AFLOW REST-shy‐API can13 be extended13 to13 includeexperimental data For further details on the13 database the13 reader is directed tohttpmaterialsdukeeduauroAUROARTICULAjcommatsci201202002pdf

Materials Innovation Infrastructure Lessons Learned and Future PlansSurya Kalidindi Georgia Institute13 of Technology

central activity in13 materials research13 is the exploration of processing-shy‐structure-shy‐property (PSP) linkages13 Workflow is defined13 as the sequence13 of steps employed for establishing13 PS linkages ofinterest to any specific engineeringtechnology application

High Throughput Explorations of PSP Linkages13 We must treat13 hierarchical13 material as acomplex System which means embracing and exploiting uncertainty quantification13 and13 complexity

management conceptstools13 It also means we need to design develop and13 validate decision support systems13 that will leverage the best current understanding (with13 its uncertainties) and13 provide objective13 guidance13 on future13 effort investment (combination of experiments and simulations)13 This meansleveraging Data Science to effectively feed the decision support systems described above by developingand deploying high13 throughput strategies for obtaining high value information from both experimentsand models

What are integrated13 workflows Integrated workflows are those that utilize the best13 combination of experiments13 and simulations13 in extracting robust and reliable PSP linkages13 This requiresengaging and exploiting cross-shy‐disciplinary expertise that includes materials science manufacturingsystems approaches uncertainty quantification computational science data and information sciencesThrough this one must ensure that the workflows output the critical information needed13 by design13 and13 manufacturing stakeholders in the materials development value chain13 Integration demands expertise

sharing13 which essentially is the core element of a collaboration Cross-shy‐disciplinary Integration13 demands13 effective13 collaborations13 such as those one might set up using the modern tools of the day13 ie a e-shy‐Collaboration

10

13

Motivators for e-shy‐Collaborations13 e-shy‐Collaborations are central to13 enhancing the productivity ofmaterials development activities in several ways

bull individual13 research groups can identify and engage13 complementary expertise13 needed for their projects

bull organizations can13 coordinate multiple teams and13 multiple projects to13 maximize outputimpactbull organizations can13 identify and13 seek the right partners that13 enhance their13 individual and combined

value e-shy‐Collaborations can13 be open semi-shy‐open or closed13 depending o the needs of the specific organization Productivity enhancement leading to dramatic reductions in cost and time13 expended in materials development is the13 main incentive13 for e-shy‐collaborations

Main Impediments to e-shy‐CollaborationsLack13 of tools to identify13 the13 ldquobestrdquo13 collaborators

bull Need software tools that quantify more precisely the expertise of individuals groupsorganizations

bull Need software tools for13 allowing rich annotations and e-shy‐recording of13 discussions Need analytic tools for13 identification extraction13 and management of metadata (ie13 high13 value abstractionsfrom the raw data)

bull Need searchable semantic metadata databases (these are13 lightweight and can be13 centralized

and customized for specific13 user communities) bull Need software tools for tracking provenance (ie13 whowhatwhenhow13 generated modified

etc) Lack13 of experience13 with the13 use13 of integrated workflows

bull Need software tools13 for e-shy‐recording of13 workflows and development of searchable workflow

databases bull Need recommendation schemes that suggest the best workflow(s) based on recorded prior

experience this provides means to transfer expertise13 acquired by an organization to other teams and other13 projects (even in the same organization)

bull Leads to possible automation and dramatic enhancement of productivity

MINED Grouprsquos Past Experience13 (~2005 -shy‐ ~2008) Funded by the Army Research Office ARO(David Stepp) engaged computer science (CS) graduates to produce13 a single13 user platform that13 integrated certain software codes previously produced13 by our group software development subsequently abandoned but recruited two CS graduates into PhD programs in Materials Science andEngineeringMechanical Engineering (MSEME)13 which dramatically altered the direction of13 research in

the group brief chronology of development bull Senior Design Project (2011-shy‐2012) An online13 portal designed for sharing data13 files and tools

learned the importance of user interfaces and the need for professional13 code developers andsoftware architects

bull HUBzero Instantiation (2012-shy‐2013) Built hub by customizing HUBzero for hierarchical structuralmaterials there are several advantages in terms of accessing and executing codes13 (on the server side) but there are significant barriers13 to establishing fast and productive13 cross-shy‐disciplinary e-shy‐collaborations

11

13

bull GTRI Integration Platform (2013-shy‐2014) Engaged professional code13 developers and software13 architects to design13 a multi-shy‐user web13 portal that allows data and13 code sharing while tracking

provenance and workflows underestimated the resources needed for such a project bull MATIN Collaboration Environment (2013-shy‐2014) Lightweight integrated software13 shell that allows

users to13 seamlessly interface with13 numerous cloud13 services offered13 through13 the web13 through13 their13 APIs Examples include GitHub for versioning and archival of codes Authorea forcollaborative editing of documents figshare for citable publication of research Plotly forcollaborative data analysis13 and visualization Google+ and LinkedIn for eteaming13 and networking MATIN focuses on building an emergent community ndash Materials amp Manufacturing Informatics (MMI) MATIN provisions web services for13 software integration platforms and innovationinfrastructure Examples of MATIN13

bull Spatial Statistics httpsgithubcomtonyfastSpatialStatisticsFFT

bull PyMKS httpopenmaterialsgithubiopymksindexhtml bull Example discussion about periodic boundary conditions in running Finite Element (FE or FEM)

simulations httpsgithubcomsfepysfepyissues267

In the platform developed it is noted that the user or userrsquos organization can set permissions at the

level13 of individual datasetscodes o the degree and13 type of13 sharingCentral Challenge Activation The transition from traditional disciplinary workflows to the

integrated workflows needs to overcome an activation barrier There is no13 clear incentive to13 get ahead13 of the curve in13 anticipating how the upcoming transformations in how materials development will be

pursued13 in13 the future Open (reproducible) science alone is not a sufficiently strong motivator for materials development community with13 strong interests in intellectual13 property development13 indeed itmight actually13 be perceived13 as a negative There is need to explore13 and refine13 what incentives work for individual13 researchers (eg increased productivity)

Future13 Plans13 Continue to13 develop13 and13 deploy MATIN at GT and any other organizationsgroups that13 express an interest in working with us Design create and distribute13 suite13 of ldquomobile13 firstrdquo Appsdesigned13 to13 make data ingest into13 collaboration environments such as MATIN from both material characterization equipment and multi-shy‐scale models an enhanced13 user experience with13 friendly andnatural interfaces Proactively promoteincentivize13 datametadata13 ingest into publicly shareddatabases Work with13 professional societies to13 announce and run new competitionsawards for13 recognizing and rewarding good practices by the members of the broader materials innovation13 usercommunity (eg Best13 Documented Data Award Best Documented Tool Award Best Documented

Workflow Award) Develop and deliver new workshops for training the new generation of13 graduate

students postdocs and junior faculty in the13 emerging workflows of Materials amp ManufacturingInformatics

High Performance Cast Aluminum Alloys for Next Generation Passenger Vehicle EnginesDongwon Shin Materials Science and Technology Division Oak Ridge National Laboratory

12

13

Oak Ridge National Laboratory (ORNL) teamed with Chrysler and Nemak13 to develop next-shy‐generation high-shy‐performance cast aluminum alloys that have13 improved castability high-shy‐temperature

strength and fatigue13 performance At this point the13 data management practice is limited13 to13 analytical data spreadsheets other than13 the CALPHAD computational thermodynamic databases and13 proprietary

property databases for casting simulation As the project goes on more robust data schema will bedeveloped13 to13 store various types of data populated13 from the experiments and13 calculations

Besides the data structure design the speaker discussed13 the necessity of a new approach13 to13 identifying key descriptors from the large dataset that can guide the design of new materials An alloydesign13 requires the balance of extremely complex relationships among a large number of descriptors Some13 examples of these13 descriptors include13 phase fraction13 of key intermetallics lattice parameters ofthe matrix and diffusion13 kinetics13 of solute atoms However optimizing these quantities relies on good

physical metallurgy intuition13 coupled13 with13 clever experiments dogged13 sleuthing and13 a healthy dose ofluck13 Such descriptor-shy‐based predictions of materials properties date back to13 the beginning13 of modern

solid-shy‐state chemistry and physics As13 shown in the Darken-shy‐Gurry map to predict the solubility ofelements in Tantalum13 intuitive descriptors13 such as electronegativity and radius13 could be used formaterials design However as the chemistry and physics13 of materials become more complex intuition-shy‐based13 search13 may not work anymore Thus the use of more advanced13 statistical analysis and13 datamining approach might be required to accelerate the design of complex multi-shy‐componentmulti-shy‐phase

alloy design13

Data Related to Cast Aluminum

Jake Zindel Ford Motor Company

The main focus of our research is the development of CAE (Computer-shy‐aided Engineering) tools to support13 two significant13 activities in the company component13 design (product13 development)13 and

manufacturing engineering (high-shy‐volume production) A key13 part of our research is conducting13 experiments to collect data13 for the13 development and validation of13 the CAE tools

The data13 we collect consists of mechanical properties physical properties microstructuralimages and manufacturing process parameters that form the13 pedigree13 of the13 material This information

can be in the form of Excel files proprietary-shy‐software storage files screen shots and written reports Our immediate need is to deal with this large amount of existing information collected over

many years on a ldquolocalrdquo level The first step is to organize many independent collections of informationin all13 their forms13 The next issue is the time required to enter all13 this information into the database13 Maintenance may also require a significant amount of effort Lastly there will also13 be a need13 to13 manipulate the data contained in the database using Excel Mini-shy‐tab MATLAB etc Therefore links or13 easy exportation of data13 to these13 types of applications will be13 necessary

The next level would involve dealing with information that has been created or collected byothers From an13 OEM (Original Equipment13 Manufacturer)13 perspective this would13 involve convincing

suppliers13 to share processing information that they may consider proprietary In addition to suppliersuniversities and13 national laboratories are also13 sources of data Data from all the13 sources will require13 some sort of protocol to determine its13 quality and resolve inconsistencies

13

13

In the end13 it seems likely that considerable budget and time will be required to implement comprehensive informationdatabases which13 could be the largest hurdle13 of them all

QuesTekrsquos MGIICME Approach to Alloy Optimization and Design (Including Cast Iron) Jason Sebastian QuesTek Innovations LLC

Integrated Computational Materials Engineering (ICME) methods involve the holistic application13 of different computational models across various length13 scales to13 the design development and13 rapid13 qualification13 of advanced13 materials Using itsMaterials by Design technologies QuesTek has successfully applied these ICME methods to13 the design development13 and full qualification of several advanced alloys (including aerospace13 structural steels aerospace13 gear steels advanced castable13 titanium alloys high-shy‐strength corrosion-shy‐resistant13 aluminum alloys13 and high-shy‐performance cast iron)

QuesTekrsquos ICME methods benefit tremendously from the activities of the US Materials GenomeInitiative (MGI)13 Recognizing that the ldquo10 to13 2 yearsrdquo that13 it takes for alloy design and development ishindering the rapid13 deployment of higher-shy‐performance materials13 the MGI is focused on ldquocutting in halfrdquo the time and cost13 to design and deploy novel advanced materials needed to enhance and sustain UScompetitiveness

QuesTek also has ongoing13 activities on the13 application of ICME methods to the13 design and

development of materials for the energy sector including advanced13 materials that enable vehicle

lightweighting13 Of particular note is an ongoing effort with a major diesel13 engine manufacturer on the13 development of high-shy‐performance cast iron13 alloys for engine block applications QuesTekrsquos efforts have

made extensive use of various MGI-shy‐type tools and databases including 1)13 density functional theory

methods for high-shy‐throughput13 screening of cast iron inoculant stabilities (structure lattice parameterstability of various13 oxidesulfide phases) 2)13 liquid iron mobility databases (to help accurately modelsolidification phenomena) and 3) thermodynamic13 databases13 (to examine nano-shy‐precipitate stability)

Moving forward to continue to accelerate the materials development cycle efforts shouldremain focused on the development13 of13 the fundamental databases (thermodynamic kinetic etc)13 that13 enable13 ICME models There13 should also be13 increased focus o efforts related13 to13 the integration13 of the

various computational databases models and tools (the ldquoIrdquo13 in ICME)

Integrated Computational13 Materials Engineering (ICME) of Generation Three Advanced High Strength

Steels Louis G Hector Jr General Motors RampD Center

While traditional material development methods often involve laborious trial-shy‐and-shy‐error testing Integrated Computational13 Materials Engineering (ICME) which falls under the umbrella of the Materials Genome Initiative is offering a means13 for materials13 development under a more compressed time scale Advanced13 high13 strength13 steels (AHSS) which13 contain13 multiple phases (eg ferrite bainite pearlite austenite martensite) and may exhibit13 phase transformation with strain are fertile ground for13 ICME However prediction of macro-shy‐scale constitutive behavior based upon the multi-shy‐scale physical chemical and mechanical phenomena13 in these13 complex microstructures is formidable challenge for ICMEUnprecedented collaboration between universities industry and government labs will be required to

14

13

address AHSS13 development archiving of data and the13 implementation of these13 materials into products that13 benefit13 the American consumer This presentation began with a brief overview of a newDOEUSAMP (US Automotive Materials Partnership)-shy‐funded ICME program aimed at13 the development13 of Generation13 Three (Gen13 3) AHSS The lack of commercial Gen13 3 steels that meet DOE targets is offering13 significant challenges13 to this13 new ICME program The program is13 meeting these challenges13 following four thrusts (1)13 making Gen 3 steels to meet DOE targets (2) passing experimental measurements of key

properties to13 an13 ICME model that produces microstructure-shy‐based13 constitutive models (3) materialevaluation in forming13 and vehicle13 performance13 simulations (4) archiving13 all relevant data13 for easy use13 by

others in13 the program Although13 the Gen13 3 AHSS ICME program is ldquomission13 orientedrdquo it demonstrates an important role for universities in providing the necessary foundation in fundamental materialsscience and engineering

Integrated Computational13 Methods for Composites Materials (ICM2)13 Lessons Learned from

Integration Planning and Feasibility Demonstrations Lara Liou GE Aviation

General Electric Aviation and Lockheed Martin Aeronautics have teamed through an Air Force

Research13 Lab13 program to13 conduct a demonstration13 of an13 Integrated13 Computational Materials Engineering digital framework that links material processing property and13 structural relationships to13 account for processability manufacturability and system performance13 with the13 goal of demonstrating

that13 the usage of13 integrated models can contribute to future airframe and engine designs with dramatic reductions in development13 time and cost Both engine and airframe Foundational Engineering Problems will be demonstrated through a common digital framework to be developed and validated onincreasingly complex articles13 The modeling tools that13 cover composite process models localized13 mechanical behavior models and global design models (Convergent Manufacturing TechnologyrsquosCOMPRO CCA Autodeskrsquos Autodesk Simulation13 Composite Analysis and13 an13 Abaqus based13 University ofMichigan micromechanics model) are centrally linked via a commercially available model integration

interface (Phoenix Integrationrsquos ModelCenterreg)13 Detailed plans of what data will13 be passed among thevarious models as well as how it will be passed have been completed Creation of these detailed13 process maps and13 simple feasibility demonstrations have raised13 inherent technical and13 logistical challenges13 to integrating commercially13 available multi-shy‐scale models Robust model integration methods13 to address these challenges were13 discussed

The exercise of drawing up detailed process maps for what to integrate and how to integrate13 the selected tools and models resulted in key early lessons learned and identification of13 tools critical tolong-shy‐term robust and sustained application of13 Integrated Computational Methods for Composites Materials (ICM2)13 The key lessons learned include

bull Individual13 models must be relatively mature and validated before an ICM2 problem can be

confidently13 defined bull F analysis-shy‐based13 tools within13 the same digital thread should utilize13 the13 same13 FE solver bull The team ideally13 should consist of at least one expert for each13 tool and13 its underlying model as

well as an expert in the digital thread and integration interface tools and programming

15

13

bull Mesh interpolation between13 finite element analysis based13 tools will likely be required and13 acommercially13 available tool for this13 type of composite mesh interpolation is13 key13 for robust and

sustained application of ICM2

Residual Stress Engineering of Rotating Nickel Components Terry Wong13 Aerojet Rocketdyne

The goal for the PW-shy‐8 project13 The Nickel Base Superalloy Residual Stress Foundational Engineering Problem program aims to13 develop13 a multi-shy‐disciplinary analysis approach13 to13 predict and13 incorporate bulk residual13 stress in the13 design of component This will allow for design of part that is optimized for a particular application or design intent (such13 as weight performance or reduction13 ofscrap) To accomplish this13 goal the PW-shy‐813 team has two major tasks 1) the13 development of an13 infrastructure and associated tools needed to predict and incorporate bulk residual13 stress in the designstages13 of a part and 2) a demonstration task that will validate the infrastructures13 and tools13 developed

The Data13 Management task is the foundational element13 of13 the larger13 infrastructure

development task The goal for the data management work is to13 develop13 an13 infrastructure that allowsthe various team members in PW-shy‐813 to share13 data13 and to store13 data13 in database This not only ensures that data generated13 by this program is never lost but also13 allows for data to13 be transferred13 in13 a secure

manner In order to meet the goal of a system13 that securely stores and shares data three major tasks must be accomplished 1) Validation that there13 is secure13 method to transfer data13 between team

members and that the method of storing data is also secure 2)13 development of13 a database schema and

3) development of means to easily upload data13 into the13 database13 and download data13 from the13 database

Of the three the most difficult task was to create a system in which the transfer and storage ofdata was secure Various proposals were made to13 ensure the security of the data but many of them

were rejected not because the proposal was actually unsecure but because it was not acceptable byeach of the13 team memberrsquos IT security policies The13 process to get each team membersrsquo IT department to approve the security of13 the data transfer13 and storage took more effort13 than anticipated In the endthe team members along with their13 respective IT departments approved the use of13 TRUcentrix as a tool to transfer13 data and the use of13 an ASM International13 hosted13 database to13 store data

The development of database schema13 is necessary step in creating the database to store all the data generated (via physical tests as well as data generated by models) A big help in creating the

database schema was to13 use the schema generated13 by Dr Steven13 Arnold13 from NASArsquos Glenn13 Research13 Center Using NASArsquos schema eliminated13 the need13 to13 start the database build13 from scratch13 and13 allowed13 us to13 focus o the specific needs of storing residual stress data

The last task involves the development of tools that make the input of data13 easy The moredifficult it is to13 enter data into13 a database the more likely data13 will not be13 entered Excel-shy‐based13 templates are being developed in order13 to make data entry as easy as possible

The two big lessons learned from PW-shy‐813 in relationship to data13 management are

bull It is essential13 to work with each13 companiesrsquo IT and IT Security people to develop a system thatsatisfies13 the security requirements of each company

16

13

bull Database schema development should not be reinvented but begin with already established the13 schema

Perspectives on MAI ProgramsDr Mike Glavicic Rolls Royce

The current formulation of MAI (AFRLrsquos Metals Affordability Initiative) programs are structured13 independently from one another in that a standard methodology does not exist that describes whatdata is to13 be archived13 at the conclusion13 of a program Nor does there exist a standard13 protocol of howto archive things on a server13 or some other computer resource at the conclusion13 of a program As aresult the current13 practices of13 MAI programs

bull Do not address IP (intellectual property)13 and licensing issues in systematic fashion which leadsto issues in the commercialization of ICME software that is generated13 under a program

bull Do not have a formalized requirement of how data is to be packaged and stored at the end of aprogram Hence even13 in13 the best of cases in13 which13 all of the data is stored the data is archived13 in13 simple folder13 fashion that13 is not13 searchable in any manner13 possible Moreover even if13 all of13 the

data is present it is extremely cumbersome to13 untangle the data for future use using a file folder strategy

Hence at the end of the program both the government and13 MAI team are13 left to manage13 the13 data and13 results in13 an13 unregimented13 fashion

The lack of regimented protocol of how to package programrsquos13 results13 (raw data and reports) makes it extremely difficult for new government programs to leverage the data from previous programs Case in13 point the RR-shy‐1013 program was developed with the13 singular goal to integrate13 the13 models developed13 in13 the LAD-shy‐213 program into commercial software13 platform From the13 outset of the13 program data models and13 other information13 that was required to attained the13 goals of the13 RR-shy‐1013 program were13 either difficult to track down lost or not provided to the13 RR-shy‐1013 team due13 to contractual disputes As aresult funding that13 was earmarked for13 specific technical work at13 the conception of13 the13 program had to

be diverted13 for the recreation13 of data that existed13 in13 a previous program and13 the progress of the

program was stunted13 by this unforeseen13 burden In13 spite of the RR-shy‐1013 programrsquos13 ability to overcome

these issues and become a successful program it13 should be noted that13 the success of13 other13 programs that13 will encounter13 similar13 problems will likely not13 apply as RR-shy‐1013 was very fortunate13 to have13 the13 means to overcome these serious issues that13 were a result13 of13 the lack of13 a data management13 plan

Furthermore if the13 MGI initiative13 is to progressively develop materials models that are13 leveraged and improved in future government funded programs a policy on how to store and share thedata from disconnected13 programs is required The PW-shy‐1113 program is the first13 step in this process as it13 will have the data from three MAI programs (RR-shy‐10 GE-shy‐10 PW-shy‐9) uploaded into database13 that will be13 searchable and will enable the reuse of the data collected in these programs However even if this13 program is a success13 there will undoubtedly13 be other features13 that will be required in the platform

developed13 and13 hence future modification13 will be required Most importantly there does not currently13 exist server that has been set up and funded for the13 archiving13 of all13 future programs with a protocol13

17

13

13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

that13 must13 be adhered to at13 the conclusion of13 a funded program Without13 such a protocol and the

existence13 of server to be13 used in all future13 programs the13 goals set forth by MGI will be13 unnecessarily

difficult and13 more expensive to attain

MAI Data Informatics Data Standards Modeling amp UQDr Rajiv Naik Pratt amp Whitney

The MAI Data13 Informatics Data13 Standards Modeling amp Uncertainty Quantification (UQ) (PW-shy‐11)13 effort13 will enable the pervasive implementation of ICME by developing (i) a web-shy‐based13 OEM-shy‐Suppliercollaboration framework13 for production13 certification13 and13 process modeling data (ii) data standards and

protocols for interoperability between13 OEMs and13 Suppliers and13 (iii)13 UQ methods to capture variability in

process microstructure and property data Models and data13 from earlier MAI programs such as GE-shy‐10 PW-shy‐913 and RR-shy‐1013 will be13 used to develop and implement OEM-shy‐Supplier integration data standards data

structures and UQ models Several OEMs and suppliers (Pratt amp Whitney General Electric Rolls-shy‐RoyceTIMET Alcoa13 and RTI International Metals Inc) are working together on the PW-shy‐11 Team

The OEM-shy‐Supplier collaboration tool will enable data13 exchange related to mechanical testingmicrostructural13 characteristics processing and process modeling The13 data13 standards developed by the

PW-shy‐1113 team will provide13 an industry standard for such OEM-shy‐Supplier collaboration The collaboration13 framework would provide a platform to manage OEM-shy‐Supplier13 workflows data and13 informationsecurity and access13 control and the13 ability to perform system administration without the13 need13 for day-shy‐to-shy‐day intervention13 of the IT department The capture of such13 supplier data over the collaboration

framework will enable the OEMs to enhance process and property13 models data analytics and UQ

modeling Three potential technologies will be evaluated13 for the OEM-shy‐Supplier Collaboration13 Frameworkndash GE Forge Vanderbilt Forge and Semantic Web technology and one13 technology will be selected13 for the

next phase of the program

18

13

Semantic Web technology is paradigm shift from highly structured13 databases to13 more flexible

graph databases from single database searches to13 multiple and13 disparate13 database13 searches and from

keyword searches to contextual and meaningful (semantic) searches It can provide flexibility13 of data

structures and enhanced OEM-shy‐Supplier13 interoperability However it requires more work upfront13 for13 implementation as it needs the development of a common vocabulary and relationships

The data13 standards and data13 structures effort will leverage work at national and13 international standards13 organizations such13 as ASM ASTM CEN and13 NIST and13 develop13 standards and vocabularies forproduction13 certification data that can include chemistry microstructure and mechanical properties This effort will also develop means for the capture of data needed13 for the UQ and13 ICME models andbuild13 databases for13 the capture of13 the data from the GE-shy‐10 PW-shy‐9 and RR-shy‐10 MAI programs

The ICME13 models being developed under the GE-shy‐10 PW-shy‐9 and RR-shy‐10 programs involve severalinterconnected models related to processing microstructure and mechanical properties The UQ effort under PW-shy‐11 will involve considering the uncertainty at each13 step13 of the modeling process and flowing it through the interconnected models to evaluate the relationships between13 the variabilities of13 the

processing parameters the microstructure and13 the mechanical performance

ASMrsquos Computational Materials Data Network amp the NIST-shy‐ASM Structural Materials Data

Demonstration ProjectScott D Henry ASM International

ASM International ndash the worldrsquos largest13 association of13 metals-shy‐focused materials engineers scientists technicians educators and students13 ndash has launched13 a series of initiatives under the

ldquoComputational Materials Data (CMD)13 Networkrdquo umbrella The objective13 of the13 CMD Network is to serve13 as center for information collection and dissemination for materials data13 to support integrated

computational materials13 engineering (ICME) and to help realize the goals13 of the US Materials13 Genome

InitiativeThe first major project for the CMD Network is the NIST-shy‐ASM Structural Materials Data

Demonstration Project (SMDDP) a cooperative research project funded by NIST with participation from

the Kent13 State Center13 for13 Materials Informatics and Granta Design Ltd13 The main13 objectives of SMDDP

are13 to

19

13

bull Establish well-shy‐pedigreed13 and13 curated13 demonstration13 datasets for non-shy‐proprietary metallic structural materials13 data over multiple length scales The team has13 chosen to work with 6061

aluminum and the13 related Al-shy‐Mg-shy‐Si system

bull Work with NIST and the materials data community to develop materials data schema and

ontologies for the demonstration13 datasets cognizant of broader interests and13 datasets

bull Develop and carry out a series of test problems that represent relevant use cases for the13 repository

bull Make data open to the materials data community for use in data analytics modeling and

educational purposes

bull Actively engage the materials data community and13 widely disseminate the findings from the

project

bull Develop and implement data13 capture13 and curation procedures that can serve13 as models for otherdata repositories

The basic philosophy behind the project is that often standards13 emerge through trial and error and examples are13 needed to spur this process

The project is using the NIST-shy‐KSU DSpace13 repository for data13 files The13 curated structured

database is being developed13 using GRANTA13 MI software and13 is hosted13 o ASM serversThe project ndashlaunched in late 2013 and planned as an 18-shy‐month program13 ndash is being conducted in

phases The13 first phase13 involved developing starter database13 and gaining experience13 in data13 curation The second phase (currently underway) involves building out more comprehensive data13 sets that will be

suitable for carrying out test problems13 related to processing-shy‐structure-shy‐property relationships in13 Al 6061 The third phase will involve making further refinements in response to testing The complete data13 set will be opened up online to the materials community as a whole

Globus13 Scientific13 Data Services13 Current and FutureBen13 Blaiszik1 Kyle Chard1 Jim Pruyne1 Rachana Anathakrishnan1 Steve Tuecke12 Ian Foster12

1University of Chicago Computation Institute 2Argonne National Laboratory MCS Division

During this talk we discussed a variety of currently available Globus services13 Via these Globusservices developers13 and IT administrators13 can offer simple secure and robust file transfer as13 well as13 identity profile and group management to their user communities13 Globus allows end users to create

and13 manage a unique identity that can13 be linked13 to13 external identities for authentication End13 users can13 also transfer files across wide13 area13 networks faster and more13 easily whatever their location

The talk also covered prototype services related to data13 publication metadata tagging and13 data

search Globus13 publishing capabilities13 are delivered through a hosted service Metadata is13 stored in the

cloud while raw published data and a fully-shy‐formed metadata log are stored on campus institutionaland group resources that13 are managed and operated by campus administrators Published datasets are

20

13

organized13 by communities and13 their member collections Globus users can13 create and13 manage theirown13 communities and13 collections through13 the data publication13 service13 with collection level policies regarding user13 access Additionally metadata from these communities are imported into a centralized

index to allow for robust searching functionality

Semantics13 for Deep13 InteroperabilitySamuel Chance iNovex

Semantic technology is based on mature13 web standards and intended to be13 used as global framework for13 machine-shy‐to-shy‐machine communication Interoperability is enabled even when the more

basic elements of the semantic technology stack are used13 in13 conjunction with a controlled vocabularyKnowledge13 representation is enabled as one13 begins to utilize13 the13 ldquohigherrdquo elements of the13 stack

brief history of the semantic web13 was presented13 and13 started13 with13 a depiction13 of the web13 that most people see today documents and web pages presented to the user13 through a web application It13 then depicted a transition to an intermediate state where documents and data can co-shy‐exist on asemantic13 web riding atop the same infrastructure (http) Even in this13 state the data can13 be directly

interpreted by a machine13 The history concluded with a depiction of data being extracted from adocument to13 become part of the web13 of data

The basics of semantic technology were presented and emphasized that the technology has afoundation based13 o formal logic that enables software13 to understand information and reduce13 the13 need

for13 a human to interpret13 the meaning This section of13 the presentation concluded with a chart13 depicting

how ldquotriplesrdquo link to13 each13 other to13 form a web13 of dataThe final section touched upon adoption of13 and things to consider13 when semantic technologies

are13 being considered as solution for community or enterprise

21

13

Big Data13 and Semantic TechnologiesHarlan Shober RJ Lee Group

The presentation started with a description of entities that13 comprise laboratoryinformatics with the central elements being a federated architecture scientific ldquoBig Datardquo management13 and workflow-shy‐driven Laboratory Information Management13 Systems (LIMS)To aid attendee understanding initial context13 was provided by first13 defining for ldquoBig Datardquo and

Semantic Technologies Context13 was further bolstered through a description of a problem that13 included laboratory testingcharacterization and fabrication processes The challenges for atypical laboratory include the use of many systems and the creation of large amounts of data13 that13 grows daily but13 cannot13 be readily accessed

After the context13 was established the needed functionality was described and consistedof several elements contribute index search analyze and archive While bearing in mind theneeded functionality a string of universal scientific data13 management13 process categories werethen presented with the intent13 of applying the universal description to the materials design anddevelopment13 domain

22

13

Though semantic technologies are applicable over the entire lifecycle of the data13 management13 process three categories were specifically identified as primary candidates fortheir application Data13 Acquisition Data13 Manipulation and Storage and Discovery andAdvisory W3C standards for Resource Description Framework (RDF) were applied to a notionalexample with the intent13 of completing a thread that13 gave the attendee a sense of how thistechnology could be applied

The presentation continued with a description of Postulate a specific solutiondeveloped by RJ Lee Group that13 could be used as part13 of a suite of tools to enable the captureof data13 and the near-shy‐frictionless tagging of data13 with additional observations provenanceinformation or links to additional data Using tools like Postulate repositories could be moreeasily shared with others in a federated manner These technologies would enable thedevelopment13 of indices that13 increase the precision of search results and ease analytics since thedata13 would be ldquoself-shy‐describedrdquo

In the remainder of the presentation Postulate functionality was further described andsome of the challenges presented earlier were addressed For example how do you processthe quantity and variety of data how do you store it how do you find the data13 you care aboutand how can the data13 be presented to you in a meaningful way

Natural13 Language Processing (NLP)Tim Hawes Decisive Analytics

The presentation provided the attendees with a description of NLP13 and how it can be13 used It started with examples13 of how it can be used to mine ldquoBig Datardquo to predict credit card fraud or assess13 user purchase histories for product development NLP is also13 being used13 in13 the field13 of biology and13 appears to have13 enjoyed some13 level of success

Since13 the13 biology and materials communities both publish large13 volume13 of documents (abstracts papers presentations etc) many of13 the NLP approaches developed for13 biology may be

applicable13 to materials The next section of the presentation focused on describing various aspects of NLP tokenization

part-shy‐of-shy‐speech tagging disambiguation parsing entity disambiguation and role13 labeling This was helpful in13 removing the mystery surrounding NLP for those13 who hadnrsquot had previous exposure

Typical products from NLP13 would be useful to the materials development community Examples include providing the gist of a documentrsquos content providing answers to natural13 language questions and knowledge13 construction

Specific NLP13 challenges for the13 materials domain were13 identified inline13 use13 of equations prevalence of tables graphs and13 images (more of a computer vision13 problem) and13 numerically driven13 information13 Additionally the ldquogenrerdquo problem was discussed as13 well as13 a way this might be addressed

using raw data (versus learning data) and13 semi-shy‐supervised learning active learning and distant supervision approaches

23

13

The presentation concluded by reiterating how NLP13 can be used and the value of using it describing some its limitations and13 identifying tool development needed13 for the materials domain

24

13

Community13 and13 Government Priorities in13 Addressing13 Materials Data

The following numbered13 priorities for13 both community and government13 actions were lsquorolled uprsquofrom workshop participantsrsquo inputs and were reviewed at13 the workshop The raw inputs provided by

participants are captured13 under each13 numbered13 priority and provide context to13 the priority and13 often13 more specific actions

Community-shy‐Led13 Priorities1 Develop13 and deploy13 standards13 for data and federatedcollaborative environments-shy‐ Promote13 standards for data13 and collaboration environments-shy‐ Designing and deploying suitable software environments that are capable of facilitating e-shy‐

recording and accelerating (fail often-shy‐fail fast)13 of13 the communityrsquos experience (both successes and

failures)13 in exploring integrated workflows needed to realize the vision of ICME and MGI-shy‐ Data13 standards (program managementtechnical datametadata)-shy‐ Community must start developing a common13 data format data package could13 look like an13

HDF5 file and be shared in a variety of systems -shy‐ Begin13 definition13 of standardized13 infrastructure

-shy‐ Produce13 draft guideline for the metadata including information regarding authorship provenance targethellip

-shy‐ Share13 the13 data13 types being collected with the13 community Include13 nametypeunits of data

-shy‐ Community must adopt a federated13 approach13 -shy‐ allow organizations to autonomously create their13 own13 toolsets but be able to13 readwrite to13 the common13 data package mentioned13 in13 point 1

-shy‐ Good documentation on repositories are important-shy‐ Address networking of repositories (lightweight metadata databaseindex) -shy‐ Digital heterostructure microstructure mapping algorithms for both scanning electron microscopy

(SEM)13 and transmission electron microscopy (TEM) -shy‐ Materials process model (thermodynamics) integration with ICMSE (Integrated Computational

Materials Science and Engineering) data flow

-shy‐ Data sharing protocol across users (government13 OEM13 developers) with appropriate securitycontrol

-shy‐ Work with instrumentequipment OEMs to get auto ingest capabilities and open architectures-shy‐ Work together to define common vocabularies for interoperability-shy‐ Standard tests for ICME13 model data13 standards for microstructure13 data

-shy‐ Standards for ICME13 model data13 storage13 and workflows 2 Communicate the value and need13 for digital materials13 data-shy‐ Brand13 management ndash consistent messaging of goals13 and beliefs -shy‐ Share13 success stories -shy‐ Value (ROI13 Return on Investment) o data capture (globally locally)-shy‐ Give incentive ($$$) for case studies to provide evidence that a shared data repository can be

actually useful beyond the archival purpose

25

13

-shy‐ Share13 ROIs for MII (the13 Materials Innovation Infrastructure)-shy‐ ICME projects due to their iterative nature take longer and are more expensive to realize benefits

in industry implementation13 Multi-shy‐year ICME programs are becoming13 a hard sell to management difficult to13 establish ROI and13 realize benefits claimed

3 Engage the community-shy‐ Write grant proposals-shy‐ Collaboration13 (or e-shy‐collaboration) science that will help us13 identify13 the specific13 incentives13 and

impediments for establishing mutually beneficial highly productive sharing of expertise between13 disciplinary experts in13 materials manufacturing and13 data sciences

-shy‐ Continue to13 work o hard13 problemsmdashdevelop13 open13 source codes discover new descriptors data

mining uncertainty analysis-shy‐ Organize as a community (governmentindustryacademia nationalinternational) -shy‐ Follow data13 recommendations in MGI strategic planmdashworkshops to develop roadmap-shy‐ Attend13 more events like the MGI workshop13 mdash gaining13 lessons learned from other organizations is

extremely valuable

4 ldquoDefine critical datardquo13 to be13 compiled-shy‐ Consensus o what fundamental genomic data is most-shy‐important to compile (thermodynamics13

kinetics13 properties [strength]13 high-shy‐throughput13 DFT) -shy‐ Agree to13 the concept of a ldquouniversal repositoryrdquo for all genomic data that everyone will contribute

to and use (cf everyone maintaining and hosting their13 own databases) this includes issues of13 data

security and sharing etc -shy‐ List of most-shy‐important ICMEMGI13 materials development activities from an industrycommercial13

perspective (alloys composites piezoelectrics etc)-shy‐ Some13 ongoing repository efforts seem redundant We13 should avoid unnecessary competition and

try not13 to step o other peoples toes5 Explore and leverage data solutions developed13 outside materials13 community-shy‐ Semantic web infrastructure13 ndash evaluate13 feasibility of lsquounstructured materials datardquo management

(Technical pathcosttimeframe) -shy‐ Keep communicating outside13 the13 community about solutions -shy‐ Focus on available13 tools open ones when possible

6 Train13 students-shy‐ Sponsor students13 who can think in terms13 of IT and MSE

-shy‐ Training workshops (through professional societies) for training the new generation of materials students13 and researchers13 in the use of readily (already) accessible data science tools13 and cloud

services13 to make13 the13 best use13 of this emerging13 infrastructure13 in enhancing13 their own individual day-shy‐to-shy‐day productivity

7 Establish community13 norms13 for publishing materials13 data-shy‐ Involve publishers by requiring data when publishing Begin by making just the data available first

and then work on standardization panel

26

13

-shy‐ Data associated with publications should be shared along with the publications Not only the

metadata (eg average grain size vs cooling rate) but also the source of the data (eg micrographs used13 to13 generate13 grain size temperaturetime charts13 used to calculate cooling rates etc)

Government-shy‐Led13 Priorities1 Lead13 data strategyapproach13 development-shy‐ Hold meeting like this one in 6-shy‐1213 months with purpose13 of developing roadmap for MII-shy‐ Define a technical roadmap for13 the materials data infrastructure

-shy‐ Governmentrsquos got to decide on a strategy-shy‐ Whatrsquos the critical nucleus eg getting13 CALPHAD off the13 ground

-shy‐ Government must establish an official stance on security and infrastructure-shy‐ List of most-shy‐important ICMEMGI materials development activities from a government perspective

(alloys composites piezoelectrics etc)13 -shy‐ Needs assessment for professional workforce development-shy‐ Work with industry to make sure the research-shy‐to-shy‐engineering13 gap is filled What is the practical13 use

of the projects -shy‐ Provide13 the13 materials community organizing function

-shy‐ Business case for productivity improvement with13 data readily available

-shy‐ Facilitator as honest broker of capability awareness and data13 sharing

-shy‐ Interdisciplinary coordinated effort13 to avoidminimize duplication and redundancy

2 Facilitate data deposition (policy amp technology)-shy‐ Setting expectations and guidelines for best practices and standards where13 relevant -shy‐ Invest in tool13 making data management as easy as possible-shy‐ Promote standards developmentcommon data structuresshare knowledge representations -shy‐ Learn about the WBC Semantic Web capabilities and limitations -shy‐ Government should foster community development for both data package refinement and creating

standard approach to creating an enterprise toolset -shy‐ Provide13 motivation for standard data13 collection ensure13 it includes pedigreeprovenancemeta-shy‐data

-shy‐ Facilitator for materials data13 standards (good and trustworthy data13 vs bad data) -shy‐ Facilitator for forming consortium for data13 mining data13 warehousing and sharing

-shy‐ Facilitate13 industry standards for ICME13 model validation and maturity levels-shy‐ Facilitate13 standards for model-shy‐based13 process and13 supplier qualification

3 Incentivize data deposition-shy‐ Provide13 motivation for13 standard data collection ensure it13 includes13 pedigreeprovenancemeta-shy‐data-shy‐ Push ldquodata13 management planrdquo requirements to ldquodata13 depositrdquo requirements -shy‐ Make data sharing a requirement -shy‐ Contractual requirements that preserve data for future use

-shy‐ Demand data submission as a deliverable that will be in the contract before paying the funds -shy‐ Require that any data generated13 by federal grants is provided13 to13 the government (or a designated13

repository)13 within 3 years after completionmdashstart with basic13 research

-shy‐ Demonstrate the return on investments in the materials data infrastructure

27

13

4 Support13 data sharing13 and deposition13 (provide resources)-shy‐ Increase of MGI13 funds dedicated to MII13 development to 10-shy‐-shy‐repositories amp collab platforms -shy‐ Commit to13 long-shy‐term support for data repositories amp collaborative platforms-shy‐ Funding for development collection of fundamental genomic data

-shy‐ Support thermodynamicskinetics database development for light-shy‐weight alloy systems-shy‐ Funding for universal data13 repository (eg at NIST) ndash creation and maintenance

-shy‐ Fund or establish data13 repositories with commonuniform data13 collection format -shy‐ Createidentify secure data sharing site (like AMRDEC)13 for13 government13 contractorsmdashnot a

repository

-shy‐ AFRL take lead13 o making a database that can13 be13 shared

-shy‐ Infrastructure for materials related semantic web applications5 Support13 data creation-shy‐ Support experimental measurement effort to provide13 quality data for13 database development -shy‐ Support more13 high-shy‐throughput13 (HT)13 studies13 for applied programs13 to populate data Current HT

approaches support are13 limited to too fundamental single-shy‐phase DFT calculations 6 Enhance data management13 competency-shy‐ Government needs to increase in-shy‐house expertise in13 these areas or alternatively work closer with13

groups like NIST We cannot allow other organizations to define our needs for us7 Provide quality datasets13 for model13 developmentvalidation-shy‐ Need for open model benchmark cases with data to be available to the community

28

13

Attendees

John Allison University of MichiganJohn Beatty Army Research Laboratory

Ben13 Blaiszik University of ChicagoANL Carrie Campbell NIST

Sam Chance13 InovexJennifer13 Fielding Air Force Research LabMarco Fornari Central Michigan

Tim Hawes Decisive Analytics Corp

Lou Hector General MotorsAndrea Helbach Air Force Research LabKris Hill ITI-shy‐GlobalMike Glavicic Rolls Royce

Ro Gorham NCDMM

Dennis Griffin NASA

Mike Groeber Air Force Research LabScott Henry ASM InternationalMatt Jacobsen Air Force Research LabWill Joost Dept of Energy-shy‐EERE

Surya13 Kalidindi Georgia Tech

Kevin Kendig Air Force Research LabBen13 Leever Air Force Research LabLara Liou GE Aviation

Kenny Lipkowitz Office of Naval Research

Bill Mullins Office of Naval Research

Benji Maruyama Air Force Research LabRajiv Naik Pratt amp WhitneyJim Neumann Honeywell Ruth13 Pachter Air Force Research LabClare Paul Air Force Research LabAmra Peles United13 Technologies Research13 CenterAndy Powell GE Aviation

Brian13 Puchala University MichiganAjit Roy Air Force Research LabAyman13 Salem Materials ResourcesTom Searles Materials Data Management Jason Sebastian QuestekRichard13 Serwecki NASA

Dongwon Shin Oak Ridge13 National Labo

29

13

Harlan Shober RJ LeeJeff13 Sheehy NASA

Jeff13 Simmons Air Force Research LabVeera Sundararaghavan University of MichiganSesh Tamirisakandala RTI Glenn Tarcea University of MichiganTJ Turner Air Force Research LabVasisht Venkatesh Pratt amp WhitneyChuck Ward Air Force Research LabJim Warren Natl Institute of Standards amp TechJeff13 Williams GE Aviation

Terry Wong Rockedyne

Wayne Ziegler Army Research13 Laboratory

Jake Zindel Ford Motor Company

30

Page 4: Materials Genome Initiative: materials data · universallyaccepted best practices. Under the MGI, Federal agencies have funded research and development designedtoexplore, develop,

13

DISCLAIMER Certain commercial equipment instruments software or materials are identifiedin this paper in order to specify the procedures of experiments and simulations adequatelySuch identification is not13 intended to imply recommendation or endorsement13 by the NationalInstitute of Standards and Technology nor is it13 intended to imply that13 the software materials13 instruments or equipment13 identified are necessarily the best13 available for the purpose

iii

13

Introduction

key objective for the Materials Genome Initiative (MGI) is access to digital13 materials dataAccess may be for an13 individual a team company or an entire community13 of interest There are

several challenges13 to providing access13 to digital materials13 data for which there are presently no

universally13 accepted best practices Under the MGI Federal agencies have funded research and

development designed13 to13 explore develop and apply MGI methodologies Several of these13 efforts either focus on an infrastructure13 for sharing13 data or emphasize13 data sharing as a key13 requirement of the

overall technical effort One objective of starting several MGI-shy‐focused programs concurrently was that13 the approaches

lessons learned and best practices would be13 shared between programs in order to accelerate13 the

overall development of a materials innovation13 infrastructure (MII) This workshop13 assembled

representatives from Federally funded efforts where a critical component13 of13 the overall research effort13 involves sharing of materials data in digital13 format13 The intent of this workshop was for13 the participants to provide insights into the goals of13 their13 effort details of13 their13 approach to sharing digital materials data challenges they face and13 lessons learned This exchange13 of ideas will not only enhance13 the

likelihood for success of the individual13 efforts but will13 also speed the communityrsquos understanding ofhow best to13 proceed13 in13 developing an13 MII

During the workshop participants were asked to identify the top three next steps needed to be

taken within the materials community13 and by13 government agencies in addressing the challengesassociated with digital materials data The following is list of priorities13 derived from those inputs for13 each sector

Communitybull Develop and deploy standards for data and13 federatedcollaborative environments bull Communicate value and13 need13 for digital materials databull Define critical data to be compiled

bull Engage Community

bull Explore and leverage data13 solutions developed outside materials community

bull Train students in these emerging areasbull Establish community norms for publishing materials data

Governmentbull Lead data strategyapproach developmentbull Incentivize data depositionbull Facilitate13 data13 deposition (policy and technology) bull Support data13 sharing and deposition (provide resources) bull Support data13 creation

bull Enhance data13 management competency

bull Provide13 quality datasets for model developmentvalidation

1

13 2

13

MGI Materials Data Workshop Organizing Committee

Air Force Research Laboratory (AFRL)

Dr Charles13 Ward amp Mr Clare Paul

Army Research Laboratory (ARL)

Dr John Beatty amp Mr Wayne Ziegler

Energy Efficiency and Renewable Energy Department of13 Energy (DOE-shy‐EERE)

Mr William13 Joost

National Aeronautics and Space Administration (NASA)

Dr Stephen Smith amp Dr Dennis13 Griffin

National Institute of Standards13 and Technology (NIST)

Dr James Warren

Office of Naval Research (ONR)

Dr William13 Mullins amp Dr Kenny Lipkowitz

3

13

AgendaMGI Materials Data Workshop

Tec^Edge 5000 Springfield Street Dayton OH

1 -shy‐ 16 July 2014

Tuesday 074 ndash 0800 Check-shy‐in

080 ndash 0815 Chuck Ward13 (AFRL) ndash Introductory remarks

Discussion Lead Will Joost (DOE-shy‐EERE)

081 ndash 0845 Jim Warren (NIST)13 ndash Data Infrastructure

084 ndash 0915 John Allison Brian13 Puchala (Univ Michigan)13 ndash PRISMSALMMII

091 ndash 0945 Carrie Campbell (NIST) ndash ChiMaD

094 ndash 1015 Discussion

101 ndash 1030 Break

Discussion Lead John Beatty (ARL)

103 ndash 1100 Marco Fornari (Central Michigan Univ)13 -shy‐ AFLOWLIBorg

110 ndash 1130 Surya13 Kalidindi (Ga13 Tech) ndash Mosaic of MicrostructureIGERT-shy‐CIF21

113 ndash 1200 Discussion

120 ndash 1230 Lunch

Discussion Lead Bill Mullins (ONR)

123 ndash 1300 Tom Searles (MDMi) ndash Cast Aluminum

130 ndash 1320 Dongwon Shin (Oak Ridge13 National Laboratory ORNL)13 ndash Cast Aluminum

132 ndash 1340 Jake Zindel (Ford Motor13 Co) ndash Cast Aluminum

134 ndash 1415 Discussion

141 ndash 1430 Break

Discussion Lead Jim Warren (NIST)

143 -shy‐ 1500 Jason Sebastian (Questek)13 ndash Cast Iron

150 ndash 1530 Lou Hector (General Motors) ndash Advanced13 Steel

153 ndash 1600 Lara Liou (GE Aviation)13 -shy‐ Integrated Computational13 Methods for Composite Materials

160 ndash 1700 Discussion

1700 Adjourn13 for the day

4

13

Wednesday

074 ndash 0800 Check-shy‐in

Discussion Lead Wayne Zeigler (ARL)

080 ndash 0830 Terry Wong (Rocketdyne) ndash Nickel RS FEP

083 ndash 0850 Mike Glavicic (Rolls Royce) ndash MAI Ti Modeling

085 ndash 0910 Rajiv Naik (Pratt amp Whitney) ndash MAI Data Informatics Data Standards Modeling amp UQ

091 ndash 0940 Discussion

094 ndash 0955 Break

Discussion Lead Dennis Griffin (NASA)

095 ndash 1015 Ro Gorham (National Center for Defense Manufacturing and13 Machining NCDMM)13 ndashAmerica Makes

101 ndash 1045 Richard13 Serwecki (NASA) -shy‐ MAPTIS

104 ndash 1105 Scott Henry (ASM International) -shy‐ Computational Materials Data Network

110 ndash 1135 Discussion

113 ndash 1200 Lunch

Discussion Lead Clare Paul (AFRL)

120 ndash 1230 Ben13 Blaiszik (Univ of13 ChicagoANL)13 -shy‐ Globus

123 ndash 1300 Mike Groeber (AFRL) ndash DREAM3DSIMPL

130 ndash 1320 Sam Chance13 (iNovex ) -shy‐ Semantics for Deep Interoperability

132 ndash 1340 Harlan Shober (RJ Lee) -shy‐ Big Data and13 Semantic Technologies

134 ndash 1400 Tim Hawes (Decisive Analytics Corp) ndash Natural Language Processing

140 ndash 1445 Discussion

144 ndash 1500 Wrap up

1500 Adjourn

5

13

Presentation13 Summaries

Welcome13 and Introduction Chuck Ward Air Force Research13 Laboratory

The primary intent of the workshop13 is to promote communication between the groups being

funded under13 the Materials Genome Initiative that13 had significant13 interest13 in the management13 of13 materials data The greater awareness is expected to enhance the likelihood13 for success of the

individual13 efforts and speed13 the communityrsquos understanding of how best to13 proceed13 in13 developing an13 MII

The Materials Research Society (MRS) and the Minerals Metals and Materials Society (TMS)conducted a survey13 of the community13 in MayJune of 2013 to gauge13 attitudes and needs in materials data Over 650 people took the survey13 with 74 of respondents saying they would be willing to share

data if it were encouraged13 as a condition13 of funding or publishing The community showed13 the most interest in gaining access to databases of some of the most basic and essential13 materials data such asthermo-shy‐physical properties

The survey asked participants to evaluate advantages of local versus centralized databasesCentralized13 repositories were perceived13 by respondents to13 offer advantages in13 cost an13 ability to13 share

onersquos data provisioning of uniform data standards and13 an13 ability to13 organize and13 find13 data Localrepositories were perceived to provide advantages in ease of data input time efficiency and flexibility inaccommodating new data13 types

The survey also asked participants to provide their thoughts on impediments to and motivators for13 sharing data Most13 of13 the impediments can be attributed13 to13 concerns of ownership13 of data orexternal data13 restrictions while technical issues such as file size and proprietary formatting were13 also

concerns Overall participants saw the ability to contribute data to the community for reuse and

critique positively The highest motivator was13 to enhance onersquos13 own research visibility Interestingly

respondents saw adding metadata as a motivator presumably due to13 the perceived13 higher value13 of data13 when it is13 described more robustly

6

13

MGI Efforts in Infrastructure Development13 at NISTJim Warren National Institute of13 Standards and Technology

The presentation provided an overview of NIST activities supporting the creation13 of a materials data infrastructure The infrastructure could13 start by defining the interfacesAPIs (Application Programing Interfaces) with which user is able13 to interact for13 the deposit13 or13 retrieval of13 data Beyond13 an interface the following are essential elements for creation of viable infrastructure

bull The establishment of repositories to13 store the databull Tools to enable digital capture of the information preferably at the time of creation

(computation or13 experiment) bull Tools to enable markup of the data13 with sufficient metadata13 to inform someone else how the

data was created as well as various attributes that would help the user judge the quality of theinformation

bull Assignment of a persistent digital identifier (like the DOI (Digital Object13 Identifier) for13 journals)13 so the data can be cited and discovered by other

bull The registration of the availability of the data into13 some sort of ldquoregistryrdquo to13 enable discovery without prior knowledge of the existence of the repositoryspecific data features

Additionally creation of new policy and13 standards as well as development of tools to13 evaluate the quality of13 the data and tools to13 enable data-shy‐driven13 discovery will be required to manage and use materials data to the greatest extent possible

NIST is responding to the OSTP (Office of13 Science and Technology and Policy) memos (Dr John Holdren 2 Feb 14) by developing practices that will be of use to13 the community including a Common13 Access Platform allowing access across multiple collections of information13 Research13 and13 Development of Persistent Identifier (PID) infrastructure with13 the Research Data Alliance (RDA) and RampD o Data Type Registries with13 RDA

Finally NIST is collaborating with the National Data Service (based at the National Center for Supercomputing Applications) and is developing a set of best practices for Data Management Plans

7

13

Th Materials Commons A Novel Information13 Repository and13 Collaboration13 Platform for the13 Materials CommunityBrian13 Puchala Glenn13 Tarcea Stravya13 Tamma Emmanuelle Marquis John13 Allison13 University of Michigan

Critical to13 accelerating the pace of materials science and13 development is the development ofnew and13 improved13 infrastructure providing a seamless way for materials researchers to13 share and13 use

materials data and models To address this need we are developing the Materials Commons (httpprismsenginumicheduprismsmaterialscommons)13 an information repository andcollaboration platform for the metals13 community13 in selected technical emphasis13 areas We envision the

Materials Commons becoming a continuous seamless part of the scientific workflow13 processResearchers will upload13 the results of experiments and13 computations as they are performedautomatically where13 possible along with the13 provenance13 information describing the13 experimental and

computational processes By13 associating this13 data with the experimental and virtual computational materials samples from13 which it is obtained the Materials Commons will build process-shy‐structure-shy‐property relationships enabling the construction13 and13 validation13 of constitutive and13 process models The

Materials Commons website provides an easy-shy‐to-shy‐use interface for uploading and13 downloading data and13 data provenance searching and sharing data At its core the13 Materials Commons consists of secure13 data storage cluster an13 application13 for efficiently uploading and downloading large data sets and a REST

(REpresentational State Transfer) based13 API to13 access and13 extend13 the capabilities of the repository TheAPI allows for features such13 as automated13 data upload13 from experiments and13 computations seamless integration of13 computational models and algorithmic analysis of13 process-shy‐structure-shy‐propertyrelationships The Materials Commons is a central thrust13 of13 the Center13 for13 PRedictive Structural Materials Science (PRISMS)

NIST Materials Data Informatics EffortsCarrie Campbell National Institute of Standards an Technology

The presentation covered NISTrsquos pilot efforts in13 creating a data infrastructure focusing on Phase-shy‐Based13 Property Data Phase-shy‐based data13 are13 diverse13 data13 that13 include 1 2 3 and 4-shy‐dimensional13 data are13 semi-shy‐structured and often include incomplete data sets The figure below provides anoverview of NISTs data efforts supporting the MGI

8

13

NIST Materials Data Repository (httpsmaterialsdatanistgovdspacexmlui) is a customizedDSpace repository for materials that enables sharing of variety of data types including text13 files images and video The repository also provides persistent identifier for each entry and allows users to choose

from a variety of13 license types NIST is also working on two data curation tools for phase-shy‐based13 dataThe Thermodynamic Research13 Center (httptrcnistgov) is extending the ThermoML markup languageand Guide13 Data13 Capture13 software13 tool to metallic systems13 with an initial emphasis on phase equilibriaand thermochemical data The13 Materials Data13 Curation System (MDCS)13 employs user-shy‐defined13 XML-shy‐based13 schemas to curate13 data The13 MDCS13 allows either web-shy‐base data curation13 or using REST API The

entered data13 is stored in a non-shy‐relational database repository can be searched using SPARQL (SPARQL Protocol and RDF Query Language) interface13 and integrated with variety of scientific workflow13 tools13

AFLOW ConsortiumMarco Fornari Michigan13 State University

The overall goal of the Materials Genome Initiative (MGI) is to accelerate the discovery13 development and13 deployment of new materials with ad hoc functionalities The AFLOW Consortium

focuses on the software infrastructures to achieve the MGI goals by using first13 principles calculations The emphasis is on the data13 that must be generated archived validated and post-shy‐processed13 efficiently13 both13 to13 identify new materials and13 distill meaningful quantitative relationships within13 the data This extends the13 solution to the13 inverse13 problem of chemically replacing13 rare13 and costly elements in criticaltechnologies

The AFLOW Consortium provides access to the repository AFLOWLIBORG

lthttpAFLOWLIBORGgt13 that publicly distributes data13 on energetics and thermodynamic stability for more than 600000 compounds and band structures for roughly 300000 materials The repository

includes bulk crystals and alloys13 Data were computed with the AFLOW high-shy‐throughput13 framework

using density functional theory The results of the calculations can13 be queried13 with13 the recentlydeveloped13 REST-shy‐API that standardizes and13 labels the data for retrieval purposes The13 fields defined in

the AFLOW REST-shy‐API include a material object identifier (AUID) data cloud13 location13 (AURL) ownership13 9

13

and sponsorship keywords and data13 specific labels (COMPOSITION ENTROPY EGAP hellip) Simple13 scripting languages13 can be used to automatically download13 from the AFLOWLIB13 repository The AFLOW

approach has13 been extensively applied on Platinum-shy‐group alloys thermoelectric materials magnetic materials topological insulators scintillators etc The AFLOW REST-shy‐API can13 be extended13 to13 includeexperimental data For further details on the13 database the13 reader is directed tohttpmaterialsdukeeduauroAUROARTICULAjcommatsci201202002pdf

Materials Innovation Infrastructure Lessons Learned and Future PlansSurya Kalidindi Georgia Institute13 of Technology

central activity in13 materials research13 is the exploration of processing-shy‐structure-shy‐property (PSP) linkages13 Workflow is defined13 as the sequence13 of steps employed for establishing13 PS linkages ofinterest to any specific engineeringtechnology application

High Throughput Explorations of PSP Linkages13 We must treat13 hierarchical13 material as acomplex System which means embracing and exploiting uncertainty quantification13 and13 complexity

management conceptstools13 It also means we need to design develop and13 validate decision support systems13 that will leverage the best current understanding (with13 its uncertainties) and13 provide objective13 guidance13 on future13 effort investment (combination of experiments and simulations)13 This meansleveraging Data Science to effectively feed the decision support systems described above by developingand deploying high13 throughput strategies for obtaining high value information from both experimentsand models

What are integrated13 workflows Integrated workflows are those that utilize the best13 combination of experiments13 and simulations13 in extracting robust and reliable PSP linkages13 This requiresengaging and exploiting cross-shy‐disciplinary expertise that includes materials science manufacturingsystems approaches uncertainty quantification computational science data and information sciencesThrough this one must ensure that the workflows output the critical information needed13 by design13 and13 manufacturing stakeholders in the materials development value chain13 Integration demands expertise

sharing13 which essentially is the core element of a collaboration Cross-shy‐disciplinary Integration13 demands13 effective13 collaborations13 such as those one might set up using the modern tools of the day13 ie a e-shy‐Collaboration

10

13

Motivators for e-shy‐Collaborations13 e-shy‐Collaborations are central to13 enhancing the productivity ofmaterials development activities in several ways

bull individual13 research groups can identify and engage13 complementary expertise13 needed for their projects

bull organizations can13 coordinate multiple teams and13 multiple projects to13 maximize outputimpactbull organizations can13 identify and13 seek the right partners that13 enhance their13 individual and combined

value e-shy‐Collaborations can13 be open semi-shy‐open or closed13 depending o the needs of the specific organization Productivity enhancement leading to dramatic reductions in cost and time13 expended in materials development is the13 main incentive13 for e-shy‐collaborations

Main Impediments to e-shy‐CollaborationsLack13 of tools to identify13 the13 ldquobestrdquo13 collaborators

bull Need software tools that quantify more precisely the expertise of individuals groupsorganizations

bull Need software tools for13 allowing rich annotations and e-shy‐recording of13 discussions Need analytic tools for13 identification extraction13 and management of metadata (ie13 high13 value abstractionsfrom the raw data)

bull Need searchable semantic metadata databases (these are13 lightweight and can be13 centralized

and customized for specific13 user communities) bull Need software tools for tracking provenance (ie13 whowhatwhenhow13 generated modified

etc) Lack13 of experience13 with the13 use13 of integrated workflows

bull Need software tools13 for e-shy‐recording of13 workflows and development of searchable workflow

databases bull Need recommendation schemes that suggest the best workflow(s) based on recorded prior

experience this provides means to transfer expertise13 acquired by an organization to other teams and other13 projects (even in the same organization)

bull Leads to possible automation and dramatic enhancement of productivity

MINED Grouprsquos Past Experience13 (~2005 -shy‐ ~2008) Funded by the Army Research Office ARO(David Stepp) engaged computer science (CS) graduates to produce13 a single13 user platform that13 integrated certain software codes previously produced13 by our group software development subsequently abandoned but recruited two CS graduates into PhD programs in Materials Science andEngineeringMechanical Engineering (MSEME)13 which dramatically altered the direction of13 research in

the group brief chronology of development bull Senior Design Project (2011-shy‐2012) An online13 portal designed for sharing data13 files and tools

learned the importance of user interfaces and the need for professional13 code developers andsoftware architects

bull HUBzero Instantiation (2012-shy‐2013) Built hub by customizing HUBzero for hierarchical structuralmaterials there are several advantages in terms of accessing and executing codes13 (on the server side) but there are significant barriers13 to establishing fast and productive13 cross-shy‐disciplinary e-shy‐collaborations

11

13

bull GTRI Integration Platform (2013-shy‐2014) Engaged professional code13 developers and software13 architects to design13 a multi-shy‐user web13 portal that allows data and13 code sharing while tracking

provenance and workflows underestimated the resources needed for such a project bull MATIN Collaboration Environment (2013-shy‐2014) Lightweight integrated software13 shell that allows

users to13 seamlessly interface with13 numerous cloud13 services offered13 through13 the web13 through13 their13 APIs Examples include GitHub for versioning and archival of codes Authorea forcollaborative editing of documents figshare for citable publication of research Plotly forcollaborative data analysis13 and visualization Google+ and LinkedIn for eteaming13 and networking MATIN focuses on building an emergent community ndash Materials amp Manufacturing Informatics (MMI) MATIN provisions web services for13 software integration platforms and innovationinfrastructure Examples of MATIN13

bull Spatial Statistics httpsgithubcomtonyfastSpatialStatisticsFFT

bull PyMKS httpopenmaterialsgithubiopymksindexhtml bull Example discussion about periodic boundary conditions in running Finite Element (FE or FEM)

simulations httpsgithubcomsfepysfepyissues267

In the platform developed it is noted that the user or userrsquos organization can set permissions at the

level13 of individual datasetscodes o the degree and13 type of13 sharingCentral Challenge Activation The transition from traditional disciplinary workflows to the

integrated workflows needs to overcome an activation barrier There is no13 clear incentive to13 get ahead13 of the curve in13 anticipating how the upcoming transformations in how materials development will be

pursued13 in13 the future Open (reproducible) science alone is not a sufficiently strong motivator for materials development community with13 strong interests in intellectual13 property development13 indeed itmight actually13 be perceived13 as a negative There is need to explore13 and refine13 what incentives work for individual13 researchers (eg increased productivity)

Future13 Plans13 Continue to13 develop13 and13 deploy MATIN at GT and any other organizationsgroups that13 express an interest in working with us Design create and distribute13 suite13 of ldquomobile13 firstrdquo Appsdesigned13 to13 make data ingest into13 collaboration environments such as MATIN from both material characterization equipment and multi-shy‐scale models an enhanced13 user experience with13 friendly andnatural interfaces Proactively promoteincentivize13 datametadata13 ingest into publicly shareddatabases Work with13 professional societies to13 announce and run new competitionsawards for13 recognizing and rewarding good practices by the members of the broader materials innovation13 usercommunity (eg Best13 Documented Data Award Best Documented Tool Award Best Documented

Workflow Award) Develop and deliver new workshops for training the new generation of13 graduate

students postdocs and junior faculty in the13 emerging workflows of Materials amp ManufacturingInformatics

High Performance Cast Aluminum Alloys for Next Generation Passenger Vehicle EnginesDongwon Shin Materials Science and Technology Division Oak Ridge National Laboratory

12

13

Oak Ridge National Laboratory (ORNL) teamed with Chrysler and Nemak13 to develop next-shy‐generation high-shy‐performance cast aluminum alloys that have13 improved castability high-shy‐temperature

strength and fatigue13 performance At this point the13 data management practice is limited13 to13 analytical data spreadsheets other than13 the CALPHAD computational thermodynamic databases and13 proprietary

property databases for casting simulation As the project goes on more robust data schema will bedeveloped13 to13 store various types of data populated13 from the experiments and13 calculations

Besides the data structure design the speaker discussed13 the necessity of a new approach13 to13 identifying key descriptors from the large dataset that can guide the design of new materials An alloydesign13 requires the balance of extremely complex relationships among a large number of descriptors Some13 examples of these13 descriptors include13 phase fraction13 of key intermetallics lattice parameters ofthe matrix and diffusion13 kinetics13 of solute atoms However optimizing these quantities relies on good

physical metallurgy intuition13 coupled13 with13 clever experiments dogged13 sleuthing and13 a healthy dose ofluck13 Such descriptor-shy‐based predictions of materials properties date back to13 the beginning13 of modern

solid-shy‐state chemistry and physics As13 shown in the Darken-shy‐Gurry map to predict the solubility ofelements in Tantalum13 intuitive descriptors13 such as electronegativity and radius13 could be used formaterials design However as the chemistry and physics13 of materials become more complex intuition-shy‐based13 search13 may not work anymore Thus the use of more advanced13 statistical analysis and13 datamining approach might be required to accelerate the design of complex multi-shy‐componentmulti-shy‐phase

alloy design13

Data Related to Cast Aluminum

Jake Zindel Ford Motor Company

The main focus of our research is the development of CAE (Computer-shy‐aided Engineering) tools to support13 two significant13 activities in the company component13 design (product13 development)13 and

manufacturing engineering (high-shy‐volume production) A key13 part of our research is conducting13 experiments to collect data13 for the13 development and validation of13 the CAE tools

The data13 we collect consists of mechanical properties physical properties microstructuralimages and manufacturing process parameters that form the13 pedigree13 of the13 material This information

can be in the form of Excel files proprietary-shy‐software storage files screen shots and written reports Our immediate need is to deal with this large amount of existing information collected over

many years on a ldquolocalrdquo level The first step is to organize many independent collections of informationin all13 their forms13 The next issue is the time required to enter all13 this information into the database13 Maintenance may also require a significant amount of effort Lastly there will also13 be a need13 to13 manipulate the data contained in the database using Excel Mini-shy‐tab MATLAB etc Therefore links or13 easy exportation of data13 to these13 types of applications will be13 necessary

The next level would involve dealing with information that has been created or collected byothers From an13 OEM (Original Equipment13 Manufacturer)13 perspective this would13 involve convincing

suppliers13 to share processing information that they may consider proprietary In addition to suppliersuniversities and13 national laboratories are also13 sources of data Data from all the13 sources will require13 some sort of protocol to determine its13 quality and resolve inconsistencies

13

13

In the end13 it seems likely that considerable budget and time will be required to implement comprehensive informationdatabases which13 could be the largest hurdle13 of them all

QuesTekrsquos MGIICME Approach to Alloy Optimization and Design (Including Cast Iron) Jason Sebastian QuesTek Innovations LLC

Integrated Computational Materials Engineering (ICME) methods involve the holistic application13 of different computational models across various length13 scales to13 the design development and13 rapid13 qualification13 of advanced13 materials Using itsMaterials by Design technologies QuesTek has successfully applied these ICME methods to13 the design development13 and full qualification of several advanced alloys (including aerospace13 structural steels aerospace13 gear steels advanced castable13 titanium alloys high-shy‐strength corrosion-shy‐resistant13 aluminum alloys13 and high-shy‐performance cast iron)

QuesTekrsquos ICME methods benefit tremendously from the activities of the US Materials GenomeInitiative (MGI)13 Recognizing that the ldquo10 to13 2 yearsrdquo that13 it takes for alloy design and development ishindering the rapid13 deployment of higher-shy‐performance materials13 the MGI is focused on ldquocutting in halfrdquo the time and cost13 to design and deploy novel advanced materials needed to enhance and sustain UScompetitiveness

QuesTek also has ongoing13 activities on the13 application of ICME methods to the13 design and

development of materials for the energy sector including advanced13 materials that enable vehicle

lightweighting13 Of particular note is an ongoing effort with a major diesel13 engine manufacturer on the13 development of high-shy‐performance cast iron13 alloys for engine block applications QuesTekrsquos efforts have

made extensive use of various MGI-shy‐type tools and databases including 1)13 density functional theory

methods for high-shy‐throughput13 screening of cast iron inoculant stabilities (structure lattice parameterstability of various13 oxidesulfide phases) 2)13 liquid iron mobility databases (to help accurately modelsolidification phenomena) and 3) thermodynamic13 databases13 (to examine nano-shy‐precipitate stability)

Moving forward to continue to accelerate the materials development cycle efforts shouldremain focused on the development13 of13 the fundamental databases (thermodynamic kinetic etc)13 that13 enable13 ICME models There13 should also be13 increased focus o efforts related13 to13 the integration13 of the

various computational databases models and tools (the ldquoIrdquo13 in ICME)

Integrated Computational13 Materials Engineering (ICME) of Generation Three Advanced High Strength

Steels Louis G Hector Jr General Motors RampD Center

While traditional material development methods often involve laborious trial-shy‐and-shy‐error testing Integrated Computational13 Materials Engineering (ICME) which falls under the umbrella of the Materials Genome Initiative is offering a means13 for materials13 development under a more compressed time scale Advanced13 high13 strength13 steels (AHSS) which13 contain13 multiple phases (eg ferrite bainite pearlite austenite martensite) and may exhibit13 phase transformation with strain are fertile ground for13 ICME However prediction of macro-shy‐scale constitutive behavior based upon the multi-shy‐scale physical chemical and mechanical phenomena13 in these13 complex microstructures is formidable challenge for ICMEUnprecedented collaboration between universities industry and government labs will be required to

14

13

address AHSS13 development archiving of data and the13 implementation of these13 materials into products that13 benefit13 the American consumer This presentation began with a brief overview of a newDOEUSAMP (US Automotive Materials Partnership)-shy‐funded ICME program aimed at13 the development13 of Generation13 Three (Gen13 3) AHSS The lack of commercial Gen13 3 steels that meet DOE targets is offering13 significant challenges13 to this13 new ICME program The program is13 meeting these challenges13 following four thrusts (1)13 making Gen 3 steels to meet DOE targets (2) passing experimental measurements of key

properties to13 an13 ICME model that produces microstructure-shy‐based13 constitutive models (3) materialevaluation in forming13 and vehicle13 performance13 simulations (4) archiving13 all relevant data13 for easy use13 by

others in13 the program Although13 the Gen13 3 AHSS ICME program is ldquomission13 orientedrdquo it demonstrates an important role for universities in providing the necessary foundation in fundamental materialsscience and engineering

Integrated Computational13 Methods for Composites Materials (ICM2)13 Lessons Learned from

Integration Planning and Feasibility Demonstrations Lara Liou GE Aviation

General Electric Aviation and Lockheed Martin Aeronautics have teamed through an Air Force

Research13 Lab13 program to13 conduct a demonstration13 of an13 Integrated13 Computational Materials Engineering digital framework that links material processing property and13 structural relationships to13 account for processability manufacturability and system performance13 with the13 goal of demonstrating

that13 the usage of13 integrated models can contribute to future airframe and engine designs with dramatic reductions in development13 time and cost Both engine and airframe Foundational Engineering Problems will be demonstrated through a common digital framework to be developed and validated onincreasingly complex articles13 The modeling tools that13 cover composite process models localized13 mechanical behavior models and global design models (Convergent Manufacturing TechnologyrsquosCOMPRO CCA Autodeskrsquos Autodesk Simulation13 Composite Analysis and13 an13 Abaqus based13 University ofMichigan micromechanics model) are centrally linked via a commercially available model integration

interface (Phoenix Integrationrsquos ModelCenterreg)13 Detailed plans of what data will13 be passed among thevarious models as well as how it will be passed have been completed Creation of these detailed13 process maps and13 simple feasibility demonstrations have raised13 inherent technical and13 logistical challenges13 to integrating commercially13 available multi-shy‐scale models Robust model integration methods13 to address these challenges were13 discussed

The exercise of drawing up detailed process maps for what to integrate and how to integrate13 the selected tools and models resulted in key early lessons learned and identification of13 tools critical tolong-shy‐term robust and sustained application of13 Integrated Computational Methods for Composites Materials (ICM2)13 The key lessons learned include

bull Individual13 models must be relatively mature and validated before an ICM2 problem can be

confidently13 defined bull F analysis-shy‐based13 tools within13 the same digital thread should utilize13 the13 same13 FE solver bull The team ideally13 should consist of at least one expert for each13 tool and13 its underlying model as

well as an expert in the digital thread and integration interface tools and programming

15

13

bull Mesh interpolation between13 finite element analysis based13 tools will likely be required and13 acommercially13 available tool for this13 type of composite mesh interpolation is13 key13 for robust and

sustained application of ICM2

Residual Stress Engineering of Rotating Nickel Components Terry Wong13 Aerojet Rocketdyne

The goal for the PW-shy‐8 project13 The Nickel Base Superalloy Residual Stress Foundational Engineering Problem program aims to13 develop13 a multi-shy‐disciplinary analysis approach13 to13 predict and13 incorporate bulk residual13 stress in the13 design of component This will allow for design of part that is optimized for a particular application or design intent (such13 as weight performance or reduction13 ofscrap) To accomplish this13 goal the PW-shy‐813 team has two major tasks 1) the13 development of an13 infrastructure and associated tools needed to predict and incorporate bulk residual13 stress in the designstages13 of a part and 2) a demonstration task that will validate the infrastructures13 and tools13 developed

The Data13 Management task is the foundational element13 of13 the larger13 infrastructure

development task The goal for the data management work is to13 develop13 an13 infrastructure that allowsthe various team members in PW-shy‐813 to share13 data13 and to store13 data13 in database This not only ensures that data generated13 by this program is never lost but also13 allows for data to13 be transferred13 in13 a secure

manner In order to meet the goal of a system13 that securely stores and shares data three major tasks must be accomplished 1) Validation that there13 is secure13 method to transfer data13 between team

members and that the method of storing data is also secure 2)13 development of13 a database schema and

3) development of means to easily upload data13 into the13 database13 and download data13 from the13 database

Of the three the most difficult task was to create a system in which the transfer and storage ofdata was secure Various proposals were made to13 ensure the security of the data but many of them

were rejected not because the proposal was actually unsecure but because it was not acceptable byeach of the13 team memberrsquos IT security policies The13 process to get each team membersrsquo IT department to approve the security of13 the data transfer13 and storage took more effort13 than anticipated In the endthe team members along with their13 respective IT departments approved the use of13 TRUcentrix as a tool to transfer13 data and the use of13 an ASM International13 hosted13 database to13 store data

The development of database schema13 is necessary step in creating the database to store all the data generated (via physical tests as well as data generated by models) A big help in creating the

database schema was to13 use the schema generated13 by Dr Steven13 Arnold13 from NASArsquos Glenn13 Research13 Center Using NASArsquos schema eliminated13 the need13 to13 start the database build13 from scratch13 and13 allowed13 us to13 focus o the specific needs of storing residual stress data

The last task involves the development of tools that make the input of data13 easy The moredifficult it is to13 enter data into13 a database the more likely data13 will not be13 entered Excel-shy‐based13 templates are being developed in order13 to make data entry as easy as possible

The two big lessons learned from PW-shy‐813 in relationship to data13 management are

bull It is essential13 to work with each13 companiesrsquo IT and IT Security people to develop a system thatsatisfies13 the security requirements of each company

16

13

bull Database schema development should not be reinvented but begin with already established the13 schema

Perspectives on MAI ProgramsDr Mike Glavicic Rolls Royce

The current formulation of MAI (AFRLrsquos Metals Affordability Initiative) programs are structured13 independently from one another in that a standard methodology does not exist that describes whatdata is to13 be archived13 at the conclusion13 of a program Nor does there exist a standard13 protocol of howto archive things on a server13 or some other computer resource at the conclusion13 of a program As aresult the current13 practices of13 MAI programs

bull Do not address IP (intellectual property)13 and licensing issues in systematic fashion which leadsto issues in the commercialization of ICME software that is generated13 under a program

bull Do not have a formalized requirement of how data is to be packaged and stored at the end of aprogram Hence even13 in13 the best of cases in13 which13 all of the data is stored the data is archived13 in13 simple folder13 fashion that13 is not13 searchable in any manner13 possible Moreover even if13 all of13 the

data is present it is extremely cumbersome to13 untangle the data for future use using a file folder strategy

Hence at the end of the program both the government and13 MAI team are13 left to manage13 the13 data and13 results in13 an13 unregimented13 fashion

The lack of regimented protocol of how to package programrsquos13 results13 (raw data and reports) makes it extremely difficult for new government programs to leverage the data from previous programs Case in13 point the RR-shy‐1013 program was developed with the13 singular goal to integrate13 the13 models developed13 in13 the LAD-shy‐213 program into commercial software13 platform From the13 outset of the13 program data models and13 other information13 that was required to attained the13 goals of the13 RR-shy‐1013 program were13 either difficult to track down lost or not provided to the13 RR-shy‐1013 team due13 to contractual disputes As aresult funding that13 was earmarked for13 specific technical work at13 the conception of13 the13 program had to

be diverted13 for the recreation13 of data that existed13 in13 a previous program and13 the progress of the

program was stunted13 by this unforeseen13 burden In13 spite of the RR-shy‐1013 programrsquos13 ability to overcome

these issues and become a successful program it13 should be noted that13 the success of13 other13 programs that13 will encounter13 similar13 problems will likely not13 apply as RR-shy‐1013 was very fortunate13 to have13 the13 means to overcome these serious issues that13 were a result13 of13 the lack of13 a data management13 plan

Furthermore if the13 MGI initiative13 is to progressively develop materials models that are13 leveraged and improved in future government funded programs a policy on how to store and share thedata from disconnected13 programs is required The PW-shy‐1113 program is the first13 step in this process as it13 will have the data from three MAI programs (RR-shy‐10 GE-shy‐10 PW-shy‐9) uploaded into database13 that will be13 searchable and will enable the reuse of the data collected in these programs However even if this13 program is a success13 there will undoubtedly13 be other features13 that will be required in the platform

developed13 and13 hence future modification13 will be required Most importantly there does not currently13 exist server that has been set up and funded for the13 archiving13 of all13 future programs with a protocol13

17

13

13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

that13 must13 be adhered to at13 the conclusion of13 a funded program Without13 such a protocol and the

existence13 of server to be13 used in all future13 programs the13 goals set forth by MGI will be13 unnecessarily

difficult and13 more expensive to attain

MAI Data Informatics Data Standards Modeling amp UQDr Rajiv Naik Pratt amp Whitney

The MAI Data13 Informatics Data13 Standards Modeling amp Uncertainty Quantification (UQ) (PW-shy‐11)13 effort13 will enable the pervasive implementation of ICME by developing (i) a web-shy‐based13 OEM-shy‐Suppliercollaboration framework13 for production13 certification13 and13 process modeling data (ii) data standards and

protocols for interoperability between13 OEMs and13 Suppliers and13 (iii)13 UQ methods to capture variability in

process microstructure and property data Models and data13 from earlier MAI programs such as GE-shy‐10 PW-shy‐913 and RR-shy‐1013 will be13 used to develop and implement OEM-shy‐Supplier integration data standards data

structures and UQ models Several OEMs and suppliers (Pratt amp Whitney General Electric Rolls-shy‐RoyceTIMET Alcoa13 and RTI International Metals Inc) are working together on the PW-shy‐11 Team

The OEM-shy‐Supplier collaboration tool will enable data13 exchange related to mechanical testingmicrostructural13 characteristics processing and process modeling The13 data13 standards developed by the

PW-shy‐1113 team will provide13 an industry standard for such OEM-shy‐Supplier collaboration The collaboration13 framework would provide a platform to manage OEM-shy‐Supplier13 workflows data and13 informationsecurity and access13 control and the13 ability to perform system administration without the13 need13 for day-shy‐to-shy‐day intervention13 of the IT department The capture of such13 supplier data over the collaboration

framework will enable the OEMs to enhance process and property13 models data analytics and UQ

modeling Three potential technologies will be evaluated13 for the OEM-shy‐Supplier Collaboration13 Frameworkndash GE Forge Vanderbilt Forge and Semantic Web technology and one13 technology will be selected13 for the

next phase of the program

18

13

Semantic Web technology is paradigm shift from highly structured13 databases to13 more flexible

graph databases from single database searches to13 multiple and13 disparate13 database13 searches and from

keyword searches to contextual and meaningful (semantic) searches It can provide flexibility13 of data

structures and enhanced OEM-shy‐Supplier13 interoperability However it requires more work upfront13 for13 implementation as it needs the development of a common vocabulary and relationships

The data13 standards and data13 structures effort will leverage work at national and13 international standards13 organizations such13 as ASM ASTM CEN and13 NIST and13 develop13 standards and vocabularies forproduction13 certification data that can include chemistry microstructure and mechanical properties This effort will also develop means for the capture of data needed13 for the UQ and13 ICME models andbuild13 databases for13 the capture of13 the data from the GE-shy‐10 PW-shy‐9 and RR-shy‐10 MAI programs

The ICME13 models being developed under the GE-shy‐10 PW-shy‐9 and RR-shy‐10 programs involve severalinterconnected models related to processing microstructure and mechanical properties The UQ effort under PW-shy‐11 will involve considering the uncertainty at each13 step13 of the modeling process and flowing it through the interconnected models to evaluate the relationships between13 the variabilities of13 the

processing parameters the microstructure and13 the mechanical performance

ASMrsquos Computational Materials Data Network amp the NIST-shy‐ASM Structural Materials Data

Demonstration ProjectScott D Henry ASM International

ASM International ndash the worldrsquos largest13 association of13 metals-shy‐focused materials engineers scientists technicians educators and students13 ndash has launched13 a series of initiatives under the

ldquoComputational Materials Data (CMD)13 Networkrdquo umbrella The objective13 of the13 CMD Network is to serve13 as center for information collection and dissemination for materials data13 to support integrated

computational materials13 engineering (ICME) and to help realize the goals13 of the US Materials13 Genome

InitiativeThe first major project for the CMD Network is the NIST-shy‐ASM Structural Materials Data

Demonstration Project (SMDDP) a cooperative research project funded by NIST with participation from

the Kent13 State Center13 for13 Materials Informatics and Granta Design Ltd13 The main13 objectives of SMDDP

are13 to

19

13

bull Establish well-shy‐pedigreed13 and13 curated13 demonstration13 datasets for non-shy‐proprietary metallic structural materials13 data over multiple length scales The team has13 chosen to work with 6061

aluminum and the13 related Al-shy‐Mg-shy‐Si system

bull Work with NIST and the materials data community to develop materials data schema and

ontologies for the demonstration13 datasets cognizant of broader interests and13 datasets

bull Develop and carry out a series of test problems that represent relevant use cases for the13 repository

bull Make data open to the materials data community for use in data analytics modeling and

educational purposes

bull Actively engage the materials data community and13 widely disseminate the findings from the

project

bull Develop and implement data13 capture13 and curation procedures that can serve13 as models for otherdata repositories

The basic philosophy behind the project is that often standards13 emerge through trial and error and examples are13 needed to spur this process

The project is using the NIST-shy‐KSU DSpace13 repository for data13 files The13 curated structured

database is being developed13 using GRANTA13 MI software and13 is hosted13 o ASM serversThe project ndashlaunched in late 2013 and planned as an 18-shy‐month program13 ndash is being conducted in

phases The13 first phase13 involved developing starter database13 and gaining experience13 in data13 curation The second phase (currently underway) involves building out more comprehensive data13 sets that will be

suitable for carrying out test problems13 related to processing-shy‐structure-shy‐property relationships in13 Al 6061 The third phase will involve making further refinements in response to testing The complete data13 set will be opened up online to the materials community as a whole

Globus13 Scientific13 Data Services13 Current and FutureBen13 Blaiszik1 Kyle Chard1 Jim Pruyne1 Rachana Anathakrishnan1 Steve Tuecke12 Ian Foster12

1University of Chicago Computation Institute 2Argonne National Laboratory MCS Division

During this talk we discussed a variety of currently available Globus services13 Via these Globusservices developers13 and IT administrators13 can offer simple secure and robust file transfer as13 well as13 identity profile and group management to their user communities13 Globus allows end users to create

and13 manage a unique identity that can13 be linked13 to13 external identities for authentication End13 users can13 also transfer files across wide13 area13 networks faster and more13 easily whatever their location

The talk also covered prototype services related to data13 publication metadata tagging and13 data

search Globus13 publishing capabilities13 are delivered through a hosted service Metadata is13 stored in the

cloud while raw published data and a fully-shy‐formed metadata log are stored on campus institutionaland group resources that13 are managed and operated by campus administrators Published datasets are

20

13

organized13 by communities and13 their member collections Globus users can13 create and13 manage theirown13 communities and13 collections through13 the data publication13 service13 with collection level policies regarding user13 access Additionally metadata from these communities are imported into a centralized

index to allow for robust searching functionality

Semantics13 for Deep13 InteroperabilitySamuel Chance iNovex

Semantic technology is based on mature13 web standards and intended to be13 used as global framework for13 machine-shy‐to-shy‐machine communication Interoperability is enabled even when the more

basic elements of the semantic technology stack are used13 in13 conjunction with a controlled vocabularyKnowledge13 representation is enabled as one13 begins to utilize13 the13 ldquohigherrdquo elements of the13 stack

brief history of the semantic web13 was presented13 and13 started13 with13 a depiction13 of the web13 that most people see today documents and web pages presented to the user13 through a web application It13 then depicted a transition to an intermediate state where documents and data can co-shy‐exist on asemantic13 web riding atop the same infrastructure (http) Even in this13 state the data can13 be directly

interpreted by a machine13 The history concluded with a depiction of data being extracted from adocument to13 become part of the web13 of data

The basics of semantic technology were presented and emphasized that the technology has afoundation based13 o formal logic that enables software13 to understand information and reduce13 the13 need

for13 a human to interpret13 the meaning This section of13 the presentation concluded with a chart13 depicting

how ldquotriplesrdquo link to13 each13 other to13 form a web13 of dataThe final section touched upon adoption of13 and things to consider13 when semantic technologies

are13 being considered as solution for community or enterprise

21

13

Big Data13 and Semantic TechnologiesHarlan Shober RJ Lee Group

The presentation started with a description of entities that13 comprise laboratoryinformatics with the central elements being a federated architecture scientific ldquoBig Datardquo management13 and workflow-shy‐driven Laboratory Information Management13 Systems (LIMS)To aid attendee understanding initial context13 was provided by first13 defining for ldquoBig Datardquo and

Semantic Technologies Context13 was further bolstered through a description of a problem that13 included laboratory testingcharacterization and fabrication processes The challenges for atypical laboratory include the use of many systems and the creation of large amounts of data13 that13 grows daily but13 cannot13 be readily accessed

After the context13 was established the needed functionality was described and consistedof several elements contribute index search analyze and archive While bearing in mind theneeded functionality a string of universal scientific data13 management13 process categories werethen presented with the intent13 of applying the universal description to the materials design anddevelopment13 domain

22

13

Though semantic technologies are applicable over the entire lifecycle of the data13 management13 process three categories were specifically identified as primary candidates fortheir application Data13 Acquisition Data13 Manipulation and Storage and Discovery andAdvisory W3C standards for Resource Description Framework (RDF) were applied to a notionalexample with the intent13 of completing a thread that13 gave the attendee a sense of how thistechnology could be applied

The presentation continued with a description of Postulate a specific solutiondeveloped by RJ Lee Group that13 could be used as part13 of a suite of tools to enable the captureof data13 and the near-shy‐frictionless tagging of data13 with additional observations provenanceinformation or links to additional data Using tools like Postulate repositories could be moreeasily shared with others in a federated manner These technologies would enable thedevelopment13 of indices that13 increase the precision of search results and ease analytics since thedata13 would be ldquoself-shy‐describedrdquo

In the remainder of the presentation Postulate functionality was further described andsome of the challenges presented earlier were addressed For example how do you processthe quantity and variety of data how do you store it how do you find the data13 you care aboutand how can the data13 be presented to you in a meaningful way

Natural13 Language Processing (NLP)Tim Hawes Decisive Analytics

The presentation provided the attendees with a description of NLP13 and how it can be13 used It started with examples13 of how it can be used to mine ldquoBig Datardquo to predict credit card fraud or assess13 user purchase histories for product development NLP is also13 being used13 in13 the field13 of biology and13 appears to have13 enjoyed some13 level of success

Since13 the13 biology and materials communities both publish large13 volume13 of documents (abstracts papers presentations etc) many of13 the NLP approaches developed for13 biology may be

applicable13 to materials The next section of the presentation focused on describing various aspects of NLP tokenization

part-shy‐of-shy‐speech tagging disambiguation parsing entity disambiguation and role13 labeling This was helpful in13 removing the mystery surrounding NLP for those13 who hadnrsquot had previous exposure

Typical products from NLP13 would be useful to the materials development community Examples include providing the gist of a documentrsquos content providing answers to natural13 language questions and knowledge13 construction

Specific NLP13 challenges for the13 materials domain were13 identified inline13 use13 of equations prevalence of tables graphs and13 images (more of a computer vision13 problem) and13 numerically driven13 information13 Additionally the ldquogenrerdquo problem was discussed as13 well as13 a way this might be addressed

using raw data (versus learning data) and13 semi-shy‐supervised learning active learning and distant supervision approaches

23

13

The presentation concluded by reiterating how NLP13 can be used and the value of using it describing some its limitations and13 identifying tool development needed13 for the materials domain

24

13

Community13 and13 Government Priorities in13 Addressing13 Materials Data

The following numbered13 priorities for13 both community and government13 actions were lsquorolled uprsquofrom workshop participantsrsquo inputs and were reviewed at13 the workshop The raw inputs provided by

participants are captured13 under each13 numbered13 priority and provide context to13 the priority and13 often13 more specific actions

Community-shy‐Led13 Priorities1 Develop13 and deploy13 standards13 for data and federatedcollaborative environments-shy‐ Promote13 standards for data13 and collaboration environments-shy‐ Designing and deploying suitable software environments that are capable of facilitating e-shy‐

recording and accelerating (fail often-shy‐fail fast)13 of13 the communityrsquos experience (both successes and

failures)13 in exploring integrated workflows needed to realize the vision of ICME and MGI-shy‐ Data13 standards (program managementtechnical datametadata)-shy‐ Community must start developing a common13 data format data package could13 look like an13

HDF5 file and be shared in a variety of systems -shy‐ Begin13 definition13 of standardized13 infrastructure

-shy‐ Produce13 draft guideline for the metadata including information regarding authorship provenance targethellip

-shy‐ Share13 the13 data13 types being collected with the13 community Include13 nametypeunits of data

-shy‐ Community must adopt a federated13 approach13 -shy‐ allow organizations to autonomously create their13 own13 toolsets but be able to13 readwrite to13 the common13 data package mentioned13 in13 point 1

-shy‐ Good documentation on repositories are important-shy‐ Address networking of repositories (lightweight metadata databaseindex) -shy‐ Digital heterostructure microstructure mapping algorithms for both scanning electron microscopy

(SEM)13 and transmission electron microscopy (TEM) -shy‐ Materials process model (thermodynamics) integration with ICMSE (Integrated Computational

Materials Science and Engineering) data flow

-shy‐ Data sharing protocol across users (government13 OEM13 developers) with appropriate securitycontrol

-shy‐ Work with instrumentequipment OEMs to get auto ingest capabilities and open architectures-shy‐ Work together to define common vocabularies for interoperability-shy‐ Standard tests for ICME13 model data13 standards for microstructure13 data

-shy‐ Standards for ICME13 model data13 storage13 and workflows 2 Communicate the value and need13 for digital materials13 data-shy‐ Brand13 management ndash consistent messaging of goals13 and beliefs -shy‐ Share13 success stories -shy‐ Value (ROI13 Return on Investment) o data capture (globally locally)-shy‐ Give incentive ($$$) for case studies to provide evidence that a shared data repository can be

actually useful beyond the archival purpose

25

13

-shy‐ Share13 ROIs for MII (the13 Materials Innovation Infrastructure)-shy‐ ICME projects due to their iterative nature take longer and are more expensive to realize benefits

in industry implementation13 Multi-shy‐year ICME programs are becoming13 a hard sell to management difficult to13 establish ROI and13 realize benefits claimed

3 Engage the community-shy‐ Write grant proposals-shy‐ Collaboration13 (or e-shy‐collaboration) science that will help us13 identify13 the specific13 incentives13 and

impediments for establishing mutually beneficial highly productive sharing of expertise between13 disciplinary experts in13 materials manufacturing and13 data sciences

-shy‐ Continue to13 work o hard13 problemsmdashdevelop13 open13 source codes discover new descriptors data

mining uncertainty analysis-shy‐ Organize as a community (governmentindustryacademia nationalinternational) -shy‐ Follow data13 recommendations in MGI strategic planmdashworkshops to develop roadmap-shy‐ Attend13 more events like the MGI workshop13 mdash gaining13 lessons learned from other organizations is

extremely valuable

4 ldquoDefine critical datardquo13 to be13 compiled-shy‐ Consensus o what fundamental genomic data is most-shy‐important to compile (thermodynamics13

kinetics13 properties [strength]13 high-shy‐throughput13 DFT) -shy‐ Agree to13 the concept of a ldquouniversal repositoryrdquo for all genomic data that everyone will contribute

to and use (cf everyone maintaining and hosting their13 own databases) this includes issues of13 data

security and sharing etc -shy‐ List of most-shy‐important ICMEMGI13 materials development activities from an industrycommercial13

perspective (alloys composites piezoelectrics etc)-shy‐ Some13 ongoing repository efforts seem redundant We13 should avoid unnecessary competition and

try not13 to step o other peoples toes5 Explore and leverage data solutions developed13 outside materials13 community-shy‐ Semantic web infrastructure13 ndash evaluate13 feasibility of lsquounstructured materials datardquo management

(Technical pathcosttimeframe) -shy‐ Keep communicating outside13 the13 community about solutions -shy‐ Focus on available13 tools open ones when possible

6 Train13 students-shy‐ Sponsor students13 who can think in terms13 of IT and MSE

-shy‐ Training workshops (through professional societies) for training the new generation of materials students13 and researchers13 in the use of readily (already) accessible data science tools13 and cloud

services13 to make13 the13 best use13 of this emerging13 infrastructure13 in enhancing13 their own individual day-shy‐to-shy‐day productivity

7 Establish community13 norms13 for publishing materials13 data-shy‐ Involve publishers by requiring data when publishing Begin by making just the data available first

and then work on standardization panel

26

13

-shy‐ Data associated with publications should be shared along with the publications Not only the

metadata (eg average grain size vs cooling rate) but also the source of the data (eg micrographs used13 to13 generate13 grain size temperaturetime charts13 used to calculate cooling rates etc)

Government-shy‐Led13 Priorities1 Lead13 data strategyapproach13 development-shy‐ Hold meeting like this one in 6-shy‐1213 months with purpose13 of developing roadmap for MII-shy‐ Define a technical roadmap for13 the materials data infrastructure

-shy‐ Governmentrsquos got to decide on a strategy-shy‐ Whatrsquos the critical nucleus eg getting13 CALPHAD off the13 ground

-shy‐ Government must establish an official stance on security and infrastructure-shy‐ List of most-shy‐important ICMEMGI materials development activities from a government perspective

(alloys composites piezoelectrics etc)13 -shy‐ Needs assessment for professional workforce development-shy‐ Work with industry to make sure the research-shy‐to-shy‐engineering13 gap is filled What is the practical13 use

of the projects -shy‐ Provide13 the13 materials community organizing function

-shy‐ Business case for productivity improvement with13 data readily available

-shy‐ Facilitator as honest broker of capability awareness and data13 sharing

-shy‐ Interdisciplinary coordinated effort13 to avoidminimize duplication and redundancy

2 Facilitate data deposition (policy amp technology)-shy‐ Setting expectations and guidelines for best practices and standards where13 relevant -shy‐ Invest in tool13 making data management as easy as possible-shy‐ Promote standards developmentcommon data structuresshare knowledge representations -shy‐ Learn about the WBC Semantic Web capabilities and limitations -shy‐ Government should foster community development for both data package refinement and creating

standard approach to creating an enterprise toolset -shy‐ Provide13 motivation for standard data13 collection ensure13 it includes pedigreeprovenancemeta-shy‐data

-shy‐ Facilitator for materials data13 standards (good and trustworthy data13 vs bad data) -shy‐ Facilitator for forming consortium for data13 mining data13 warehousing and sharing

-shy‐ Facilitate13 industry standards for ICME13 model validation and maturity levels-shy‐ Facilitate13 standards for model-shy‐based13 process and13 supplier qualification

3 Incentivize data deposition-shy‐ Provide13 motivation for13 standard data collection ensure it13 includes13 pedigreeprovenancemeta-shy‐data-shy‐ Push ldquodata13 management planrdquo requirements to ldquodata13 depositrdquo requirements -shy‐ Make data sharing a requirement -shy‐ Contractual requirements that preserve data for future use

-shy‐ Demand data submission as a deliverable that will be in the contract before paying the funds -shy‐ Require that any data generated13 by federal grants is provided13 to13 the government (or a designated13

repository)13 within 3 years after completionmdashstart with basic13 research

-shy‐ Demonstrate the return on investments in the materials data infrastructure

27

13

4 Support13 data sharing13 and deposition13 (provide resources)-shy‐ Increase of MGI13 funds dedicated to MII13 development to 10-shy‐-shy‐repositories amp collab platforms -shy‐ Commit to13 long-shy‐term support for data repositories amp collaborative platforms-shy‐ Funding for development collection of fundamental genomic data

-shy‐ Support thermodynamicskinetics database development for light-shy‐weight alloy systems-shy‐ Funding for universal data13 repository (eg at NIST) ndash creation and maintenance

-shy‐ Fund or establish data13 repositories with commonuniform data13 collection format -shy‐ Createidentify secure data sharing site (like AMRDEC)13 for13 government13 contractorsmdashnot a

repository

-shy‐ AFRL take lead13 o making a database that can13 be13 shared

-shy‐ Infrastructure for materials related semantic web applications5 Support13 data creation-shy‐ Support experimental measurement effort to provide13 quality data for13 database development -shy‐ Support more13 high-shy‐throughput13 (HT)13 studies13 for applied programs13 to populate data Current HT

approaches support are13 limited to too fundamental single-shy‐phase DFT calculations 6 Enhance data management13 competency-shy‐ Government needs to increase in-shy‐house expertise in13 these areas or alternatively work closer with13

groups like NIST We cannot allow other organizations to define our needs for us7 Provide quality datasets13 for model13 developmentvalidation-shy‐ Need for open model benchmark cases with data to be available to the community

28

13

Attendees

John Allison University of MichiganJohn Beatty Army Research Laboratory

Ben13 Blaiszik University of ChicagoANL Carrie Campbell NIST

Sam Chance13 InovexJennifer13 Fielding Air Force Research LabMarco Fornari Central Michigan

Tim Hawes Decisive Analytics Corp

Lou Hector General MotorsAndrea Helbach Air Force Research LabKris Hill ITI-shy‐GlobalMike Glavicic Rolls Royce

Ro Gorham NCDMM

Dennis Griffin NASA

Mike Groeber Air Force Research LabScott Henry ASM InternationalMatt Jacobsen Air Force Research LabWill Joost Dept of Energy-shy‐EERE

Surya13 Kalidindi Georgia Tech

Kevin Kendig Air Force Research LabBen13 Leever Air Force Research LabLara Liou GE Aviation

Kenny Lipkowitz Office of Naval Research

Bill Mullins Office of Naval Research

Benji Maruyama Air Force Research LabRajiv Naik Pratt amp WhitneyJim Neumann Honeywell Ruth13 Pachter Air Force Research LabClare Paul Air Force Research LabAmra Peles United13 Technologies Research13 CenterAndy Powell GE Aviation

Brian13 Puchala University MichiganAjit Roy Air Force Research LabAyman13 Salem Materials ResourcesTom Searles Materials Data Management Jason Sebastian QuestekRichard13 Serwecki NASA

Dongwon Shin Oak Ridge13 National Labo

29

13

Harlan Shober RJ LeeJeff13 Sheehy NASA

Jeff13 Simmons Air Force Research LabVeera Sundararaghavan University of MichiganSesh Tamirisakandala RTI Glenn Tarcea University of MichiganTJ Turner Air Force Research LabVasisht Venkatesh Pratt amp WhitneyChuck Ward Air Force Research LabJim Warren Natl Institute of Standards amp TechJeff13 Williams GE Aviation

Terry Wong Rockedyne

Wayne Ziegler Army Research13 Laboratory

Jake Zindel Ford Motor Company

30

Page 5: Materials Genome Initiative: materials data · universallyaccepted best practices. Under the MGI, Federal agencies have funded research and development designedtoexplore, develop,

13

Introduction

key objective for the Materials Genome Initiative (MGI) is access to digital13 materials dataAccess may be for an13 individual a team company or an entire community13 of interest There are

several challenges13 to providing access13 to digital materials13 data for which there are presently no

universally13 accepted best practices Under the MGI Federal agencies have funded research and

development designed13 to13 explore develop and apply MGI methodologies Several of these13 efforts either focus on an infrastructure13 for sharing13 data or emphasize13 data sharing as a key13 requirement of the

overall technical effort One objective of starting several MGI-shy‐focused programs concurrently was that13 the approaches

lessons learned and best practices would be13 shared between programs in order to accelerate13 the

overall development of a materials innovation13 infrastructure (MII) This workshop13 assembled

representatives from Federally funded efforts where a critical component13 of13 the overall research effort13 involves sharing of materials data in digital13 format13 The intent of this workshop was for13 the participants to provide insights into the goals of13 their13 effort details of13 their13 approach to sharing digital materials data challenges they face and13 lessons learned This exchange13 of ideas will not only enhance13 the

likelihood for success of the individual13 efforts but will13 also speed the communityrsquos understanding ofhow best to13 proceed13 in13 developing an13 MII

During the workshop participants were asked to identify the top three next steps needed to be

taken within the materials community13 and by13 government agencies in addressing the challengesassociated with digital materials data The following is list of priorities13 derived from those inputs for13 each sector

Communitybull Develop and deploy standards for data and13 federatedcollaborative environments bull Communicate value and13 need13 for digital materials databull Define critical data to be compiled

bull Engage Community

bull Explore and leverage data13 solutions developed outside materials community

bull Train students in these emerging areasbull Establish community norms for publishing materials data

Governmentbull Lead data strategyapproach developmentbull Incentivize data depositionbull Facilitate13 data13 deposition (policy and technology) bull Support data13 sharing and deposition (provide resources) bull Support data13 creation

bull Enhance data13 management competency

bull Provide13 quality datasets for model developmentvalidation

1

13 2

13

MGI Materials Data Workshop Organizing Committee

Air Force Research Laboratory (AFRL)

Dr Charles13 Ward amp Mr Clare Paul

Army Research Laboratory (ARL)

Dr John Beatty amp Mr Wayne Ziegler

Energy Efficiency and Renewable Energy Department of13 Energy (DOE-shy‐EERE)

Mr William13 Joost

National Aeronautics and Space Administration (NASA)

Dr Stephen Smith amp Dr Dennis13 Griffin

National Institute of Standards13 and Technology (NIST)

Dr James Warren

Office of Naval Research (ONR)

Dr William13 Mullins amp Dr Kenny Lipkowitz

3

13

AgendaMGI Materials Data Workshop

Tec^Edge 5000 Springfield Street Dayton OH

1 -shy‐ 16 July 2014

Tuesday 074 ndash 0800 Check-shy‐in

080 ndash 0815 Chuck Ward13 (AFRL) ndash Introductory remarks

Discussion Lead Will Joost (DOE-shy‐EERE)

081 ndash 0845 Jim Warren (NIST)13 ndash Data Infrastructure

084 ndash 0915 John Allison Brian13 Puchala (Univ Michigan)13 ndash PRISMSALMMII

091 ndash 0945 Carrie Campbell (NIST) ndash ChiMaD

094 ndash 1015 Discussion

101 ndash 1030 Break

Discussion Lead John Beatty (ARL)

103 ndash 1100 Marco Fornari (Central Michigan Univ)13 -shy‐ AFLOWLIBorg

110 ndash 1130 Surya13 Kalidindi (Ga13 Tech) ndash Mosaic of MicrostructureIGERT-shy‐CIF21

113 ndash 1200 Discussion

120 ndash 1230 Lunch

Discussion Lead Bill Mullins (ONR)

123 ndash 1300 Tom Searles (MDMi) ndash Cast Aluminum

130 ndash 1320 Dongwon Shin (Oak Ridge13 National Laboratory ORNL)13 ndash Cast Aluminum

132 ndash 1340 Jake Zindel (Ford Motor13 Co) ndash Cast Aluminum

134 ndash 1415 Discussion

141 ndash 1430 Break

Discussion Lead Jim Warren (NIST)

143 -shy‐ 1500 Jason Sebastian (Questek)13 ndash Cast Iron

150 ndash 1530 Lou Hector (General Motors) ndash Advanced13 Steel

153 ndash 1600 Lara Liou (GE Aviation)13 -shy‐ Integrated Computational13 Methods for Composite Materials

160 ndash 1700 Discussion

1700 Adjourn13 for the day

4

13

Wednesday

074 ndash 0800 Check-shy‐in

Discussion Lead Wayne Zeigler (ARL)

080 ndash 0830 Terry Wong (Rocketdyne) ndash Nickel RS FEP

083 ndash 0850 Mike Glavicic (Rolls Royce) ndash MAI Ti Modeling

085 ndash 0910 Rajiv Naik (Pratt amp Whitney) ndash MAI Data Informatics Data Standards Modeling amp UQ

091 ndash 0940 Discussion

094 ndash 0955 Break

Discussion Lead Dennis Griffin (NASA)

095 ndash 1015 Ro Gorham (National Center for Defense Manufacturing and13 Machining NCDMM)13 ndashAmerica Makes

101 ndash 1045 Richard13 Serwecki (NASA) -shy‐ MAPTIS

104 ndash 1105 Scott Henry (ASM International) -shy‐ Computational Materials Data Network

110 ndash 1135 Discussion

113 ndash 1200 Lunch

Discussion Lead Clare Paul (AFRL)

120 ndash 1230 Ben13 Blaiszik (Univ of13 ChicagoANL)13 -shy‐ Globus

123 ndash 1300 Mike Groeber (AFRL) ndash DREAM3DSIMPL

130 ndash 1320 Sam Chance13 (iNovex ) -shy‐ Semantics for Deep Interoperability

132 ndash 1340 Harlan Shober (RJ Lee) -shy‐ Big Data and13 Semantic Technologies

134 ndash 1400 Tim Hawes (Decisive Analytics Corp) ndash Natural Language Processing

140 ndash 1445 Discussion

144 ndash 1500 Wrap up

1500 Adjourn

5

13

Presentation13 Summaries

Welcome13 and Introduction Chuck Ward Air Force Research13 Laboratory

The primary intent of the workshop13 is to promote communication between the groups being

funded under13 the Materials Genome Initiative that13 had significant13 interest13 in the management13 of13 materials data The greater awareness is expected to enhance the likelihood13 for success of the

individual13 efforts and speed13 the communityrsquos understanding of how best to13 proceed13 in13 developing an13 MII

The Materials Research Society (MRS) and the Minerals Metals and Materials Society (TMS)conducted a survey13 of the community13 in MayJune of 2013 to gauge13 attitudes and needs in materials data Over 650 people took the survey13 with 74 of respondents saying they would be willing to share

data if it were encouraged13 as a condition13 of funding or publishing The community showed13 the most interest in gaining access to databases of some of the most basic and essential13 materials data such asthermo-shy‐physical properties

The survey asked participants to evaluate advantages of local versus centralized databasesCentralized13 repositories were perceived13 by respondents to13 offer advantages in13 cost an13 ability to13 share

onersquos data provisioning of uniform data standards and13 an13 ability to13 organize and13 find13 data Localrepositories were perceived to provide advantages in ease of data input time efficiency and flexibility inaccommodating new data13 types

The survey also asked participants to provide their thoughts on impediments to and motivators for13 sharing data Most13 of13 the impediments can be attributed13 to13 concerns of ownership13 of data orexternal data13 restrictions while technical issues such as file size and proprietary formatting were13 also

concerns Overall participants saw the ability to contribute data to the community for reuse and

critique positively The highest motivator was13 to enhance onersquos13 own research visibility Interestingly

respondents saw adding metadata as a motivator presumably due to13 the perceived13 higher value13 of data13 when it is13 described more robustly

6

13

MGI Efforts in Infrastructure Development13 at NISTJim Warren National Institute of13 Standards and Technology

The presentation provided an overview of NIST activities supporting the creation13 of a materials data infrastructure The infrastructure could13 start by defining the interfacesAPIs (Application Programing Interfaces) with which user is able13 to interact for13 the deposit13 or13 retrieval of13 data Beyond13 an interface the following are essential elements for creation of viable infrastructure

bull The establishment of repositories to13 store the databull Tools to enable digital capture of the information preferably at the time of creation

(computation or13 experiment) bull Tools to enable markup of the data13 with sufficient metadata13 to inform someone else how the

data was created as well as various attributes that would help the user judge the quality of theinformation

bull Assignment of a persistent digital identifier (like the DOI (Digital Object13 Identifier) for13 journals)13 so the data can be cited and discovered by other

bull The registration of the availability of the data into13 some sort of ldquoregistryrdquo to13 enable discovery without prior knowledge of the existence of the repositoryspecific data features

Additionally creation of new policy and13 standards as well as development of tools to13 evaluate the quality of13 the data and tools to13 enable data-shy‐driven13 discovery will be required to manage and use materials data to the greatest extent possible

NIST is responding to the OSTP (Office of13 Science and Technology and Policy) memos (Dr John Holdren 2 Feb 14) by developing practices that will be of use to13 the community including a Common13 Access Platform allowing access across multiple collections of information13 Research13 and13 Development of Persistent Identifier (PID) infrastructure with13 the Research Data Alliance (RDA) and RampD o Data Type Registries with13 RDA

Finally NIST is collaborating with the National Data Service (based at the National Center for Supercomputing Applications) and is developing a set of best practices for Data Management Plans

7

13

Th Materials Commons A Novel Information13 Repository and13 Collaboration13 Platform for the13 Materials CommunityBrian13 Puchala Glenn13 Tarcea Stravya13 Tamma Emmanuelle Marquis John13 Allison13 University of Michigan

Critical to13 accelerating the pace of materials science and13 development is the development ofnew and13 improved13 infrastructure providing a seamless way for materials researchers to13 share and13 use

materials data and models To address this need we are developing the Materials Commons (httpprismsenginumicheduprismsmaterialscommons)13 an information repository andcollaboration platform for the metals13 community13 in selected technical emphasis13 areas We envision the

Materials Commons becoming a continuous seamless part of the scientific workflow13 processResearchers will upload13 the results of experiments and13 computations as they are performedautomatically where13 possible along with the13 provenance13 information describing the13 experimental and

computational processes By13 associating this13 data with the experimental and virtual computational materials samples from13 which it is obtained the Materials Commons will build process-shy‐structure-shy‐property relationships enabling the construction13 and13 validation13 of constitutive and13 process models The

Materials Commons website provides an easy-shy‐to-shy‐use interface for uploading and13 downloading data and13 data provenance searching and sharing data At its core the13 Materials Commons consists of secure13 data storage cluster an13 application13 for efficiently uploading and downloading large data sets and a REST

(REpresentational State Transfer) based13 API to13 access and13 extend13 the capabilities of the repository TheAPI allows for features such13 as automated13 data upload13 from experiments and13 computations seamless integration of13 computational models and algorithmic analysis of13 process-shy‐structure-shy‐propertyrelationships The Materials Commons is a central thrust13 of13 the Center13 for13 PRedictive Structural Materials Science (PRISMS)

NIST Materials Data Informatics EffortsCarrie Campbell National Institute of Standards an Technology

The presentation covered NISTrsquos pilot efforts in13 creating a data infrastructure focusing on Phase-shy‐Based13 Property Data Phase-shy‐based data13 are13 diverse13 data13 that13 include 1 2 3 and 4-shy‐dimensional13 data are13 semi-shy‐structured and often include incomplete data sets The figure below provides anoverview of NISTs data efforts supporting the MGI

8

13

NIST Materials Data Repository (httpsmaterialsdatanistgovdspacexmlui) is a customizedDSpace repository for materials that enables sharing of variety of data types including text13 files images and video The repository also provides persistent identifier for each entry and allows users to choose

from a variety of13 license types NIST is also working on two data curation tools for phase-shy‐based13 dataThe Thermodynamic Research13 Center (httptrcnistgov) is extending the ThermoML markup languageand Guide13 Data13 Capture13 software13 tool to metallic systems13 with an initial emphasis on phase equilibriaand thermochemical data The13 Materials Data13 Curation System (MDCS)13 employs user-shy‐defined13 XML-shy‐based13 schemas to curate13 data The13 MDCS13 allows either web-shy‐base data curation13 or using REST API The

entered data13 is stored in a non-shy‐relational database repository can be searched using SPARQL (SPARQL Protocol and RDF Query Language) interface13 and integrated with variety of scientific workflow13 tools13

AFLOW ConsortiumMarco Fornari Michigan13 State University

The overall goal of the Materials Genome Initiative (MGI) is to accelerate the discovery13 development and13 deployment of new materials with ad hoc functionalities The AFLOW Consortium

focuses on the software infrastructures to achieve the MGI goals by using first13 principles calculations The emphasis is on the data13 that must be generated archived validated and post-shy‐processed13 efficiently13 both13 to13 identify new materials and13 distill meaningful quantitative relationships within13 the data This extends the13 solution to the13 inverse13 problem of chemically replacing13 rare13 and costly elements in criticaltechnologies

The AFLOW Consortium provides access to the repository AFLOWLIBORG

lthttpAFLOWLIBORGgt13 that publicly distributes data13 on energetics and thermodynamic stability for more than 600000 compounds and band structures for roughly 300000 materials The repository

includes bulk crystals and alloys13 Data were computed with the AFLOW high-shy‐throughput13 framework

using density functional theory The results of the calculations can13 be queried13 with13 the recentlydeveloped13 REST-shy‐API that standardizes and13 labels the data for retrieval purposes The13 fields defined in

the AFLOW REST-shy‐API include a material object identifier (AUID) data cloud13 location13 (AURL) ownership13 9

13

and sponsorship keywords and data13 specific labels (COMPOSITION ENTROPY EGAP hellip) Simple13 scripting languages13 can be used to automatically download13 from the AFLOWLIB13 repository The AFLOW

approach has13 been extensively applied on Platinum-shy‐group alloys thermoelectric materials magnetic materials topological insulators scintillators etc The AFLOW REST-shy‐API can13 be extended13 to13 includeexperimental data For further details on the13 database the13 reader is directed tohttpmaterialsdukeeduauroAUROARTICULAjcommatsci201202002pdf

Materials Innovation Infrastructure Lessons Learned and Future PlansSurya Kalidindi Georgia Institute13 of Technology

central activity in13 materials research13 is the exploration of processing-shy‐structure-shy‐property (PSP) linkages13 Workflow is defined13 as the sequence13 of steps employed for establishing13 PS linkages ofinterest to any specific engineeringtechnology application

High Throughput Explorations of PSP Linkages13 We must treat13 hierarchical13 material as acomplex System which means embracing and exploiting uncertainty quantification13 and13 complexity

management conceptstools13 It also means we need to design develop and13 validate decision support systems13 that will leverage the best current understanding (with13 its uncertainties) and13 provide objective13 guidance13 on future13 effort investment (combination of experiments and simulations)13 This meansleveraging Data Science to effectively feed the decision support systems described above by developingand deploying high13 throughput strategies for obtaining high value information from both experimentsand models

What are integrated13 workflows Integrated workflows are those that utilize the best13 combination of experiments13 and simulations13 in extracting robust and reliable PSP linkages13 This requiresengaging and exploiting cross-shy‐disciplinary expertise that includes materials science manufacturingsystems approaches uncertainty quantification computational science data and information sciencesThrough this one must ensure that the workflows output the critical information needed13 by design13 and13 manufacturing stakeholders in the materials development value chain13 Integration demands expertise

sharing13 which essentially is the core element of a collaboration Cross-shy‐disciplinary Integration13 demands13 effective13 collaborations13 such as those one might set up using the modern tools of the day13 ie a e-shy‐Collaboration

10

13

Motivators for e-shy‐Collaborations13 e-shy‐Collaborations are central to13 enhancing the productivity ofmaterials development activities in several ways

bull individual13 research groups can identify and engage13 complementary expertise13 needed for their projects

bull organizations can13 coordinate multiple teams and13 multiple projects to13 maximize outputimpactbull organizations can13 identify and13 seek the right partners that13 enhance their13 individual and combined

value e-shy‐Collaborations can13 be open semi-shy‐open or closed13 depending o the needs of the specific organization Productivity enhancement leading to dramatic reductions in cost and time13 expended in materials development is the13 main incentive13 for e-shy‐collaborations

Main Impediments to e-shy‐CollaborationsLack13 of tools to identify13 the13 ldquobestrdquo13 collaborators

bull Need software tools that quantify more precisely the expertise of individuals groupsorganizations

bull Need software tools for13 allowing rich annotations and e-shy‐recording of13 discussions Need analytic tools for13 identification extraction13 and management of metadata (ie13 high13 value abstractionsfrom the raw data)

bull Need searchable semantic metadata databases (these are13 lightweight and can be13 centralized

and customized for specific13 user communities) bull Need software tools for tracking provenance (ie13 whowhatwhenhow13 generated modified

etc) Lack13 of experience13 with the13 use13 of integrated workflows

bull Need software tools13 for e-shy‐recording of13 workflows and development of searchable workflow

databases bull Need recommendation schemes that suggest the best workflow(s) based on recorded prior

experience this provides means to transfer expertise13 acquired by an organization to other teams and other13 projects (even in the same organization)

bull Leads to possible automation and dramatic enhancement of productivity

MINED Grouprsquos Past Experience13 (~2005 -shy‐ ~2008) Funded by the Army Research Office ARO(David Stepp) engaged computer science (CS) graduates to produce13 a single13 user platform that13 integrated certain software codes previously produced13 by our group software development subsequently abandoned but recruited two CS graduates into PhD programs in Materials Science andEngineeringMechanical Engineering (MSEME)13 which dramatically altered the direction of13 research in

the group brief chronology of development bull Senior Design Project (2011-shy‐2012) An online13 portal designed for sharing data13 files and tools

learned the importance of user interfaces and the need for professional13 code developers andsoftware architects

bull HUBzero Instantiation (2012-shy‐2013) Built hub by customizing HUBzero for hierarchical structuralmaterials there are several advantages in terms of accessing and executing codes13 (on the server side) but there are significant barriers13 to establishing fast and productive13 cross-shy‐disciplinary e-shy‐collaborations

11

13

bull GTRI Integration Platform (2013-shy‐2014) Engaged professional code13 developers and software13 architects to design13 a multi-shy‐user web13 portal that allows data and13 code sharing while tracking

provenance and workflows underestimated the resources needed for such a project bull MATIN Collaboration Environment (2013-shy‐2014) Lightweight integrated software13 shell that allows

users to13 seamlessly interface with13 numerous cloud13 services offered13 through13 the web13 through13 their13 APIs Examples include GitHub for versioning and archival of codes Authorea forcollaborative editing of documents figshare for citable publication of research Plotly forcollaborative data analysis13 and visualization Google+ and LinkedIn for eteaming13 and networking MATIN focuses on building an emergent community ndash Materials amp Manufacturing Informatics (MMI) MATIN provisions web services for13 software integration platforms and innovationinfrastructure Examples of MATIN13

bull Spatial Statistics httpsgithubcomtonyfastSpatialStatisticsFFT

bull PyMKS httpopenmaterialsgithubiopymksindexhtml bull Example discussion about periodic boundary conditions in running Finite Element (FE or FEM)

simulations httpsgithubcomsfepysfepyissues267

In the platform developed it is noted that the user or userrsquos organization can set permissions at the

level13 of individual datasetscodes o the degree and13 type of13 sharingCentral Challenge Activation The transition from traditional disciplinary workflows to the

integrated workflows needs to overcome an activation barrier There is no13 clear incentive to13 get ahead13 of the curve in13 anticipating how the upcoming transformations in how materials development will be

pursued13 in13 the future Open (reproducible) science alone is not a sufficiently strong motivator for materials development community with13 strong interests in intellectual13 property development13 indeed itmight actually13 be perceived13 as a negative There is need to explore13 and refine13 what incentives work for individual13 researchers (eg increased productivity)

Future13 Plans13 Continue to13 develop13 and13 deploy MATIN at GT and any other organizationsgroups that13 express an interest in working with us Design create and distribute13 suite13 of ldquomobile13 firstrdquo Appsdesigned13 to13 make data ingest into13 collaboration environments such as MATIN from both material characterization equipment and multi-shy‐scale models an enhanced13 user experience with13 friendly andnatural interfaces Proactively promoteincentivize13 datametadata13 ingest into publicly shareddatabases Work with13 professional societies to13 announce and run new competitionsawards for13 recognizing and rewarding good practices by the members of the broader materials innovation13 usercommunity (eg Best13 Documented Data Award Best Documented Tool Award Best Documented

Workflow Award) Develop and deliver new workshops for training the new generation of13 graduate

students postdocs and junior faculty in the13 emerging workflows of Materials amp ManufacturingInformatics

High Performance Cast Aluminum Alloys for Next Generation Passenger Vehicle EnginesDongwon Shin Materials Science and Technology Division Oak Ridge National Laboratory

12

13

Oak Ridge National Laboratory (ORNL) teamed with Chrysler and Nemak13 to develop next-shy‐generation high-shy‐performance cast aluminum alloys that have13 improved castability high-shy‐temperature

strength and fatigue13 performance At this point the13 data management practice is limited13 to13 analytical data spreadsheets other than13 the CALPHAD computational thermodynamic databases and13 proprietary

property databases for casting simulation As the project goes on more robust data schema will bedeveloped13 to13 store various types of data populated13 from the experiments and13 calculations

Besides the data structure design the speaker discussed13 the necessity of a new approach13 to13 identifying key descriptors from the large dataset that can guide the design of new materials An alloydesign13 requires the balance of extremely complex relationships among a large number of descriptors Some13 examples of these13 descriptors include13 phase fraction13 of key intermetallics lattice parameters ofthe matrix and diffusion13 kinetics13 of solute atoms However optimizing these quantities relies on good

physical metallurgy intuition13 coupled13 with13 clever experiments dogged13 sleuthing and13 a healthy dose ofluck13 Such descriptor-shy‐based predictions of materials properties date back to13 the beginning13 of modern

solid-shy‐state chemistry and physics As13 shown in the Darken-shy‐Gurry map to predict the solubility ofelements in Tantalum13 intuitive descriptors13 such as electronegativity and radius13 could be used formaterials design However as the chemistry and physics13 of materials become more complex intuition-shy‐based13 search13 may not work anymore Thus the use of more advanced13 statistical analysis and13 datamining approach might be required to accelerate the design of complex multi-shy‐componentmulti-shy‐phase

alloy design13

Data Related to Cast Aluminum

Jake Zindel Ford Motor Company

The main focus of our research is the development of CAE (Computer-shy‐aided Engineering) tools to support13 two significant13 activities in the company component13 design (product13 development)13 and

manufacturing engineering (high-shy‐volume production) A key13 part of our research is conducting13 experiments to collect data13 for the13 development and validation of13 the CAE tools

The data13 we collect consists of mechanical properties physical properties microstructuralimages and manufacturing process parameters that form the13 pedigree13 of the13 material This information

can be in the form of Excel files proprietary-shy‐software storage files screen shots and written reports Our immediate need is to deal with this large amount of existing information collected over

many years on a ldquolocalrdquo level The first step is to organize many independent collections of informationin all13 their forms13 The next issue is the time required to enter all13 this information into the database13 Maintenance may also require a significant amount of effort Lastly there will also13 be a need13 to13 manipulate the data contained in the database using Excel Mini-shy‐tab MATLAB etc Therefore links or13 easy exportation of data13 to these13 types of applications will be13 necessary

The next level would involve dealing with information that has been created or collected byothers From an13 OEM (Original Equipment13 Manufacturer)13 perspective this would13 involve convincing

suppliers13 to share processing information that they may consider proprietary In addition to suppliersuniversities and13 national laboratories are also13 sources of data Data from all the13 sources will require13 some sort of protocol to determine its13 quality and resolve inconsistencies

13

13

In the end13 it seems likely that considerable budget and time will be required to implement comprehensive informationdatabases which13 could be the largest hurdle13 of them all

QuesTekrsquos MGIICME Approach to Alloy Optimization and Design (Including Cast Iron) Jason Sebastian QuesTek Innovations LLC

Integrated Computational Materials Engineering (ICME) methods involve the holistic application13 of different computational models across various length13 scales to13 the design development and13 rapid13 qualification13 of advanced13 materials Using itsMaterials by Design technologies QuesTek has successfully applied these ICME methods to13 the design development13 and full qualification of several advanced alloys (including aerospace13 structural steels aerospace13 gear steels advanced castable13 titanium alloys high-shy‐strength corrosion-shy‐resistant13 aluminum alloys13 and high-shy‐performance cast iron)

QuesTekrsquos ICME methods benefit tremendously from the activities of the US Materials GenomeInitiative (MGI)13 Recognizing that the ldquo10 to13 2 yearsrdquo that13 it takes for alloy design and development ishindering the rapid13 deployment of higher-shy‐performance materials13 the MGI is focused on ldquocutting in halfrdquo the time and cost13 to design and deploy novel advanced materials needed to enhance and sustain UScompetitiveness

QuesTek also has ongoing13 activities on the13 application of ICME methods to the13 design and

development of materials for the energy sector including advanced13 materials that enable vehicle

lightweighting13 Of particular note is an ongoing effort with a major diesel13 engine manufacturer on the13 development of high-shy‐performance cast iron13 alloys for engine block applications QuesTekrsquos efforts have

made extensive use of various MGI-shy‐type tools and databases including 1)13 density functional theory

methods for high-shy‐throughput13 screening of cast iron inoculant stabilities (structure lattice parameterstability of various13 oxidesulfide phases) 2)13 liquid iron mobility databases (to help accurately modelsolidification phenomena) and 3) thermodynamic13 databases13 (to examine nano-shy‐precipitate stability)

Moving forward to continue to accelerate the materials development cycle efforts shouldremain focused on the development13 of13 the fundamental databases (thermodynamic kinetic etc)13 that13 enable13 ICME models There13 should also be13 increased focus o efforts related13 to13 the integration13 of the

various computational databases models and tools (the ldquoIrdquo13 in ICME)

Integrated Computational13 Materials Engineering (ICME) of Generation Three Advanced High Strength

Steels Louis G Hector Jr General Motors RampD Center

While traditional material development methods often involve laborious trial-shy‐and-shy‐error testing Integrated Computational13 Materials Engineering (ICME) which falls under the umbrella of the Materials Genome Initiative is offering a means13 for materials13 development under a more compressed time scale Advanced13 high13 strength13 steels (AHSS) which13 contain13 multiple phases (eg ferrite bainite pearlite austenite martensite) and may exhibit13 phase transformation with strain are fertile ground for13 ICME However prediction of macro-shy‐scale constitutive behavior based upon the multi-shy‐scale physical chemical and mechanical phenomena13 in these13 complex microstructures is formidable challenge for ICMEUnprecedented collaboration between universities industry and government labs will be required to

14

13

address AHSS13 development archiving of data and the13 implementation of these13 materials into products that13 benefit13 the American consumer This presentation began with a brief overview of a newDOEUSAMP (US Automotive Materials Partnership)-shy‐funded ICME program aimed at13 the development13 of Generation13 Three (Gen13 3) AHSS The lack of commercial Gen13 3 steels that meet DOE targets is offering13 significant challenges13 to this13 new ICME program The program is13 meeting these challenges13 following four thrusts (1)13 making Gen 3 steels to meet DOE targets (2) passing experimental measurements of key

properties to13 an13 ICME model that produces microstructure-shy‐based13 constitutive models (3) materialevaluation in forming13 and vehicle13 performance13 simulations (4) archiving13 all relevant data13 for easy use13 by

others in13 the program Although13 the Gen13 3 AHSS ICME program is ldquomission13 orientedrdquo it demonstrates an important role for universities in providing the necessary foundation in fundamental materialsscience and engineering

Integrated Computational13 Methods for Composites Materials (ICM2)13 Lessons Learned from

Integration Planning and Feasibility Demonstrations Lara Liou GE Aviation

General Electric Aviation and Lockheed Martin Aeronautics have teamed through an Air Force

Research13 Lab13 program to13 conduct a demonstration13 of an13 Integrated13 Computational Materials Engineering digital framework that links material processing property and13 structural relationships to13 account for processability manufacturability and system performance13 with the13 goal of demonstrating

that13 the usage of13 integrated models can contribute to future airframe and engine designs with dramatic reductions in development13 time and cost Both engine and airframe Foundational Engineering Problems will be demonstrated through a common digital framework to be developed and validated onincreasingly complex articles13 The modeling tools that13 cover composite process models localized13 mechanical behavior models and global design models (Convergent Manufacturing TechnologyrsquosCOMPRO CCA Autodeskrsquos Autodesk Simulation13 Composite Analysis and13 an13 Abaqus based13 University ofMichigan micromechanics model) are centrally linked via a commercially available model integration

interface (Phoenix Integrationrsquos ModelCenterreg)13 Detailed plans of what data will13 be passed among thevarious models as well as how it will be passed have been completed Creation of these detailed13 process maps and13 simple feasibility demonstrations have raised13 inherent technical and13 logistical challenges13 to integrating commercially13 available multi-shy‐scale models Robust model integration methods13 to address these challenges were13 discussed

The exercise of drawing up detailed process maps for what to integrate and how to integrate13 the selected tools and models resulted in key early lessons learned and identification of13 tools critical tolong-shy‐term robust and sustained application of13 Integrated Computational Methods for Composites Materials (ICM2)13 The key lessons learned include

bull Individual13 models must be relatively mature and validated before an ICM2 problem can be

confidently13 defined bull F analysis-shy‐based13 tools within13 the same digital thread should utilize13 the13 same13 FE solver bull The team ideally13 should consist of at least one expert for each13 tool and13 its underlying model as

well as an expert in the digital thread and integration interface tools and programming

15

13

bull Mesh interpolation between13 finite element analysis based13 tools will likely be required and13 acommercially13 available tool for this13 type of composite mesh interpolation is13 key13 for robust and

sustained application of ICM2

Residual Stress Engineering of Rotating Nickel Components Terry Wong13 Aerojet Rocketdyne

The goal for the PW-shy‐8 project13 The Nickel Base Superalloy Residual Stress Foundational Engineering Problem program aims to13 develop13 a multi-shy‐disciplinary analysis approach13 to13 predict and13 incorporate bulk residual13 stress in the13 design of component This will allow for design of part that is optimized for a particular application or design intent (such13 as weight performance or reduction13 ofscrap) To accomplish this13 goal the PW-shy‐813 team has two major tasks 1) the13 development of an13 infrastructure and associated tools needed to predict and incorporate bulk residual13 stress in the designstages13 of a part and 2) a demonstration task that will validate the infrastructures13 and tools13 developed

The Data13 Management task is the foundational element13 of13 the larger13 infrastructure

development task The goal for the data management work is to13 develop13 an13 infrastructure that allowsthe various team members in PW-shy‐813 to share13 data13 and to store13 data13 in database This not only ensures that data generated13 by this program is never lost but also13 allows for data to13 be transferred13 in13 a secure

manner In order to meet the goal of a system13 that securely stores and shares data three major tasks must be accomplished 1) Validation that there13 is secure13 method to transfer data13 between team

members and that the method of storing data is also secure 2)13 development of13 a database schema and

3) development of means to easily upload data13 into the13 database13 and download data13 from the13 database

Of the three the most difficult task was to create a system in which the transfer and storage ofdata was secure Various proposals were made to13 ensure the security of the data but many of them

were rejected not because the proposal was actually unsecure but because it was not acceptable byeach of the13 team memberrsquos IT security policies The13 process to get each team membersrsquo IT department to approve the security of13 the data transfer13 and storage took more effort13 than anticipated In the endthe team members along with their13 respective IT departments approved the use of13 TRUcentrix as a tool to transfer13 data and the use of13 an ASM International13 hosted13 database to13 store data

The development of database schema13 is necessary step in creating the database to store all the data generated (via physical tests as well as data generated by models) A big help in creating the

database schema was to13 use the schema generated13 by Dr Steven13 Arnold13 from NASArsquos Glenn13 Research13 Center Using NASArsquos schema eliminated13 the need13 to13 start the database build13 from scratch13 and13 allowed13 us to13 focus o the specific needs of storing residual stress data

The last task involves the development of tools that make the input of data13 easy The moredifficult it is to13 enter data into13 a database the more likely data13 will not be13 entered Excel-shy‐based13 templates are being developed in order13 to make data entry as easy as possible

The two big lessons learned from PW-shy‐813 in relationship to data13 management are

bull It is essential13 to work with each13 companiesrsquo IT and IT Security people to develop a system thatsatisfies13 the security requirements of each company

16

13

bull Database schema development should not be reinvented but begin with already established the13 schema

Perspectives on MAI ProgramsDr Mike Glavicic Rolls Royce

The current formulation of MAI (AFRLrsquos Metals Affordability Initiative) programs are structured13 independently from one another in that a standard methodology does not exist that describes whatdata is to13 be archived13 at the conclusion13 of a program Nor does there exist a standard13 protocol of howto archive things on a server13 or some other computer resource at the conclusion13 of a program As aresult the current13 practices of13 MAI programs

bull Do not address IP (intellectual property)13 and licensing issues in systematic fashion which leadsto issues in the commercialization of ICME software that is generated13 under a program

bull Do not have a formalized requirement of how data is to be packaged and stored at the end of aprogram Hence even13 in13 the best of cases in13 which13 all of the data is stored the data is archived13 in13 simple folder13 fashion that13 is not13 searchable in any manner13 possible Moreover even if13 all of13 the

data is present it is extremely cumbersome to13 untangle the data for future use using a file folder strategy

Hence at the end of the program both the government and13 MAI team are13 left to manage13 the13 data and13 results in13 an13 unregimented13 fashion

The lack of regimented protocol of how to package programrsquos13 results13 (raw data and reports) makes it extremely difficult for new government programs to leverage the data from previous programs Case in13 point the RR-shy‐1013 program was developed with the13 singular goal to integrate13 the13 models developed13 in13 the LAD-shy‐213 program into commercial software13 platform From the13 outset of the13 program data models and13 other information13 that was required to attained the13 goals of the13 RR-shy‐1013 program were13 either difficult to track down lost or not provided to the13 RR-shy‐1013 team due13 to contractual disputes As aresult funding that13 was earmarked for13 specific technical work at13 the conception of13 the13 program had to

be diverted13 for the recreation13 of data that existed13 in13 a previous program and13 the progress of the

program was stunted13 by this unforeseen13 burden In13 spite of the RR-shy‐1013 programrsquos13 ability to overcome

these issues and become a successful program it13 should be noted that13 the success of13 other13 programs that13 will encounter13 similar13 problems will likely not13 apply as RR-shy‐1013 was very fortunate13 to have13 the13 means to overcome these serious issues that13 were a result13 of13 the lack of13 a data management13 plan

Furthermore if the13 MGI initiative13 is to progressively develop materials models that are13 leveraged and improved in future government funded programs a policy on how to store and share thedata from disconnected13 programs is required The PW-shy‐1113 program is the first13 step in this process as it13 will have the data from three MAI programs (RR-shy‐10 GE-shy‐10 PW-shy‐9) uploaded into database13 that will be13 searchable and will enable the reuse of the data collected in these programs However even if this13 program is a success13 there will undoubtedly13 be other features13 that will be required in the platform

developed13 and13 hence future modification13 will be required Most importantly there does not currently13 exist server that has been set up and funded for the13 archiving13 of all13 future programs with a protocol13

17

13

13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

that13 must13 be adhered to at13 the conclusion of13 a funded program Without13 such a protocol and the

existence13 of server to be13 used in all future13 programs the13 goals set forth by MGI will be13 unnecessarily

difficult and13 more expensive to attain

MAI Data Informatics Data Standards Modeling amp UQDr Rajiv Naik Pratt amp Whitney

The MAI Data13 Informatics Data13 Standards Modeling amp Uncertainty Quantification (UQ) (PW-shy‐11)13 effort13 will enable the pervasive implementation of ICME by developing (i) a web-shy‐based13 OEM-shy‐Suppliercollaboration framework13 for production13 certification13 and13 process modeling data (ii) data standards and

protocols for interoperability between13 OEMs and13 Suppliers and13 (iii)13 UQ methods to capture variability in

process microstructure and property data Models and data13 from earlier MAI programs such as GE-shy‐10 PW-shy‐913 and RR-shy‐1013 will be13 used to develop and implement OEM-shy‐Supplier integration data standards data

structures and UQ models Several OEMs and suppliers (Pratt amp Whitney General Electric Rolls-shy‐RoyceTIMET Alcoa13 and RTI International Metals Inc) are working together on the PW-shy‐11 Team

The OEM-shy‐Supplier collaboration tool will enable data13 exchange related to mechanical testingmicrostructural13 characteristics processing and process modeling The13 data13 standards developed by the

PW-shy‐1113 team will provide13 an industry standard for such OEM-shy‐Supplier collaboration The collaboration13 framework would provide a platform to manage OEM-shy‐Supplier13 workflows data and13 informationsecurity and access13 control and the13 ability to perform system administration without the13 need13 for day-shy‐to-shy‐day intervention13 of the IT department The capture of such13 supplier data over the collaboration

framework will enable the OEMs to enhance process and property13 models data analytics and UQ

modeling Three potential technologies will be evaluated13 for the OEM-shy‐Supplier Collaboration13 Frameworkndash GE Forge Vanderbilt Forge and Semantic Web technology and one13 technology will be selected13 for the

next phase of the program

18

13

Semantic Web technology is paradigm shift from highly structured13 databases to13 more flexible

graph databases from single database searches to13 multiple and13 disparate13 database13 searches and from

keyword searches to contextual and meaningful (semantic) searches It can provide flexibility13 of data

structures and enhanced OEM-shy‐Supplier13 interoperability However it requires more work upfront13 for13 implementation as it needs the development of a common vocabulary and relationships

The data13 standards and data13 structures effort will leverage work at national and13 international standards13 organizations such13 as ASM ASTM CEN and13 NIST and13 develop13 standards and vocabularies forproduction13 certification data that can include chemistry microstructure and mechanical properties This effort will also develop means for the capture of data needed13 for the UQ and13 ICME models andbuild13 databases for13 the capture of13 the data from the GE-shy‐10 PW-shy‐9 and RR-shy‐10 MAI programs

The ICME13 models being developed under the GE-shy‐10 PW-shy‐9 and RR-shy‐10 programs involve severalinterconnected models related to processing microstructure and mechanical properties The UQ effort under PW-shy‐11 will involve considering the uncertainty at each13 step13 of the modeling process and flowing it through the interconnected models to evaluate the relationships between13 the variabilities of13 the

processing parameters the microstructure and13 the mechanical performance

ASMrsquos Computational Materials Data Network amp the NIST-shy‐ASM Structural Materials Data

Demonstration ProjectScott D Henry ASM International

ASM International ndash the worldrsquos largest13 association of13 metals-shy‐focused materials engineers scientists technicians educators and students13 ndash has launched13 a series of initiatives under the

ldquoComputational Materials Data (CMD)13 Networkrdquo umbrella The objective13 of the13 CMD Network is to serve13 as center for information collection and dissemination for materials data13 to support integrated

computational materials13 engineering (ICME) and to help realize the goals13 of the US Materials13 Genome

InitiativeThe first major project for the CMD Network is the NIST-shy‐ASM Structural Materials Data

Demonstration Project (SMDDP) a cooperative research project funded by NIST with participation from

the Kent13 State Center13 for13 Materials Informatics and Granta Design Ltd13 The main13 objectives of SMDDP

are13 to

19

13

bull Establish well-shy‐pedigreed13 and13 curated13 demonstration13 datasets for non-shy‐proprietary metallic structural materials13 data over multiple length scales The team has13 chosen to work with 6061

aluminum and the13 related Al-shy‐Mg-shy‐Si system

bull Work with NIST and the materials data community to develop materials data schema and

ontologies for the demonstration13 datasets cognizant of broader interests and13 datasets

bull Develop and carry out a series of test problems that represent relevant use cases for the13 repository

bull Make data open to the materials data community for use in data analytics modeling and

educational purposes

bull Actively engage the materials data community and13 widely disseminate the findings from the

project

bull Develop and implement data13 capture13 and curation procedures that can serve13 as models for otherdata repositories

The basic philosophy behind the project is that often standards13 emerge through trial and error and examples are13 needed to spur this process

The project is using the NIST-shy‐KSU DSpace13 repository for data13 files The13 curated structured

database is being developed13 using GRANTA13 MI software and13 is hosted13 o ASM serversThe project ndashlaunched in late 2013 and planned as an 18-shy‐month program13 ndash is being conducted in

phases The13 first phase13 involved developing starter database13 and gaining experience13 in data13 curation The second phase (currently underway) involves building out more comprehensive data13 sets that will be

suitable for carrying out test problems13 related to processing-shy‐structure-shy‐property relationships in13 Al 6061 The third phase will involve making further refinements in response to testing The complete data13 set will be opened up online to the materials community as a whole

Globus13 Scientific13 Data Services13 Current and FutureBen13 Blaiszik1 Kyle Chard1 Jim Pruyne1 Rachana Anathakrishnan1 Steve Tuecke12 Ian Foster12

1University of Chicago Computation Institute 2Argonne National Laboratory MCS Division

During this talk we discussed a variety of currently available Globus services13 Via these Globusservices developers13 and IT administrators13 can offer simple secure and robust file transfer as13 well as13 identity profile and group management to their user communities13 Globus allows end users to create

and13 manage a unique identity that can13 be linked13 to13 external identities for authentication End13 users can13 also transfer files across wide13 area13 networks faster and more13 easily whatever their location

The talk also covered prototype services related to data13 publication metadata tagging and13 data

search Globus13 publishing capabilities13 are delivered through a hosted service Metadata is13 stored in the

cloud while raw published data and a fully-shy‐formed metadata log are stored on campus institutionaland group resources that13 are managed and operated by campus administrators Published datasets are

20

13

organized13 by communities and13 their member collections Globus users can13 create and13 manage theirown13 communities and13 collections through13 the data publication13 service13 with collection level policies regarding user13 access Additionally metadata from these communities are imported into a centralized

index to allow for robust searching functionality

Semantics13 for Deep13 InteroperabilitySamuel Chance iNovex

Semantic technology is based on mature13 web standards and intended to be13 used as global framework for13 machine-shy‐to-shy‐machine communication Interoperability is enabled even when the more

basic elements of the semantic technology stack are used13 in13 conjunction with a controlled vocabularyKnowledge13 representation is enabled as one13 begins to utilize13 the13 ldquohigherrdquo elements of the13 stack

brief history of the semantic web13 was presented13 and13 started13 with13 a depiction13 of the web13 that most people see today documents and web pages presented to the user13 through a web application It13 then depicted a transition to an intermediate state where documents and data can co-shy‐exist on asemantic13 web riding atop the same infrastructure (http) Even in this13 state the data can13 be directly

interpreted by a machine13 The history concluded with a depiction of data being extracted from adocument to13 become part of the web13 of data

The basics of semantic technology were presented and emphasized that the technology has afoundation based13 o formal logic that enables software13 to understand information and reduce13 the13 need

for13 a human to interpret13 the meaning This section of13 the presentation concluded with a chart13 depicting

how ldquotriplesrdquo link to13 each13 other to13 form a web13 of dataThe final section touched upon adoption of13 and things to consider13 when semantic technologies

are13 being considered as solution for community or enterprise

21

13

Big Data13 and Semantic TechnologiesHarlan Shober RJ Lee Group

The presentation started with a description of entities that13 comprise laboratoryinformatics with the central elements being a federated architecture scientific ldquoBig Datardquo management13 and workflow-shy‐driven Laboratory Information Management13 Systems (LIMS)To aid attendee understanding initial context13 was provided by first13 defining for ldquoBig Datardquo and

Semantic Technologies Context13 was further bolstered through a description of a problem that13 included laboratory testingcharacterization and fabrication processes The challenges for atypical laboratory include the use of many systems and the creation of large amounts of data13 that13 grows daily but13 cannot13 be readily accessed

After the context13 was established the needed functionality was described and consistedof several elements contribute index search analyze and archive While bearing in mind theneeded functionality a string of universal scientific data13 management13 process categories werethen presented with the intent13 of applying the universal description to the materials design anddevelopment13 domain

22

13

Though semantic technologies are applicable over the entire lifecycle of the data13 management13 process three categories were specifically identified as primary candidates fortheir application Data13 Acquisition Data13 Manipulation and Storage and Discovery andAdvisory W3C standards for Resource Description Framework (RDF) were applied to a notionalexample with the intent13 of completing a thread that13 gave the attendee a sense of how thistechnology could be applied

The presentation continued with a description of Postulate a specific solutiondeveloped by RJ Lee Group that13 could be used as part13 of a suite of tools to enable the captureof data13 and the near-shy‐frictionless tagging of data13 with additional observations provenanceinformation or links to additional data Using tools like Postulate repositories could be moreeasily shared with others in a federated manner These technologies would enable thedevelopment13 of indices that13 increase the precision of search results and ease analytics since thedata13 would be ldquoself-shy‐describedrdquo

In the remainder of the presentation Postulate functionality was further described andsome of the challenges presented earlier were addressed For example how do you processthe quantity and variety of data how do you store it how do you find the data13 you care aboutand how can the data13 be presented to you in a meaningful way

Natural13 Language Processing (NLP)Tim Hawes Decisive Analytics

The presentation provided the attendees with a description of NLP13 and how it can be13 used It started with examples13 of how it can be used to mine ldquoBig Datardquo to predict credit card fraud or assess13 user purchase histories for product development NLP is also13 being used13 in13 the field13 of biology and13 appears to have13 enjoyed some13 level of success

Since13 the13 biology and materials communities both publish large13 volume13 of documents (abstracts papers presentations etc) many of13 the NLP approaches developed for13 biology may be

applicable13 to materials The next section of the presentation focused on describing various aspects of NLP tokenization

part-shy‐of-shy‐speech tagging disambiguation parsing entity disambiguation and role13 labeling This was helpful in13 removing the mystery surrounding NLP for those13 who hadnrsquot had previous exposure

Typical products from NLP13 would be useful to the materials development community Examples include providing the gist of a documentrsquos content providing answers to natural13 language questions and knowledge13 construction

Specific NLP13 challenges for the13 materials domain were13 identified inline13 use13 of equations prevalence of tables graphs and13 images (more of a computer vision13 problem) and13 numerically driven13 information13 Additionally the ldquogenrerdquo problem was discussed as13 well as13 a way this might be addressed

using raw data (versus learning data) and13 semi-shy‐supervised learning active learning and distant supervision approaches

23

13

The presentation concluded by reiterating how NLP13 can be used and the value of using it describing some its limitations and13 identifying tool development needed13 for the materials domain

24

13

Community13 and13 Government Priorities in13 Addressing13 Materials Data

The following numbered13 priorities for13 both community and government13 actions were lsquorolled uprsquofrom workshop participantsrsquo inputs and were reviewed at13 the workshop The raw inputs provided by

participants are captured13 under each13 numbered13 priority and provide context to13 the priority and13 often13 more specific actions

Community-shy‐Led13 Priorities1 Develop13 and deploy13 standards13 for data and federatedcollaborative environments-shy‐ Promote13 standards for data13 and collaboration environments-shy‐ Designing and deploying suitable software environments that are capable of facilitating e-shy‐

recording and accelerating (fail often-shy‐fail fast)13 of13 the communityrsquos experience (both successes and

failures)13 in exploring integrated workflows needed to realize the vision of ICME and MGI-shy‐ Data13 standards (program managementtechnical datametadata)-shy‐ Community must start developing a common13 data format data package could13 look like an13

HDF5 file and be shared in a variety of systems -shy‐ Begin13 definition13 of standardized13 infrastructure

-shy‐ Produce13 draft guideline for the metadata including information regarding authorship provenance targethellip

-shy‐ Share13 the13 data13 types being collected with the13 community Include13 nametypeunits of data

-shy‐ Community must adopt a federated13 approach13 -shy‐ allow organizations to autonomously create their13 own13 toolsets but be able to13 readwrite to13 the common13 data package mentioned13 in13 point 1

-shy‐ Good documentation on repositories are important-shy‐ Address networking of repositories (lightweight metadata databaseindex) -shy‐ Digital heterostructure microstructure mapping algorithms for both scanning electron microscopy

(SEM)13 and transmission electron microscopy (TEM) -shy‐ Materials process model (thermodynamics) integration with ICMSE (Integrated Computational

Materials Science and Engineering) data flow

-shy‐ Data sharing protocol across users (government13 OEM13 developers) with appropriate securitycontrol

-shy‐ Work with instrumentequipment OEMs to get auto ingest capabilities and open architectures-shy‐ Work together to define common vocabularies for interoperability-shy‐ Standard tests for ICME13 model data13 standards for microstructure13 data

-shy‐ Standards for ICME13 model data13 storage13 and workflows 2 Communicate the value and need13 for digital materials13 data-shy‐ Brand13 management ndash consistent messaging of goals13 and beliefs -shy‐ Share13 success stories -shy‐ Value (ROI13 Return on Investment) o data capture (globally locally)-shy‐ Give incentive ($$$) for case studies to provide evidence that a shared data repository can be

actually useful beyond the archival purpose

25

13

-shy‐ Share13 ROIs for MII (the13 Materials Innovation Infrastructure)-shy‐ ICME projects due to their iterative nature take longer and are more expensive to realize benefits

in industry implementation13 Multi-shy‐year ICME programs are becoming13 a hard sell to management difficult to13 establish ROI and13 realize benefits claimed

3 Engage the community-shy‐ Write grant proposals-shy‐ Collaboration13 (or e-shy‐collaboration) science that will help us13 identify13 the specific13 incentives13 and

impediments for establishing mutually beneficial highly productive sharing of expertise between13 disciplinary experts in13 materials manufacturing and13 data sciences

-shy‐ Continue to13 work o hard13 problemsmdashdevelop13 open13 source codes discover new descriptors data

mining uncertainty analysis-shy‐ Organize as a community (governmentindustryacademia nationalinternational) -shy‐ Follow data13 recommendations in MGI strategic planmdashworkshops to develop roadmap-shy‐ Attend13 more events like the MGI workshop13 mdash gaining13 lessons learned from other organizations is

extremely valuable

4 ldquoDefine critical datardquo13 to be13 compiled-shy‐ Consensus o what fundamental genomic data is most-shy‐important to compile (thermodynamics13

kinetics13 properties [strength]13 high-shy‐throughput13 DFT) -shy‐ Agree to13 the concept of a ldquouniversal repositoryrdquo for all genomic data that everyone will contribute

to and use (cf everyone maintaining and hosting their13 own databases) this includes issues of13 data

security and sharing etc -shy‐ List of most-shy‐important ICMEMGI13 materials development activities from an industrycommercial13

perspective (alloys composites piezoelectrics etc)-shy‐ Some13 ongoing repository efforts seem redundant We13 should avoid unnecessary competition and

try not13 to step o other peoples toes5 Explore and leverage data solutions developed13 outside materials13 community-shy‐ Semantic web infrastructure13 ndash evaluate13 feasibility of lsquounstructured materials datardquo management

(Technical pathcosttimeframe) -shy‐ Keep communicating outside13 the13 community about solutions -shy‐ Focus on available13 tools open ones when possible

6 Train13 students-shy‐ Sponsor students13 who can think in terms13 of IT and MSE

-shy‐ Training workshops (through professional societies) for training the new generation of materials students13 and researchers13 in the use of readily (already) accessible data science tools13 and cloud

services13 to make13 the13 best use13 of this emerging13 infrastructure13 in enhancing13 their own individual day-shy‐to-shy‐day productivity

7 Establish community13 norms13 for publishing materials13 data-shy‐ Involve publishers by requiring data when publishing Begin by making just the data available first

and then work on standardization panel

26

13

-shy‐ Data associated with publications should be shared along with the publications Not only the

metadata (eg average grain size vs cooling rate) but also the source of the data (eg micrographs used13 to13 generate13 grain size temperaturetime charts13 used to calculate cooling rates etc)

Government-shy‐Led13 Priorities1 Lead13 data strategyapproach13 development-shy‐ Hold meeting like this one in 6-shy‐1213 months with purpose13 of developing roadmap for MII-shy‐ Define a technical roadmap for13 the materials data infrastructure

-shy‐ Governmentrsquos got to decide on a strategy-shy‐ Whatrsquos the critical nucleus eg getting13 CALPHAD off the13 ground

-shy‐ Government must establish an official stance on security and infrastructure-shy‐ List of most-shy‐important ICMEMGI materials development activities from a government perspective

(alloys composites piezoelectrics etc)13 -shy‐ Needs assessment for professional workforce development-shy‐ Work with industry to make sure the research-shy‐to-shy‐engineering13 gap is filled What is the practical13 use

of the projects -shy‐ Provide13 the13 materials community organizing function

-shy‐ Business case for productivity improvement with13 data readily available

-shy‐ Facilitator as honest broker of capability awareness and data13 sharing

-shy‐ Interdisciplinary coordinated effort13 to avoidminimize duplication and redundancy

2 Facilitate data deposition (policy amp technology)-shy‐ Setting expectations and guidelines for best practices and standards where13 relevant -shy‐ Invest in tool13 making data management as easy as possible-shy‐ Promote standards developmentcommon data structuresshare knowledge representations -shy‐ Learn about the WBC Semantic Web capabilities and limitations -shy‐ Government should foster community development for both data package refinement and creating

standard approach to creating an enterprise toolset -shy‐ Provide13 motivation for standard data13 collection ensure13 it includes pedigreeprovenancemeta-shy‐data

-shy‐ Facilitator for materials data13 standards (good and trustworthy data13 vs bad data) -shy‐ Facilitator for forming consortium for data13 mining data13 warehousing and sharing

-shy‐ Facilitate13 industry standards for ICME13 model validation and maturity levels-shy‐ Facilitate13 standards for model-shy‐based13 process and13 supplier qualification

3 Incentivize data deposition-shy‐ Provide13 motivation for13 standard data collection ensure it13 includes13 pedigreeprovenancemeta-shy‐data-shy‐ Push ldquodata13 management planrdquo requirements to ldquodata13 depositrdquo requirements -shy‐ Make data sharing a requirement -shy‐ Contractual requirements that preserve data for future use

-shy‐ Demand data submission as a deliverable that will be in the contract before paying the funds -shy‐ Require that any data generated13 by federal grants is provided13 to13 the government (or a designated13

repository)13 within 3 years after completionmdashstart with basic13 research

-shy‐ Demonstrate the return on investments in the materials data infrastructure

27

13

4 Support13 data sharing13 and deposition13 (provide resources)-shy‐ Increase of MGI13 funds dedicated to MII13 development to 10-shy‐-shy‐repositories amp collab platforms -shy‐ Commit to13 long-shy‐term support for data repositories amp collaborative platforms-shy‐ Funding for development collection of fundamental genomic data

-shy‐ Support thermodynamicskinetics database development for light-shy‐weight alloy systems-shy‐ Funding for universal data13 repository (eg at NIST) ndash creation and maintenance

-shy‐ Fund or establish data13 repositories with commonuniform data13 collection format -shy‐ Createidentify secure data sharing site (like AMRDEC)13 for13 government13 contractorsmdashnot a

repository

-shy‐ AFRL take lead13 o making a database that can13 be13 shared

-shy‐ Infrastructure for materials related semantic web applications5 Support13 data creation-shy‐ Support experimental measurement effort to provide13 quality data for13 database development -shy‐ Support more13 high-shy‐throughput13 (HT)13 studies13 for applied programs13 to populate data Current HT

approaches support are13 limited to too fundamental single-shy‐phase DFT calculations 6 Enhance data management13 competency-shy‐ Government needs to increase in-shy‐house expertise in13 these areas or alternatively work closer with13

groups like NIST We cannot allow other organizations to define our needs for us7 Provide quality datasets13 for model13 developmentvalidation-shy‐ Need for open model benchmark cases with data to be available to the community

28

13

Attendees

John Allison University of MichiganJohn Beatty Army Research Laboratory

Ben13 Blaiszik University of ChicagoANL Carrie Campbell NIST

Sam Chance13 InovexJennifer13 Fielding Air Force Research LabMarco Fornari Central Michigan

Tim Hawes Decisive Analytics Corp

Lou Hector General MotorsAndrea Helbach Air Force Research LabKris Hill ITI-shy‐GlobalMike Glavicic Rolls Royce

Ro Gorham NCDMM

Dennis Griffin NASA

Mike Groeber Air Force Research LabScott Henry ASM InternationalMatt Jacobsen Air Force Research LabWill Joost Dept of Energy-shy‐EERE

Surya13 Kalidindi Georgia Tech

Kevin Kendig Air Force Research LabBen13 Leever Air Force Research LabLara Liou GE Aviation

Kenny Lipkowitz Office of Naval Research

Bill Mullins Office of Naval Research

Benji Maruyama Air Force Research LabRajiv Naik Pratt amp WhitneyJim Neumann Honeywell Ruth13 Pachter Air Force Research LabClare Paul Air Force Research LabAmra Peles United13 Technologies Research13 CenterAndy Powell GE Aviation

Brian13 Puchala University MichiganAjit Roy Air Force Research LabAyman13 Salem Materials ResourcesTom Searles Materials Data Management Jason Sebastian QuestekRichard13 Serwecki NASA

Dongwon Shin Oak Ridge13 National Labo

29

13

Harlan Shober RJ LeeJeff13 Sheehy NASA

Jeff13 Simmons Air Force Research LabVeera Sundararaghavan University of MichiganSesh Tamirisakandala RTI Glenn Tarcea University of MichiganTJ Turner Air Force Research LabVasisht Venkatesh Pratt amp WhitneyChuck Ward Air Force Research LabJim Warren Natl Institute of Standards amp TechJeff13 Williams GE Aviation

Terry Wong Rockedyne

Wayne Ziegler Army Research13 Laboratory

Jake Zindel Ford Motor Company

30

Page 6: Materials Genome Initiative: materials data · universallyaccepted best practices. Under the MGI, Federal agencies have funded research and development designedtoexplore, develop,

13 2

13

MGI Materials Data Workshop Organizing Committee

Air Force Research Laboratory (AFRL)

Dr Charles13 Ward amp Mr Clare Paul

Army Research Laboratory (ARL)

Dr John Beatty amp Mr Wayne Ziegler

Energy Efficiency and Renewable Energy Department of13 Energy (DOE-shy‐EERE)

Mr William13 Joost

National Aeronautics and Space Administration (NASA)

Dr Stephen Smith amp Dr Dennis13 Griffin

National Institute of Standards13 and Technology (NIST)

Dr James Warren

Office of Naval Research (ONR)

Dr William13 Mullins amp Dr Kenny Lipkowitz

3

13

AgendaMGI Materials Data Workshop

Tec^Edge 5000 Springfield Street Dayton OH

1 -shy‐ 16 July 2014

Tuesday 074 ndash 0800 Check-shy‐in

080 ndash 0815 Chuck Ward13 (AFRL) ndash Introductory remarks

Discussion Lead Will Joost (DOE-shy‐EERE)

081 ndash 0845 Jim Warren (NIST)13 ndash Data Infrastructure

084 ndash 0915 John Allison Brian13 Puchala (Univ Michigan)13 ndash PRISMSALMMII

091 ndash 0945 Carrie Campbell (NIST) ndash ChiMaD

094 ndash 1015 Discussion

101 ndash 1030 Break

Discussion Lead John Beatty (ARL)

103 ndash 1100 Marco Fornari (Central Michigan Univ)13 -shy‐ AFLOWLIBorg

110 ndash 1130 Surya13 Kalidindi (Ga13 Tech) ndash Mosaic of MicrostructureIGERT-shy‐CIF21

113 ndash 1200 Discussion

120 ndash 1230 Lunch

Discussion Lead Bill Mullins (ONR)

123 ndash 1300 Tom Searles (MDMi) ndash Cast Aluminum

130 ndash 1320 Dongwon Shin (Oak Ridge13 National Laboratory ORNL)13 ndash Cast Aluminum

132 ndash 1340 Jake Zindel (Ford Motor13 Co) ndash Cast Aluminum

134 ndash 1415 Discussion

141 ndash 1430 Break

Discussion Lead Jim Warren (NIST)

143 -shy‐ 1500 Jason Sebastian (Questek)13 ndash Cast Iron

150 ndash 1530 Lou Hector (General Motors) ndash Advanced13 Steel

153 ndash 1600 Lara Liou (GE Aviation)13 -shy‐ Integrated Computational13 Methods for Composite Materials

160 ndash 1700 Discussion

1700 Adjourn13 for the day

4

13

Wednesday

074 ndash 0800 Check-shy‐in

Discussion Lead Wayne Zeigler (ARL)

080 ndash 0830 Terry Wong (Rocketdyne) ndash Nickel RS FEP

083 ndash 0850 Mike Glavicic (Rolls Royce) ndash MAI Ti Modeling

085 ndash 0910 Rajiv Naik (Pratt amp Whitney) ndash MAI Data Informatics Data Standards Modeling amp UQ

091 ndash 0940 Discussion

094 ndash 0955 Break

Discussion Lead Dennis Griffin (NASA)

095 ndash 1015 Ro Gorham (National Center for Defense Manufacturing and13 Machining NCDMM)13 ndashAmerica Makes

101 ndash 1045 Richard13 Serwecki (NASA) -shy‐ MAPTIS

104 ndash 1105 Scott Henry (ASM International) -shy‐ Computational Materials Data Network

110 ndash 1135 Discussion

113 ndash 1200 Lunch

Discussion Lead Clare Paul (AFRL)

120 ndash 1230 Ben13 Blaiszik (Univ of13 ChicagoANL)13 -shy‐ Globus

123 ndash 1300 Mike Groeber (AFRL) ndash DREAM3DSIMPL

130 ndash 1320 Sam Chance13 (iNovex ) -shy‐ Semantics for Deep Interoperability

132 ndash 1340 Harlan Shober (RJ Lee) -shy‐ Big Data and13 Semantic Technologies

134 ndash 1400 Tim Hawes (Decisive Analytics Corp) ndash Natural Language Processing

140 ndash 1445 Discussion

144 ndash 1500 Wrap up

1500 Adjourn

5

13

Presentation13 Summaries

Welcome13 and Introduction Chuck Ward Air Force Research13 Laboratory

The primary intent of the workshop13 is to promote communication between the groups being

funded under13 the Materials Genome Initiative that13 had significant13 interest13 in the management13 of13 materials data The greater awareness is expected to enhance the likelihood13 for success of the

individual13 efforts and speed13 the communityrsquos understanding of how best to13 proceed13 in13 developing an13 MII

The Materials Research Society (MRS) and the Minerals Metals and Materials Society (TMS)conducted a survey13 of the community13 in MayJune of 2013 to gauge13 attitudes and needs in materials data Over 650 people took the survey13 with 74 of respondents saying they would be willing to share

data if it were encouraged13 as a condition13 of funding or publishing The community showed13 the most interest in gaining access to databases of some of the most basic and essential13 materials data such asthermo-shy‐physical properties

The survey asked participants to evaluate advantages of local versus centralized databasesCentralized13 repositories were perceived13 by respondents to13 offer advantages in13 cost an13 ability to13 share

onersquos data provisioning of uniform data standards and13 an13 ability to13 organize and13 find13 data Localrepositories were perceived to provide advantages in ease of data input time efficiency and flexibility inaccommodating new data13 types

The survey also asked participants to provide their thoughts on impediments to and motivators for13 sharing data Most13 of13 the impediments can be attributed13 to13 concerns of ownership13 of data orexternal data13 restrictions while technical issues such as file size and proprietary formatting were13 also

concerns Overall participants saw the ability to contribute data to the community for reuse and

critique positively The highest motivator was13 to enhance onersquos13 own research visibility Interestingly

respondents saw adding metadata as a motivator presumably due to13 the perceived13 higher value13 of data13 when it is13 described more robustly

6

13

MGI Efforts in Infrastructure Development13 at NISTJim Warren National Institute of13 Standards and Technology

The presentation provided an overview of NIST activities supporting the creation13 of a materials data infrastructure The infrastructure could13 start by defining the interfacesAPIs (Application Programing Interfaces) with which user is able13 to interact for13 the deposit13 or13 retrieval of13 data Beyond13 an interface the following are essential elements for creation of viable infrastructure

bull The establishment of repositories to13 store the databull Tools to enable digital capture of the information preferably at the time of creation

(computation or13 experiment) bull Tools to enable markup of the data13 with sufficient metadata13 to inform someone else how the

data was created as well as various attributes that would help the user judge the quality of theinformation

bull Assignment of a persistent digital identifier (like the DOI (Digital Object13 Identifier) for13 journals)13 so the data can be cited and discovered by other

bull The registration of the availability of the data into13 some sort of ldquoregistryrdquo to13 enable discovery without prior knowledge of the existence of the repositoryspecific data features

Additionally creation of new policy and13 standards as well as development of tools to13 evaluate the quality of13 the data and tools to13 enable data-shy‐driven13 discovery will be required to manage and use materials data to the greatest extent possible

NIST is responding to the OSTP (Office of13 Science and Technology and Policy) memos (Dr John Holdren 2 Feb 14) by developing practices that will be of use to13 the community including a Common13 Access Platform allowing access across multiple collections of information13 Research13 and13 Development of Persistent Identifier (PID) infrastructure with13 the Research Data Alliance (RDA) and RampD o Data Type Registries with13 RDA

Finally NIST is collaborating with the National Data Service (based at the National Center for Supercomputing Applications) and is developing a set of best practices for Data Management Plans

7

13

Th Materials Commons A Novel Information13 Repository and13 Collaboration13 Platform for the13 Materials CommunityBrian13 Puchala Glenn13 Tarcea Stravya13 Tamma Emmanuelle Marquis John13 Allison13 University of Michigan

Critical to13 accelerating the pace of materials science and13 development is the development ofnew and13 improved13 infrastructure providing a seamless way for materials researchers to13 share and13 use

materials data and models To address this need we are developing the Materials Commons (httpprismsenginumicheduprismsmaterialscommons)13 an information repository andcollaboration platform for the metals13 community13 in selected technical emphasis13 areas We envision the

Materials Commons becoming a continuous seamless part of the scientific workflow13 processResearchers will upload13 the results of experiments and13 computations as they are performedautomatically where13 possible along with the13 provenance13 information describing the13 experimental and

computational processes By13 associating this13 data with the experimental and virtual computational materials samples from13 which it is obtained the Materials Commons will build process-shy‐structure-shy‐property relationships enabling the construction13 and13 validation13 of constitutive and13 process models The

Materials Commons website provides an easy-shy‐to-shy‐use interface for uploading and13 downloading data and13 data provenance searching and sharing data At its core the13 Materials Commons consists of secure13 data storage cluster an13 application13 for efficiently uploading and downloading large data sets and a REST

(REpresentational State Transfer) based13 API to13 access and13 extend13 the capabilities of the repository TheAPI allows for features such13 as automated13 data upload13 from experiments and13 computations seamless integration of13 computational models and algorithmic analysis of13 process-shy‐structure-shy‐propertyrelationships The Materials Commons is a central thrust13 of13 the Center13 for13 PRedictive Structural Materials Science (PRISMS)

NIST Materials Data Informatics EffortsCarrie Campbell National Institute of Standards an Technology

The presentation covered NISTrsquos pilot efforts in13 creating a data infrastructure focusing on Phase-shy‐Based13 Property Data Phase-shy‐based data13 are13 diverse13 data13 that13 include 1 2 3 and 4-shy‐dimensional13 data are13 semi-shy‐structured and often include incomplete data sets The figure below provides anoverview of NISTs data efforts supporting the MGI

8

13

NIST Materials Data Repository (httpsmaterialsdatanistgovdspacexmlui) is a customizedDSpace repository for materials that enables sharing of variety of data types including text13 files images and video The repository also provides persistent identifier for each entry and allows users to choose

from a variety of13 license types NIST is also working on two data curation tools for phase-shy‐based13 dataThe Thermodynamic Research13 Center (httptrcnistgov) is extending the ThermoML markup languageand Guide13 Data13 Capture13 software13 tool to metallic systems13 with an initial emphasis on phase equilibriaand thermochemical data The13 Materials Data13 Curation System (MDCS)13 employs user-shy‐defined13 XML-shy‐based13 schemas to curate13 data The13 MDCS13 allows either web-shy‐base data curation13 or using REST API The

entered data13 is stored in a non-shy‐relational database repository can be searched using SPARQL (SPARQL Protocol and RDF Query Language) interface13 and integrated with variety of scientific workflow13 tools13

AFLOW ConsortiumMarco Fornari Michigan13 State University

The overall goal of the Materials Genome Initiative (MGI) is to accelerate the discovery13 development and13 deployment of new materials with ad hoc functionalities The AFLOW Consortium

focuses on the software infrastructures to achieve the MGI goals by using first13 principles calculations The emphasis is on the data13 that must be generated archived validated and post-shy‐processed13 efficiently13 both13 to13 identify new materials and13 distill meaningful quantitative relationships within13 the data This extends the13 solution to the13 inverse13 problem of chemically replacing13 rare13 and costly elements in criticaltechnologies

The AFLOW Consortium provides access to the repository AFLOWLIBORG

lthttpAFLOWLIBORGgt13 that publicly distributes data13 on energetics and thermodynamic stability for more than 600000 compounds and band structures for roughly 300000 materials The repository

includes bulk crystals and alloys13 Data were computed with the AFLOW high-shy‐throughput13 framework

using density functional theory The results of the calculations can13 be queried13 with13 the recentlydeveloped13 REST-shy‐API that standardizes and13 labels the data for retrieval purposes The13 fields defined in

the AFLOW REST-shy‐API include a material object identifier (AUID) data cloud13 location13 (AURL) ownership13 9

13

and sponsorship keywords and data13 specific labels (COMPOSITION ENTROPY EGAP hellip) Simple13 scripting languages13 can be used to automatically download13 from the AFLOWLIB13 repository The AFLOW

approach has13 been extensively applied on Platinum-shy‐group alloys thermoelectric materials magnetic materials topological insulators scintillators etc The AFLOW REST-shy‐API can13 be extended13 to13 includeexperimental data For further details on the13 database the13 reader is directed tohttpmaterialsdukeeduauroAUROARTICULAjcommatsci201202002pdf

Materials Innovation Infrastructure Lessons Learned and Future PlansSurya Kalidindi Georgia Institute13 of Technology

central activity in13 materials research13 is the exploration of processing-shy‐structure-shy‐property (PSP) linkages13 Workflow is defined13 as the sequence13 of steps employed for establishing13 PS linkages ofinterest to any specific engineeringtechnology application

High Throughput Explorations of PSP Linkages13 We must treat13 hierarchical13 material as acomplex System which means embracing and exploiting uncertainty quantification13 and13 complexity

management conceptstools13 It also means we need to design develop and13 validate decision support systems13 that will leverage the best current understanding (with13 its uncertainties) and13 provide objective13 guidance13 on future13 effort investment (combination of experiments and simulations)13 This meansleveraging Data Science to effectively feed the decision support systems described above by developingand deploying high13 throughput strategies for obtaining high value information from both experimentsand models

What are integrated13 workflows Integrated workflows are those that utilize the best13 combination of experiments13 and simulations13 in extracting robust and reliable PSP linkages13 This requiresengaging and exploiting cross-shy‐disciplinary expertise that includes materials science manufacturingsystems approaches uncertainty quantification computational science data and information sciencesThrough this one must ensure that the workflows output the critical information needed13 by design13 and13 manufacturing stakeholders in the materials development value chain13 Integration demands expertise

sharing13 which essentially is the core element of a collaboration Cross-shy‐disciplinary Integration13 demands13 effective13 collaborations13 such as those one might set up using the modern tools of the day13 ie a e-shy‐Collaboration

10

13

Motivators for e-shy‐Collaborations13 e-shy‐Collaborations are central to13 enhancing the productivity ofmaterials development activities in several ways

bull individual13 research groups can identify and engage13 complementary expertise13 needed for their projects

bull organizations can13 coordinate multiple teams and13 multiple projects to13 maximize outputimpactbull organizations can13 identify and13 seek the right partners that13 enhance their13 individual and combined

value e-shy‐Collaborations can13 be open semi-shy‐open or closed13 depending o the needs of the specific organization Productivity enhancement leading to dramatic reductions in cost and time13 expended in materials development is the13 main incentive13 for e-shy‐collaborations

Main Impediments to e-shy‐CollaborationsLack13 of tools to identify13 the13 ldquobestrdquo13 collaborators

bull Need software tools that quantify more precisely the expertise of individuals groupsorganizations

bull Need software tools for13 allowing rich annotations and e-shy‐recording of13 discussions Need analytic tools for13 identification extraction13 and management of metadata (ie13 high13 value abstractionsfrom the raw data)

bull Need searchable semantic metadata databases (these are13 lightweight and can be13 centralized

and customized for specific13 user communities) bull Need software tools for tracking provenance (ie13 whowhatwhenhow13 generated modified

etc) Lack13 of experience13 with the13 use13 of integrated workflows

bull Need software tools13 for e-shy‐recording of13 workflows and development of searchable workflow

databases bull Need recommendation schemes that suggest the best workflow(s) based on recorded prior

experience this provides means to transfer expertise13 acquired by an organization to other teams and other13 projects (even in the same organization)

bull Leads to possible automation and dramatic enhancement of productivity

MINED Grouprsquos Past Experience13 (~2005 -shy‐ ~2008) Funded by the Army Research Office ARO(David Stepp) engaged computer science (CS) graduates to produce13 a single13 user platform that13 integrated certain software codes previously produced13 by our group software development subsequently abandoned but recruited two CS graduates into PhD programs in Materials Science andEngineeringMechanical Engineering (MSEME)13 which dramatically altered the direction of13 research in

the group brief chronology of development bull Senior Design Project (2011-shy‐2012) An online13 portal designed for sharing data13 files and tools

learned the importance of user interfaces and the need for professional13 code developers andsoftware architects

bull HUBzero Instantiation (2012-shy‐2013) Built hub by customizing HUBzero for hierarchical structuralmaterials there are several advantages in terms of accessing and executing codes13 (on the server side) but there are significant barriers13 to establishing fast and productive13 cross-shy‐disciplinary e-shy‐collaborations

11

13

bull GTRI Integration Platform (2013-shy‐2014) Engaged professional code13 developers and software13 architects to design13 a multi-shy‐user web13 portal that allows data and13 code sharing while tracking

provenance and workflows underestimated the resources needed for such a project bull MATIN Collaboration Environment (2013-shy‐2014) Lightweight integrated software13 shell that allows

users to13 seamlessly interface with13 numerous cloud13 services offered13 through13 the web13 through13 their13 APIs Examples include GitHub for versioning and archival of codes Authorea forcollaborative editing of documents figshare for citable publication of research Plotly forcollaborative data analysis13 and visualization Google+ and LinkedIn for eteaming13 and networking MATIN focuses on building an emergent community ndash Materials amp Manufacturing Informatics (MMI) MATIN provisions web services for13 software integration platforms and innovationinfrastructure Examples of MATIN13

bull Spatial Statistics httpsgithubcomtonyfastSpatialStatisticsFFT

bull PyMKS httpopenmaterialsgithubiopymksindexhtml bull Example discussion about periodic boundary conditions in running Finite Element (FE or FEM)

simulations httpsgithubcomsfepysfepyissues267

In the platform developed it is noted that the user or userrsquos organization can set permissions at the

level13 of individual datasetscodes o the degree and13 type of13 sharingCentral Challenge Activation The transition from traditional disciplinary workflows to the

integrated workflows needs to overcome an activation barrier There is no13 clear incentive to13 get ahead13 of the curve in13 anticipating how the upcoming transformations in how materials development will be

pursued13 in13 the future Open (reproducible) science alone is not a sufficiently strong motivator for materials development community with13 strong interests in intellectual13 property development13 indeed itmight actually13 be perceived13 as a negative There is need to explore13 and refine13 what incentives work for individual13 researchers (eg increased productivity)

Future13 Plans13 Continue to13 develop13 and13 deploy MATIN at GT and any other organizationsgroups that13 express an interest in working with us Design create and distribute13 suite13 of ldquomobile13 firstrdquo Appsdesigned13 to13 make data ingest into13 collaboration environments such as MATIN from both material characterization equipment and multi-shy‐scale models an enhanced13 user experience with13 friendly andnatural interfaces Proactively promoteincentivize13 datametadata13 ingest into publicly shareddatabases Work with13 professional societies to13 announce and run new competitionsawards for13 recognizing and rewarding good practices by the members of the broader materials innovation13 usercommunity (eg Best13 Documented Data Award Best Documented Tool Award Best Documented

Workflow Award) Develop and deliver new workshops for training the new generation of13 graduate

students postdocs and junior faculty in the13 emerging workflows of Materials amp ManufacturingInformatics

High Performance Cast Aluminum Alloys for Next Generation Passenger Vehicle EnginesDongwon Shin Materials Science and Technology Division Oak Ridge National Laboratory

12

13

Oak Ridge National Laboratory (ORNL) teamed with Chrysler and Nemak13 to develop next-shy‐generation high-shy‐performance cast aluminum alloys that have13 improved castability high-shy‐temperature

strength and fatigue13 performance At this point the13 data management practice is limited13 to13 analytical data spreadsheets other than13 the CALPHAD computational thermodynamic databases and13 proprietary

property databases for casting simulation As the project goes on more robust data schema will bedeveloped13 to13 store various types of data populated13 from the experiments and13 calculations

Besides the data structure design the speaker discussed13 the necessity of a new approach13 to13 identifying key descriptors from the large dataset that can guide the design of new materials An alloydesign13 requires the balance of extremely complex relationships among a large number of descriptors Some13 examples of these13 descriptors include13 phase fraction13 of key intermetallics lattice parameters ofthe matrix and diffusion13 kinetics13 of solute atoms However optimizing these quantities relies on good

physical metallurgy intuition13 coupled13 with13 clever experiments dogged13 sleuthing and13 a healthy dose ofluck13 Such descriptor-shy‐based predictions of materials properties date back to13 the beginning13 of modern

solid-shy‐state chemistry and physics As13 shown in the Darken-shy‐Gurry map to predict the solubility ofelements in Tantalum13 intuitive descriptors13 such as electronegativity and radius13 could be used formaterials design However as the chemistry and physics13 of materials become more complex intuition-shy‐based13 search13 may not work anymore Thus the use of more advanced13 statistical analysis and13 datamining approach might be required to accelerate the design of complex multi-shy‐componentmulti-shy‐phase

alloy design13

Data Related to Cast Aluminum

Jake Zindel Ford Motor Company

The main focus of our research is the development of CAE (Computer-shy‐aided Engineering) tools to support13 two significant13 activities in the company component13 design (product13 development)13 and

manufacturing engineering (high-shy‐volume production) A key13 part of our research is conducting13 experiments to collect data13 for the13 development and validation of13 the CAE tools

The data13 we collect consists of mechanical properties physical properties microstructuralimages and manufacturing process parameters that form the13 pedigree13 of the13 material This information

can be in the form of Excel files proprietary-shy‐software storage files screen shots and written reports Our immediate need is to deal with this large amount of existing information collected over

many years on a ldquolocalrdquo level The first step is to organize many independent collections of informationin all13 their forms13 The next issue is the time required to enter all13 this information into the database13 Maintenance may also require a significant amount of effort Lastly there will also13 be a need13 to13 manipulate the data contained in the database using Excel Mini-shy‐tab MATLAB etc Therefore links or13 easy exportation of data13 to these13 types of applications will be13 necessary

The next level would involve dealing with information that has been created or collected byothers From an13 OEM (Original Equipment13 Manufacturer)13 perspective this would13 involve convincing

suppliers13 to share processing information that they may consider proprietary In addition to suppliersuniversities and13 national laboratories are also13 sources of data Data from all the13 sources will require13 some sort of protocol to determine its13 quality and resolve inconsistencies

13

13

In the end13 it seems likely that considerable budget and time will be required to implement comprehensive informationdatabases which13 could be the largest hurdle13 of them all

QuesTekrsquos MGIICME Approach to Alloy Optimization and Design (Including Cast Iron) Jason Sebastian QuesTek Innovations LLC

Integrated Computational Materials Engineering (ICME) methods involve the holistic application13 of different computational models across various length13 scales to13 the design development and13 rapid13 qualification13 of advanced13 materials Using itsMaterials by Design technologies QuesTek has successfully applied these ICME methods to13 the design development13 and full qualification of several advanced alloys (including aerospace13 structural steels aerospace13 gear steels advanced castable13 titanium alloys high-shy‐strength corrosion-shy‐resistant13 aluminum alloys13 and high-shy‐performance cast iron)

QuesTekrsquos ICME methods benefit tremendously from the activities of the US Materials GenomeInitiative (MGI)13 Recognizing that the ldquo10 to13 2 yearsrdquo that13 it takes for alloy design and development ishindering the rapid13 deployment of higher-shy‐performance materials13 the MGI is focused on ldquocutting in halfrdquo the time and cost13 to design and deploy novel advanced materials needed to enhance and sustain UScompetitiveness

QuesTek also has ongoing13 activities on the13 application of ICME methods to the13 design and

development of materials for the energy sector including advanced13 materials that enable vehicle

lightweighting13 Of particular note is an ongoing effort with a major diesel13 engine manufacturer on the13 development of high-shy‐performance cast iron13 alloys for engine block applications QuesTekrsquos efforts have

made extensive use of various MGI-shy‐type tools and databases including 1)13 density functional theory

methods for high-shy‐throughput13 screening of cast iron inoculant stabilities (structure lattice parameterstability of various13 oxidesulfide phases) 2)13 liquid iron mobility databases (to help accurately modelsolidification phenomena) and 3) thermodynamic13 databases13 (to examine nano-shy‐precipitate stability)

Moving forward to continue to accelerate the materials development cycle efforts shouldremain focused on the development13 of13 the fundamental databases (thermodynamic kinetic etc)13 that13 enable13 ICME models There13 should also be13 increased focus o efforts related13 to13 the integration13 of the

various computational databases models and tools (the ldquoIrdquo13 in ICME)

Integrated Computational13 Materials Engineering (ICME) of Generation Three Advanced High Strength

Steels Louis G Hector Jr General Motors RampD Center

While traditional material development methods often involve laborious trial-shy‐and-shy‐error testing Integrated Computational13 Materials Engineering (ICME) which falls under the umbrella of the Materials Genome Initiative is offering a means13 for materials13 development under a more compressed time scale Advanced13 high13 strength13 steels (AHSS) which13 contain13 multiple phases (eg ferrite bainite pearlite austenite martensite) and may exhibit13 phase transformation with strain are fertile ground for13 ICME However prediction of macro-shy‐scale constitutive behavior based upon the multi-shy‐scale physical chemical and mechanical phenomena13 in these13 complex microstructures is formidable challenge for ICMEUnprecedented collaboration between universities industry and government labs will be required to

14

13

address AHSS13 development archiving of data and the13 implementation of these13 materials into products that13 benefit13 the American consumer This presentation began with a brief overview of a newDOEUSAMP (US Automotive Materials Partnership)-shy‐funded ICME program aimed at13 the development13 of Generation13 Three (Gen13 3) AHSS The lack of commercial Gen13 3 steels that meet DOE targets is offering13 significant challenges13 to this13 new ICME program The program is13 meeting these challenges13 following four thrusts (1)13 making Gen 3 steels to meet DOE targets (2) passing experimental measurements of key

properties to13 an13 ICME model that produces microstructure-shy‐based13 constitutive models (3) materialevaluation in forming13 and vehicle13 performance13 simulations (4) archiving13 all relevant data13 for easy use13 by

others in13 the program Although13 the Gen13 3 AHSS ICME program is ldquomission13 orientedrdquo it demonstrates an important role for universities in providing the necessary foundation in fundamental materialsscience and engineering

Integrated Computational13 Methods for Composites Materials (ICM2)13 Lessons Learned from

Integration Planning and Feasibility Demonstrations Lara Liou GE Aviation

General Electric Aviation and Lockheed Martin Aeronautics have teamed through an Air Force

Research13 Lab13 program to13 conduct a demonstration13 of an13 Integrated13 Computational Materials Engineering digital framework that links material processing property and13 structural relationships to13 account for processability manufacturability and system performance13 with the13 goal of demonstrating

that13 the usage of13 integrated models can contribute to future airframe and engine designs with dramatic reductions in development13 time and cost Both engine and airframe Foundational Engineering Problems will be demonstrated through a common digital framework to be developed and validated onincreasingly complex articles13 The modeling tools that13 cover composite process models localized13 mechanical behavior models and global design models (Convergent Manufacturing TechnologyrsquosCOMPRO CCA Autodeskrsquos Autodesk Simulation13 Composite Analysis and13 an13 Abaqus based13 University ofMichigan micromechanics model) are centrally linked via a commercially available model integration

interface (Phoenix Integrationrsquos ModelCenterreg)13 Detailed plans of what data will13 be passed among thevarious models as well as how it will be passed have been completed Creation of these detailed13 process maps and13 simple feasibility demonstrations have raised13 inherent technical and13 logistical challenges13 to integrating commercially13 available multi-shy‐scale models Robust model integration methods13 to address these challenges were13 discussed

The exercise of drawing up detailed process maps for what to integrate and how to integrate13 the selected tools and models resulted in key early lessons learned and identification of13 tools critical tolong-shy‐term robust and sustained application of13 Integrated Computational Methods for Composites Materials (ICM2)13 The key lessons learned include

bull Individual13 models must be relatively mature and validated before an ICM2 problem can be

confidently13 defined bull F analysis-shy‐based13 tools within13 the same digital thread should utilize13 the13 same13 FE solver bull The team ideally13 should consist of at least one expert for each13 tool and13 its underlying model as

well as an expert in the digital thread and integration interface tools and programming

15

13

bull Mesh interpolation between13 finite element analysis based13 tools will likely be required and13 acommercially13 available tool for this13 type of composite mesh interpolation is13 key13 for robust and

sustained application of ICM2

Residual Stress Engineering of Rotating Nickel Components Terry Wong13 Aerojet Rocketdyne

The goal for the PW-shy‐8 project13 The Nickel Base Superalloy Residual Stress Foundational Engineering Problem program aims to13 develop13 a multi-shy‐disciplinary analysis approach13 to13 predict and13 incorporate bulk residual13 stress in the13 design of component This will allow for design of part that is optimized for a particular application or design intent (such13 as weight performance or reduction13 ofscrap) To accomplish this13 goal the PW-shy‐813 team has two major tasks 1) the13 development of an13 infrastructure and associated tools needed to predict and incorporate bulk residual13 stress in the designstages13 of a part and 2) a demonstration task that will validate the infrastructures13 and tools13 developed

The Data13 Management task is the foundational element13 of13 the larger13 infrastructure

development task The goal for the data management work is to13 develop13 an13 infrastructure that allowsthe various team members in PW-shy‐813 to share13 data13 and to store13 data13 in database This not only ensures that data generated13 by this program is never lost but also13 allows for data to13 be transferred13 in13 a secure

manner In order to meet the goal of a system13 that securely stores and shares data three major tasks must be accomplished 1) Validation that there13 is secure13 method to transfer data13 between team

members and that the method of storing data is also secure 2)13 development of13 a database schema and

3) development of means to easily upload data13 into the13 database13 and download data13 from the13 database

Of the three the most difficult task was to create a system in which the transfer and storage ofdata was secure Various proposals were made to13 ensure the security of the data but many of them

were rejected not because the proposal was actually unsecure but because it was not acceptable byeach of the13 team memberrsquos IT security policies The13 process to get each team membersrsquo IT department to approve the security of13 the data transfer13 and storage took more effort13 than anticipated In the endthe team members along with their13 respective IT departments approved the use of13 TRUcentrix as a tool to transfer13 data and the use of13 an ASM International13 hosted13 database to13 store data

The development of database schema13 is necessary step in creating the database to store all the data generated (via physical tests as well as data generated by models) A big help in creating the

database schema was to13 use the schema generated13 by Dr Steven13 Arnold13 from NASArsquos Glenn13 Research13 Center Using NASArsquos schema eliminated13 the need13 to13 start the database build13 from scratch13 and13 allowed13 us to13 focus o the specific needs of storing residual stress data

The last task involves the development of tools that make the input of data13 easy The moredifficult it is to13 enter data into13 a database the more likely data13 will not be13 entered Excel-shy‐based13 templates are being developed in order13 to make data entry as easy as possible

The two big lessons learned from PW-shy‐813 in relationship to data13 management are

bull It is essential13 to work with each13 companiesrsquo IT and IT Security people to develop a system thatsatisfies13 the security requirements of each company

16

13

bull Database schema development should not be reinvented but begin with already established the13 schema

Perspectives on MAI ProgramsDr Mike Glavicic Rolls Royce

The current formulation of MAI (AFRLrsquos Metals Affordability Initiative) programs are structured13 independently from one another in that a standard methodology does not exist that describes whatdata is to13 be archived13 at the conclusion13 of a program Nor does there exist a standard13 protocol of howto archive things on a server13 or some other computer resource at the conclusion13 of a program As aresult the current13 practices of13 MAI programs

bull Do not address IP (intellectual property)13 and licensing issues in systematic fashion which leadsto issues in the commercialization of ICME software that is generated13 under a program

bull Do not have a formalized requirement of how data is to be packaged and stored at the end of aprogram Hence even13 in13 the best of cases in13 which13 all of the data is stored the data is archived13 in13 simple folder13 fashion that13 is not13 searchable in any manner13 possible Moreover even if13 all of13 the

data is present it is extremely cumbersome to13 untangle the data for future use using a file folder strategy

Hence at the end of the program both the government and13 MAI team are13 left to manage13 the13 data and13 results in13 an13 unregimented13 fashion

The lack of regimented protocol of how to package programrsquos13 results13 (raw data and reports) makes it extremely difficult for new government programs to leverage the data from previous programs Case in13 point the RR-shy‐1013 program was developed with the13 singular goal to integrate13 the13 models developed13 in13 the LAD-shy‐213 program into commercial software13 platform From the13 outset of the13 program data models and13 other information13 that was required to attained the13 goals of the13 RR-shy‐1013 program were13 either difficult to track down lost or not provided to the13 RR-shy‐1013 team due13 to contractual disputes As aresult funding that13 was earmarked for13 specific technical work at13 the conception of13 the13 program had to

be diverted13 for the recreation13 of data that existed13 in13 a previous program and13 the progress of the

program was stunted13 by this unforeseen13 burden In13 spite of the RR-shy‐1013 programrsquos13 ability to overcome

these issues and become a successful program it13 should be noted that13 the success of13 other13 programs that13 will encounter13 similar13 problems will likely not13 apply as RR-shy‐1013 was very fortunate13 to have13 the13 means to overcome these serious issues that13 were a result13 of13 the lack of13 a data management13 plan

Furthermore if the13 MGI initiative13 is to progressively develop materials models that are13 leveraged and improved in future government funded programs a policy on how to store and share thedata from disconnected13 programs is required The PW-shy‐1113 program is the first13 step in this process as it13 will have the data from three MAI programs (RR-shy‐10 GE-shy‐10 PW-shy‐9) uploaded into database13 that will be13 searchable and will enable the reuse of the data collected in these programs However even if this13 program is a success13 there will undoubtedly13 be other features13 that will be required in the platform

developed13 and13 hence future modification13 will be required Most importantly there does not currently13 exist server that has been set up and funded for the13 archiving13 of all13 future programs with a protocol13

17

13

13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

that13 must13 be adhered to at13 the conclusion of13 a funded program Without13 such a protocol and the

existence13 of server to be13 used in all future13 programs the13 goals set forth by MGI will be13 unnecessarily

difficult and13 more expensive to attain

MAI Data Informatics Data Standards Modeling amp UQDr Rajiv Naik Pratt amp Whitney

The MAI Data13 Informatics Data13 Standards Modeling amp Uncertainty Quantification (UQ) (PW-shy‐11)13 effort13 will enable the pervasive implementation of ICME by developing (i) a web-shy‐based13 OEM-shy‐Suppliercollaboration framework13 for production13 certification13 and13 process modeling data (ii) data standards and

protocols for interoperability between13 OEMs and13 Suppliers and13 (iii)13 UQ methods to capture variability in

process microstructure and property data Models and data13 from earlier MAI programs such as GE-shy‐10 PW-shy‐913 and RR-shy‐1013 will be13 used to develop and implement OEM-shy‐Supplier integration data standards data

structures and UQ models Several OEMs and suppliers (Pratt amp Whitney General Electric Rolls-shy‐RoyceTIMET Alcoa13 and RTI International Metals Inc) are working together on the PW-shy‐11 Team

The OEM-shy‐Supplier collaboration tool will enable data13 exchange related to mechanical testingmicrostructural13 characteristics processing and process modeling The13 data13 standards developed by the

PW-shy‐1113 team will provide13 an industry standard for such OEM-shy‐Supplier collaboration The collaboration13 framework would provide a platform to manage OEM-shy‐Supplier13 workflows data and13 informationsecurity and access13 control and the13 ability to perform system administration without the13 need13 for day-shy‐to-shy‐day intervention13 of the IT department The capture of such13 supplier data over the collaboration

framework will enable the OEMs to enhance process and property13 models data analytics and UQ

modeling Three potential technologies will be evaluated13 for the OEM-shy‐Supplier Collaboration13 Frameworkndash GE Forge Vanderbilt Forge and Semantic Web technology and one13 technology will be selected13 for the

next phase of the program

18

13

Semantic Web technology is paradigm shift from highly structured13 databases to13 more flexible

graph databases from single database searches to13 multiple and13 disparate13 database13 searches and from

keyword searches to contextual and meaningful (semantic) searches It can provide flexibility13 of data

structures and enhanced OEM-shy‐Supplier13 interoperability However it requires more work upfront13 for13 implementation as it needs the development of a common vocabulary and relationships

The data13 standards and data13 structures effort will leverage work at national and13 international standards13 organizations such13 as ASM ASTM CEN and13 NIST and13 develop13 standards and vocabularies forproduction13 certification data that can include chemistry microstructure and mechanical properties This effort will also develop means for the capture of data needed13 for the UQ and13 ICME models andbuild13 databases for13 the capture of13 the data from the GE-shy‐10 PW-shy‐9 and RR-shy‐10 MAI programs

The ICME13 models being developed under the GE-shy‐10 PW-shy‐9 and RR-shy‐10 programs involve severalinterconnected models related to processing microstructure and mechanical properties The UQ effort under PW-shy‐11 will involve considering the uncertainty at each13 step13 of the modeling process and flowing it through the interconnected models to evaluate the relationships between13 the variabilities of13 the

processing parameters the microstructure and13 the mechanical performance

ASMrsquos Computational Materials Data Network amp the NIST-shy‐ASM Structural Materials Data

Demonstration ProjectScott D Henry ASM International

ASM International ndash the worldrsquos largest13 association of13 metals-shy‐focused materials engineers scientists technicians educators and students13 ndash has launched13 a series of initiatives under the

ldquoComputational Materials Data (CMD)13 Networkrdquo umbrella The objective13 of the13 CMD Network is to serve13 as center for information collection and dissemination for materials data13 to support integrated

computational materials13 engineering (ICME) and to help realize the goals13 of the US Materials13 Genome

InitiativeThe first major project for the CMD Network is the NIST-shy‐ASM Structural Materials Data

Demonstration Project (SMDDP) a cooperative research project funded by NIST with participation from

the Kent13 State Center13 for13 Materials Informatics and Granta Design Ltd13 The main13 objectives of SMDDP

are13 to

19

13

bull Establish well-shy‐pedigreed13 and13 curated13 demonstration13 datasets for non-shy‐proprietary metallic structural materials13 data over multiple length scales The team has13 chosen to work with 6061

aluminum and the13 related Al-shy‐Mg-shy‐Si system

bull Work with NIST and the materials data community to develop materials data schema and

ontologies for the demonstration13 datasets cognizant of broader interests and13 datasets

bull Develop and carry out a series of test problems that represent relevant use cases for the13 repository

bull Make data open to the materials data community for use in data analytics modeling and

educational purposes

bull Actively engage the materials data community and13 widely disseminate the findings from the

project

bull Develop and implement data13 capture13 and curation procedures that can serve13 as models for otherdata repositories

The basic philosophy behind the project is that often standards13 emerge through trial and error and examples are13 needed to spur this process

The project is using the NIST-shy‐KSU DSpace13 repository for data13 files The13 curated structured

database is being developed13 using GRANTA13 MI software and13 is hosted13 o ASM serversThe project ndashlaunched in late 2013 and planned as an 18-shy‐month program13 ndash is being conducted in

phases The13 first phase13 involved developing starter database13 and gaining experience13 in data13 curation The second phase (currently underway) involves building out more comprehensive data13 sets that will be

suitable for carrying out test problems13 related to processing-shy‐structure-shy‐property relationships in13 Al 6061 The third phase will involve making further refinements in response to testing The complete data13 set will be opened up online to the materials community as a whole

Globus13 Scientific13 Data Services13 Current and FutureBen13 Blaiszik1 Kyle Chard1 Jim Pruyne1 Rachana Anathakrishnan1 Steve Tuecke12 Ian Foster12

1University of Chicago Computation Institute 2Argonne National Laboratory MCS Division

During this talk we discussed a variety of currently available Globus services13 Via these Globusservices developers13 and IT administrators13 can offer simple secure and robust file transfer as13 well as13 identity profile and group management to their user communities13 Globus allows end users to create

and13 manage a unique identity that can13 be linked13 to13 external identities for authentication End13 users can13 also transfer files across wide13 area13 networks faster and more13 easily whatever their location

The talk also covered prototype services related to data13 publication metadata tagging and13 data

search Globus13 publishing capabilities13 are delivered through a hosted service Metadata is13 stored in the

cloud while raw published data and a fully-shy‐formed metadata log are stored on campus institutionaland group resources that13 are managed and operated by campus administrators Published datasets are

20

13

organized13 by communities and13 their member collections Globus users can13 create and13 manage theirown13 communities and13 collections through13 the data publication13 service13 with collection level policies regarding user13 access Additionally metadata from these communities are imported into a centralized

index to allow for robust searching functionality

Semantics13 for Deep13 InteroperabilitySamuel Chance iNovex

Semantic technology is based on mature13 web standards and intended to be13 used as global framework for13 machine-shy‐to-shy‐machine communication Interoperability is enabled even when the more

basic elements of the semantic technology stack are used13 in13 conjunction with a controlled vocabularyKnowledge13 representation is enabled as one13 begins to utilize13 the13 ldquohigherrdquo elements of the13 stack

brief history of the semantic web13 was presented13 and13 started13 with13 a depiction13 of the web13 that most people see today documents and web pages presented to the user13 through a web application It13 then depicted a transition to an intermediate state where documents and data can co-shy‐exist on asemantic13 web riding atop the same infrastructure (http) Even in this13 state the data can13 be directly

interpreted by a machine13 The history concluded with a depiction of data being extracted from adocument to13 become part of the web13 of data

The basics of semantic technology were presented and emphasized that the technology has afoundation based13 o formal logic that enables software13 to understand information and reduce13 the13 need

for13 a human to interpret13 the meaning This section of13 the presentation concluded with a chart13 depicting

how ldquotriplesrdquo link to13 each13 other to13 form a web13 of dataThe final section touched upon adoption of13 and things to consider13 when semantic technologies

are13 being considered as solution for community or enterprise

21

13

Big Data13 and Semantic TechnologiesHarlan Shober RJ Lee Group

The presentation started with a description of entities that13 comprise laboratoryinformatics with the central elements being a federated architecture scientific ldquoBig Datardquo management13 and workflow-shy‐driven Laboratory Information Management13 Systems (LIMS)To aid attendee understanding initial context13 was provided by first13 defining for ldquoBig Datardquo and

Semantic Technologies Context13 was further bolstered through a description of a problem that13 included laboratory testingcharacterization and fabrication processes The challenges for atypical laboratory include the use of many systems and the creation of large amounts of data13 that13 grows daily but13 cannot13 be readily accessed

After the context13 was established the needed functionality was described and consistedof several elements contribute index search analyze and archive While bearing in mind theneeded functionality a string of universal scientific data13 management13 process categories werethen presented with the intent13 of applying the universal description to the materials design anddevelopment13 domain

22

13

Though semantic technologies are applicable over the entire lifecycle of the data13 management13 process three categories were specifically identified as primary candidates fortheir application Data13 Acquisition Data13 Manipulation and Storage and Discovery andAdvisory W3C standards for Resource Description Framework (RDF) were applied to a notionalexample with the intent13 of completing a thread that13 gave the attendee a sense of how thistechnology could be applied

The presentation continued with a description of Postulate a specific solutiondeveloped by RJ Lee Group that13 could be used as part13 of a suite of tools to enable the captureof data13 and the near-shy‐frictionless tagging of data13 with additional observations provenanceinformation or links to additional data Using tools like Postulate repositories could be moreeasily shared with others in a federated manner These technologies would enable thedevelopment13 of indices that13 increase the precision of search results and ease analytics since thedata13 would be ldquoself-shy‐describedrdquo

In the remainder of the presentation Postulate functionality was further described andsome of the challenges presented earlier were addressed For example how do you processthe quantity and variety of data how do you store it how do you find the data13 you care aboutand how can the data13 be presented to you in a meaningful way

Natural13 Language Processing (NLP)Tim Hawes Decisive Analytics

The presentation provided the attendees with a description of NLP13 and how it can be13 used It started with examples13 of how it can be used to mine ldquoBig Datardquo to predict credit card fraud or assess13 user purchase histories for product development NLP is also13 being used13 in13 the field13 of biology and13 appears to have13 enjoyed some13 level of success

Since13 the13 biology and materials communities both publish large13 volume13 of documents (abstracts papers presentations etc) many of13 the NLP approaches developed for13 biology may be

applicable13 to materials The next section of the presentation focused on describing various aspects of NLP tokenization

part-shy‐of-shy‐speech tagging disambiguation parsing entity disambiguation and role13 labeling This was helpful in13 removing the mystery surrounding NLP for those13 who hadnrsquot had previous exposure

Typical products from NLP13 would be useful to the materials development community Examples include providing the gist of a documentrsquos content providing answers to natural13 language questions and knowledge13 construction

Specific NLP13 challenges for the13 materials domain were13 identified inline13 use13 of equations prevalence of tables graphs and13 images (more of a computer vision13 problem) and13 numerically driven13 information13 Additionally the ldquogenrerdquo problem was discussed as13 well as13 a way this might be addressed

using raw data (versus learning data) and13 semi-shy‐supervised learning active learning and distant supervision approaches

23

13

The presentation concluded by reiterating how NLP13 can be used and the value of using it describing some its limitations and13 identifying tool development needed13 for the materials domain

24

13

Community13 and13 Government Priorities in13 Addressing13 Materials Data

The following numbered13 priorities for13 both community and government13 actions were lsquorolled uprsquofrom workshop participantsrsquo inputs and were reviewed at13 the workshop The raw inputs provided by

participants are captured13 under each13 numbered13 priority and provide context to13 the priority and13 often13 more specific actions

Community-shy‐Led13 Priorities1 Develop13 and deploy13 standards13 for data and federatedcollaborative environments-shy‐ Promote13 standards for data13 and collaboration environments-shy‐ Designing and deploying suitable software environments that are capable of facilitating e-shy‐

recording and accelerating (fail often-shy‐fail fast)13 of13 the communityrsquos experience (both successes and

failures)13 in exploring integrated workflows needed to realize the vision of ICME and MGI-shy‐ Data13 standards (program managementtechnical datametadata)-shy‐ Community must start developing a common13 data format data package could13 look like an13

HDF5 file and be shared in a variety of systems -shy‐ Begin13 definition13 of standardized13 infrastructure

-shy‐ Produce13 draft guideline for the metadata including information regarding authorship provenance targethellip

-shy‐ Share13 the13 data13 types being collected with the13 community Include13 nametypeunits of data

-shy‐ Community must adopt a federated13 approach13 -shy‐ allow organizations to autonomously create their13 own13 toolsets but be able to13 readwrite to13 the common13 data package mentioned13 in13 point 1

-shy‐ Good documentation on repositories are important-shy‐ Address networking of repositories (lightweight metadata databaseindex) -shy‐ Digital heterostructure microstructure mapping algorithms for both scanning electron microscopy

(SEM)13 and transmission electron microscopy (TEM) -shy‐ Materials process model (thermodynamics) integration with ICMSE (Integrated Computational

Materials Science and Engineering) data flow

-shy‐ Data sharing protocol across users (government13 OEM13 developers) with appropriate securitycontrol

-shy‐ Work with instrumentequipment OEMs to get auto ingest capabilities and open architectures-shy‐ Work together to define common vocabularies for interoperability-shy‐ Standard tests for ICME13 model data13 standards for microstructure13 data

-shy‐ Standards for ICME13 model data13 storage13 and workflows 2 Communicate the value and need13 for digital materials13 data-shy‐ Brand13 management ndash consistent messaging of goals13 and beliefs -shy‐ Share13 success stories -shy‐ Value (ROI13 Return on Investment) o data capture (globally locally)-shy‐ Give incentive ($$$) for case studies to provide evidence that a shared data repository can be

actually useful beyond the archival purpose

25

13

-shy‐ Share13 ROIs for MII (the13 Materials Innovation Infrastructure)-shy‐ ICME projects due to their iterative nature take longer and are more expensive to realize benefits

in industry implementation13 Multi-shy‐year ICME programs are becoming13 a hard sell to management difficult to13 establish ROI and13 realize benefits claimed

3 Engage the community-shy‐ Write grant proposals-shy‐ Collaboration13 (or e-shy‐collaboration) science that will help us13 identify13 the specific13 incentives13 and

impediments for establishing mutually beneficial highly productive sharing of expertise between13 disciplinary experts in13 materials manufacturing and13 data sciences

-shy‐ Continue to13 work o hard13 problemsmdashdevelop13 open13 source codes discover new descriptors data

mining uncertainty analysis-shy‐ Organize as a community (governmentindustryacademia nationalinternational) -shy‐ Follow data13 recommendations in MGI strategic planmdashworkshops to develop roadmap-shy‐ Attend13 more events like the MGI workshop13 mdash gaining13 lessons learned from other organizations is

extremely valuable

4 ldquoDefine critical datardquo13 to be13 compiled-shy‐ Consensus o what fundamental genomic data is most-shy‐important to compile (thermodynamics13

kinetics13 properties [strength]13 high-shy‐throughput13 DFT) -shy‐ Agree to13 the concept of a ldquouniversal repositoryrdquo for all genomic data that everyone will contribute

to and use (cf everyone maintaining and hosting their13 own databases) this includes issues of13 data

security and sharing etc -shy‐ List of most-shy‐important ICMEMGI13 materials development activities from an industrycommercial13

perspective (alloys composites piezoelectrics etc)-shy‐ Some13 ongoing repository efforts seem redundant We13 should avoid unnecessary competition and

try not13 to step o other peoples toes5 Explore and leverage data solutions developed13 outside materials13 community-shy‐ Semantic web infrastructure13 ndash evaluate13 feasibility of lsquounstructured materials datardquo management

(Technical pathcosttimeframe) -shy‐ Keep communicating outside13 the13 community about solutions -shy‐ Focus on available13 tools open ones when possible

6 Train13 students-shy‐ Sponsor students13 who can think in terms13 of IT and MSE

-shy‐ Training workshops (through professional societies) for training the new generation of materials students13 and researchers13 in the use of readily (already) accessible data science tools13 and cloud

services13 to make13 the13 best use13 of this emerging13 infrastructure13 in enhancing13 their own individual day-shy‐to-shy‐day productivity

7 Establish community13 norms13 for publishing materials13 data-shy‐ Involve publishers by requiring data when publishing Begin by making just the data available first

and then work on standardization panel

26

13

-shy‐ Data associated with publications should be shared along with the publications Not only the

metadata (eg average grain size vs cooling rate) but also the source of the data (eg micrographs used13 to13 generate13 grain size temperaturetime charts13 used to calculate cooling rates etc)

Government-shy‐Led13 Priorities1 Lead13 data strategyapproach13 development-shy‐ Hold meeting like this one in 6-shy‐1213 months with purpose13 of developing roadmap for MII-shy‐ Define a technical roadmap for13 the materials data infrastructure

-shy‐ Governmentrsquos got to decide on a strategy-shy‐ Whatrsquos the critical nucleus eg getting13 CALPHAD off the13 ground

-shy‐ Government must establish an official stance on security and infrastructure-shy‐ List of most-shy‐important ICMEMGI materials development activities from a government perspective

(alloys composites piezoelectrics etc)13 -shy‐ Needs assessment for professional workforce development-shy‐ Work with industry to make sure the research-shy‐to-shy‐engineering13 gap is filled What is the practical13 use

of the projects -shy‐ Provide13 the13 materials community organizing function

-shy‐ Business case for productivity improvement with13 data readily available

-shy‐ Facilitator as honest broker of capability awareness and data13 sharing

-shy‐ Interdisciplinary coordinated effort13 to avoidminimize duplication and redundancy

2 Facilitate data deposition (policy amp technology)-shy‐ Setting expectations and guidelines for best practices and standards where13 relevant -shy‐ Invest in tool13 making data management as easy as possible-shy‐ Promote standards developmentcommon data structuresshare knowledge representations -shy‐ Learn about the WBC Semantic Web capabilities and limitations -shy‐ Government should foster community development for both data package refinement and creating

standard approach to creating an enterprise toolset -shy‐ Provide13 motivation for standard data13 collection ensure13 it includes pedigreeprovenancemeta-shy‐data

-shy‐ Facilitator for materials data13 standards (good and trustworthy data13 vs bad data) -shy‐ Facilitator for forming consortium for data13 mining data13 warehousing and sharing

-shy‐ Facilitate13 industry standards for ICME13 model validation and maturity levels-shy‐ Facilitate13 standards for model-shy‐based13 process and13 supplier qualification

3 Incentivize data deposition-shy‐ Provide13 motivation for13 standard data collection ensure it13 includes13 pedigreeprovenancemeta-shy‐data-shy‐ Push ldquodata13 management planrdquo requirements to ldquodata13 depositrdquo requirements -shy‐ Make data sharing a requirement -shy‐ Contractual requirements that preserve data for future use

-shy‐ Demand data submission as a deliverable that will be in the contract before paying the funds -shy‐ Require that any data generated13 by federal grants is provided13 to13 the government (or a designated13

repository)13 within 3 years after completionmdashstart with basic13 research

-shy‐ Demonstrate the return on investments in the materials data infrastructure

27

13

4 Support13 data sharing13 and deposition13 (provide resources)-shy‐ Increase of MGI13 funds dedicated to MII13 development to 10-shy‐-shy‐repositories amp collab platforms -shy‐ Commit to13 long-shy‐term support for data repositories amp collaborative platforms-shy‐ Funding for development collection of fundamental genomic data

-shy‐ Support thermodynamicskinetics database development for light-shy‐weight alloy systems-shy‐ Funding for universal data13 repository (eg at NIST) ndash creation and maintenance

-shy‐ Fund or establish data13 repositories with commonuniform data13 collection format -shy‐ Createidentify secure data sharing site (like AMRDEC)13 for13 government13 contractorsmdashnot a

repository

-shy‐ AFRL take lead13 o making a database that can13 be13 shared

-shy‐ Infrastructure for materials related semantic web applications5 Support13 data creation-shy‐ Support experimental measurement effort to provide13 quality data for13 database development -shy‐ Support more13 high-shy‐throughput13 (HT)13 studies13 for applied programs13 to populate data Current HT

approaches support are13 limited to too fundamental single-shy‐phase DFT calculations 6 Enhance data management13 competency-shy‐ Government needs to increase in-shy‐house expertise in13 these areas or alternatively work closer with13

groups like NIST We cannot allow other organizations to define our needs for us7 Provide quality datasets13 for model13 developmentvalidation-shy‐ Need for open model benchmark cases with data to be available to the community

28

13

Attendees

John Allison University of MichiganJohn Beatty Army Research Laboratory

Ben13 Blaiszik University of ChicagoANL Carrie Campbell NIST

Sam Chance13 InovexJennifer13 Fielding Air Force Research LabMarco Fornari Central Michigan

Tim Hawes Decisive Analytics Corp

Lou Hector General MotorsAndrea Helbach Air Force Research LabKris Hill ITI-shy‐GlobalMike Glavicic Rolls Royce

Ro Gorham NCDMM

Dennis Griffin NASA

Mike Groeber Air Force Research LabScott Henry ASM InternationalMatt Jacobsen Air Force Research LabWill Joost Dept of Energy-shy‐EERE

Surya13 Kalidindi Georgia Tech

Kevin Kendig Air Force Research LabBen13 Leever Air Force Research LabLara Liou GE Aviation

Kenny Lipkowitz Office of Naval Research

Bill Mullins Office of Naval Research

Benji Maruyama Air Force Research LabRajiv Naik Pratt amp WhitneyJim Neumann Honeywell Ruth13 Pachter Air Force Research LabClare Paul Air Force Research LabAmra Peles United13 Technologies Research13 CenterAndy Powell GE Aviation

Brian13 Puchala University MichiganAjit Roy Air Force Research LabAyman13 Salem Materials ResourcesTom Searles Materials Data Management Jason Sebastian QuestekRichard13 Serwecki NASA

Dongwon Shin Oak Ridge13 National Labo

29

13

Harlan Shober RJ LeeJeff13 Sheehy NASA

Jeff13 Simmons Air Force Research LabVeera Sundararaghavan University of MichiganSesh Tamirisakandala RTI Glenn Tarcea University of MichiganTJ Turner Air Force Research LabVasisht Venkatesh Pratt amp WhitneyChuck Ward Air Force Research LabJim Warren Natl Institute of Standards amp TechJeff13 Williams GE Aviation

Terry Wong Rockedyne

Wayne Ziegler Army Research13 Laboratory

Jake Zindel Ford Motor Company

30

Page 7: Materials Genome Initiative: materials data · universallyaccepted best practices. Under the MGI, Federal agencies have funded research and development designedtoexplore, develop,

13

MGI Materials Data Workshop Organizing Committee

Air Force Research Laboratory (AFRL)

Dr Charles13 Ward amp Mr Clare Paul

Army Research Laboratory (ARL)

Dr John Beatty amp Mr Wayne Ziegler

Energy Efficiency and Renewable Energy Department of13 Energy (DOE-shy‐EERE)

Mr William13 Joost

National Aeronautics and Space Administration (NASA)

Dr Stephen Smith amp Dr Dennis13 Griffin

National Institute of Standards13 and Technology (NIST)

Dr James Warren

Office of Naval Research (ONR)

Dr William13 Mullins amp Dr Kenny Lipkowitz

3

13

AgendaMGI Materials Data Workshop

Tec^Edge 5000 Springfield Street Dayton OH

1 -shy‐ 16 July 2014

Tuesday 074 ndash 0800 Check-shy‐in

080 ndash 0815 Chuck Ward13 (AFRL) ndash Introductory remarks

Discussion Lead Will Joost (DOE-shy‐EERE)

081 ndash 0845 Jim Warren (NIST)13 ndash Data Infrastructure

084 ndash 0915 John Allison Brian13 Puchala (Univ Michigan)13 ndash PRISMSALMMII

091 ndash 0945 Carrie Campbell (NIST) ndash ChiMaD

094 ndash 1015 Discussion

101 ndash 1030 Break

Discussion Lead John Beatty (ARL)

103 ndash 1100 Marco Fornari (Central Michigan Univ)13 -shy‐ AFLOWLIBorg

110 ndash 1130 Surya13 Kalidindi (Ga13 Tech) ndash Mosaic of MicrostructureIGERT-shy‐CIF21

113 ndash 1200 Discussion

120 ndash 1230 Lunch

Discussion Lead Bill Mullins (ONR)

123 ndash 1300 Tom Searles (MDMi) ndash Cast Aluminum

130 ndash 1320 Dongwon Shin (Oak Ridge13 National Laboratory ORNL)13 ndash Cast Aluminum

132 ndash 1340 Jake Zindel (Ford Motor13 Co) ndash Cast Aluminum

134 ndash 1415 Discussion

141 ndash 1430 Break

Discussion Lead Jim Warren (NIST)

143 -shy‐ 1500 Jason Sebastian (Questek)13 ndash Cast Iron

150 ndash 1530 Lou Hector (General Motors) ndash Advanced13 Steel

153 ndash 1600 Lara Liou (GE Aviation)13 -shy‐ Integrated Computational13 Methods for Composite Materials

160 ndash 1700 Discussion

1700 Adjourn13 for the day

4

13

Wednesday

074 ndash 0800 Check-shy‐in

Discussion Lead Wayne Zeigler (ARL)

080 ndash 0830 Terry Wong (Rocketdyne) ndash Nickel RS FEP

083 ndash 0850 Mike Glavicic (Rolls Royce) ndash MAI Ti Modeling

085 ndash 0910 Rajiv Naik (Pratt amp Whitney) ndash MAI Data Informatics Data Standards Modeling amp UQ

091 ndash 0940 Discussion

094 ndash 0955 Break

Discussion Lead Dennis Griffin (NASA)

095 ndash 1015 Ro Gorham (National Center for Defense Manufacturing and13 Machining NCDMM)13 ndashAmerica Makes

101 ndash 1045 Richard13 Serwecki (NASA) -shy‐ MAPTIS

104 ndash 1105 Scott Henry (ASM International) -shy‐ Computational Materials Data Network

110 ndash 1135 Discussion

113 ndash 1200 Lunch

Discussion Lead Clare Paul (AFRL)

120 ndash 1230 Ben13 Blaiszik (Univ of13 ChicagoANL)13 -shy‐ Globus

123 ndash 1300 Mike Groeber (AFRL) ndash DREAM3DSIMPL

130 ndash 1320 Sam Chance13 (iNovex ) -shy‐ Semantics for Deep Interoperability

132 ndash 1340 Harlan Shober (RJ Lee) -shy‐ Big Data and13 Semantic Technologies

134 ndash 1400 Tim Hawes (Decisive Analytics Corp) ndash Natural Language Processing

140 ndash 1445 Discussion

144 ndash 1500 Wrap up

1500 Adjourn

5

13

Presentation13 Summaries

Welcome13 and Introduction Chuck Ward Air Force Research13 Laboratory

The primary intent of the workshop13 is to promote communication between the groups being

funded under13 the Materials Genome Initiative that13 had significant13 interest13 in the management13 of13 materials data The greater awareness is expected to enhance the likelihood13 for success of the

individual13 efforts and speed13 the communityrsquos understanding of how best to13 proceed13 in13 developing an13 MII

The Materials Research Society (MRS) and the Minerals Metals and Materials Society (TMS)conducted a survey13 of the community13 in MayJune of 2013 to gauge13 attitudes and needs in materials data Over 650 people took the survey13 with 74 of respondents saying they would be willing to share

data if it were encouraged13 as a condition13 of funding or publishing The community showed13 the most interest in gaining access to databases of some of the most basic and essential13 materials data such asthermo-shy‐physical properties

The survey asked participants to evaluate advantages of local versus centralized databasesCentralized13 repositories were perceived13 by respondents to13 offer advantages in13 cost an13 ability to13 share

onersquos data provisioning of uniform data standards and13 an13 ability to13 organize and13 find13 data Localrepositories were perceived to provide advantages in ease of data input time efficiency and flexibility inaccommodating new data13 types

The survey also asked participants to provide their thoughts on impediments to and motivators for13 sharing data Most13 of13 the impediments can be attributed13 to13 concerns of ownership13 of data orexternal data13 restrictions while technical issues such as file size and proprietary formatting were13 also

concerns Overall participants saw the ability to contribute data to the community for reuse and

critique positively The highest motivator was13 to enhance onersquos13 own research visibility Interestingly

respondents saw adding metadata as a motivator presumably due to13 the perceived13 higher value13 of data13 when it is13 described more robustly

6

13

MGI Efforts in Infrastructure Development13 at NISTJim Warren National Institute of13 Standards and Technology

The presentation provided an overview of NIST activities supporting the creation13 of a materials data infrastructure The infrastructure could13 start by defining the interfacesAPIs (Application Programing Interfaces) with which user is able13 to interact for13 the deposit13 or13 retrieval of13 data Beyond13 an interface the following are essential elements for creation of viable infrastructure

bull The establishment of repositories to13 store the databull Tools to enable digital capture of the information preferably at the time of creation

(computation or13 experiment) bull Tools to enable markup of the data13 with sufficient metadata13 to inform someone else how the

data was created as well as various attributes that would help the user judge the quality of theinformation

bull Assignment of a persistent digital identifier (like the DOI (Digital Object13 Identifier) for13 journals)13 so the data can be cited and discovered by other

bull The registration of the availability of the data into13 some sort of ldquoregistryrdquo to13 enable discovery without prior knowledge of the existence of the repositoryspecific data features

Additionally creation of new policy and13 standards as well as development of tools to13 evaluate the quality of13 the data and tools to13 enable data-shy‐driven13 discovery will be required to manage and use materials data to the greatest extent possible

NIST is responding to the OSTP (Office of13 Science and Technology and Policy) memos (Dr John Holdren 2 Feb 14) by developing practices that will be of use to13 the community including a Common13 Access Platform allowing access across multiple collections of information13 Research13 and13 Development of Persistent Identifier (PID) infrastructure with13 the Research Data Alliance (RDA) and RampD o Data Type Registries with13 RDA

Finally NIST is collaborating with the National Data Service (based at the National Center for Supercomputing Applications) and is developing a set of best practices for Data Management Plans

7

13

Th Materials Commons A Novel Information13 Repository and13 Collaboration13 Platform for the13 Materials CommunityBrian13 Puchala Glenn13 Tarcea Stravya13 Tamma Emmanuelle Marquis John13 Allison13 University of Michigan

Critical to13 accelerating the pace of materials science and13 development is the development ofnew and13 improved13 infrastructure providing a seamless way for materials researchers to13 share and13 use

materials data and models To address this need we are developing the Materials Commons (httpprismsenginumicheduprismsmaterialscommons)13 an information repository andcollaboration platform for the metals13 community13 in selected technical emphasis13 areas We envision the

Materials Commons becoming a continuous seamless part of the scientific workflow13 processResearchers will upload13 the results of experiments and13 computations as they are performedautomatically where13 possible along with the13 provenance13 information describing the13 experimental and

computational processes By13 associating this13 data with the experimental and virtual computational materials samples from13 which it is obtained the Materials Commons will build process-shy‐structure-shy‐property relationships enabling the construction13 and13 validation13 of constitutive and13 process models The

Materials Commons website provides an easy-shy‐to-shy‐use interface for uploading and13 downloading data and13 data provenance searching and sharing data At its core the13 Materials Commons consists of secure13 data storage cluster an13 application13 for efficiently uploading and downloading large data sets and a REST

(REpresentational State Transfer) based13 API to13 access and13 extend13 the capabilities of the repository TheAPI allows for features such13 as automated13 data upload13 from experiments and13 computations seamless integration of13 computational models and algorithmic analysis of13 process-shy‐structure-shy‐propertyrelationships The Materials Commons is a central thrust13 of13 the Center13 for13 PRedictive Structural Materials Science (PRISMS)

NIST Materials Data Informatics EffortsCarrie Campbell National Institute of Standards an Technology

The presentation covered NISTrsquos pilot efforts in13 creating a data infrastructure focusing on Phase-shy‐Based13 Property Data Phase-shy‐based data13 are13 diverse13 data13 that13 include 1 2 3 and 4-shy‐dimensional13 data are13 semi-shy‐structured and often include incomplete data sets The figure below provides anoverview of NISTs data efforts supporting the MGI

8

13

NIST Materials Data Repository (httpsmaterialsdatanistgovdspacexmlui) is a customizedDSpace repository for materials that enables sharing of variety of data types including text13 files images and video The repository also provides persistent identifier for each entry and allows users to choose

from a variety of13 license types NIST is also working on two data curation tools for phase-shy‐based13 dataThe Thermodynamic Research13 Center (httptrcnistgov) is extending the ThermoML markup languageand Guide13 Data13 Capture13 software13 tool to metallic systems13 with an initial emphasis on phase equilibriaand thermochemical data The13 Materials Data13 Curation System (MDCS)13 employs user-shy‐defined13 XML-shy‐based13 schemas to curate13 data The13 MDCS13 allows either web-shy‐base data curation13 or using REST API The

entered data13 is stored in a non-shy‐relational database repository can be searched using SPARQL (SPARQL Protocol and RDF Query Language) interface13 and integrated with variety of scientific workflow13 tools13

AFLOW ConsortiumMarco Fornari Michigan13 State University

The overall goal of the Materials Genome Initiative (MGI) is to accelerate the discovery13 development and13 deployment of new materials with ad hoc functionalities The AFLOW Consortium

focuses on the software infrastructures to achieve the MGI goals by using first13 principles calculations The emphasis is on the data13 that must be generated archived validated and post-shy‐processed13 efficiently13 both13 to13 identify new materials and13 distill meaningful quantitative relationships within13 the data This extends the13 solution to the13 inverse13 problem of chemically replacing13 rare13 and costly elements in criticaltechnologies

The AFLOW Consortium provides access to the repository AFLOWLIBORG

lthttpAFLOWLIBORGgt13 that publicly distributes data13 on energetics and thermodynamic stability for more than 600000 compounds and band structures for roughly 300000 materials The repository

includes bulk crystals and alloys13 Data were computed with the AFLOW high-shy‐throughput13 framework

using density functional theory The results of the calculations can13 be queried13 with13 the recentlydeveloped13 REST-shy‐API that standardizes and13 labels the data for retrieval purposes The13 fields defined in

the AFLOW REST-shy‐API include a material object identifier (AUID) data cloud13 location13 (AURL) ownership13 9

13

and sponsorship keywords and data13 specific labels (COMPOSITION ENTROPY EGAP hellip) Simple13 scripting languages13 can be used to automatically download13 from the AFLOWLIB13 repository The AFLOW

approach has13 been extensively applied on Platinum-shy‐group alloys thermoelectric materials magnetic materials topological insulators scintillators etc The AFLOW REST-shy‐API can13 be extended13 to13 includeexperimental data For further details on the13 database the13 reader is directed tohttpmaterialsdukeeduauroAUROARTICULAjcommatsci201202002pdf

Materials Innovation Infrastructure Lessons Learned and Future PlansSurya Kalidindi Georgia Institute13 of Technology

central activity in13 materials research13 is the exploration of processing-shy‐structure-shy‐property (PSP) linkages13 Workflow is defined13 as the sequence13 of steps employed for establishing13 PS linkages ofinterest to any specific engineeringtechnology application

High Throughput Explorations of PSP Linkages13 We must treat13 hierarchical13 material as acomplex System which means embracing and exploiting uncertainty quantification13 and13 complexity

management conceptstools13 It also means we need to design develop and13 validate decision support systems13 that will leverage the best current understanding (with13 its uncertainties) and13 provide objective13 guidance13 on future13 effort investment (combination of experiments and simulations)13 This meansleveraging Data Science to effectively feed the decision support systems described above by developingand deploying high13 throughput strategies for obtaining high value information from both experimentsand models

What are integrated13 workflows Integrated workflows are those that utilize the best13 combination of experiments13 and simulations13 in extracting robust and reliable PSP linkages13 This requiresengaging and exploiting cross-shy‐disciplinary expertise that includes materials science manufacturingsystems approaches uncertainty quantification computational science data and information sciencesThrough this one must ensure that the workflows output the critical information needed13 by design13 and13 manufacturing stakeholders in the materials development value chain13 Integration demands expertise

sharing13 which essentially is the core element of a collaboration Cross-shy‐disciplinary Integration13 demands13 effective13 collaborations13 such as those one might set up using the modern tools of the day13 ie a e-shy‐Collaboration

10

13

Motivators for e-shy‐Collaborations13 e-shy‐Collaborations are central to13 enhancing the productivity ofmaterials development activities in several ways

bull individual13 research groups can identify and engage13 complementary expertise13 needed for their projects

bull organizations can13 coordinate multiple teams and13 multiple projects to13 maximize outputimpactbull organizations can13 identify and13 seek the right partners that13 enhance their13 individual and combined

value e-shy‐Collaborations can13 be open semi-shy‐open or closed13 depending o the needs of the specific organization Productivity enhancement leading to dramatic reductions in cost and time13 expended in materials development is the13 main incentive13 for e-shy‐collaborations

Main Impediments to e-shy‐CollaborationsLack13 of tools to identify13 the13 ldquobestrdquo13 collaborators

bull Need software tools that quantify more precisely the expertise of individuals groupsorganizations

bull Need software tools for13 allowing rich annotations and e-shy‐recording of13 discussions Need analytic tools for13 identification extraction13 and management of metadata (ie13 high13 value abstractionsfrom the raw data)

bull Need searchable semantic metadata databases (these are13 lightweight and can be13 centralized

and customized for specific13 user communities) bull Need software tools for tracking provenance (ie13 whowhatwhenhow13 generated modified

etc) Lack13 of experience13 with the13 use13 of integrated workflows

bull Need software tools13 for e-shy‐recording of13 workflows and development of searchable workflow

databases bull Need recommendation schemes that suggest the best workflow(s) based on recorded prior

experience this provides means to transfer expertise13 acquired by an organization to other teams and other13 projects (even in the same organization)

bull Leads to possible automation and dramatic enhancement of productivity

MINED Grouprsquos Past Experience13 (~2005 -shy‐ ~2008) Funded by the Army Research Office ARO(David Stepp) engaged computer science (CS) graduates to produce13 a single13 user platform that13 integrated certain software codes previously produced13 by our group software development subsequently abandoned but recruited two CS graduates into PhD programs in Materials Science andEngineeringMechanical Engineering (MSEME)13 which dramatically altered the direction of13 research in

the group brief chronology of development bull Senior Design Project (2011-shy‐2012) An online13 portal designed for sharing data13 files and tools

learned the importance of user interfaces and the need for professional13 code developers andsoftware architects

bull HUBzero Instantiation (2012-shy‐2013) Built hub by customizing HUBzero for hierarchical structuralmaterials there are several advantages in terms of accessing and executing codes13 (on the server side) but there are significant barriers13 to establishing fast and productive13 cross-shy‐disciplinary e-shy‐collaborations

11

13

bull GTRI Integration Platform (2013-shy‐2014) Engaged professional code13 developers and software13 architects to design13 a multi-shy‐user web13 portal that allows data and13 code sharing while tracking

provenance and workflows underestimated the resources needed for such a project bull MATIN Collaboration Environment (2013-shy‐2014) Lightweight integrated software13 shell that allows

users to13 seamlessly interface with13 numerous cloud13 services offered13 through13 the web13 through13 their13 APIs Examples include GitHub for versioning and archival of codes Authorea forcollaborative editing of documents figshare for citable publication of research Plotly forcollaborative data analysis13 and visualization Google+ and LinkedIn for eteaming13 and networking MATIN focuses on building an emergent community ndash Materials amp Manufacturing Informatics (MMI) MATIN provisions web services for13 software integration platforms and innovationinfrastructure Examples of MATIN13

bull Spatial Statistics httpsgithubcomtonyfastSpatialStatisticsFFT

bull PyMKS httpopenmaterialsgithubiopymksindexhtml bull Example discussion about periodic boundary conditions in running Finite Element (FE or FEM)

simulations httpsgithubcomsfepysfepyissues267

In the platform developed it is noted that the user or userrsquos organization can set permissions at the

level13 of individual datasetscodes o the degree and13 type of13 sharingCentral Challenge Activation The transition from traditional disciplinary workflows to the

integrated workflows needs to overcome an activation barrier There is no13 clear incentive to13 get ahead13 of the curve in13 anticipating how the upcoming transformations in how materials development will be

pursued13 in13 the future Open (reproducible) science alone is not a sufficiently strong motivator for materials development community with13 strong interests in intellectual13 property development13 indeed itmight actually13 be perceived13 as a negative There is need to explore13 and refine13 what incentives work for individual13 researchers (eg increased productivity)

Future13 Plans13 Continue to13 develop13 and13 deploy MATIN at GT and any other organizationsgroups that13 express an interest in working with us Design create and distribute13 suite13 of ldquomobile13 firstrdquo Appsdesigned13 to13 make data ingest into13 collaboration environments such as MATIN from both material characterization equipment and multi-shy‐scale models an enhanced13 user experience with13 friendly andnatural interfaces Proactively promoteincentivize13 datametadata13 ingest into publicly shareddatabases Work with13 professional societies to13 announce and run new competitionsawards for13 recognizing and rewarding good practices by the members of the broader materials innovation13 usercommunity (eg Best13 Documented Data Award Best Documented Tool Award Best Documented

Workflow Award) Develop and deliver new workshops for training the new generation of13 graduate

students postdocs and junior faculty in the13 emerging workflows of Materials amp ManufacturingInformatics

High Performance Cast Aluminum Alloys for Next Generation Passenger Vehicle EnginesDongwon Shin Materials Science and Technology Division Oak Ridge National Laboratory

12

13

Oak Ridge National Laboratory (ORNL) teamed with Chrysler and Nemak13 to develop next-shy‐generation high-shy‐performance cast aluminum alloys that have13 improved castability high-shy‐temperature

strength and fatigue13 performance At this point the13 data management practice is limited13 to13 analytical data spreadsheets other than13 the CALPHAD computational thermodynamic databases and13 proprietary

property databases for casting simulation As the project goes on more robust data schema will bedeveloped13 to13 store various types of data populated13 from the experiments and13 calculations

Besides the data structure design the speaker discussed13 the necessity of a new approach13 to13 identifying key descriptors from the large dataset that can guide the design of new materials An alloydesign13 requires the balance of extremely complex relationships among a large number of descriptors Some13 examples of these13 descriptors include13 phase fraction13 of key intermetallics lattice parameters ofthe matrix and diffusion13 kinetics13 of solute atoms However optimizing these quantities relies on good

physical metallurgy intuition13 coupled13 with13 clever experiments dogged13 sleuthing and13 a healthy dose ofluck13 Such descriptor-shy‐based predictions of materials properties date back to13 the beginning13 of modern

solid-shy‐state chemistry and physics As13 shown in the Darken-shy‐Gurry map to predict the solubility ofelements in Tantalum13 intuitive descriptors13 such as electronegativity and radius13 could be used formaterials design However as the chemistry and physics13 of materials become more complex intuition-shy‐based13 search13 may not work anymore Thus the use of more advanced13 statistical analysis and13 datamining approach might be required to accelerate the design of complex multi-shy‐componentmulti-shy‐phase

alloy design13

Data Related to Cast Aluminum

Jake Zindel Ford Motor Company

The main focus of our research is the development of CAE (Computer-shy‐aided Engineering) tools to support13 two significant13 activities in the company component13 design (product13 development)13 and

manufacturing engineering (high-shy‐volume production) A key13 part of our research is conducting13 experiments to collect data13 for the13 development and validation of13 the CAE tools

The data13 we collect consists of mechanical properties physical properties microstructuralimages and manufacturing process parameters that form the13 pedigree13 of the13 material This information

can be in the form of Excel files proprietary-shy‐software storage files screen shots and written reports Our immediate need is to deal with this large amount of existing information collected over

many years on a ldquolocalrdquo level The first step is to organize many independent collections of informationin all13 their forms13 The next issue is the time required to enter all13 this information into the database13 Maintenance may also require a significant amount of effort Lastly there will also13 be a need13 to13 manipulate the data contained in the database using Excel Mini-shy‐tab MATLAB etc Therefore links or13 easy exportation of data13 to these13 types of applications will be13 necessary

The next level would involve dealing with information that has been created or collected byothers From an13 OEM (Original Equipment13 Manufacturer)13 perspective this would13 involve convincing

suppliers13 to share processing information that they may consider proprietary In addition to suppliersuniversities and13 national laboratories are also13 sources of data Data from all the13 sources will require13 some sort of protocol to determine its13 quality and resolve inconsistencies

13

13

In the end13 it seems likely that considerable budget and time will be required to implement comprehensive informationdatabases which13 could be the largest hurdle13 of them all

QuesTekrsquos MGIICME Approach to Alloy Optimization and Design (Including Cast Iron) Jason Sebastian QuesTek Innovations LLC

Integrated Computational Materials Engineering (ICME) methods involve the holistic application13 of different computational models across various length13 scales to13 the design development and13 rapid13 qualification13 of advanced13 materials Using itsMaterials by Design technologies QuesTek has successfully applied these ICME methods to13 the design development13 and full qualification of several advanced alloys (including aerospace13 structural steels aerospace13 gear steels advanced castable13 titanium alloys high-shy‐strength corrosion-shy‐resistant13 aluminum alloys13 and high-shy‐performance cast iron)

QuesTekrsquos ICME methods benefit tremendously from the activities of the US Materials GenomeInitiative (MGI)13 Recognizing that the ldquo10 to13 2 yearsrdquo that13 it takes for alloy design and development ishindering the rapid13 deployment of higher-shy‐performance materials13 the MGI is focused on ldquocutting in halfrdquo the time and cost13 to design and deploy novel advanced materials needed to enhance and sustain UScompetitiveness

QuesTek also has ongoing13 activities on the13 application of ICME methods to the13 design and

development of materials for the energy sector including advanced13 materials that enable vehicle

lightweighting13 Of particular note is an ongoing effort with a major diesel13 engine manufacturer on the13 development of high-shy‐performance cast iron13 alloys for engine block applications QuesTekrsquos efforts have

made extensive use of various MGI-shy‐type tools and databases including 1)13 density functional theory

methods for high-shy‐throughput13 screening of cast iron inoculant stabilities (structure lattice parameterstability of various13 oxidesulfide phases) 2)13 liquid iron mobility databases (to help accurately modelsolidification phenomena) and 3) thermodynamic13 databases13 (to examine nano-shy‐precipitate stability)

Moving forward to continue to accelerate the materials development cycle efforts shouldremain focused on the development13 of13 the fundamental databases (thermodynamic kinetic etc)13 that13 enable13 ICME models There13 should also be13 increased focus o efforts related13 to13 the integration13 of the

various computational databases models and tools (the ldquoIrdquo13 in ICME)

Integrated Computational13 Materials Engineering (ICME) of Generation Three Advanced High Strength

Steels Louis G Hector Jr General Motors RampD Center

While traditional material development methods often involve laborious trial-shy‐and-shy‐error testing Integrated Computational13 Materials Engineering (ICME) which falls under the umbrella of the Materials Genome Initiative is offering a means13 for materials13 development under a more compressed time scale Advanced13 high13 strength13 steels (AHSS) which13 contain13 multiple phases (eg ferrite bainite pearlite austenite martensite) and may exhibit13 phase transformation with strain are fertile ground for13 ICME However prediction of macro-shy‐scale constitutive behavior based upon the multi-shy‐scale physical chemical and mechanical phenomena13 in these13 complex microstructures is formidable challenge for ICMEUnprecedented collaboration between universities industry and government labs will be required to

14

13

address AHSS13 development archiving of data and the13 implementation of these13 materials into products that13 benefit13 the American consumer This presentation began with a brief overview of a newDOEUSAMP (US Automotive Materials Partnership)-shy‐funded ICME program aimed at13 the development13 of Generation13 Three (Gen13 3) AHSS The lack of commercial Gen13 3 steels that meet DOE targets is offering13 significant challenges13 to this13 new ICME program The program is13 meeting these challenges13 following four thrusts (1)13 making Gen 3 steels to meet DOE targets (2) passing experimental measurements of key

properties to13 an13 ICME model that produces microstructure-shy‐based13 constitutive models (3) materialevaluation in forming13 and vehicle13 performance13 simulations (4) archiving13 all relevant data13 for easy use13 by

others in13 the program Although13 the Gen13 3 AHSS ICME program is ldquomission13 orientedrdquo it demonstrates an important role for universities in providing the necessary foundation in fundamental materialsscience and engineering

Integrated Computational13 Methods for Composites Materials (ICM2)13 Lessons Learned from

Integration Planning and Feasibility Demonstrations Lara Liou GE Aviation

General Electric Aviation and Lockheed Martin Aeronautics have teamed through an Air Force

Research13 Lab13 program to13 conduct a demonstration13 of an13 Integrated13 Computational Materials Engineering digital framework that links material processing property and13 structural relationships to13 account for processability manufacturability and system performance13 with the13 goal of demonstrating

that13 the usage of13 integrated models can contribute to future airframe and engine designs with dramatic reductions in development13 time and cost Both engine and airframe Foundational Engineering Problems will be demonstrated through a common digital framework to be developed and validated onincreasingly complex articles13 The modeling tools that13 cover composite process models localized13 mechanical behavior models and global design models (Convergent Manufacturing TechnologyrsquosCOMPRO CCA Autodeskrsquos Autodesk Simulation13 Composite Analysis and13 an13 Abaqus based13 University ofMichigan micromechanics model) are centrally linked via a commercially available model integration

interface (Phoenix Integrationrsquos ModelCenterreg)13 Detailed plans of what data will13 be passed among thevarious models as well as how it will be passed have been completed Creation of these detailed13 process maps and13 simple feasibility demonstrations have raised13 inherent technical and13 logistical challenges13 to integrating commercially13 available multi-shy‐scale models Robust model integration methods13 to address these challenges were13 discussed

The exercise of drawing up detailed process maps for what to integrate and how to integrate13 the selected tools and models resulted in key early lessons learned and identification of13 tools critical tolong-shy‐term robust and sustained application of13 Integrated Computational Methods for Composites Materials (ICM2)13 The key lessons learned include

bull Individual13 models must be relatively mature and validated before an ICM2 problem can be

confidently13 defined bull F analysis-shy‐based13 tools within13 the same digital thread should utilize13 the13 same13 FE solver bull The team ideally13 should consist of at least one expert for each13 tool and13 its underlying model as

well as an expert in the digital thread and integration interface tools and programming

15

13

bull Mesh interpolation between13 finite element analysis based13 tools will likely be required and13 acommercially13 available tool for this13 type of composite mesh interpolation is13 key13 for robust and

sustained application of ICM2

Residual Stress Engineering of Rotating Nickel Components Terry Wong13 Aerojet Rocketdyne

The goal for the PW-shy‐8 project13 The Nickel Base Superalloy Residual Stress Foundational Engineering Problem program aims to13 develop13 a multi-shy‐disciplinary analysis approach13 to13 predict and13 incorporate bulk residual13 stress in the13 design of component This will allow for design of part that is optimized for a particular application or design intent (such13 as weight performance or reduction13 ofscrap) To accomplish this13 goal the PW-shy‐813 team has two major tasks 1) the13 development of an13 infrastructure and associated tools needed to predict and incorporate bulk residual13 stress in the designstages13 of a part and 2) a demonstration task that will validate the infrastructures13 and tools13 developed

The Data13 Management task is the foundational element13 of13 the larger13 infrastructure

development task The goal for the data management work is to13 develop13 an13 infrastructure that allowsthe various team members in PW-shy‐813 to share13 data13 and to store13 data13 in database This not only ensures that data generated13 by this program is never lost but also13 allows for data to13 be transferred13 in13 a secure

manner In order to meet the goal of a system13 that securely stores and shares data three major tasks must be accomplished 1) Validation that there13 is secure13 method to transfer data13 between team

members and that the method of storing data is also secure 2)13 development of13 a database schema and

3) development of means to easily upload data13 into the13 database13 and download data13 from the13 database

Of the three the most difficult task was to create a system in which the transfer and storage ofdata was secure Various proposals were made to13 ensure the security of the data but many of them

were rejected not because the proposal was actually unsecure but because it was not acceptable byeach of the13 team memberrsquos IT security policies The13 process to get each team membersrsquo IT department to approve the security of13 the data transfer13 and storage took more effort13 than anticipated In the endthe team members along with their13 respective IT departments approved the use of13 TRUcentrix as a tool to transfer13 data and the use of13 an ASM International13 hosted13 database to13 store data

The development of database schema13 is necessary step in creating the database to store all the data generated (via physical tests as well as data generated by models) A big help in creating the

database schema was to13 use the schema generated13 by Dr Steven13 Arnold13 from NASArsquos Glenn13 Research13 Center Using NASArsquos schema eliminated13 the need13 to13 start the database build13 from scratch13 and13 allowed13 us to13 focus o the specific needs of storing residual stress data

The last task involves the development of tools that make the input of data13 easy The moredifficult it is to13 enter data into13 a database the more likely data13 will not be13 entered Excel-shy‐based13 templates are being developed in order13 to make data entry as easy as possible

The two big lessons learned from PW-shy‐813 in relationship to data13 management are

bull It is essential13 to work with each13 companiesrsquo IT and IT Security people to develop a system thatsatisfies13 the security requirements of each company

16

13

bull Database schema development should not be reinvented but begin with already established the13 schema

Perspectives on MAI ProgramsDr Mike Glavicic Rolls Royce

The current formulation of MAI (AFRLrsquos Metals Affordability Initiative) programs are structured13 independently from one another in that a standard methodology does not exist that describes whatdata is to13 be archived13 at the conclusion13 of a program Nor does there exist a standard13 protocol of howto archive things on a server13 or some other computer resource at the conclusion13 of a program As aresult the current13 practices of13 MAI programs

bull Do not address IP (intellectual property)13 and licensing issues in systematic fashion which leadsto issues in the commercialization of ICME software that is generated13 under a program

bull Do not have a formalized requirement of how data is to be packaged and stored at the end of aprogram Hence even13 in13 the best of cases in13 which13 all of the data is stored the data is archived13 in13 simple folder13 fashion that13 is not13 searchable in any manner13 possible Moreover even if13 all of13 the

data is present it is extremely cumbersome to13 untangle the data for future use using a file folder strategy

Hence at the end of the program both the government and13 MAI team are13 left to manage13 the13 data and13 results in13 an13 unregimented13 fashion

The lack of regimented protocol of how to package programrsquos13 results13 (raw data and reports) makes it extremely difficult for new government programs to leverage the data from previous programs Case in13 point the RR-shy‐1013 program was developed with the13 singular goal to integrate13 the13 models developed13 in13 the LAD-shy‐213 program into commercial software13 platform From the13 outset of the13 program data models and13 other information13 that was required to attained the13 goals of the13 RR-shy‐1013 program were13 either difficult to track down lost or not provided to the13 RR-shy‐1013 team due13 to contractual disputes As aresult funding that13 was earmarked for13 specific technical work at13 the conception of13 the13 program had to

be diverted13 for the recreation13 of data that existed13 in13 a previous program and13 the progress of the

program was stunted13 by this unforeseen13 burden In13 spite of the RR-shy‐1013 programrsquos13 ability to overcome

these issues and become a successful program it13 should be noted that13 the success of13 other13 programs that13 will encounter13 similar13 problems will likely not13 apply as RR-shy‐1013 was very fortunate13 to have13 the13 means to overcome these serious issues that13 were a result13 of13 the lack of13 a data management13 plan

Furthermore if the13 MGI initiative13 is to progressively develop materials models that are13 leveraged and improved in future government funded programs a policy on how to store and share thedata from disconnected13 programs is required The PW-shy‐1113 program is the first13 step in this process as it13 will have the data from three MAI programs (RR-shy‐10 GE-shy‐10 PW-shy‐9) uploaded into database13 that will be13 searchable and will enable the reuse of the data collected in these programs However even if this13 program is a success13 there will undoubtedly13 be other features13 that will be required in the platform

developed13 and13 hence future modification13 will be required Most importantly there does not currently13 exist server that has been set up and funded for the13 archiving13 of all13 future programs with a protocol13

17

13

13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

that13 must13 be adhered to at13 the conclusion of13 a funded program Without13 such a protocol and the

existence13 of server to be13 used in all future13 programs the13 goals set forth by MGI will be13 unnecessarily

difficult and13 more expensive to attain

MAI Data Informatics Data Standards Modeling amp UQDr Rajiv Naik Pratt amp Whitney

The MAI Data13 Informatics Data13 Standards Modeling amp Uncertainty Quantification (UQ) (PW-shy‐11)13 effort13 will enable the pervasive implementation of ICME by developing (i) a web-shy‐based13 OEM-shy‐Suppliercollaboration framework13 for production13 certification13 and13 process modeling data (ii) data standards and

protocols for interoperability between13 OEMs and13 Suppliers and13 (iii)13 UQ methods to capture variability in

process microstructure and property data Models and data13 from earlier MAI programs such as GE-shy‐10 PW-shy‐913 and RR-shy‐1013 will be13 used to develop and implement OEM-shy‐Supplier integration data standards data

structures and UQ models Several OEMs and suppliers (Pratt amp Whitney General Electric Rolls-shy‐RoyceTIMET Alcoa13 and RTI International Metals Inc) are working together on the PW-shy‐11 Team

The OEM-shy‐Supplier collaboration tool will enable data13 exchange related to mechanical testingmicrostructural13 characteristics processing and process modeling The13 data13 standards developed by the

PW-shy‐1113 team will provide13 an industry standard for such OEM-shy‐Supplier collaboration The collaboration13 framework would provide a platform to manage OEM-shy‐Supplier13 workflows data and13 informationsecurity and access13 control and the13 ability to perform system administration without the13 need13 for day-shy‐to-shy‐day intervention13 of the IT department The capture of such13 supplier data over the collaboration

framework will enable the OEMs to enhance process and property13 models data analytics and UQ

modeling Three potential technologies will be evaluated13 for the OEM-shy‐Supplier Collaboration13 Frameworkndash GE Forge Vanderbilt Forge and Semantic Web technology and one13 technology will be selected13 for the

next phase of the program

18

13

Semantic Web technology is paradigm shift from highly structured13 databases to13 more flexible

graph databases from single database searches to13 multiple and13 disparate13 database13 searches and from

keyword searches to contextual and meaningful (semantic) searches It can provide flexibility13 of data

structures and enhanced OEM-shy‐Supplier13 interoperability However it requires more work upfront13 for13 implementation as it needs the development of a common vocabulary and relationships

The data13 standards and data13 structures effort will leverage work at national and13 international standards13 organizations such13 as ASM ASTM CEN and13 NIST and13 develop13 standards and vocabularies forproduction13 certification data that can include chemistry microstructure and mechanical properties This effort will also develop means for the capture of data needed13 for the UQ and13 ICME models andbuild13 databases for13 the capture of13 the data from the GE-shy‐10 PW-shy‐9 and RR-shy‐10 MAI programs

The ICME13 models being developed under the GE-shy‐10 PW-shy‐9 and RR-shy‐10 programs involve severalinterconnected models related to processing microstructure and mechanical properties The UQ effort under PW-shy‐11 will involve considering the uncertainty at each13 step13 of the modeling process and flowing it through the interconnected models to evaluate the relationships between13 the variabilities of13 the

processing parameters the microstructure and13 the mechanical performance

ASMrsquos Computational Materials Data Network amp the NIST-shy‐ASM Structural Materials Data

Demonstration ProjectScott D Henry ASM International

ASM International ndash the worldrsquos largest13 association of13 metals-shy‐focused materials engineers scientists technicians educators and students13 ndash has launched13 a series of initiatives under the

ldquoComputational Materials Data (CMD)13 Networkrdquo umbrella The objective13 of the13 CMD Network is to serve13 as center for information collection and dissemination for materials data13 to support integrated

computational materials13 engineering (ICME) and to help realize the goals13 of the US Materials13 Genome

InitiativeThe first major project for the CMD Network is the NIST-shy‐ASM Structural Materials Data

Demonstration Project (SMDDP) a cooperative research project funded by NIST with participation from

the Kent13 State Center13 for13 Materials Informatics and Granta Design Ltd13 The main13 objectives of SMDDP

are13 to

19

13

bull Establish well-shy‐pedigreed13 and13 curated13 demonstration13 datasets for non-shy‐proprietary metallic structural materials13 data over multiple length scales The team has13 chosen to work with 6061

aluminum and the13 related Al-shy‐Mg-shy‐Si system

bull Work with NIST and the materials data community to develop materials data schema and

ontologies for the demonstration13 datasets cognizant of broader interests and13 datasets

bull Develop and carry out a series of test problems that represent relevant use cases for the13 repository

bull Make data open to the materials data community for use in data analytics modeling and

educational purposes

bull Actively engage the materials data community and13 widely disseminate the findings from the

project

bull Develop and implement data13 capture13 and curation procedures that can serve13 as models for otherdata repositories

The basic philosophy behind the project is that often standards13 emerge through trial and error and examples are13 needed to spur this process

The project is using the NIST-shy‐KSU DSpace13 repository for data13 files The13 curated structured

database is being developed13 using GRANTA13 MI software and13 is hosted13 o ASM serversThe project ndashlaunched in late 2013 and planned as an 18-shy‐month program13 ndash is being conducted in

phases The13 first phase13 involved developing starter database13 and gaining experience13 in data13 curation The second phase (currently underway) involves building out more comprehensive data13 sets that will be

suitable for carrying out test problems13 related to processing-shy‐structure-shy‐property relationships in13 Al 6061 The third phase will involve making further refinements in response to testing The complete data13 set will be opened up online to the materials community as a whole

Globus13 Scientific13 Data Services13 Current and FutureBen13 Blaiszik1 Kyle Chard1 Jim Pruyne1 Rachana Anathakrishnan1 Steve Tuecke12 Ian Foster12

1University of Chicago Computation Institute 2Argonne National Laboratory MCS Division

During this talk we discussed a variety of currently available Globus services13 Via these Globusservices developers13 and IT administrators13 can offer simple secure and robust file transfer as13 well as13 identity profile and group management to their user communities13 Globus allows end users to create

and13 manage a unique identity that can13 be linked13 to13 external identities for authentication End13 users can13 also transfer files across wide13 area13 networks faster and more13 easily whatever their location

The talk also covered prototype services related to data13 publication metadata tagging and13 data

search Globus13 publishing capabilities13 are delivered through a hosted service Metadata is13 stored in the

cloud while raw published data and a fully-shy‐formed metadata log are stored on campus institutionaland group resources that13 are managed and operated by campus administrators Published datasets are

20

13

organized13 by communities and13 their member collections Globus users can13 create and13 manage theirown13 communities and13 collections through13 the data publication13 service13 with collection level policies regarding user13 access Additionally metadata from these communities are imported into a centralized

index to allow for robust searching functionality

Semantics13 for Deep13 InteroperabilitySamuel Chance iNovex

Semantic technology is based on mature13 web standards and intended to be13 used as global framework for13 machine-shy‐to-shy‐machine communication Interoperability is enabled even when the more

basic elements of the semantic technology stack are used13 in13 conjunction with a controlled vocabularyKnowledge13 representation is enabled as one13 begins to utilize13 the13 ldquohigherrdquo elements of the13 stack

brief history of the semantic web13 was presented13 and13 started13 with13 a depiction13 of the web13 that most people see today documents and web pages presented to the user13 through a web application It13 then depicted a transition to an intermediate state where documents and data can co-shy‐exist on asemantic13 web riding atop the same infrastructure (http) Even in this13 state the data can13 be directly

interpreted by a machine13 The history concluded with a depiction of data being extracted from adocument to13 become part of the web13 of data

The basics of semantic technology were presented and emphasized that the technology has afoundation based13 o formal logic that enables software13 to understand information and reduce13 the13 need

for13 a human to interpret13 the meaning This section of13 the presentation concluded with a chart13 depicting

how ldquotriplesrdquo link to13 each13 other to13 form a web13 of dataThe final section touched upon adoption of13 and things to consider13 when semantic technologies

are13 being considered as solution for community or enterprise

21

13

Big Data13 and Semantic TechnologiesHarlan Shober RJ Lee Group

The presentation started with a description of entities that13 comprise laboratoryinformatics with the central elements being a federated architecture scientific ldquoBig Datardquo management13 and workflow-shy‐driven Laboratory Information Management13 Systems (LIMS)To aid attendee understanding initial context13 was provided by first13 defining for ldquoBig Datardquo and

Semantic Technologies Context13 was further bolstered through a description of a problem that13 included laboratory testingcharacterization and fabrication processes The challenges for atypical laboratory include the use of many systems and the creation of large amounts of data13 that13 grows daily but13 cannot13 be readily accessed

After the context13 was established the needed functionality was described and consistedof several elements contribute index search analyze and archive While bearing in mind theneeded functionality a string of universal scientific data13 management13 process categories werethen presented with the intent13 of applying the universal description to the materials design anddevelopment13 domain

22

13

Though semantic technologies are applicable over the entire lifecycle of the data13 management13 process three categories were specifically identified as primary candidates fortheir application Data13 Acquisition Data13 Manipulation and Storage and Discovery andAdvisory W3C standards for Resource Description Framework (RDF) were applied to a notionalexample with the intent13 of completing a thread that13 gave the attendee a sense of how thistechnology could be applied

The presentation continued with a description of Postulate a specific solutiondeveloped by RJ Lee Group that13 could be used as part13 of a suite of tools to enable the captureof data13 and the near-shy‐frictionless tagging of data13 with additional observations provenanceinformation or links to additional data Using tools like Postulate repositories could be moreeasily shared with others in a federated manner These technologies would enable thedevelopment13 of indices that13 increase the precision of search results and ease analytics since thedata13 would be ldquoself-shy‐describedrdquo

In the remainder of the presentation Postulate functionality was further described andsome of the challenges presented earlier were addressed For example how do you processthe quantity and variety of data how do you store it how do you find the data13 you care aboutand how can the data13 be presented to you in a meaningful way

Natural13 Language Processing (NLP)Tim Hawes Decisive Analytics

The presentation provided the attendees with a description of NLP13 and how it can be13 used It started with examples13 of how it can be used to mine ldquoBig Datardquo to predict credit card fraud or assess13 user purchase histories for product development NLP is also13 being used13 in13 the field13 of biology and13 appears to have13 enjoyed some13 level of success

Since13 the13 biology and materials communities both publish large13 volume13 of documents (abstracts papers presentations etc) many of13 the NLP approaches developed for13 biology may be

applicable13 to materials The next section of the presentation focused on describing various aspects of NLP tokenization

part-shy‐of-shy‐speech tagging disambiguation parsing entity disambiguation and role13 labeling This was helpful in13 removing the mystery surrounding NLP for those13 who hadnrsquot had previous exposure

Typical products from NLP13 would be useful to the materials development community Examples include providing the gist of a documentrsquos content providing answers to natural13 language questions and knowledge13 construction

Specific NLP13 challenges for the13 materials domain were13 identified inline13 use13 of equations prevalence of tables graphs and13 images (more of a computer vision13 problem) and13 numerically driven13 information13 Additionally the ldquogenrerdquo problem was discussed as13 well as13 a way this might be addressed

using raw data (versus learning data) and13 semi-shy‐supervised learning active learning and distant supervision approaches

23

13

The presentation concluded by reiterating how NLP13 can be used and the value of using it describing some its limitations and13 identifying tool development needed13 for the materials domain

24

13

Community13 and13 Government Priorities in13 Addressing13 Materials Data

The following numbered13 priorities for13 both community and government13 actions were lsquorolled uprsquofrom workshop participantsrsquo inputs and were reviewed at13 the workshop The raw inputs provided by

participants are captured13 under each13 numbered13 priority and provide context to13 the priority and13 often13 more specific actions

Community-shy‐Led13 Priorities1 Develop13 and deploy13 standards13 for data and federatedcollaborative environments-shy‐ Promote13 standards for data13 and collaboration environments-shy‐ Designing and deploying suitable software environments that are capable of facilitating e-shy‐

recording and accelerating (fail often-shy‐fail fast)13 of13 the communityrsquos experience (both successes and

failures)13 in exploring integrated workflows needed to realize the vision of ICME and MGI-shy‐ Data13 standards (program managementtechnical datametadata)-shy‐ Community must start developing a common13 data format data package could13 look like an13

HDF5 file and be shared in a variety of systems -shy‐ Begin13 definition13 of standardized13 infrastructure

-shy‐ Produce13 draft guideline for the metadata including information regarding authorship provenance targethellip

-shy‐ Share13 the13 data13 types being collected with the13 community Include13 nametypeunits of data

-shy‐ Community must adopt a federated13 approach13 -shy‐ allow organizations to autonomously create their13 own13 toolsets but be able to13 readwrite to13 the common13 data package mentioned13 in13 point 1

-shy‐ Good documentation on repositories are important-shy‐ Address networking of repositories (lightweight metadata databaseindex) -shy‐ Digital heterostructure microstructure mapping algorithms for both scanning electron microscopy

(SEM)13 and transmission electron microscopy (TEM) -shy‐ Materials process model (thermodynamics) integration with ICMSE (Integrated Computational

Materials Science and Engineering) data flow

-shy‐ Data sharing protocol across users (government13 OEM13 developers) with appropriate securitycontrol

-shy‐ Work with instrumentequipment OEMs to get auto ingest capabilities and open architectures-shy‐ Work together to define common vocabularies for interoperability-shy‐ Standard tests for ICME13 model data13 standards for microstructure13 data

-shy‐ Standards for ICME13 model data13 storage13 and workflows 2 Communicate the value and need13 for digital materials13 data-shy‐ Brand13 management ndash consistent messaging of goals13 and beliefs -shy‐ Share13 success stories -shy‐ Value (ROI13 Return on Investment) o data capture (globally locally)-shy‐ Give incentive ($$$) for case studies to provide evidence that a shared data repository can be

actually useful beyond the archival purpose

25

13

-shy‐ Share13 ROIs for MII (the13 Materials Innovation Infrastructure)-shy‐ ICME projects due to their iterative nature take longer and are more expensive to realize benefits

in industry implementation13 Multi-shy‐year ICME programs are becoming13 a hard sell to management difficult to13 establish ROI and13 realize benefits claimed

3 Engage the community-shy‐ Write grant proposals-shy‐ Collaboration13 (or e-shy‐collaboration) science that will help us13 identify13 the specific13 incentives13 and

impediments for establishing mutually beneficial highly productive sharing of expertise between13 disciplinary experts in13 materials manufacturing and13 data sciences

-shy‐ Continue to13 work o hard13 problemsmdashdevelop13 open13 source codes discover new descriptors data

mining uncertainty analysis-shy‐ Organize as a community (governmentindustryacademia nationalinternational) -shy‐ Follow data13 recommendations in MGI strategic planmdashworkshops to develop roadmap-shy‐ Attend13 more events like the MGI workshop13 mdash gaining13 lessons learned from other organizations is

extremely valuable

4 ldquoDefine critical datardquo13 to be13 compiled-shy‐ Consensus o what fundamental genomic data is most-shy‐important to compile (thermodynamics13

kinetics13 properties [strength]13 high-shy‐throughput13 DFT) -shy‐ Agree to13 the concept of a ldquouniversal repositoryrdquo for all genomic data that everyone will contribute

to and use (cf everyone maintaining and hosting their13 own databases) this includes issues of13 data

security and sharing etc -shy‐ List of most-shy‐important ICMEMGI13 materials development activities from an industrycommercial13

perspective (alloys composites piezoelectrics etc)-shy‐ Some13 ongoing repository efforts seem redundant We13 should avoid unnecessary competition and

try not13 to step o other peoples toes5 Explore and leverage data solutions developed13 outside materials13 community-shy‐ Semantic web infrastructure13 ndash evaluate13 feasibility of lsquounstructured materials datardquo management

(Technical pathcosttimeframe) -shy‐ Keep communicating outside13 the13 community about solutions -shy‐ Focus on available13 tools open ones when possible

6 Train13 students-shy‐ Sponsor students13 who can think in terms13 of IT and MSE

-shy‐ Training workshops (through professional societies) for training the new generation of materials students13 and researchers13 in the use of readily (already) accessible data science tools13 and cloud

services13 to make13 the13 best use13 of this emerging13 infrastructure13 in enhancing13 their own individual day-shy‐to-shy‐day productivity

7 Establish community13 norms13 for publishing materials13 data-shy‐ Involve publishers by requiring data when publishing Begin by making just the data available first

and then work on standardization panel

26

13

-shy‐ Data associated with publications should be shared along with the publications Not only the

metadata (eg average grain size vs cooling rate) but also the source of the data (eg micrographs used13 to13 generate13 grain size temperaturetime charts13 used to calculate cooling rates etc)

Government-shy‐Led13 Priorities1 Lead13 data strategyapproach13 development-shy‐ Hold meeting like this one in 6-shy‐1213 months with purpose13 of developing roadmap for MII-shy‐ Define a technical roadmap for13 the materials data infrastructure

-shy‐ Governmentrsquos got to decide on a strategy-shy‐ Whatrsquos the critical nucleus eg getting13 CALPHAD off the13 ground

-shy‐ Government must establish an official stance on security and infrastructure-shy‐ List of most-shy‐important ICMEMGI materials development activities from a government perspective

(alloys composites piezoelectrics etc)13 -shy‐ Needs assessment for professional workforce development-shy‐ Work with industry to make sure the research-shy‐to-shy‐engineering13 gap is filled What is the practical13 use

of the projects -shy‐ Provide13 the13 materials community organizing function

-shy‐ Business case for productivity improvement with13 data readily available

-shy‐ Facilitator as honest broker of capability awareness and data13 sharing

-shy‐ Interdisciplinary coordinated effort13 to avoidminimize duplication and redundancy

2 Facilitate data deposition (policy amp technology)-shy‐ Setting expectations and guidelines for best practices and standards where13 relevant -shy‐ Invest in tool13 making data management as easy as possible-shy‐ Promote standards developmentcommon data structuresshare knowledge representations -shy‐ Learn about the WBC Semantic Web capabilities and limitations -shy‐ Government should foster community development for both data package refinement and creating

standard approach to creating an enterprise toolset -shy‐ Provide13 motivation for standard data13 collection ensure13 it includes pedigreeprovenancemeta-shy‐data

-shy‐ Facilitator for materials data13 standards (good and trustworthy data13 vs bad data) -shy‐ Facilitator for forming consortium for data13 mining data13 warehousing and sharing

-shy‐ Facilitate13 industry standards for ICME13 model validation and maturity levels-shy‐ Facilitate13 standards for model-shy‐based13 process and13 supplier qualification

3 Incentivize data deposition-shy‐ Provide13 motivation for13 standard data collection ensure it13 includes13 pedigreeprovenancemeta-shy‐data-shy‐ Push ldquodata13 management planrdquo requirements to ldquodata13 depositrdquo requirements -shy‐ Make data sharing a requirement -shy‐ Contractual requirements that preserve data for future use

-shy‐ Demand data submission as a deliverable that will be in the contract before paying the funds -shy‐ Require that any data generated13 by federal grants is provided13 to13 the government (or a designated13

repository)13 within 3 years after completionmdashstart with basic13 research

-shy‐ Demonstrate the return on investments in the materials data infrastructure

27

13

4 Support13 data sharing13 and deposition13 (provide resources)-shy‐ Increase of MGI13 funds dedicated to MII13 development to 10-shy‐-shy‐repositories amp collab platforms -shy‐ Commit to13 long-shy‐term support for data repositories amp collaborative platforms-shy‐ Funding for development collection of fundamental genomic data

-shy‐ Support thermodynamicskinetics database development for light-shy‐weight alloy systems-shy‐ Funding for universal data13 repository (eg at NIST) ndash creation and maintenance

-shy‐ Fund or establish data13 repositories with commonuniform data13 collection format -shy‐ Createidentify secure data sharing site (like AMRDEC)13 for13 government13 contractorsmdashnot a

repository

-shy‐ AFRL take lead13 o making a database that can13 be13 shared

-shy‐ Infrastructure for materials related semantic web applications5 Support13 data creation-shy‐ Support experimental measurement effort to provide13 quality data for13 database development -shy‐ Support more13 high-shy‐throughput13 (HT)13 studies13 for applied programs13 to populate data Current HT

approaches support are13 limited to too fundamental single-shy‐phase DFT calculations 6 Enhance data management13 competency-shy‐ Government needs to increase in-shy‐house expertise in13 these areas or alternatively work closer with13

groups like NIST We cannot allow other organizations to define our needs for us7 Provide quality datasets13 for model13 developmentvalidation-shy‐ Need for open model benchmark cases with data to be available to the community

28

13

Attendees

John Allison University of MichiganJohn Beatty Army Research Laboratory

Ben13 Blaiszik University of ChicagoANL Carrie Campbell NIST

Sam Chance13 InovexJennifer13 Fielding Air Force Research LabMarco Fornari Central Michigan

Tim Hawes Decisive Analytics Corp

Lou Hector General MotorsAndrea Helbach Air Force Research LabKris Hill ITI-shy‐GlobalMike Glavicic Rolls Royce

Ro Gorham NCDMM

Dennis Griffin NASA

Mike Groeber Air Force Research LabScott Henry ASM InternationalMatt Jacobsen Air Force Research LabWill Joost Dept of Energy-shy‐EERE

Surya13 Kalidindi Georgia Tech

Kevin Kendig Air Force Research LabBen13 Leever Air Force Research LabLara Liou GE Aviation

Kenny Lipkowitz Office of Naval Research

Bill Mullins Office of Naval Research

Benji Maruyama Air Force Research LabRajiv Naik Pratt amp WhitneyJim Neumann Honeywell Ruth13 Pachter Air Force Research LabClare Paul Air Force Research LabAmra Peles United13 Technologies Research13 CenterAndy Powell GE Aviation

Brian13 Puchala University MichiganAjit Roy Air Force Research LabAyman13 Salem Materials ResourcesTom Searles Materials Data Management Jason Sebastian QuestekRichard13 Serwecki NASA

Dongwon Shin Oak Ridge13 National Labo

29

13

Harlan Shober RJ LeeJeff13 Sheehy NASA

Jeff13 Simmons Air Force Research LabVeera Sundararaghavan University of MichiganSesh Tamirisakandala RTI Glenn Tarcea University of MichiganTJ Turner Air Force Research LabVasisht Venkatesh Pratt amp WhitneyChuck Ward Air Force Research LabJim Warren Natl Institute of Standards amp TechJeff13 Williams GE Aviation

Terry Wong Rockedyne

Wayne Ziegler Army Research13 Laboratory

Jake Zindel Ford Motor Company

30

Page 8: Materials Genome Initiative: materials data · universallyaccepted best practices. Under the MGI, Federal agencies have funded research and development designedtoexplore, develop,

13

AgendaMGI Materials Data Workshop

Tec^Edge 5000 Springfield Street Dayton OH

1 -shy‐ 16 July 2014

Tuesday 074 ndash 0800 Check-shy‐in

080 ndash 0815 Chuck Ward13 (AFRL) ndash Introductory remarks

Discussion Lead Will Joost (DOE-shy‐EERE)

081 ndash 0845 Jim Warren (NIST)13 ndash Data Infrastructure

084 ndash 0915 John Allison Brian13 Puchala (Univ Michigan)13 ndash PRISMSALMMII

091 ndash 0945 Carrie Campbell (NIST) ndash ChiMaD

094 ndash 1015 Discussion

101 ndash 1030 Break

Discussion Lead John Beatty (ARL)

103 ndash 1100 Marco Fornari (Central Michigan Univ)13 -shy‐ AFLOWLIBorg

110 ndash 1130 Surya13 Kalidindi (Ga13 Tech) ndash Mosaic of MicrostructureIGERT-shy‐CIF21

113 ndash 1200 Discussion

120 ndash 1230 Lunch

Discussion Lead Bill Mullins (ONR)

123 ndash 1300 Tom Searles (MDMi) ndash Cast Aluminum

130 ndash 1320 Dongwon Shin (Oak Ridge13 National Laboratory ORNL)13 ndash Cast Aluminum

132 ndash 1340 Jake Zindel (Ford Motor13 Co) ndash Cast Aluminum

134 ndash 1415 Discussion

141 ndash 1430 Break

Discussion Lead Jim Warren (NIST)

143 -shy‐ 1500 Jason Sebastian (Questek)13 ndash Cast Iron

150 ndash 1530 Lou Hector (General Motors) ndash Advanced13 Steel

153 ndash 1600 Lara Liou (GE Aviation)13 -shy‐ Integrated Computational13 Methods for Composite Materials

160 ndash 1700 Discussion

1700 Adjourn13 for the day

4

13

Wednesday

074 ndash 0800 Check-shy‐in

Discussion Lead Wayne Zeigler (ARL)

080 ndash 0830 Terry Wong (Rocketdyne) ndash Nickel RS FEP

083 ndash 0850 Mike Glavicic (Rolls Royce) ndash MAI Ti Modeling

085 ndash 0910 Rajiv Naik (Pratt amp Whitney) ndash MAI Data Informatics Data Standards Modeling amp UQ

091 ndash 0940 Discussion

094 ndash 0955 Break

Discussion Lead Dennis Griffin (NASA)

095 ndash 1015 Ro Gorham (National Center for Defense Manufacturing and13 Machining NCDMM)13 ndashAmerica Makes

101 ndash 1045 Richard13 Serwecki (NASA) -shy‐ MAPTIS

104 ndash 1105 Scott Henry (ASM International) -shy‐ Computational Materials Data Network

110 ndash 1135 Discussion

113 ndash 1200 Lunch

Discussion Lead Clare Paul (AFRL)

120 ndash 1230 Ben13 Blaiszik (Univ of13 ChicagoANL)13 -shy‐ Globus

123 ndash 1300 Mike Groeber (AFRL) ndash DREAM3DSIMPL

130 ndash 1320 Sam Chance13 (iNovex ) -shy‐ Semantics for Deep Interoperability

132 ndash 1340 Harlan Shober (RJ Lee) -shy‐ Big Data and13 Semantic Technologies

134 ndash 1400 Tim Hawes (Decisive Analytics Corp) ndash Natural Language Processing

140 ndash 1445 Discussion

144 ndash 1500 Wrap up

1500 Adjourn

5

13

Presentation13 Summaries

Welcome13 and Introduction Chuck Ward Air Force Research13 Laboratory

The primary intent of the workshop13 is to promote communication between the groups being

funded under13 the Materials Genome Initiative that13 had significant13 interest13 in the management13 of13 materials data The greater awareness is expected to enhance the likelihood13 for success of the

individual13 efforts and speed13 the communityrsquos understanding of how best to13 proceed13 in13 developing an13 MII

The Materials Research Society (MRS) and the Minerals Metals and Materials Society (TMS)conducted a survey13 of the community13 in MayJune of 2013 to gauge13 attitudes and needs in materials data Over 650 people took the survey13 with 74 of respondents saying they would be willing to share

data if it were encouraged13 as a condition13 of funding or publishing The community showed13 the most interest in gaining access to databases of some of the most basic and essential13 materials data such asthermo-shy‐physical properties

The survey asked participants to evaluate advantages of local versus centralized databasesCentralized13 repositories were perceived13 by respondents to13 offer advantages in13 cost an13 ability to13 share

onersquos data provisioning of uniform data standards and13 an13 ability to13 organize and13 find13 data Localrepositories were perceived to provide advantages in ease of data input time efficiency and flexibility inaccommodating new data13 types

The survey also asked participants to provide their thoughts on impediments to and motivators for13 sharing data Most13 of13 the impediments can be attributed13 to13 concerns of ownership13 of data orexternal data13 restrictions while technical issues such as file size and proprietary formatting were13 also

concerns Overall participants saw the ability to contribute data to the community for reuse and

critique positively The highest motivator was13 to enhance onersquos13 own research visibility Interestingly

respondents saw adding metadata as a motivator presumably due to13 the perceived13 higher value13 of data13 when it is13 described more robustly

6

13

MGI Efforts in Infrastructure Development13 at NISTJim Warren National Institute of13 Standards and Technology

The presentation provided an overview of NIST activities supporting the creation13 of a materials data infrastructure The infrastructure could13 start by defining the interfacesAPIs (Application Programing Interfaces) with which user is able13 to interact for13 the deposit13 or13 retrieval of13 data Beyond13 an interface the following are essential elements for creation of viable infrastructure

bull The establishment of repositories to13 store the databull Tools to enable digital capture of the information preferably at the time of creation

(computation or13 experiment) bull Tools to enable markup of the data13 with sufficient metadata13 to inform someone else how the

data was created as well as various attributes that would help the user judge the quality of theinformation

bull Assignment of a persistent digital identifier (like the DOI (Digital Object13 Identifier) for13 journals)13 so the data can be cited and discovered by other

bull The registration of the availability of the data into13 some sort of ldquoregistryrdquo to13 enable discovery without prior knowledge of the existence of the repositoryspecific data features

Additionally creation of new policy and13 standards as well as development of tools to13 evaluate the quality of13 the data and tools to13 enable data-shy‐driven13 discovery will be required to manage and use materials data to the greatest extent possible

NIST is responding to the OSTP (Office of13 Science and Technology and Policy) memos (Dr John Holdren 2 Feb 14) by developing practices that will be of use to13 the community including a Common13 Access Platform allowing access across multiple collections of information13 Research13 and13 Development of Persistent Identifier (PID) infrastructure with13 the Research Data Alliance (RDA) and RampD o Data Type Registries with13 RDA

Finally NIST is collaborating with the National Data Service (based at the National Center for Supercomputing Applications) and is developing a set of best practices for Data Management Plans

7

13

Th Materials Commons A Novel Information13 Repository and13 Collaboration13 Platform for the13 Materials CommunityBrian13 Puchala Glenn13 Tarcea Stravya13 Tamma Emmanuelle Marquis John13 Allison13 University of Michigan

Critical to13 accelerating the pace of materials science and13 development is the development ofnew and13 improved13 infrastructure providing a seamless way for materials researchers to13 share and13 use

materials data and models To address this need we are developing the Materials Commons (httpprismsenginumicheduprismsmaterialscommons)13 an information repository andcollaboration platform for the metals13 community13 in selected technical emphasis13 areas We envision the

Materials Commons becoming a continuous seamless part of the scientific workflow13 processResearchers will upload13 the results of experiments and13 computations as they are performedautomatically where13 possible along with the13 provenance13 information describing the13 experimental and

computational processes By13 associating this13 data with the experimental and virtual computational materials samples from13 which it is obtained the Materials Commons will build process-shy‐structure-shy‐property relationships enabling the construction13 and13 validation13 of constitutive and13 process models The

Materials Commons website provides an easy-shy‐to-shy‐use interface for uploading and13 downloading data and13 data provenance searching and sharing data At its core the13 Materials Commons consists of secure13 data storage cluster an13 application13 for efficiently uploading and downloading large data sets and a REST

(REpresentational State Transfer) based13 API to13 access and13 extend13 the capabilities of the repository TheAPI allows for features such13 as automated13 data upload13 from experiments and13 computations seamless integration of13 computational models and algorithmic analysis of13 process-shy‐structure-shy‐propertyrelationships The Materials Commons is a central thrust13 of13 the Center13 for13 PRedictive Structural Materials Science (PRISMS)

NIST Materials Data Informatics EffortsCarrie Campbell National Institute of Standards an Technology

The presentation covered NISTrsquos pilot efforts in13 creating a data infrastructure focusing on Phase-shy‐Based13 Property Data Phase-shy‐based data13 are13 diverse13 data13 that13 include 1 2 3 and 4-shy‐dimensional13 data are13 semi-shy‐structured and often include incomplete data sets The figure below provides anoverview of NISTs data efforts supporting the MGI

8

13

NIST Materials Data Repository (httpsmaterialsdatanistgovdspacexmlui) is a customizedDSpace repository for materials that enables sharing of variety of data types including text13 files images and video The repository also provides persistent identifier for each entry and allows users to choose

from a variety of13 license types NIST is also working on two data curation tools for phase-shy‐based13 dataThe Thermodynamic Research13 Center (httptrcnistgov) is extending the ThermoML markup languageand Guide13 Data13 Capture13 software13 tool to metallic systems13 with an initial emphasis on phase equilibriaand thermochemical data The13 Materials Data13 Curation System (MDCS)13 employs user-shy‐defined13 XML-shy‐based13 schemas to curate13 data The13 MDCS13 allows either web-shy‐base data curation13 or using REST API The

entered data13 is stored in a non-shy‐relational database repository can be searched using SPARQL (SPARQL Protocol and RDF Query Language) interface13 and integrated with variety of scientific workflow13 tools13

AFLOW ConsortiumMarco Fornari Michigan13 State University

The overall goal of the Materials Genome Initiative (MGI) is to accelerate the discovery13 development and13 deployment of new materials with ad hoc functionalities The AFLOW Consortium

focuses on the software infrastructures to achieve the MGI goals by using first13 principles calculations The emphasis is on the data13 that must be generated archived validated and post-shy‐processed13 efficiently13 both13 to13 identify new materials and13 distill meaningful quantitative relationships within13 the data This extends the13 solution to the13 inverse13 problem of chemically replacing13 rare13 and costly elements in criticaltechnologies

The AFLOW Consortium provides access to the repository AFLOWLIBORG

lthttpAFLOWLIBORGgt13 that publicly distributes data13 on energetics and thermodynamic stability for more than 600000 compounds and band structures for roughly 300000 materials The repository

includes bulk crystals and alloys13 Data were computed with the AFLOW high-shy‐throughput13 framework

using density functional theory The results of the calculations can13 be queried13 with13 the recentlydeveloped13 REST-shy‐API that standardizes and13 labels the data for retrieval purposes The13 fields defined in

the AFLOW REST-shy‐API include a material object identifier (AUID) data cloud13 location13 (AURL) ownership13 9

13

and sponsorship keywords and data13 specific labels (COMPOSITION ENTROPY EGAP hellip) Simple13 scripting languages13 can be used to automatically download13 from the AFLOWLIB13 repository The AFLOW

approach has13 been extensively applied on Platinum-shy‐group alloys thermoelectric materials magnetic materials topological insulators scintillators etc The AFLOW REST-shy‐API can13 be extended13 to13 includeexperimental data For further details on the13 database the13 reader is directed tohttpmaterialsdukeeduauroAUROARTICULAjcommatsci201202002pdf

Materials Innovation Infrastructure Lessons Learned and Future PlansSurya Kalidindi Georgia Institute13 of Technology

central activity in13 materials research13 is the exploration of processing-shy‐structure-shy‐property (PSP) linkages13 Workflow is defined13 as the sequence13 of steps employed for establishing13 PS linkages ofinterest to any specific engineeringtechnology application

High Throughput Explorations of PSP Linkages13 We must treat13 hierarchical13 material as acomplex System which means embracing and exploiting uncertainty quantification13 and13 complexity

management conceptstools13 It also means we need to design develop and13 validate decision support systems13 that will leverage the best current understanding (with13 its uncertainties) and13 provide objective13 guidance13 on future13 effort investment (combination of experiments and simulations)13 This meansleveraging Data Science to effectively feed the decision support systems described above by developingand deploying high13 throughput strategies for obtaining high value information from both experimentsand models

What are integrated13 workflows Integrated workflows are those that utilize the best13 combination of experiments13 and simulations13 in extracting robust and reliable PSP linkages13 This requiresengaging and exploiting cross-shy‐disciplinary expertise that includes materials science manufacturingsystems approaches uncertainty quantification computational science data and information sciencesThrough this one must ensure that the workflows output the critical information needed13 by design13 and13 manufacturing stakeholders in the materials development value chain13 Integration demands expertise

sharing13 which essentially is the core element of a collaboration Cross-shy‐disciplinary Integration13 demands13 effective13 collaborations13 such as those one might set up using the modern tools of the day13 ie a e-shy‐Collaboration

10

13

Motivators for e-shy‐Collaborations13 e-shy‐Collaborations are central to13 enhancing the productivity ofmaterials development activities in several ways

bull individual13 research groups can identify and engage13 complementary expertise13 needed for their projects

bull organizations can13 coordinate multiple teams and13 multiple projects to13 maximize outputimpactbull organizations can13 identify and13 seek the right partners that13 enhance their13 individual and combined

value e-shy‐Collaborations can13 be open semi-shy‐open or closed13 depending o the needs of the specific organization Productivity enhancement leading to dramatic reductions in cost and time13 expended in materials development is the13 main incentive13 for e-shy‐collaborations

Main Impediments to e-shy‐CollaborationsLack13 of tools to identify13 the13 ldquobestrdquo13 collaborators

bull Need software tools that quantify more precisely the expertise of individuals groupsorganizations

bull Need software tools for13 allowing rich annotations and e-shy‐recording of13 discussions Need analytic tools for13 identification extraction13 and management of metadata (ie13 high13 value abstractionsfrom the raw data)

bull Need searchable semantic metadata databases (these are13 lightweight and can be13 centralized

and customized for specific13 user communities) bull Need software tools for tracking provenance (ie13 whowhatwhenhow13 generated modified

etc) Lack13 of experience13 with the13 use13 of integrated workflows

bull Need software tools13 for e-shy‐recording of13 workflows and development of searchable workflow

databases bull Need recommendation schemes that suggest the best workflow(s) based on recorded prior

experience this provides means to transfer expertise13 acquired by an organization to other teams and other13 projects (even in the same organization)

bull Leads to possible automation and dramatic enhancement of productivity

MINED Grouprsquos Past Experience13 (~2005 -shy‐ ~2008) Funded by the Army Research Office ARO(David Stepp) engaged computer science (CS) graduates to produce13 a single13 user platform that13 integrated certain software codes previously produced13 by our group software development subsequently abandoned but recruited two CS graduates into PhD programs in Materials Science andEngineeringMechanical Engineering (MSEME)13 which dramatically altered the direction of13 research in

the group brief chronology of development bull Senior Design Project (2011-shy‐2012) An online13 portal designed for sharing data13 files and tools

learned the importance of user interfaces and the need for professional13 code developers andsoftware architects

bull HUBzero Instantiation (2012-shy‐2013) Built hub by customizing HUBzero for hierarchical structuralmaterials there are several advantages in terms of accessing and executing codes13 (on the server side) but there are significant barriers13 to establishing fast and productive13 cross-shy‐disciplinary e-shy‐collaborations

11

13

bull GTRI Integration Platform (2013-shy‐2014) Engaged professional code13 developers and software13 architects to design13 a multi-shy‐user web13 portal that allows data and13 code sharing while tracking

provenance and workflows underestimated the resources needed for such a project bull MATIN Collaboration Environment (2013-shy‐2014) Lightweight integrated software13 shell that allows

users to13 seamlessly interface with13 numerous cloud13 services offered13 through13 the web13 through13 their13 APIs Examples include GitHub for versioning and archival of codes Authorea forcollaborative editing of documents figshare for citable publication of research Plotly forcollaborative data analysis13 and visualization Google+ and LinkedIn for eteaming13 and networking MATIN focuses on building an emergent community ndash Materials amp Manufacturing Informatics (MMI) MATIN provisions web services for13 software integration platforms and innovationinfrastructure Examples of MATIN13

bull Spatial Statistics httpsgithubcomtonyfastSpatialStatisticsFFT

bull PyMKS httpopenmaterialsgithubiopymksindexhtml bull Example discussion about periodic boundary conditions in running Finite Element (FE or FEM)

simulations httpsgithubcomsfepysfepyissues267

In the platform developed it is noted that the user or userrsquos organization can set permissions at the

level13 of individual datasetscodes o the degree and13 type of13 sharingCentral Challenge Activation The transition from traditional disciplinary workflows to the

integrated workflows needs to overcome an activation barrier There is no13 clear incentive to13 get ahead13 of the curve in13 anticipating how the upcoming transformations in how materials development will be

pursued13 in13 the future Open (reproducible) science alone is not a sufficiently strong motivator for materials development community with13 strong interests in intellectual13 property development13 indeed itmight actually13 be perceived13 as a negative There is need to explore13 and refine13 what incentives work for individual13 researchers (eg increased productivity)

Future13 Plans13 Continue to13 develop13 and13 deploy MATIN at GT and any other organizationsgroups that13 express an interest in working with us Design create and distribute13 suite13 of ldquomobile13 firstrdquo Appsdesigned13 to13 make data ingest into13 collaboration environments such as MATIN from both material characterization equipment and multi-shy‐scale models an enhanced13 user experience with13 friendly andnatural interfaces Proactively promoteincentivize13 datametadata13 ingest into publicly shareddatabases Work with13 professional societies to13 announce and run new competitionsawards for13 recognizing and rewarding good practices by the members of the broader materials innovation13 usercommunity (eg Best13 Documented Data Award Best Documented Tool Award Best Documented

Workflow Award) Develop and deliver new workshops for training the new generation of13 graduate

students postdocs and junior faculty in the13 emerging workflows of Materials amp ManufacturingInformatics

High Performance Cast Aluminum Alloys for Next Generation Passenger Vehicle EnginesDongwon Shin Materials Science and Technology Division Oak Ridge National Laboratory

12

13

Oak Ridge National Laboratory (ORNL) teamed with Chrysler and Nemak13 to develop next-shy‐generation high-shy‐performance cast aluminum alloys that have13 improved castability high-shy‐temperature

strength and fatigue13 performance At this point the13 data management practice is limited13 to13 analytical data spreadsheets other than13 the CALPHAD computational thermodynamic databases and13 proprietary

property databases for casting simulation As the project goes on more robust data schema will bedeveloped13 to13 store various types of data populated13 from the experiments and13 calculations

Besides the data structure design the speaker discussed13 the necessity of a new approach13 to13 identifying key descriptors from the large dataset that can guide the design of new materials An alloydesign13 requires the balance of extremely complex relationships among a large number of descriptors Some13 examples of these13 descriptors include13 phase fraction13 of key intermetallics lattice parameters ofthe matrix and diffusion13 kinetics13 of solute atoms However optimizing these quantities relies on good

physical metallurgy intuition13 coupled13 with13 clever experiments dogged13 sleuthing and13 a healthy dose ofluck13 Such descriptor-shy‐based predictions of materials properties date back to13 the beginning13 of modern

solid-shy‐state chemistry and physics As13 shown in the Darken-shy‐Gurry map to predict the solubility ofelements in Tantalum13 intuitive descriptors13 such as electronegativity and radius13 could be used formaterials design However as the chemistry and physics13 of materials become more complex intuition-shy‐based13 search13 may not work anymore Thus the use of more advanced13 statistical analysis and13 datamining approach might be required to accelerate the design of complex multi-shy‐componentmulti-shy‐phase

alloy design13

Data Related to Cast Aluminum

Jake Zindel Ford Motor Company

The main focus of our research is the development of CAE (Computer-shy‐aided Engineering) tools to support13 two significant13 activities in the company component13 design (product13 development)13 and

manufacturing engineering (high-shy‐volume production) A key13 part of our research is conducting13 experiments to collect data13 for the13 development and validation of13 the CAE tools

The data13 we collect consists of mechanical properties physical properties microstructuralimages and manufacturing process parameters that form the13 pedigree13 of the13 material This information

can be in the form of Excel files proprietary-shy‐software storage files screen shots and written reports Our immediate need is to deal with this large amount of existing information collected over

many years on a ldquolocalrdquo level The first step is to organize many independent collections of informationin all13 their forms13 The next issue is the time required to enter all13 this information into the database13 Maintenance may also require a significant amount of effort Lastly there will also13 be a need13 to13 manipulate the data contained in the database using Excel Mini-shy‐tab MATLAB etc Therefore links or13 easy exportation of data13 to these13 types of applications will be13 necessary

The next level would involve dealing with information that has been created or collected byothers From an13 OEM (Original Equipment13 Manufacturer)13 perspective this would13 involve convincing

suppliers13 to share processing information that they may consider proprietary In addition to suppliersuniversities and13 national laboratories are also13 sources of data Data from all the13 sources will require13 some sort of protocol to determine its13 quality and resolve inconsistencies

13

13

In the end13 it seems likely that considerable budget and time will be required to implement comprehensive informationdatabases which13 could be the largest hurdle13 of them all

QuesTekrsquos MGIICME Approach to Alloy Optimization and Design (Including Cast Iron) Jason Sebastian QuesTek Innovations LLC

Integrated Computational Materials Engineering (ICME) methods involve the holistic application13 of different computational models across various length13 scales to13 the design development and13 rapid13 qualification13 of advanced13 materials Using itsMaterials by Design technologies QuesTek has successfully applied these ICME methods to13 the design development13 and full qualification of several advanced alloys (including aerospace13 structural steels aerospace13 gear steels advanced castable13 titanium alloys high-shy‐strength corrosion-shy‐resistant13 aluminum alloys13 and high-shy‐performance cast iron)

QuesTekrsquos ICME methods benefit tremendously from the activities of the US Materials GenomeInitiative (MGI)13 Recognizing that the ldquo10 to13 2 yearsrdquo that13 it takes for alloy design and development ishindering the rapid13 deployment of higher-shy‐performance materials13 the MGI is focused on ldquocutting in halfrdquo the time and cost13 to design and deploy novel advanced materials needed to enhance and sustain UScompetitiveness

QuesTek also has ongoing13 activities on the13 application of ICME methods to the13 design and

development of materials for the energy sector including advanced13 materials that enable vehicle

lightweighting13 Of particular note is an ongoing effort with a major diesel13 engine manufacturer on the13 development of high-shy‐performance cast iron13 alloys for engine block applications QuesTekrsquos efforts have

made extensive use of various MGI-shy‐type tools and databases including 1)13 density functional theory

methods for high-shy‐throughput13 screening of cast iron inoculant stabilities (structure lattice parameterstability of various13 oxidesulfide phases) 2)13 liquid iron mobility databases (to help accurately modelsolidification phenomena) and 3) thermodynamic13 databases13 (to examine nano-shy‐precipitate stability)

Moving forward to continue to accelerate the materials development cycle efforts shouldremain focused on the development13 of13 the fundamental databases (thermodynamic kinetic etc)13 that13 enable13 ICME models There13 should also be13 increased focus o efforts related13 to13 the integration13 of the

various computational databases models and tools (the ldquoIrdquo13 in ICME)

Integrated Computational13 Materials Engineering (ICME) of Generation Three Advanced High Strength

Steels Louis G Hector Jr General Motors RampD Center

While traditional material development methods often involve laborious trial-shy‐and-shy‐error testing Integrated Computational13 Materials Engineering (ICME) which falls under the umbrella of the Materials Genome Initiative is offering a means13 for materials13 development under a more compressed time scale Advanced13 high13 strength13 steels (AHSS) which13 contain13 multiple phases (eg ferrite bainite pearlite austenite martensite) and may exhibit13 phase transformation with strain are fertile ground for13 ICME However prediction of macro-shy‐scale constitutive behavior based upon the multi-shy‐scale physical chemical and mechanical phenomena13 in these13 complex microstructures is formidable challenge for ICMEUnprecedented collaboration between universities industry and government labs will be required to

14

13

address AHSS13 development archiving of data and the13 implementation of these13 materials into products that13 benefit13 the American consumer This presentation began with a brief overview of a newDOEUSAMP (US Automotive Materials Partnership)-shy‐funded ICME program aimed at13 the development13 of Generation13 Three (Gen13 3) AHSS The lack of commercial Gen13 3 steels that meet DOE targets is offering13 significant challenges13 to this13 new ICME program The program is13 meeting these challenges13 following four thrusts (1)13 making Gen 3 steels to meet DOE targets (2) passing experimental measurements of key

properties to13 an13 ICME model that produces microstructure-shy‐based13 constitutive models (3) materialevaluation in forming13 and vehicle13 performance13 simulations (4) archiving13 all relevant data13 for easy use13 by

others in13 the program Although13 the Gen13 3 AHSS ICME program is ldquomission13 orientedrdquo it demonstrates an important role for universities in providing the necessary foundation in fundamental materialsscience and engineering

Integrated Computational13 Methods for Composites Materials (ICM2)13 Lessons Learned from

Integration Planning and Feasibility Demonstrations Lara Liou GE Aviation

General Electric Aviation and Lockheed Martin Aeronautics have teamed through an Air Force

Research13 Lab13 program to13 conduct a demonstration13 of an13 Integrated13 Computational Materials Engineering digital framework that links material processing property and13 structural relationships to13 account for processability manufacturability and system performance13 with the13 goal of demonstrating

that13 the usage of13 integrated models can contribute to future airframe and engine designs with dramatic reductions in development13 time and cost Both engine and airframe Foundational Engineering Problems will be demonstrated through a common digital framework to be developed and validated onincreasingly complex articles13 The modeling tools that13 cover composite process models localized13 mechanical behavior models and global design models (Convergent Manufacturing TechnologyrsquosCOMPRO CCA Autodeskrsquos Autodesk Simulation13 Composite Analysis and13 an13 Abaqus based13 University ofMichigan micromechanics model) are centrally linked via a commercially available model integration

interface (Phoenix Integrationrsquos ModelCenterreg)13 Detailed plans of what data will13 be passed among thevarious models as well as how it will be passed have been completed Creation of these detailed13 process maps and13 simple feasibility demonstrations have raised13 inherent technical and13 logistical challenges13 to integrating commercially13 available multi-shy‐scale models Robust model integration methods13 to address these challenges were13 discussed

The exercise of drawing up detailed process maps for what to integrate and how to integrate13 the selected tools and models resulted in key early lessons learned and identification of13 tools critical tolong-shy‐term robust and sustained application of13 Integrated Computational Methods for Composites Materials (ICM2)13 The key lessons learned include

bull Individual13 models must be relatively mature and validated before an ICM2 problem can be

confidently13 defined bull F analysis-shy‐based13 tools within13 the same digital thread should utilize13 the13 same13 FE solver bull The team ideally13 should consist of at least one expert for each13 tool and13 its underlying model as

well as an expert in the digital thread and integration interface tools and programming

15

13

bull Mesh interpolation between13 finite element analysis based13 tools will likely be required and13 acommercially13 available tool for this13 type of composite mesh interpolation is13 key13 for robust and

sustained application of ICM2

Residual Stress Engineering of Rotating Nickel Components Terry Wong13 Aerojet Rocketdyne

The goal for the PW-shy‐8 project13 The Nickel Base Superalloy Residual Stress Foundational Engineering Problem program aims to13 develop13 a multi-shy‐disciplinary analysis approach13 to13 predict and13 incorporate bulk residual13 stress in the13 design of component This will allow for design of part that is optimized for a particular application or design intent (such13 as weight performance or reduction13 ofscrap) To accomplish this13 goal the PW-shy‐813 team has two major tasks 1) the13 development of an13 infrastructure and associated tools needed to predict and incorporate bulk residual13 stress in the designstages13 of a part and 2) a demonstration task that will validate the infrastructures13 and tools13 developed

The Data13 Management task is the foundational element13 of13 the larger13 infrastructure

development task The goal for the data management work is to13 develop13 an13 infrastructure that allowsthe various team members in PW-shy‐813 to share13 data13 and to store13 data13 in database This not only ensures that data generated13 by this program is never lost but also13 allows for data to13 be transferred13 in13 a secure

manner In order to meet the goal of a system13 that securely stores and shares data three major tasks must be accomplished 1) Validation that there13 is secure13 method to transfer data13 between team

members and that the method of storing data is also secure 2)13 development of13 a database schema and

3) development of means to easily upload data13 into the13 database13 and download data13 from the13 database

Of the three the most difficult task was to create a system in which the transfer and storage ofdata was secure Various proposals were made to13 ensure the security of the data but many of them

were rejected not because the proposal was actually unsecure but because it was not acceptable byeach of the13 team memberrsquos IT security policies The13 process to get each team membersrsquo IT department to approve the security of13 the data transfer13 and storage took more effort13 than anticipated In the endthe team members along with their13 respective IT departments approved the use of13 TRUcentrix as a tool to transfer13 data and the use of13 an ASM International13 hosted13 database to13 store data

The development of database schema13 is necessary step in creating the database to store all the data generated (via physical tests as well as data generated by models) A big help in creating the

database schema was to13 use the schema generated13 by Dr Steven13 Arnold13 from NASArsquos Glenn13 Research13 Center Using NASArsquos schema eliminated13 the need13 to13 start the database build13 from scratch13 and13 allowed13 us to13 focus o the specific needs of storing residual stress data

The last task involves the development of tools that make the input of data13 easy The moredifficult it is to13 enter data into13 a database the more likely data13 will not be13 entered Excel-shy‐based13 templates are being developed in order13 to make data entry as easy as possible

The two big lessons learned from PW-shy‐813 in relationship to data13 management are

bull It is essential13 to work with each13 companiesrsquo IT and IT Security people to develop a system thatsatisfies13 the security requirements of each company

16

13

bull Database schema development should not be reinvented but begin with already established the13 schema

Perspectives on MAI ProgramsDr Mike Glavicic Rolls Royce

The current formulation of MAI (AFRLrsquos Metals Affordability Initiative) programs are structured13 independently from one another in that a standard methodology does not exist that describes whatdata is to13 be archived13 at the conclusion13 of a program Nor does there exist a standard13 protocol of howto archive things on a server13 or some other computer resource at the conclusion13 of a program As aresult the current13 practices of13 MAI programs

bull Do not address IP (intellectual property)13 and licensing issues in systematic fashion which leadsto issues in the commercialization of ICME software that is generated13 under a program

bull Do not have a formalized requirement of how data is to be packaged and stored at the end of aprogram Hence even13 in13 the best of cases in13 which13 all of the data is stored the data is archived13 in13 simple folder13 fashion that13 is not13 searchable in any manner13 possible Moreover even if13 all of13 the

data is present it is extremely cumbersome to13 untangle the data for future use using a file folder strategy

Hence at the end of the program both the government and13 MAI team are13 left to manage13 the13 data and13 results in13 an13 unregimented13 fashion

The lack of regimented protocol of how to package programrsquos13 results13 (raw data and reports) makes it extremely difficult for new government programs to leverage the data from previous programs Case in13 point the RR-shy‐1013 program was developed with the13 singular goal to integrate13 the13 models developed13 in13 the LAD-shy‐213 program into commercial software13 platform From the13 outset of the13 program data models and13 other information13 that was required to attained the13 goals of the13 RR-shy‐1013 program were13 either difficult to track down lost or not provided to the13 RR-shy‐1013 team due13 to contractual disputes As aresult funding that13 was earmarked for13 specific technical work at13 the conception of13 the13 program had to

be diverted13 for the recreation13 of data that existed13 in13 a previous program and13 the progress of the

program was stunted13 by this unforeseen13 burden In13 spite of the RR-shy‐1013 programrsquos13 ability to overcome

these issues and become a successful program it13 should be noted that13 the success of13 other13 programs that13 will encounter13 similar13 problems will likely not13 apply as RR-shy‐1013 was very fortunate13 to have13 the13 means to overcome these serious issues that13 were a result13 of13 the lack of13 a data management13 plan

Furthermore if the13 MGI initiative13 is to progressively develop materials models that are13 leveraged and improved in future government funded programs a policy on how to store and share thedata from disconnected13 programs is required The PW-shy‐1113 program is the first13 step in this process as it13 will have the data from three MAI programs (RR-shy‐10 GE-shy‐10 PW-shy‐9) uploaded into database13 that will be13 searchable and will enable the reuse of the data collected in these programs However even if this13 program is a success13 there will undoubtedly13 be other features13 that will be required in the platform

developed13 and13 hence future modification13 will be required Most importantly there does not currently13 exist server that has been set up and funded for the13 archiving13 of all13 future programs with a protocol13

17

13

13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

that13 must13 be adhered to at13 the conclusion of13 a funded program Without13 such a protocol and the

existence13 of server to be13 used in all future13 programs the13 goals set forth by MGI will be13 unnecessarily

difficult and13 more expensive to attain

MAI Data Informatics Data Standards Modeling amp UQDr Rajiv Naik Pratt amp Whitney

The MAI Data13 Informatics Data13 Standards Modeling amp Uncertainty Quantification (UQ) (PW-shy‐11)13 effort13 will enable the pervasive implementation of ICME by developing (i) a web-shy‐based13 OEM-shy‐Suppliercollaboration framework13 for production13 certification13 and13 process modeling data (ii) data standards and

protocols for interoperability between13 OEMs and13 Suppliers and13 (iii)13 UQ methods to capture variability in

process microstructure and property data Models and data13 from earlier MAI programs such as GE-shy‐10 PW-shy‐913 and RR-shy‐1013 will be13 used to develop and implement OEM-shy‐Supplier integration data standards data

structures and UQ models Several OEMs and suppliers (Pratt amp Whitney General Electric Rolls-shy‐RoyceTIMET Alcoa13 and RTI International Metals Inc) are working together on the PW-shy‐11 Team

The OEM-shy‐Supplier collaboration tool will enable data13 exchange related to mechanical testingmicrostructural13 characteristics processing and process modeling The13 data13 standards developed by the

PW-shy‐1113 team will provide13 an industry standard for such OEM-shy‐Supplier collaboration The collaboration13 framework would provide a platform to manage OEM-shy‐Supplier13 workflows data and13 informationsecurity and access13 control and the13 ability to perform system administration without the13 need13 for day-shy‐to-shy‐day intervention13 of the IT department The capture of such13 supplier data over the collaboration

framework will enable the OEMs to enhance process and property13 models data analytics and UQ

modeling Three potential technologies will be evaluated13 for the OEM-shy‐Supplier Collaboration13 Frameworkndash GE Forge Vanderbilt Forge and Semantic Web technology and one13 technology will be selected13 for the

next phase of the program

18

13

Semantic Web technology is paradigm shift from highly structured13 databases to13 more flexible

graph databases from single database searches to13 multiple and13 disparate13 database13 searches and from

keyword searches to contextual and meaningful (semantic) searches It can provide flexibility13 of data

structures and enhanced OEM-shy‐Supplier13 interoperability However it requires more work upfront13 for13 implementation as it needs the development of a common vocabulary and relationships

The data13 standards and data13 structures effort will leverage work at national and13 international standards13 organizations such13 as ASM ASTM CEN and13 NIST and13 develop13 standards and vocabularies forproduction13 certification data that can include chemistry microstructure and mechanical properties This effort will also develop means for the capture of data needed13 for the UQ and13 ICME models andbuild13 databases for13 the capture of13 the data from the GE-shy‐10 PW-shy‐9 and RR-shy‐10 MAI programs

The ICME13 models being developed under the GE-shy‐10 PW-shy‐9 and RR-shy‐10 programs involve severalinterconnected models related to processing microstructure and mechanical properties The UQ effort under PW-shy‐11 will involve considering the uncertainty at each13 step13 of the modeling process and flowing it through the interconnected models to evaluate the relationships between13 the variabilities of13 the

processing parameters the microstructure and13 the mechanical performance

ASMrsquos Computational Materials Data Network amp the NIST-shy‐ASM Structural Materials Data

Demonstration ProjectScott D Henry ASM International

ASM International ndash the worldrsquos largest13 association of13 metals-shy‐focused materials engineers scientists technicians educators and students13 ndash has launched13 a series of initiatives under the

ldquoComputational Materials Data (CMD)13 Networkrdquo umbrella The objective13 of the13 CMD Network is to serve13 as center for information collection and dissemination for materials data13 to support integrated

computational materials13 engineering (ICME) and to help realize the goals13 of the US Materials13 Genome

InitiativeThe first major project for the CMD Network is the NIST-shy‐ASM Structural Materials Data

Demonstration Project (SMDDP) a cooperative research project funded by NIST with participation from

the Kent13 State Center13 for13 Materials Informatics and Granta Design Ltd13 The main13 objectives of SMDDP

are13 to

19

13

bull Establish well-shy‐pedigreed13 and13 curated13 demonstration13 datasets for non-shy‐proprietary metallic structural materials13 data over multiple length scales The team has13 chosen to work with 6061

aluminum and the13 related Al-shy‐Mg-shy‐Si system

bull Work with NIST and the materials data community to develop materials data schema and

ontologies for the demonstration13 datasets cognizant of broader interests and13 datasets

bull Develop and carry out a series of test problems that represent relevant use cases for the13 repository

bull Make data open to the materials data community for use in data analytics modeling and

educational purposes

bull Actively engage the materials data community and13 widely disseminate the findings from the

project

bull Develop and implement data13 capture13 and curation procedures that can serve13 as models for otherdata repositories

The basic philosophy behind the project is that often standards13 emerge through trial and error and examples are13 needed to spur this process

The project is using the NIST-shy‐KSU DSpace13 repository for data13 files The13 curated structured

database is being developed13 using GRANTA13 MI software and13 is hosted13 o ASM serversThe project ndashlaunched in late 2013 and planned as an 18-shy‐month program13 ndash is being conducted in

phases The13 first phase13 involved developing starter database13 and gaining experience13 in data13 curation The second phase (currently underway) involves building out more comprehensive data13 sets that will be

suitable for carrying out test problems13 related to processing-shy‐structure-shy‐property relationships in13 Al 6061 The third phase will involve making further refinements in response to testing The complete data13 set will be opened up online to the materials community as a whole

Globus13 Scientific13 Data Services13 Current and FutureBen13 Blaiszik1 Kyle Chard1 Jim Pruyne1 Rachana Anathakrishnan1 Steve Tuecke12 Ian Foster12

1University of Chicago Computation Institute 2Argonne National Laboratory MCS Division

During this talk we discussed a variety of currently available Globus services13 Via these Globusservices developers13 and IT administrators13 can offer simple secure and robust file transfer as13 well as13 identity profile and group management to their user communities13 Globus allows end users to create

and13 manage a unique identity that can13 be linked13 to13 external identities for authentication End13 users can13 also transfer files across wide13 area13 networks faster and more13 easily whatever their location

The talk also covered prototype services related to data13 publication metadata tagging and13 data

search Globus13 publishing capabilities13 are delivered through a hosted service Metadata is13 stored in the

cloud while raw published data and a fully-shy‐formed metadata log are stored on campus institutionaland group resources that13 are managed and operated by campus administrators Published datasets are

20

13

organized13 by communities and13 their member collections Globus users can13 create and13 manage theirown13 communities and13 collections through13 the data publication13 service13 with collection level policies regarding user13 access Additionally metadata from these communities are imported into a centralized

index to allow for robust searching functionality

Semantics13 for Deep13 InteroperabilitySamuel Chance iNovex

Semantic technology is based on mature13 web standards and intended to be13 used as global framework for13 machine-shy‐to-shy‐machine communication Interoperability is enabled even when the more

basic elements of the semantic technology stack are used13 in13 conjunction with a controlled vocabularyKnowledge13 representation is enabled as one13 begins to utilize13 the13 ldquohigherrdquo elements of the13 stack

brief history of the semantic web13 was presented13 and13 started13 with13 a depiction13 of the web13 that most people see today documents and web pages presented to the user13 through a web application It13 then depicted a transition to an intermediate state where documents and data can co-shy‐exist on asemantic13 web riding atop the same infrastructure (http) Even in this13 state the data can13 be directly

interpreted by a machine13 The history concluded with a depiction of data being extracted from adocument to13 become part of the web13 of data

The basics of semantic technology were presented and emphasized that the technology has afoundation based13 o formal logic that enables software13 to understand information and reduce13 the13 need

for13 a human to interpret13 the meaning This section of13 the presentation concluded with a chart13 depicting

how ldquotriplesrdquo link to13 each13 other to13 form a web13 of dataThe final section touched upon adoption of13 and things to consider13 when semantic technologies

are13 being considered as solution for community or enterprise

21

13

Big Data13 and Semantic TechnologiesHarlan Shober RJ Lee Group

The presentation started with a description of entities that13 comprise laboratoryinformatics with the central elements being a federated architecture scientific ldquoBig Datardquo management13 and workflow-shy‐driven Laboratory Information Management13 Systems (LIMS)To aid attendee understanding initial context13 was provided by first13 defining for ldquoBig Datardquo and

Semantic Technologies Context13 was further bolstered through a description of a problem that13 included laboratory testingcharacterization and fabrication processes The challenges for atypical laboratory include the use of many systems and the creation of large amounts of data13 that13 grows daily but13 cannot13 be readily accessed

After the context13 was established the needed functionality was described and consistedof several elements contribute index search analyze and archive While bearing in mind theneeded functionality a string of universal scientific data13 management13 process categories werethen presented with the intent13 of applying the universal description to the materials design anddevelopment13 domain

22

13

Though semantic technologies are applicable over the entire lifecycle of the data13 management13 process three categories were specifically identified as primary candidates fortheir application Data13 Acquisition Data13 Manipulation and Storage and Discovery andAdvisory W3C standards for Resource Description Framework (RDF) were applied to a notionalexample with the intent13 of completing a thread that13 gave the attendee a sense of how thistechnology could be applied

The presentation continued with a description of Postulate a specific solutiondeveloped by RJ Lee Group that13 could be used as part13 of a suite of tools to enable the captureof data13 and the near-shy‐frictionless tagging of data13 with additional observations provenanceinformation or links to additional data Using tools like Postulate repositories could be moreeasily shared with others in a federated manner These technologies would enable thedevelopment13 of indices that13 increase the precision of search results and ease analytics since thedata13 would be ldquoself-shy‐describedrdquo

In the remainder of the presentation Postulate functionality was further described andsome of the challenges presented earlier were addressed For example how do you processthe quantity and variety of data how do you store it how do you find the data13 you care aboutand how can the data13 be presented to you in a meaningful way

Natural13 Language Processing (NLP)Tim Hawes Decisive Analytics

The presentation provided the attendees with a description of NLP13 and how it can be13 used It started with examples13 of how it can be used to mine ldquoBig Datardquo to predict credit card fraud or assess13 user purchase histories for product development NLP is also13 being used13 in13 the field13 of biology and13 appears to have13 enjoyed some13 level of success

Since13 the13 biology and materials communities both publish large13 volume13 of documents (abstracts papers presentations etc) many of13 the NLP approaches developed for13 biology may be

applicable13 to materials The next section of the presentation focused on describing various aspects of NLP tokenization

part-shy‐of-shy‐speech tagging disambiguation parsing entity disambiguation and role13 labeling This was helpful in13 removing the mystery surrounding NLP for those13 who hadnrsquot had previous exposure

Typical products from NLP13 would be useful to the materials development community Examples include providing the gist of a documentrsquos content providing answers to natural13 language questions and knowledge13 construction

Specific NLP13 challenges for the13 materials domain were13 identified inline13 use13 of equations prevalence of tables graphs and13 images (more of a computer vision13 problem) and13 numerically driven13 information13 Additionally the ldquogenrerdquo problem was discussed as13 well as13 a way this might be addressed

using raw data (versus learning data) and13 semi-shy‐supervised learning active learning and distant supervision approaches

23

13

The presentation concluded by reiterating how NLP13 can be used and the value of using it describing some its limitations and13 identifying tool development needed13 for the materials domain

24

13

Community13 and13 Government Priorities in13 Addressing13 Materials Data

The following numbered13 priorities for13 both community and government13 actions were lsquorolled uprsquofrom workshop participantsrsquo inputs and were reviewed at13 the workshop The raw inputs provided by

participants are captured13 under each13 numbered13 priority and provide context to13 the priority and13 often13 more specific actions

Community-shy‐Led13 Priorities1 Develop13 and deploy13 standards13 for data and federatedcollaborative environments-shy‐ Promote13 standards for data13 and collaboration environments-shy‐ Designing and deploying suitable software environments that are capable of facilitating e-shy‐

recording and accelerating (fail often-shy‐fail fast)13 of13 the communityrsquos experience (both successes and

failures)13 in exploring integrated workflows needed to realize the vision of ICME and MGI-shy‐ Data13 standards (program managementtechnical datametadata)-shy‐ Community must start developing a common13 data format data package could13 look like an13

HDF5 file and be shared in a variety of systems -shy‐ Begin13 definition13 of standardized13 infrastructure

-shy‐ Produce13 draft guideline for the metadata including information regarding authorship provenance targethellip

-shy‐ Share13 the13 data13 types being collected with the13 community Include13 nametypeunits of data

-shy‐ Community must adopt a federated13 approach13 -shy‐ allow organizations to autonomously create their13 own13 toolsets but be able to13 readwrite to13 the common13 data package mentioned13 in13 point 1

-shy‐ Good documentation on repositories are important-shy‐ Address networking of repositories (lightweight metadata databaseindex) -shy‐ Digital heterostructure microstructure mapping algorithms for both scanning electron microscopy

(SEM)13 and transmission electron microscopy (TEM) -shy‐ Materials process model (thermodynamics) integration with ICMSE (Integrated Computational

Materials Science and Engineering) data flow

-shy‐ Data sharing protocol across users (government13 OEM13 developers) with appropriate securitycontrol

-shy‐ Work with instrumentequipment OEMs to get auto ingest capabilities and open architectures-shy‐ Work together to define common vocabularies for interoperability-shy‐ Standard tests for ICME13 model data13 standards for microstructure13 data

-shy‐ Standards for ICME13 model data13 storage13 and workflows 2 Communicate the value and need13 for digital materials13 data-shy‐ Brand13 management ndash consistent messaging of goals13 and beliefs -shy‐ Share13 success stories -shy‐ Value (ROI13 Return on Investment) o data capture (globally locally)-shy‐ Give incentive ($$$) for case studies to provide evidence that a shared data repository can be

actually useful beyond the archival purpose

25

13

-shy‐ Share13 ROIs for MII (the13 Materials Innovation Infrastructure)-shy‐ ICME projects due to their iterative nature take longer and are more expensive to realize benefits

in industry implementation13 Multi-shy‐year ICME programs are becoming13 a hard sell to management difficult to13 establish ROI and13 realize benefits claimed

3 Engage the community-shy‐ Write grant proposals-shy‐ Collaboration13 (or e-shy‐collaboration) science that will help us13 identify13 the specific13 incentives13 and

impediments for establishing mutually beneficial highly productive sharing of expertise between13 disciplinary experts in13 materials manufacturing and13 data sciences

-shy‐ Continue to13 work o hard13 problemsmdashdevelop13 open13 source codes discover new descriptors data

mining uncertainty analysis-shy‐ Organize as a community (governmentindustryacademia nationalinternational) -shy‐ Follow data13 recommendations in MGI strategic planmdashworkshops to develop roadmap-shy‐ Attend13 more events like the MGI workshop13 mdash gaining13 lessons learned from other organizations is

extremely valuable

4 ldquoDefine critical datardquo13 to be13 compiled-shy‐ Consensus o what fundamental genomic data is most-shy‐important to compile (thermodynamics13

kinetics13 properties [strength]13 high-shy‐throughput13 DFT) -shy‐ Agree to13 the concept of a ldquouniversal repositoryrdquo for all genomic data that everyone will contribute

to and use (cf everyone maintaining and hosting their13 own databases) this includes issues of13 data

security and sharing etc -shy‐ List of most-shy‐important ICMEMGI13 materials development activities from an industrycommercial13

perspective (alloys composites piezoelectrics etc)-shy‐ Some13 ongoing repository efforts seem redundant We13 should avoid unnecessary competition and

try not13 to step o other peoples toes5 Explore and leverage data solutions developed13 outside materials13 community-shy‐ Semantic web infrastructure13 ndash evaluate13 feasibility of lsquounstructured materials datardquo management

(Technical pathcosttimeframe) -shy‐ Keep communicating outside13 the13 community about solutions -shy‐ Focus on available13 tools open ones when possible

6 Train13 students-shy‐ Sponsor students13 who can think in terms13 of IT and MSE

-shy‐ Training workshops (through professional societies) for training the new generation of materials students13 and researchers13 in the use of readily (already) accessible data science tools13 and cloud

services13 to make13 the13 best use13 of this emerging13 infrastructure13 in enhancing13 their own individual day-shy‐to-shy‐day productivity

7 Establish community13 norms13 for publishing materials13 data-shy‐ Involve publishers by requiring data when publishing Begin by making just the data available first

and then work on standardization panel

26

13

-shy‐ Data associated with publications should be shared along with the publications Not only the

metadata (eg average grain size vs cooling rate) but also the source of the data (eg micrographs used13 to13 generate13 grain size temperaturetime charts13 used to calculate cooling rates etc)

Government-shy‐Led13 Priorities1 Lead13 data strategyapproach13 development-shy‐ Hold meeting like this one in 6-shy‐1213 months with purpose13 of developing roadmap for MII-shy‐ Define a technical roadmap for13 the materials data infrastructure

-shy‐ Governmentrsquos got to decide on a strategy-shy‐ Whatrsquos the critical nucleus eg getting13 CALPHAD off the13 ground

-shy‐ Government must establish an official stance on security and infrastructure-shy‐ List of most-shy‐important ICMEMGI materials development activities from a government perspective

(alloys composites piezoelectrics etc)13 -shy‐ Needs assessment for professional workforce development-shy‐ Work with industry to make sure the research-shy‐to-shy‐engineering13 gap is filled What is the practical13 use

of the projects -shy‐ Provide13 the13 materials community organizing function

-shy‐ Business case for productivity improvement with13 data readily available

-shy‐ Facilitator as honest broker of capability awareness and data13 sharing

-shy‐ Interdisciplinary coordinated effort13 to avoidminimize duplication and redundancy

2 Facilitate data deposition (policy amp technology)-shy‐ Setting expectations and guidelines for best practices and standards where13 relevant -shy‐ Invest in tool13 making data management as easy as possible-shy‐ Promote standards developmentcommon data structuresshare knowledge representations -shy‐ Learn about the WBC Semantic Web capabilities and limitations -shy‐ Government should foster community development for both data package refinement and creating

standard approach to creating an enterprise toolset -shy‐ Provide13 motivation for standard data13 collection ensure13 it includes pedigreeprovenancemeta-shy‐data

-shy‐ Facilitator for materials data13 standards (good and trustworthy data13 vs bad data) -shy‐ Facilitator for forming consortium for data13 mining data13 warehousing and sharing

-shy‐ Facilitate13 industry standards for ICME13 model validation and maturity levels-shy‐ Facilitate13 standards for model-shy‐based13 process and13 supplier qualification

3 Incentivize data deposition-shy‐ Provide13 motivation for13 standard data collection ensure it13 includes13 pedigreeprovenancemeta-shy‐data-shy‐ Push ldquodata13 management planrdquo requirements to ldquodata13 depositrdquo requirements -shy‐ Make data sharing a requirement -shy‐ Contractual requirements that preserve data for future use

-shy‐ Demand data submission as a deliverable that will be in the contract before paying the funds -shy‐ Require that any data generated13 by federal grants is provided13 to13 the government (or a designated13

repository)13 within 3 years after completionmdashstart with basic13 research

-shy‐ Demonstrate the return on investments in the materials data infrastructure

27

13

4 Support13 data sharing13 and deposition13 (provide resources)-shy‐ Increase of MGI13 funds dedicated to MII13 development to 10-shy‐-shy‐repositories amp collab platforms -shy‐ Commit to13 long-shy‐term support for data repositories amp collaborative platforms-shy‐ Funding for development collection of fundamental genomic data

-shy‐ Support thermodynamicskinetics database development for light-shy‐weight alloy systems-shy‐ Funding for universal data13 repository (eg at NIST) ndash creation and maintenance

-shy‐ Fund or establish data13 repositories with commonuniform data13 collection format -shy‐ Createidentify secure data sharing site (like AMRDEC)13 for13 government13 contractorsmdashnot a

repository

-shy‐ AFRL take lead13 o making a database that can13 be13 shared

-shy‐ Infrastructure for materials related semantic web applications5 Support13 data creation-shy‐ Support experimental measurement effort to provide13 quality data for13 database development -shy‐ Support more13 high-shy‐throughput13 (HT)13 studies13 for applied programs13 to populate data Current HT

approaches support are13 limited to too fundamental single-shy‐phase DFT calculations 6 Enhance data management13 competency-shy‐ Government needs to increase in-shy‐house expertise in13 these areas or alternatively work closer with13

groups like NIST We cannot allow other organizations to define our needs for us7 Provide quality datasets13 for model13 developmentvalidation-shy‐ Need for open model benchmark cases with data to be available to the community

28

13

Attendees

John Allison University of MichiganJohn Beatty Army Research Laboratory

Ben13 Blaiszik University of ChicagoANL Carrie Campbell NIST

Sam Chance13 InovexJennifer13 Fielding Air Force Research LabMarco Fornari Central Michigan

Tim Hawes Decisive Analytics Corp

Lou Hector General MotorsAndrea Helbach Air Force Research LabKris Hill ITI-shy‐GlobalMike Glavicic Rolls Royce

Ro Gorham NCDMM

Dennis Griffin NASA

Mike Groeber Air Force Research LabScott Henry ASM InternationalMatt Jacobsen Air Force Research LabWill Joost Dept of Energy-shy‐EERE

Surya13 Kalidindi Georgia Tech

Kevin Kendig Air Force Research LabBen13 Leever Air Force Research LabLara Liou GE Aviation

Kenny Lipkowitz Office of Naval Research

Bill Mullins Office of Naval Research

Benji Maruyama Air Force Research LabRajiv Naik Pratt amp WhitneyJim Neumann Honeywell Ruth13 Pachter Air Force Research LabClare Paul Air Force Research LabAmra Peles United13 Technologies Research13 CenterAndy Powell GE Aviation

Brian13 Puchala University MichiganAjit Roy Air Force Research LabAyman13 Salem Materials ResourcesTom Searles Materials Data Management Jason Sebastian QuestekRichard13 Serwecki NASA

Dongwon Shin Oak Ridge13 National Labo

29

13

Harlan Shober RJ LeeJeff13 Sheehy NASA

Jeff13 Simmons Air Force Research LabVeera Sundararaghavan University of MichiganSesh Tamirisakandala RTI Glenn Tarcea University of MichiganTJ Turner Air Force Research LabVasisht Venkatesh Pratt amp WhitneyChuck Ward Air Force Research LabJim Warren Natl Institute of Standards amp TechJeff13 Williams GE Aviation

Terry Wong Rockedyne

Wayne Ziegler Army Research13 Laboratory

Jake Zindel Ford Motor Company

30

Page 9: Materials Genome Initiative: materials data · universallyaccepted best practices. Under the MGI, Federal agencies have funded research and development designedtoexplore, develop,

13

Wednesday

074 ndash 0800 Check-shy‐in

Discussion Lead Wayne Zeigler (ARL)

080 ndash 0830 Terry Wong (Rocketdyne) ndash Nickel RS FEP

083 ndash 0850 Mike Glavicic (Rolls Royce) ndash MAI Ti Modeling

085 ndash 0910 Rajiv Naik (Pratt amp Whitney) ndash MAI Data Informatics Data Standards Modeling amp UQ

091 ndash 0940 Discussion

094 ndash 0955 Break

Discussion Lead Dennis Griffin (NASA)

095 ndash 1015 Ro Gorham (National Center for Defense Manufacturing and13 Machining NCDMM)13 ndashAmerica Makes

101 ndash 1045 Richard13 Serwecki (NASA) -shy‐ MAPTIS

104 ndash 1105 Scott Henry (ASM International) -shy‐ Computational Materials Data Network

110 ndash 1135 Discussion

113 ndash 1200 Lunch

Discussion Lead Clare Paul (AFRL)

120 ndash 1230 Ben13 Blaiszik (Univ of13 ChicagoANL)13 -shy‐ Globus

123 ndash 1300 Mike Groeber (AFRL) ndash DREAM3DSIMPL

130 ndash 1320 Sam Chance13 (iNovex ) -shy‐ Semantics for Deep Interoperability

132 ndash 1340 Harlan Shober (RJ Lee) -shy‐ Big Data and13 Semantic Technologies

134 ndash 1400 Tim Hawes (Decisive Analytics Corp) ndash Natural Language Processing

140 ndash 1445 Discussion

144 ndash 1500 Wrap up

1500 Adjourn

5

13

Presentation13 Summaries

Welcome13 and Introduction Chuck Ward Air Force Research13 Laboratory

The primary intent of the workshop13 is to promote communication between the groups being

funded under13 the Materials Genome Initiative that13 had significant13 interest13 in the management13 of13 materials data The greater awareness is expected to enhance the likelihood13 for success of the

individual13 efforts and speed13 the communityrsquos understanding of how best to13 proceed13 in13 developing an13 MII

The Materials Research Society (MRS) and the Minerals Metals and Materials Society (TMS)conducted a survey13 of the community13 in MayJune of 2013 to gauge13 attitudes and needs in materials data Over 650 people took the survey13 with 74 of respondents saying they would be willing to share

data if it were encouraged13 as a condition13 of funding or publishing The community showed13 the most interest in gaining access to databases of some of the most basic and essential13 materials data such asthermo-shy‐physical properties

The survey asked participants to evaluate advantages of local versus centralized databasesCentralized13 repositories were perceived13 by respondents to13 offer advantages in13 cost an13 ability to13 share

onersquos data provisioning of uniform data standards and13 an13 ability to13 organize and13 find13 data Localrepositories were perceived to provide advantages in ease of data input time efficiency and flexibility inaccommodating new data13 types

The survey also asked participants to provide their thoughts on impediments to and motivators for13 sharing data Most13 of13 the impediments can be attributed13 to13 concerns of ownership13 of data orexternal data13 restrictions while technical issues such as file size and proprietary formatting were13 also

concerns Overall participants saw the ability to contribute data to the community for reuse and

critique positively The highest motivator was13 to enhance onersquos13 own research visibility Interestingly

respondents saw adding metadata as a motivator presumably due to13 the perceived13 higher value13 of data13 when it is13 described more robustly

6

13

MGI Efforts in Infrastructure Development13 at NISTJim Warren National Institute of13 Standards and Technology

The presentation provided an overview of NIST activities supporting the creation13 of a materials data infrastructure The infrastructure could13 start by defining the interfacesAPIs (Application Programing Interfaces) with which user is able13 to interact for13 the deposit13 or13 retrieval of13 data Beyond13 an interface the following are essential elements for creation of viable infrastructure

bull The establishment of repositories to13 store the databull Tools to enable digital capture of the information preferably at the time of creation

(computation or13 experiment) bull Tools to enable markup of the data13 with sufficient metadata13 to inform someone else how the

data was created as well as various attributes that would help the user judge the quality of theinformation

bull Assignment of a persistent digital identifier (like the DOI (Digital Object13 Identifier) for13 journals)13 so the data can be cited and discovered by other

bull The registration of the availability of the data into13 some sort of ldquoregistryrdquo to13 enable discovery without prior knowledge of the existence of the repositoryspecific data features

Additionally creation of new policy and13 standards as well as development of tools to13 evaluate the quality of13 the data and tools to13 enable data-shy‐driven13 discovery will be required to manage and use materials data to the greatest extent possible

NIST is responding to the OSTP (Office of13 Science and Technology and Policy) memos (Dr John Holdren 2 Feb 14) by developing practices that will be of use to13 the community including a Common13 Access Platform allowing access across multiple collections of information13 Research13 and13 Development of Persistent Identifier (PID) infrastructure with13 the Research Data Alliance (RDA) and RampD o Data Type Registries with13 RDA

Finally NIST is collaborating with the National Data Service (based at the National Center for Supercomputing Applications) and is developing a set of best practices for Data Management Plans

7

13

Th Materials Commons A Novel Information13 Repository and13 Collaboration13 Platform for the13 Materials CommunityBrian13 Puchala Glenn13 Tarcea Stravya13 Tamma Emmanuelle Marquis John13 Allison13 University of Michigan

Critical to13 accelerating the pace of materials science and13 development is the development ofnew and13 improved13 infrastructure providing a seamless way for materials researchers to13 share and13 use

materials data and models To address this need we are developing the Materials Commons (httpprismsenginumicheduprismsmaterialscommons)13 an information repository andcollaboration platform for the metals13 community13 in selected technical emphasis13 areas We envision the

Materials Commons becoming a continuous seamless part of the scientific workflow13 processResearchers will upload13 the results of experiments and13 computations as they are performedautomatically where13 possible along with the13 provenance13 information describing the13 experimental and

computational processes By13 associating this13 data with the experimental and virtual computational materials samples from13 which it is obtained the Materials Commons will build process-shy‐structure-shy‐property relationships enabling the construction13 and13 validation13 of constitutive and13 process models The

Materials Commons website provides an easy-shy‐to-shy‐use interface for uploading and13 downloading data and13 data provenance searching and sharing data At its core the13 Materials Commons consists of secure13 data storage cluster an13 application13 for efficiently uploading and downloading large data sets and a REST

(REpresentational State Transfer) based13 API to13 access and13 extend13 the capabilities of the repository TheAPI allows for features such13 as automated13 data upload13 from experiments and13 computations seamless integration of13 computational models and algorithmic analysis of13 process-shy‐structure-shy‐propertyrelationships The Materials Commons is a central thrust13 of13 the Center13 for13 PRedictive Structural Materials Science (PRISMS)

NIST Materials Data Informatics EffortsCarrie Campbell National Institute of Standards an Technology

The presentation covered NISTrsquos pilot efforts in13 creating a data infrastructure focusing on Phase-shy‐Based13 Property Data Phase-shy‐based data13 are13 diverse13 data13 that13 include 1 2 3 and 4-shy‐dimensional13 data are13 semi-shy‐structured and often include incomplete data sets The figure below provides anoverview of NISTs data efforts supporting the MGI

8

13

NIST Materials Data Repository (httpsmaterialsdatanistgovdspacexmlui) is a customizedDSpace repository for materials that enables sharing of variety of data types including text13 files images and video The repository also provides persistent identifier for each entry and allows users to choose

from a variety of13 license types NIST is also working on two data curation tools for phase-shy‐based13 dataThe Thermodynamic Research13 Center (httptrcnistgov) is extending the ThermoML markup languageand Guide13 Data13 Capture13 software13 tool to metallic systems13 with an initial emphasis on phase equilibriaand thermochemical data The13 Materials Data13 Curation System (MDCS)13 employs user-shy‐defined13 XML-shy‐based13 schemas to curate13 data The13 MDCS13 allows either web-shy‐base data curation13 or using REST API The

entered data13 is stored in a non-shy‐relational database repository can be searched using SPARQL (SPARQL Protocol and RDF Query Language) interface13 and integrated with variety of scientific workflow13 tools13

AFLOW ConsortiumMarco Fornari Michigan13 State University

The overall goal of the Materials Genome Initiative (MGI) is to accelerate the discovery13 development and13 deployment of new materials with ad hoc functionalities The AFLOW Consortium

focuses on the software infrastructures to achieve the MGI goals by using first13 principles calculations The emphasis is on the data13 that must be generated archived validated and post-shy‐processed13 efficiently13 both13 to13 identify new materials and13 distill meaningful quantitative relationships within13 the data This extends the13 solution to the13 inverse13 problem of chemically replacing13 rare13 and costly elements in criticaltechnologies

The AFLOW Consortium provides access to the repository AFLOWLIBORG

lthttpAFLOWLIBORGgt13 that publicly distributes data13 on energetics and thermodynamic stability for more than 600000 compounds and band structures for roughly 300000 materials The repository

includes bulk crystals and alloys13 Data were computed with the AFLOW high-shy‐throughput13 framework

using density functional theory The results of the calculations can13 be queried13 with13 the recentlydeveloped13 REST-shy‐API that standardizes and13 labels the data for retrieval purposes The13 fields defined in

the AFLOW REST-shy‐API include a material object identifier (AUID) data cloud13 location13 (AURL) ownership13 9

13

and sponsorship keywords and data13 specific labels (COMPOSITION ENTROPY EGAP hellip) Simple13 scripting languages13 can be used to automatically download13 from the AFLOWLIB13 repository The AFLOW

approach has13 been extensively applied on Platinum-shy‐group alloys thermoelectric materials magnetic materials topological insulators scintillators etc The AFLOW REST-shy‐API can13 be extended13 to13 includeexperimental data For further details on the13 database the13 reader is directed tohttpmaterialsdukeeduauroAUROARTICULAjcommatsci201202002pdf

Materials Innovation Infrastructure Lessons Learned and Future PlansSurya Kalidindi Georgia Institute13 of Technology

central activity in13 materials research13 is the exploration of processing-shy‐structure-shy‐property (PSP) linkages13 Workflow is defined13 as the sequence13 of steps employed for establishing13 PS linkages ofinterest to any specific engineeringtechnology application

High Throughput Explorations of PSP Linkages13 We must treat13 hierarchical13 material as acomplex System which means embracing and exploiting uncertainty quantification13 and13 complexity

management conceptstools13 It also means we need to design develop and13 validate decision support systems13 that will leverage the best current understanding (with13 its uncertainties) and13 provide objective13 guidance13 on future13 effort investment (combination of experiments and simulations)13 This meansleveraging Data Science to effectively feed the decision support systems described above by developingand deploying high13 throughput strategies for obtaining high value information from both experimentsand models

What are integrated13 workflows Integrated workflows are those that utilize the best13 combination of experiments13 and simulations13 in extracting robust and reliable PSP linkages13 This requiresengaging and exploiting cross-shy‐disciplinary expertise that includes materials science manufacturingsystems approaches uncertainty quantification computational science data and information sciencesThrough this one must ensure that the workflows output the critical information needed13 by design13 and13 manufacturing stakeholders in the materials development value chain13 Integration demands expertise

sharing13 which essentially is the core element of a collaboration Cross-shy‐disciplinary Integration13 demands13 effective13 collaborations13 such as those one might set up using the modern tools of the day13 ie a e-shy‐Collaboration

10

13

Motivators for e-shy‐Collaborations13 e-shy‐Collaborations are central to13 enhancing the productivity ofmaterials development activities in several ways

bull individual13 research groups can identify and engage13 complementary expertise13 needed for their projects

bull organizations can13 coordinate multiple teams and13 multiple projects to13 maximize outputimpactbull organizations can13 identify and13 seek the right partners that13 enhance their13 individual and combined

value e-shy‐Collaborations can13 be open semi-shy‐open or closed13 depending o the needs of the specific organization Productivity enhancement leading to dramatic reductions in cost and time13 expended in materials development is the13 main incentive13 for e-shy‐collaborations

Main Impediments to e-shy‐CollaborationsLack13 of tools to identify13 the13 ldquobestrdquo13 collaborators

bull Need software tools that quantify more precisely the expertise of individuals groupsorganizations

bull Need software tools for13 allowing rich annotations and e-shy‐recording of13 discussions Need analytic tools for13 identification extraction13 and management of metadata (ie13 high13 value abstractionsfrom the raw data)

bull Need searchable semantic metadata databases (these are13 lightweight and can be13 centralized

and customized for specific13 user communities) bull Need software tools for tracking provenance (ie13 whowhatwhenhow13 generated modified

etc) Lack13 of experience13 with the13 use13 of integrated workflows

bull Need software tools13 for e-shy‐recording of13 workflows and development of searchable workflow

databases bull Need recommendation schemes that suggest the best workflow(s) based on recorded prior

experience this provides means to transfer expertise13 acquired by an organization to other teams and other13 projects (even in the same organization)

bull Leads to possible automation and dramatic enhancement of productivity

MINED Grouprsquos Past Experience13 (~2005 -shy‐ ~2008) Funded by the Army Research Office ARO(David Stepp) engaged computer science (CS) graduates to produce13 a single13 user platform that13 integrated certain software codes previously produced13 by our group software development subsequently abandoned but recruited two CS graduates into PhD programs in Materials Science andEngineeringMechanical Engineering (MSEME)13 which dramatically altered the direction of13 research in

the group brief chronology of development bull Senior Design Project (2011-shy‐2012) An online13 portal designed for sharing data13 files and tools

learned the importance of user interfaces and the need for professional13 code developers andsoftware architects

bull HUBzero Instantiation (2012-shy‐2013) Built hub by customizing HUBzero for hierarchical structuralmaterials there are several advantages in terms of accessing and executing codes13 (on the server side) but there are significant barriers13 to establishing fast and productive13 cross-shy‐disciplinary e-shy‐collaborations

11

13

bull GTRI Integration Platform (2013-shy‐2014) Engaged professional code13 developers and software13 architects to design13 a multi-shy‐user web13 portal that allows data and13 code sharing while tracking

provenance and workflows underestimated the resources needed for such a project bull MATIN Collaboration Environment (2013-shy‐2014) Lightweight integrated software13 shell that allows

users to13 seamlessly interface with13 numerous cloud13 services offered13 through13 the web13 through13 their13 APIs Examples include GitHub for versioning and archival of codes Authorea forcollaborative editing of documents figshare for citable publication of research Plotly forcollaborative data analysis13 and visualization Google+ and LinkedIn for eteaming13 and networking MATIN focuses on building an emergent community ndash Materials amp Manufacturing Informatics (MMI) MATIN provisions web services for13 software integration platforms and innovationinfrastructure Examples of MATIN13

bull Spatial Statistics httpsgithubcomtonyfastSpatialStatisticsFFT

bull PyMKS httpopenmaterialsgithubiopymksindexhtml bull Example discussion about periodic boundary conditions in running Finite Element (FE or FEM)

simulations httpsgithubcomsfepysfepyissues267

In the platform developed it is noted that the user or userrsquos organization can set permissions at the

level13 of individual datasetscodes o the degree and13 type of13 sharingCentral Challenge Activation The transition from traditional disciplinary workflows to the

integrated workflows needs to overcome an activation barrier There is no13 clear incentive to13 get ahead13 of the curve in13 anticipating how the upcoming transformations in how materials development will be

pursued13 in13 the future Open (reproducible) science alone is not a sufficiently strong motivator for materials development community with13 strong interests in intellectual13 property development13 indeed itmight actually13 be perceived13 as a negative There is need to explore13 and refine13 what incentives work for individual13 researchers (eg increased productivity)

Future13 Plans13 Continue to13 develop13 and13 deploy MATIN at GT and any other organizationsgroups that13 express an interest in working with us Design create and distribute13 suite13 of ldquomobile13 firstrdquo Appsdesigned13 to13 make data ingest into13 collaboration environments such as MATIN from both material characterization equipment and multi-shy‐scale models an enhanced13 user experience with13 friendly andnatural interfaces Proactively promoteincentivize13 datametadata13 ingest into publicly shareddatabases Work with13 professional societies to13 announce and run new competitionsawards for13 recognizing and rewarding good practices by the members of the broader materials innovation13 usercommunity (eg Best13 Documented Data Award Best Documented Tool Award Best Documented

Workflow Award) Develop and deliver new workshops for training the new generation of13 graduate

students postdocs and junior faculty in the13 emerging workflows of Materials amp ManufacturingInformatics

High Performance Cast Aluminum Alloys for Next Generation Passenger Vehicle EnginesDongwon Shin Materials Science and Technology Division Oak Ridge National Laboratory

12

13

Oak Ridge National Laboratory (ORNL) teamed with Chrysler and Nemak13 to develop next-shy‐generation high-shy‐performance cast aluminum alloys that have13 improved castability high-shy‐temperature

strength and fatigue13 performance At this point the13 data management practice is limited13 to13 analytical data spreadsheets other than13 the CALPHAD computational thermodynamic databases and13 proprietary

property databases for casting simulation As the project goes on more robust data schema will bedeveloped13 to13 store various types of data populated13 from the experiments and13 calculations

Besides the data structure design the speaker discussed13 the necessity of a new approach13 to13 identifying key descriptors from the large dataset that can guide the design of new materials An alloydesign13 requires the balance of extremely complex relationships among a large number of descriptors Some13 examples of these13 descriptors include13 phase fraction13 of key intermetallics lattice parameters ofthe matrix and diffusion13 kinetics13 of solute atoms However optimizing these quantities relies on good

physical metallurgy intuition13 coupled13 with13 clever experiments dogged13 sleuthing and13 a healthy dose ofluck13 Such descriptor-shy‐based predictions of materials properties date back to13 the beginning13 of modern

solid-shy‐state chemistry and physics As13 shown in the Darken-shy‐Gurry map to predict the solubility ofelements in Tantalum13 intuitive descriptors13 such as electronegativity and radius13 could be used formaterials design However as the chemistry and physics13 of materials become more complex intuition-shy‐based13 search13 may not work anymore Thus the use of more advanced13 statistical analysis and13 datamining approach might be required to accelerate the design of complex multi-shy‐componentmulti-shy‐phase

alloy design13

Data Related to Cast Aluminum

Jake Zindel Ford Motor Company

The main focus of our research is the development of CAE (Computer-shy‐aided Engineering) tools to support13 two significant13 activities in the company component13 design (product13 development)13 and

manufacturing engineering (high-shy‐volume production) A key13 part of our research is conducting13 experiments to collect data13 for the13 development and validation of13 the CAE tools

The data13 we collect consists of mechanical properties physical properties microstructuralimages and manufacturing process parameters that form the13 pedigree13 of the13 material This information

can be in the form of Excel files proprietary-shy‐software storage files screen shots and written reports Our immediate need is to deal with this large amount of existing information collected over

many years on a ldquolocalrdquo level The first step is to organize many independent collections of informationin all13 their forms13 The next issue is the time required to enter all13 this information into the database13 Maintenance may also require a significant amount of effort Lastly there will also13 be a need13 to13 manipulate the data contained in the database using Excel Mini-shy‐tab MATLAB etc Therefore links or13 easy exportation of data13 to these13 types of applications will be13 necessary

The next level would involve dealing with information that has been created or collected byothers From an13 OEM (Original Equipment13 Manufacturer)13 perspective this would13 involve convincing

suppliers13 to share processing information that they may consider proprietary In addition to suppliersuniversities and13 national laboratories are also13 sources of data Data from all the13 sources will require13 some sort of protocol to determine its13 quality and resolve inconsistencies

13

13

In the end13 it seems likely that considerable budget and time will be required to implement comprehensive informationdatabases which13 could be the largest hurdle13 of them all

QuesTekrsquos MGIICME Approach to Alloy Optimization and Design (Including Cast Iron) Jason Sebastian QuesTek Innovations LLC

Integrated Computational Materials Engineering (ICME) methods involve the holistic application13 of different computational models across various length13 scales to13 the design development and13 rapid13 qualification13 of advanced13 materials Using itsMaterials by Design technologies QuesTek has successfully applied these ICME methods to13 the design development13 and full qualification of several advanced alloys (including aerospace13 structural steels aerospace13 gear steels advanced castable13 titanium alloys high-shy‐strength corrosion-shy‐resistant13 aluminum alloys13 and high-shy‐performance cast iron)

QuesTekrsquos ICME methods benefit tremendously from the activities of the US Materials GenomeInitiative (MGI)13 Recognizing that the ldquo10 to13 2 yearsrdquo that13 it takes for alloy design and development ishindering the rapid13 deployment of higher-shy‐performance materials13 the MGI is focused on ldquocutting in halfrdquo the time and cost13 to design and deploy novel advanced materials needed to enhance and sustain UScompetitiveness

QuesTek also has ongoing13 activities on the13 application of ICME methods to the13 design and

development of materials for the energy sector including advanced13 materials that enable vehicle

lightweighting13 Of particular note is an ongoing effort with a major diesel13 engine manufacturer on the13 development of high-shy‐performance cast iron13 alloys for engine block applications QuesTekrsquos efforts have

made extensive use of various MGI-shy‐type tools and databases including 1)13 density functional theory

methods for high-shy‐throughput13 screening of cast iron inoculant stabilities (structure lattice parameterstability of various13 oxidesulfide phases) 2)13 liquid iron mobility databases (to help accurately modelsolidification phenomena) and 3) thermodynamic13 databases13 (to examine nano-shy‐precipitate stability)

Moving forward to continue to accelerate the materials development cycle efforts shouldremain focused on the development13 of13 the fundamental databases (thermodynamic kinetic etc)13 that13 enable13 ICME models There13 should also be13 increased focus o efforts related13 to13 the integration13 of the

various computational databases models and tools (the ldquoIrdquo13 in ICME)

Integrated Computational13 Materials Engineering (ICME) of Generation Three Advanced High Strength

Steels Louis G Hector Jr General Motors RampD Center

While traditional material development methods often involve laborious trial-shy‐and-shy‐error testing Integrated Computational13 Materials Engineering (ICME) which falls under the umbrella of the Materials Genome Initiative is offering a means13 for materials13 development under a more compressed time scale Advanced13 high13 strength13 steels (AHSS) which13 contain13 multiple phases (eg ferrite bainite pearlite austenite martensite) and may exhibit13 phase transformation with strain are fertile ground for13 ICME However prediction of macro-shy‐scale constitutive behavior based upon the multi-shy‐scale physical chemical and mechanical phenomena13 in these13 complex microstructures is formidable challenge for ICMEUnprecedented collaboration between universities industry and government labs will be required to

14

13

address AHSS13 development archiving of data and the13 implementation of these13 materials into products that13 benefit13 the American consumer This presentation began with a brief overview of a newDOEUSAMP (US Automotive Materials Partnership)-shy‐funded ICME program aimed at13 the development13 of Generation13 Three (Gen13 3) AHSS The lack of commercial Gen13 3 steels that meet DOE targets is offering13 significant challenges13 to this13 new ICME program The program is13 meeting these challenges13 following four thrusts (1)13 making Gen 3 steels to meet DOE targets (2) passing experimental measurements of key

properties to13 an13 ICME model that produces microstructure-shy‐based13 constitutive models (3) materialevaluation in forming13 and vehicle13 performance13 simulations (4) archiving13 all relevant data13 for easy use13 by

others in13 the program Although13 the Gen13 3 AHSS ICME program is ldquomission13 orientedrdquo it demonstrates an important role for universities in providing the necessary foundation in fundamental materialsscience and engineering

Integrated Computational13 Methods for Composites Materials (ICM2)13 Lessons Learned from

Integration Planning and Feasibility Demonstrations Lara Liou GE Aviation

General Electric Aviation and Lockheed Martin Aeronautics have teamed through an Air Force

Research13 Lab13 program to13 conduct a demonstration13 of an13 Integrated13 Computational Materials Engineering digital framework that links material processing property and13 structural relationships to13 account for processability manufacturability and system performance13 with the13 goal of demonstrating

that13 the usage of13 integrated models can contribute to future airframe and engine designs with dramatic reductions in development13 time and cost Both engine and airframe Foundational Engineering Problems will be demonstrated through a common digital framework to be developed and validated onincreasingly complex articles13 The modeling tools that13 cover composite process models localized13 mechanical behavior models and global design models (Convergent Manufacturing TechnologyrsquosCOMPRO CCA Autodeskrsquos Autodesk Simulation13 Composite Analysis and13 an13 Abaqus based13 University ofMichigan micromechanics model) are centrally linked via a commercially available model integration

interface (Phoenix Integrationrsquos ModelCenterreg)13 Detailed plans of what data will13 be passed among thevarious models as well as how it will be passed have been completed Creation of these detailed13 process maps and13 simple feasibility demonstrations have raised13 inherent technical and13 logistical challenges13 to integrating commercially13 available multi-shy‐scale models Robust model integration methods13 to address these challenges were13 discussed

The exercise of drawing up detailed process maps for what to integrate and how to integrate13 the selected tools and models resulted in key early lessons learned and identification of13 tools critical tolong-shy‐term robust and sustained application of13 Integrated Computational Methods for Composites Materials (ICM2)13 The key lessons learned include

bull Individual13 models must be relatively mature and validated before an ICM2 problem can be

confidently13 defined bull F analysis-shy‐based13 tools within13 the same digital thread should utilize13 the13 same13 FE solver bull The team ideally13 should consist of at least one expert for each13 tool and13 its underlying model as

well as an expert in the digital thread and integration interface tools and programming

15

13

bull Mesh interpolation between13 finite element analysis based13 tools will likely be required and13 acommercially13 available tool for this13 type of composite mesh interpolation is13 key13 for robust and

sustained application of ICM2

Residual Stress Engineering of Rotating Nickel Components Terry Wong13 Aerojet Rocketdyne

The goal for the PW-shy‐8 project13 The Nickel Base Superalloy Residual Stress Foundational Engineering Problem program aims to13 develop13 a multi-shy‐disciplinary analysis approach13 to13 predict and13 incorporate bulk residual13 stress in the13 design of component This will allow for design of part that is optimized for a particular application or design intent (such13 as weight performance or reduction13 ofscrap) To accomplish this13 goal the PW-shy‐813 team has two major tasks 1) the13 development of an13 infrastructure and associated tools needed to predict and incorporate bulk residual13 stress in the designstages13 of a part and 2) a demonstration task that will validate the infrastructures13 and tools13 developed

The Data13 Management task is the foundational element13 of13 the larger13 infrastructure

development task The goal for the data management work is to13 develop13 an13 infrastructure that allowsthe various team members in PW-shy‐813 to share13 data13 and to store13 data13 in database This not only ensures that data generated13 by this program is never lost but also13 allows for data to13 be transferred13 in13 a secure

manner In order to meet the goal of a system13 that securely stores and shares data three major tasks must be accomplished 1) Validation that there13 is secure13 method to transfer data13 between team

members and that the method of storing data is also secure 2)13 development of13 a database schema and

3) development of means to easily upload data13 into the13 database13 and download data13 from the13 database

Of the three the most difficult task was to create a system in which the transfer and storage ofdata was secure Various proposals were made to13 ensure the security of the data but many of them

were rejected not because the proposal was actually unsecure but because it was not acceptable byeach of the13 team memberrsquos IT security policies The13 process to get each team membersrsquo IT department to approve the security of13 the data transfer13 and storage took more effort13 than anticipated In the endthe team members along with their13 respective IT departments approved the use of13 TRUcentrix as a tool to transfer13 data and the use of13 an ASM International13 hosted13 database to13 store data

The development of database schema13 is necessary step in creating the database to store all the data generated (via physical tests as well as data generated by models) A big help in creating the

database schema was to13 use the schema generated13 by Dr Steven13 Arnold13 from NASArsquos Glenn13 Research13 Center Using NASArsquos schema eliminated13 the need13 to13 start the database build13 from scratch13 and13 allowed13 us to13 focus o the specific needs of storing residual stress data

The last task involves the development of tools that make the input of data13 easy The moredifficult it is to13 enter data into13 a database the more likely data13 will not be13 entered Excel-shy‐based13 templates are being developed in order13 to make data entry as easy as possible

The two big lessons learned from PW-shy‐813 in relationship to data13 management are

bull It is essential13 to work with each13 companiesrsquo IT and IT Security people to develop a system thatsatisfies13 the security requirements of each company

16

13

bull Database schema development should not be reinvented but begin with already established the13 schema

Perspectives on MAI ProgramsDr Mike Glavicic Rolls Royce

The current formulation of MAI (AFRLrsquos Metals Affordability Initiative) programs are structured13 independently from one another in that a standard methodology does not exist that describes whatdata is to13 be archived13 at the conclusion13 of a program Nor does there exist a standard13 protocol of howto archive things on a server13 or some other computer resource at the conclusion13 of a program As aresult the current13 practices of13 MAI programs

bull Do not address IP (intellectual property)13 and licensing issues in systematic fashion which leadsto issues in the commercialization of ICME software that is generated13 under a program

bull Do not have a formalized requirement of how data is to be packaged and stored at the end of aprogram Hence even13 in13 the best of cases in13 which13 all of the data is stored the data is archived13 in13 simple folder13 fashion that13 is not13 searchable in any manner13 possible Moreover even if13 all of13 the

data is present it is extremely cumbersome to13 untangle the data for future use using a file folder strategy

Hence at the end of the program both the government and13 MAI team are13 left to manage13 the13 data and13 results in13 an13 unregimented13 fashion

The lack of regimented protocol of how to package programrsquos13 results13 (raw data and reports) makes it extremely difficult for new government programs to leverage the data from previous programs Case in13 point the RR-shy‐1013 program was developed with the13 singular goal to integrate13 the13 models developed13 in13 the LAD-shy‐213 program into commercial software13 platform From the13 outset of the13 program data models and13 other information13 that was required to attained the13 goals of the13 RR-shy‐1013 program were13 either difficult to track down lost or not provided to the13 RR-shy‐1013 team due13 to contractual disputes As aresult funding that13 was earmarked for13 specific technical work at13 the conception of13 the13 program had to

be diverted13 for the recreation13 of data that existed13 in13 a previous program and13 the progress of the

program was stunted13 by this unforeseen13 burden In13 spite of the RR-shy‐1013 programrsquos13 ability to overcome

these issues and become a successful program it13 should be noted that13 the success of13 other13 programs that13 will encounter13 similar13 problems will likely not13 apply as RR-shy‐1013 was very fortunate13 to have13 the13 means to overcome these serious issues that13 were a result13 of13 the lack of13 a data management13 plan

Furthermore if the13 MGI initiative13 is to progressively develop materials models that are13 leveraged and improved in future government funded programs a policy on how to store and share thedata from disconnected13 programs is required The PW-shy‐1113 program is the first13 step in this process as it13 will have the data from three MAI programs (RR-shy‐10 GE-shy‐10 PW-shy‐9) uploaded into database13 that will be13 searchable and will enable the reuse of the data collected in these programs However even if this13 program is a success13 there will undoubtedly13 be other features13 that will be required in the platform

developed13 and13 hence future modification13 will be required Most importantly there does not currently13 exist server that has been set up and funded for the13 archiving13 of all13 future programs with a protocol13

17

13

13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

that13 must13 be adhered to at13 the conclusion of13 a funded program Without13 such a protocol and the

existence13 of server to be13 used in all future13 programs the13 goals set forth by MGI will be13 unnecessarily

difficult and13 more expensive to attain

MAI Data Informatics Data Standards Modeling amp UQDr Rajiv Naik Pratt amp Whitney

The MAI Data13 Informatics Data13 Standards Modeling amp Uncertainty Quantification (UQ) (PW-shy‐11)13 effort13 will enable the pervasive implementation of ICME by developing (i) a web-shy‐based13 OEM-shy‐Suppliercollaboration framework13 for production13 certification13 and13 process modeling data (ii) data standards and

protocols for interoperability between13 OEMs and13 Suppliers and13 (iii)13 UQ methods to capture variability in

process microstructure and property data Models and data13 from earlier MAI programs such as GE-shy‐10 PW-shy‐913 and RR-shy‐1013 will be13 used to develop and implement OEM-shy‐Supplier integration data standards data

structures and UQ models Several OEMs and suppliers (Pratt amp Whitney General Electric Rolls-shy‐RoyceTIMET Alcoa13 and RTI International Metals Inc) are working together on the PW-shy‐11 Team

The OEM-shy‐Supplier collaboration tool will enable data13 exchange related to mechanical testingmicrostructural13 characteristics processing and process modeling The13 data13 standards developed by the

PW-shy‐1113 team will provide13 an industry standard for such OEM-shy‐Supplier collaboration The collaboration13 framework would provide a platform to manage OEM-shy‐Supplier13 workflows data and13 informationsecurity and access13 control and the13 ability to perform system administration without the13 need13 for day-shy‐to-shy‐day intervention13 of the IT department The capture of such13 supplier data over the collaboration

framework will enable the OEMs to enhance process and property13 models data analytics and UQ

modeling Three potential technologies will be evaluated13 for the OEM-shy‐Supplier Collaboration13 Frameworkndash GE Forge Vanderbilt Forge and Semantic Web technology and one13 technology will be selected13 for the

next phase of the program

18

13

Semantic Web technology is paradigm shift from highly structured13 databases to13 more flexible

graph databases from single database searches to13 multiple and13 disparate13 database13 searches and from

keyword searches to contextual and meaningful (semantic) searches It can provide flexibility13 of data

structures and enhanced OEM-shy‐Supplier13 interoperability However it requires more work upfront13 for13 implementation as it needs the development of a common vocabulary and relationships

The data13 standards and data13 structures effort will leverage work at national and13 international standards13 organizations such13 as ASM ASTM CEN and13 NIST and13 develop13 standards and vocabularies forproduction13 certification data that can include chemistry microstructure and mechanical properties This effort will also develop means for the capture of data needed13 for the UQ and13 ICME models andbuild13 databases for13 the capture of13 the data from the GE-shy‐10 PW-shy‐9 and RR-shy‐10 MAI programs

The ICME13 models being developed under the GE-shy‐10 PW-shy‐9 and RR-shy‐10 programs involve severalinterconnected models related to processing microstructure and mechanical properties The UQ effort under PW-shy‐11 will involve considering the uncertainty at each13 step13 of the modeling process and flowing it through the interconnected models to evaluate the relationships between13 the variabilities of13 the

processing parameters the microstructure and13 the mechanical performance

ASMrsquos Computational Materials Data Network amp the NIST-shy‐ASM Structural Materials Data

Demonstration ProjectScott D Henry ASM International

ASM International ndash the worldrsquos largest13 association of13 metals-shy‐focused materials engineers scientists technicians educators and students13 ndash has launched13 a series of initiatives under the

ldquoComputational Materials Data (CMD)13 Networkrdquo umbrella The objective13 of the13 CMD Network is to serve13 as center for information collection and dissemination for materials data13 to support integrated

computational materials13 engineering (ICME) and to help realize the goals13 of the US Materials13 Genome

InitiativeThe first major project for the CMD Network is the NIST-shy‐ASM Structural Materials Data

Demonstration Project (SMDDP) a cooperative research project funded by NIST with participation from

the Kent13 State Center13 for13 Materials Informatics and Granta Design Ltd13 The main13 objectives of SMDDP

are13 to

19

13

bull Establish well-shy‐pedigreed13 and13 curated13 demonstration13 datasets for non-shy‐proprietary metallic structural materials13 data over multiple length scales The team has13 chosen to work with 6061

aluminum and the13 related Al-shy‐Mg-shy‐Si system

bull Work with NIST and the materials data community to develop materials data schema and

ontologies for the demonstration13 datasets cognizant of broader interests and13 datasets

bull Develop and carry out a series of test problems that represent relevant use cases for the13 repository

bull Make data open to the materials data community for use in data analytics modeling and

educational purposes

bull Actively engage the materials data community and13 widely disseminate the findings from the

project

bull Develop and implement data13 capture13 and curation procedures that can serve13 as models for otherdata repositories

The basic philosophy behind the project is that often standards13 emerge through trial and error and examples are13 needed to spur this process

The project is using the NIST-shy‐KSU DSpace13 repository for data13 files The13 curated structured

database is being developed13 using GRANTA13 MI software and13 is hosted13 o ASM serversThe project ndashlaunched in late 2013 and planned as an 18-shy‐month program13 ndash is being conducted in

phases The13 first phase13 involved developing starter database13 and gaining experience13 in data13 curation The second phase (currently underway) involves building out more comprehensive data13 sets that will be

suitable for carrying out test problems13 related to processing-shy‐structure-shy‐property relationships in13 Al 6061 The third phase will involve making further refinements in response to testing The complete data13 set will be opened up online to the materials community as a whole

Globus13 Scientific13 Data Services13 Current and FutureBen13 Blaiszik1 Kyle Chard1 Jim Pruyne1 Rachana Anathakrishnan1 Steve Tuecke12 Ian Foster12

1University of Chicago Computation Institute 2Argonne National Laboratory MCS Division

During this talk we discussed a variety of currently available Globus services13 Via these Globusservices developers13 and IT administrators13 can offer simple secure and robust file transfer as13 well as13 identity profile and group management to their user communities13 Globus allows end users to create

and13 manage a unique identity that can13 be linked13 to13 external identities for authentication End13 users can13 also transfer files across wide13 area13 networks faster and more13 easily whatever their location

The talk also covered prototype services related to data13 publication metadata tagging and13 data

search Globus13 publishing capabilities13 are delivered through a hosted service Metadata is13 stored in the

cloud while raw published data and a fully-shy‐formed metadata log are stored on campus institutionaland group resources that13 are managed and operated by campus administrators Published datasets are

20

13

organized13 by communities and13 their member collections Globus users can13 create and13 manage theirown13 communities and13 collections through13 the data publication13 service13 with collection level policies regarding user13 access Additionally metadata from these communities are imported into a centralized

index to allow for robust searching functionality

Semantics13 for Deep13 InteroperabilitySamuel Chance iNovex

Semantic technology is based on mature13 web standards and intended to be13 used as global framework for13 machine-shy‐to-shy‐machine communication Interoperability is enabled even when the more

basic elements of the semantic technology stack are used13 in13 conjunction with a controlled vocabularyKnowledge13 representation is enabled as one13 begins to utilize13 the13 ldquohigherrdquo elements of the13 stack

brief history of the semantic web13 was presented13 and13 started13 with13 a depiction13 of the web13 that most people see today documents and web pages presented to the user13 through a web application It13 then depicted a transition to an intermediate state where documents and data can co-shy‐exist on asemantic13 web riding atop the same infrastructure (http) Even in this13 state the data can13 be directly

interpreted by a machine13 The history concluded with a depiction of data being extracted from adocument to13 become part of the web13 of data

The basics of semantic technology were presented and emphasized that the technology has afoundation based13 o formal logic that enables software13 to understand information and reduce13 the13 need

for13 a human to interpret13 the meaning This section of13 the presentation concluded with a chart13 depicting

how ldquotriplesrdquo link to13 each13 other to13 form a web13 of dataThe final section touched upon adoption of13 and things to consider13 when semantic technologies

are13 being considered as solution for community or enterprise

21

13

Big Data13 and Semantic TechnologiesHarlan Shober RJ Lee Group

The presentation started with a description of entities that13 comprise laboratoryinformatics with the central elements being a federated architecture scientific ldquoBig Datardquo management13 and workflow-shy‐driven Laboratory Information Management13 Systems (LIMS)To aid attendee understanding initial context13 was provided by first13 defining for ldquoBig Datardquo and

Semantic Technologies Context13 was further bolstered through a description of a problem that13 included laboratory testingcharacterization and fabrication processes The challenges for atypical laboratory include the use of many systems and the creation of large amounts of data13 that13 grows daily but13 cannot13 be readily accessed

After the context13 was established the needed functionality was described and consistedof several elements contribute index search analyze and archive While bearing in mind theneeded functionality a string of universal scientific data13 management13 process categories werethen presented with the intent13 of applying the universal description to the materials design anddevelopment13 domain

22

13

Though semantic technologies are applicable over the entire lifecycle of the data13 management13 process three categories were specifically identified as primary candidates fortheir application Data13 Acquisition Data13 Manipulation and Storage and Discovery andAdvisory W3C standards for Resource Description Framework (RDF) were applied to a notionalexample with the intent13 of completing a thread that13 gave the attendee a sense of how thistechnology could be applied

The presentation continued with a description of Postulate a specific solutiondeveloped by RJ Lee Group that13 could be used as part13 of a suite of tools to enable the captureof data13 and the near-shy‐frictionless tagging of data13 with additional observations provenanceinformation or links to additional data Using tools like Postulate repositories could be moreeasily shared with others in a federated manner These technologies would enable thedevelopment13 of indices that13 increase the precision of search results and ease analytics since thedata13 would be ldquoself-shy‐describedrdquo

In the remainder of the presentation Postulate functionality was further described andsome of the challenges presented earlier were addressed For example how do you processthe quantity and variety of data how do you store it how do you find the data13 you care aboutand how can the data13 be presented to you in a meaningful way

Natural13 Language Processing (NLP)Tim Hawes Decisive Analytics

The presentation provided the attendees with a description of NLP13 and how it can be13 used It started with examples13 of how it can be used to mine ldquoBig Datardquo to predict credit card fraud or assess13 user purchase histories for product development NLP is also13 being used13 in13 the field13 of biology and13 appears to have13 enjoyed some13 level of success

Since13 the13 biology and materials communities both publish large13 volume13 of documents (abstracts papers presentations etc) many of13 the NLP approaches developed for13 biology may be

applicable13 to materials The next section of the presentation focused on describing various aspects of NLP tokenization

part-shy‐of-shy‐speech tagging disambiguation parsing entity disambiguation and role13 labeling This was helpful in13 removing the mystery surrounding NLP for those13 who hadnrsquot had previous exposure

Typical products from NLP13 would be useful to the materials development community Examples include providing the gist of a documentrsquos content providing answers to natural13 language questions and knowledge13 construction

Specific NLP13 challenges for the13 materials domain were13 identified inline13 use13 of equations prevalence of tables graphs and13 images (more of a computer vision13 problem) and13 numerically driven13 information13 Additionally the ldquogenrerdquo problem was discussed as13 well as13 a way this might be addressed

using raw data (versus learning data) and13 semi-shy‐supervised learning active learning and distant supervision approaches

23

13

The presentation concluded by reiterating how NLP13 can be used and the value of using it describing some its limitations and13 identifying tool development needed13 for the materials domain

24

13

Community13 and13 Government Priorities in13 Addressing13 Materials Data

The following numbered13 priorities for13 both community and government13 actions were lsquorolled uprsquofrom workshop participantsrsquo inputs and were reviewed at13 the workshop The raw inputs provided by

participants are captured13 under each13 numbered13 priority and provide context to13 the priority and13 often13 more specific actions

Community-shy‐Led13 Priorities1 Develop13 and deploy13 standards13 for data and federatedcollaborative environments-shy‐ Promote13 standards for data13 and collaboration environments-shy‐ Designing and deploying suitable software environments that are capable of facilitating e-shy‐

recording and accelerating (fail often-shy‐fail fast)13 of13 the communityrsquos experience (both successes and

failures)13 in exploring integrated workflows needed to realize the vision of ICME and MGI-shy‐ Data13 standards (program managementtechnical datametadata)-shy‐ Community must start developing a common13 data format data package could13 look like an13

HDF5 file and be shared in a variety of systems -shy‐ Begin13 definition13 of standardized13 infrastructure

-shy‐ Produce13 draft guideline for the metadata including information regarding authorship provenance targethellip

-shy‐ Share13 the13 data13 types being collected with the13 community Include13 nametypeunits of data

-shy‐ Community must adopt a federated13 approach13 -shy‐ allow organizations to autonomously create their13 own13 toolsets but be able to13 readwrite to13 the common13 data package mentioned13 in13 point 1

-shy‐ Good documentation on repositories are important-shy‐ Address networking of repositories (lightweight metadata databaseindex) -shy‐ Digital heterostructure microstructure mapping algorithms for both scanning electron microscopy

(SEM)13 and transmission electron microscopy (TEM) -shy‐ Materials process model (thermodynamics) integration with ICMSE (Integrated Computational

Materials Science and Engineering) data flow

-shy‐ Data sharing protocol across users (government13 OEM13 developers) with appropriate securitycontrol

-shy‐ Work with instrumentequipment OEMs to get auto ingest capabilities and open architectures-shy‐ Work together to define common vocabularies for interoperability-shy‐ Standard tests for ICME13 model data13 standards for microstructure13 data

-shy‐ Standards for ICME13 model data13 storage13 and workflows 2 Communicate the value and need13 for digital materials13 data-shy‐ Brand13 management ndash consistent messaging of goals13 and beliefs -shy‐ Share13 success stories -shy‐ Value (ROI13 Return on Investment) o data capture (globally locally)-shy‐ Give incentive ($$$) for case studies to provide evidence that a shared data repository can be

actually useful beyond the archival purpose

25

13

-shy‐ Share13 ROIs for MII (the13 Materials Innovation Infrastructure)-shy‐ ICME projects due to their iterative nature take longer and are more expensive to realize benefits

in industry implementation13 Multi-shy‐year ICME programs are becoming13 a hard sell to management difficult to13 establish ROI and13 realize benefits claimed

3 Engage the community-shy‐ Write grant proposals-shy‐ Collaboration13 (or e-shy‐collaboration) science that will help us13 identify13 the specific13 incentives13 and

impediments for establishing mutually beneficial highly productive sharing of expertise between13 disciplinary experts in13 materials manufacturing and13 data sciences

-shy‐ Continue to13 work o hard13 problemsmdashdevelop13 open13 source codes discover new descriptors data

mining uncertainty analysis-shy‐ Organize as a community (governmentindustryacademia nationalinternational) -shy‐ Follow data13 recommendations in MGI strategic planmdashworkshops to develop roadmap-shy‐ Attend13 more events like the MGI workshop13 mdash gaining13 lessons learned from other organizations is

extremely valuable

4 ldquoDefine critical datardquo13 to be13 compiled-shy‐ Consensus o what fundamental genomic data is most-shy‐important to compile (thermodynamics13

kinetics13 properties [strength]13 high-shy‐throughput13 DFT) -shy‐ Agree to13 the concept of a ldquouniversal repositoryrdquo for all genomic data that everyone will contribute

to and use (cf everyone maintaining and hosting their13 own databases) this includes issues of13 data

security and sharing etc -shy‐ List of most-shy‐important ICMEMGI13 materials development activities from an industrycommercial13

perspective (alloys composites piezoelectrics etc)-shy‐ Some13 ongoing repository efforts seem redundant We13 should avoid unnecessary competition and

try not13 to step o other peoples toes5 Explore and leverage data solutions developed13 outside materials13 community-shy‐ Semantic web infrastructure13 ndash evaluate13 feasibility of lsquounstructured materials datardquo management

(Technical pathcosttimeframe) -shy‐ Keep communicating outside13 the13 community about solutions -shy‐ Focus on available13 tools open ones when possible

6 Train13 students-shy‐ Sponsor students13 who can think in terms13 of IT and MSE

-shy‐ Training workshops (through professional societies) for training the new generation of materials students13 and researchers13 in the use of readily (already) accessible data science tools13 and cloud

services13 to make13 the13 best use13 of this emerging13 infrastructure13 in enhancing13 their own individual day-shy‐to-shy‐day productivity

7 Establish community13 norms13 for publishing materials13 data-shy‐ Involve publishers by requiring data when publishing Begin by making just the data available first

and then work on standardization panel

26

13

-shy‐ Data associated with publications should be shared along with the publications Not only the

metadata (eg average grain size vs cooling rate) but also the source of the data (eg micrographs used13 to13 generate13 grain size temperaturetime charts13 used to calculate cooling rates etc)

Government-shy‐Led13 Priorities1 Lead13 data strategyapproach13 development-shy‐ Hold meeting like this one in 6-shy‐1213 months with purpose13 of developing roadmap for MII-shy‐ Define a technical roadmap for13 the materials data infrastructure

-shy‐ Governmentrsquos got to decide on a strategy-shy‐ Whatrsquos the critical nucleus eg getting13 CALPHAD off the13 ground

-shy‐ Government must establish an official stance on security and infrastructure-shy‐ List of most-shy‐important ICMEMGI materials development activities from a government perspective

(alloys composites piezoelectrics etc)13 -shy‐ Needs assessment for professional workforce development-shy‐ Work with industry to make sure the research-shy‐to-shy‐engineering13 gap is filled What is the practical13 use

of the projects -shy‐ Provide13 the13 materials community organizing function

-shy‐ Business case for productivity improvement with13 data readily available

-shy‐ Facilitator as honest broker of capability awareness and data13 sharing

-shy‐ Interdisciplinary coordinated effort13 to avoidminimize duplication and redundancy

2 Facilitate data deposition (policy amp technology)-shy‐ Setting expectations and guidelines for best practices and standards where13 relevant -shy‐ Invest in tool13 making data management as easy as possible-shy‐ Promote standards developmentcommon data structuresshare knowledge representations -shy‐ Learn about the WBC Semantic Web capabilities and limitations -shy‐ Government should foster community development for both data package refinement and creating

standard approach to creating an enterprise toolset -shy‐ Provide13 motivation for standard data13 collection ensure13 it includes pedigreeprovenancemeta-shy‐data

-shy‐ Facilitator for materials data13 standards (good and trustworthy data13 vs bad data) -shy‐ Facilitator for forming consortium for data13 mining data13 warehousing and sharing

-shy‐ Facilitate13 industry standards for ICME13 model validation and maturity levels-shy‐ Facilitate13 standards for model-shy‐based13 process and13 supplier qualification

3 Incentivize data deposition-shy‐ Provide13 motivation for13 standard data collection ensure it13 includes13 pedigreeprovenancemeta-shy‐data-shy‐ Push ldquodata13 management planrdquo requirements to ldquodata13 depositrdquo requirements -shy‐ Make data sharing a requirement -shy‐ Contractual requirements that preserve data for future use

-shy‐ Demand data submission as a deliverable that will be in the contract before paying the funds -shy‐ Require that any data generated13 by federal grants is provided13 to13 the government (or a designated13

repository)13 within 3 years after completionmdashstart with basic13 research

-shy‐ Demonstrate the return on investments in the materials data infrastructure

27

13

4 Support13 data sharing13 and deposition13 (provide resources)-shy‐ Increase of MGI13 funds dedicated to MII13 development to 10-shy‐-shy‐repositories amp collab platforms -shy‐ Commit to13 long-shy‐term support for data repositories amp collaborative platforms-shy‐ Funding for development collection of fundamental genomic data

-shy‐ Support thermodynamicskinetics database development for light-shy‐weight alloy systems-shy‐ Funding for universal data13 repository (eg at NIST) ndash creation and maintenance

-shy‐ Fund or establish data13 repositories with commonuniform data13 collection format -shy‐ Createidentify secure data sharing site (like AMRDEC)13 for13 government13 contractorsmdashnot a

repository

-shy‐ AFRL take lead13 o making a database that can13 be13 shared

-shy‐ Infrastructure for materials related semantic web applications5 Support13 data creation-shy‐ Support experimental measurement effort to provide13 quality data for13 database development -shy‐ Support more13 high-shy‐throughput13 (HT)13 studies13 for applied programs13 to populate data Current HT

approaches support are13 limited to too fundamental single-shy‐phase DFT calculations 6 Enhance data management13 competency-shy‐ Government needs to increase in-shy‐house expertise in13 these areas or alternatively work closer with13

groups like NIST We cannot allow other organizations to define our needs for us7 Provide quality datasets13 for model13 developmentvalidation-shy‐ Need for open model benchmark cases with data to be available to the community

28

13

Attendees

John Allison University of MichiganJohn Beatty Army Research Laboratory

Ben13 Blaiszik University of ChicagoANL Carrie Campbell NIST

Sam Chance13 InovexJennifer13 Fielding Air Force Research LabMarco Fornari Central Michigan

Tim Hawes Decisive Analytics Corp

Lou Hector General MotorsAndrea Helbach Air Force Research LabKris Hill ITI-shy‐GlobalMike Glavicic Rolls Royce

Ro Gorham NCDMM

Dennis Griffin NASA

Mike Groeber Air Force Research LabScott Henry ASM InternationalMatt Jacobsen Air Force Research LabWill Joost Dept of Energy-shy‐EERE

Surya13 Kalidindi Georgia Tech

Kevin Kendig Air Force Research LabBen13 Leever Air Force Research LabLara Liou GE Aviation

Kenny Lipkowitz Office of Naval Research

Bill Mullins Office of Naval Research

Benji Maruyama Air Force Research LabRajiv Naik Pratt amp WhitneyJim Neumann Honeywell Ruth13 Pachter Air Force Research LabClare Paul Air Force Research LabAmra Peles United13 Technologies Research13 CenterAndy Powell GE Aviation

Brian13 Puchala University MichiganAjit Roy Air Force Research LabAyman13 Salem Materials ResourcesTom Searles Materials Data Management Jason Sebastian QuestekRichard13 Serwecki NASA

Dongwon Shin Oak Ridge13 National Labo

29

13

Harlan Shober RJ LeeJeff13 Sheehy NASA

Jeff13 Simmons Air Force Research LabVeera Sundararaghavan University of MichiganSesh Tamirisakandala RTI Glenn Tarcea University of MichiganTJ Turner Air Force Research LabVasisht Venkatesh Pratt amp WhitneyChuck Ward Air Force Research LabJim Warren Natl Institute of Standards amp TechJeff13 Williams GE Aviation

Terry Wong Rockedyne

Wayne Ziegler Army Research13 Laboratory

Jake Zindel Ford Motor Company

30

Page 10: Materials Genome Initiative: materials data · universallyaccepted best practices. Under the MGI, Federal agencies have funded research and development designedtoexplore, develop,

13

Presentation13 Summaries

Welcome13 and Introduction Chuck Ward Air Force Research13 Laboratory

The primary intent of the workshop13 is to promote communication between the groups being

funded under13 the Materials Genome Initiative that13 had significant13 interest13 in the management13 of13 materials data The greater awareness is expected to enhance the likelihood13 for success of the

individual13 efforts and speed13 the communityrsquos understanding of how best to13 proceed13 in13 developing an13 MII

The Materials Research Society (MRS) and the Minerals Metals and Materials Society (TMS)conducted a survey13 of the community13 in MayJune of 2013 to gauge13 attitudes and needs in materials data Over 650 people took the survey13 with 74 of respondents saying they would be willing to share

data if it were encouraged13 as a condition13 of funding or publishing The community showed13 the most interest in gaining access to databases of some of the most basic and essential13 materials data such asthermo-shy‐physical properties

The survey asked participants to evaluate advantages of local versus centralized databasesCentralized13 repositories were perceived13 by respondents to13 offer advantages in13 cost an13 ability to13 share

onersquos data provisioning of uniform data standards and13 an13 ability to13 organize and13 find13 data Localrepositories were perceived to provide advantages in ease of data input time efficiency and flexibility inaccommodating new data13 types

The survey also asked participants to provide their thoughts on impediments to and motivators for13 sharing data Most13 of13 the impediments can be attributed13 to13 concerns of ownership13 of data orexternal data13 restrictions while technical issues such as file size and proprietary formatting were13 also

concerns Overall participants saw the ability to contribute data to the community for reuse and

critique positively The highest motivator was13 to enhance onersquos13 own research visibility Interestingly

respondents saw adding metadata as a motivator presumably due to13 the perceived13 higher value13 of data13 when it is13 described more robustly

6

13

MGI Efforts in Infrastructure Development13 at NISTJim Warren National Institute of13 Standards and Technology

The presentation provided an overview of NIST activities supporting the creation13 of a materials data infrastructure The infrastructure could13 start by defining the interfacesAPIs (Application Programing Interfaces) with which user is able13 to interact for13 the deposit13 or13 retrieval of13 data Beyond13 an interface the following are essential elements for creation of viable infrastructure

bull The establishment of repositories to13 store the databull Tools to enable digital capture of the information preferably at the time of creation

(computation or13 experiment) bull Tools to enable markup of the data13 with sufficient metadata13 to inform someone else how the

data was created as well as various attributes that would help the user judge the quality of theinformation

bull Assignment of a persistent digital identifier (like the DOI (Digital Object13 Identifier) for13 journals)13 so the data can be cited and discovered by other

bull The registration of the availability of the data into13 some sort of ldquoregistryrdquo to13 enable discovery without prior knowledge of the existence of the repositoryspecific data features

Additionally creation of new policy and13 standards as well as development of tools to13 evaluate the quality of13 the data and tools to13 enable data-shy‐driven13 discovery will be required to manage and use materials data to the greatest extent possible

NIST is responding to the OSTP (Office of13 Science and Technology and Policy) memos (Dr John Holdren 2 Feb 14) by developing practices that will be of use to13 the community including a Common13 Access Platform allowing access across multiple collections of information13 Research13 and13 Development of Persistent Identifier (PID) infrastructure with13 the Research Data Alliance (RDA) and RampD o Data Type Registries with13 RDA

Finally NIST is collaborating with the National Data Service (based at the National Center for Supercomputing Applications) and is developing a set of best practices for Data Management Plans

7

13

Th Materials Commons A Novel Information13 Repository and13 Collaboration13 Platform for the13 Materials CommunityBrian13 Puchala Glenn13 Tarcea Stravya13 Tamma Emmanuelle Marquis John13 Allison13 University of Michigan

Critical to13 accelerating the pace of materials science and13 development is the development ofnew and13 improved13 infrastructure providing a seamless way for materials researchers to13 share and13 use

materials data and models To address this need we are developing the Materials Commons (httpprismsenginumicheduprismsmaterialscommons)13 an information repository andcollaboration platform for the metals13 community13 in selected technical emphasis13 areas We envision the

Materials Commons becoming a continuous seamless part of the scientific workflow13 processResearchers will upload13 the results of experiments and13 computations as they are performedautomatically where13 possible along with the13 provenance13 information describing the13 experimental and

computational processes By13 associating this13 data with the experimental and virtual computational materials samples from13 which it is obtained the Materials Commons will build process-shy‐structure-shy‐property relationships enabling the construction13 and13 validation13 of constitutive and13 process models The

Materials Commons website provides an easy-shy‐to-shy‐use interface for uploading and13 downloading data and13 data provenance searching and sharing data At its core the13 Materials Commons consists of secure13 data storage cluster an13 application13 for efficiently uploading and downloading large data sets and a REST

(REpresentational State Transfer) based13 API to13 access and13 extend13 the capabilities of the repository TheAPI allows for features such13 as automated13 data upload13 from experiments and13 computations seamless integration of13 computational models and algorithmic analysis of13 process-shy‐structure-shy‐propertyrelationships The Materials Commons is a central thrust13 of13 the Center13 for13 PRedictive Structural Materials Science (PRISMS)

NIST Materials Data Informatics EffortsCarrie Campbell National Institute of Standards an Technology

The presentation covered NISTrsquos pilot efforts in13 creating a data infrastructure focusing on Phase-shy‐Based13 Property Data Phase-shy‐based data13 are13 diverse13 data13 that13 include 1 2 3 and 4-shy‐dimensional13 data are13 semi-shy‐structured and often include incomplete data sets The figure below provides anoverview of NISTs data efforts supporting the MGI

8

13

NIST Materials Data Repository (httpsmaterialsdatanistgovdspacexmlui) is a customizedDSpace repository for materials that enables sharing of variety of data types including text13 files images and video The repository also provides persistent identifier for each entry and allows users to choose

from a variety of13 license types NIST is also working on two data curation tools for phase-shy‐based13 dataThe Thermodynamic Research13 Center (httptrcnistgov) is extending the ThermoML markup languageand Guide13 Data13 Capture13 software13 tool to metallic systems13 with an initial emphasis on phase equilibriaand thermochemical data The13 Materials Data13 Curation System (MDCS)13 employs user-shy‐defined13 XML-shy‐based13 schemas to curate13 data The13 MDCS13 allows either web-shy‐base data curation13 or using REST API The

entered data13 is stored in a non-shy‐relational database repository can be searched using SPARQL (SPARQL Protocol and RDF Query Language) interface13 and integrated with variety of scientific workflow13 tools13

AFLOW ConsortiumMarco Fornari Michigan13 State University

The overall goal of the Materials Genome Initiative (MGI) is to accelerate the discovery13 development and13 deployment of new materials with ad hoc functionalities The AFLOW Consortium

focuses on the software infrastructures to achieve the MGI goals by using first13 principles calculations The emphasis is on the data13 that must be generated archived validated and post-shy‐processed13 efficiently13 both13 to13 identify new materials and13 distill meaningful quantitative relationships within13 the data This extends the13 solution to the13 inverse13 problem of chemically replacing13 rare13 and costly elements in criticaltechnologies

The AFLOW Consortium provides access to the repository AFLOWLIBORG

lthttpAFLOWLIBORGgt13 that publicly distributes data13 on energetics and thermodynamic stability for more than 600000 compounds and band structures for roughly 300000 materials The repository

includes bulk crystals and alloys13 Data were computed with the AFLOW high-shy‐throughput13 framework

using density functional theory The results of the calculations can13 be queried13 with13 the recentlydeveloped13 REST-shy‐API that standardizes and13 labels the data for retrieval purposes The13 fields defined in

the AFLOW REST-shy‐API include a material object identifier (AUID) data cloud13 location13 (AURL) ownership13 9

13

and sponsorship keywords and data13 specific labels (COMPOSITION ENTROPY EGAP hellip) Simple13 scripting languages13 can be used to automatically download13 from the AFLOWLIB13 repository The AFLOW

approach has13 been extensively applied on Platinum-shy‐group alloys thermoelectric materials magnetic materials topological insulators scintillators etc The AFLOW REST-shy‐API can13 be extended13 to13 includeexperimental data For further details on the13 database the13 reader is directed tohttpmaterialsdukeeduauroAUROARTICULAjcommatsci201202002pdf

Materials Innovation Infrastructure Lessons Learned and Future PlansSurya Kalidindi Georgia Institute13 of Technology

central activity in13 materials research13 is the exploration of processing-shy‐structure-shy‐property (PSP) linkages13 Workflow is defined13 as the sequence13 of steps employed for establishing13 PS linkages ofinterest to any specific engineeringtechnology application

High Throughput Explorations of PSP Linkages13 We must treat13 hierarchical13 material as acomplex System which means embracing and exploiting uncertainty quantification13 and13 complexity

management conceptstools13 It also means we need to design develop and13 validate decision support systems13 that will leverage the best current understanding (with13 its uncertainties) and13 provide objective13 guidance13 on future13 effort investment (combination of experiments and simulations)13 This meansleveraging Data Science to effectively feed the decision support systems described above by developingand deploying high13 throughput strategies for obtaining high value information from both experimentsand models

What are integrated13 workflows Integrated workflows are those that utilize the best13 combination of experiments13 and simulations13 in extracting robust and reliable PSP linkages13 This requiresengaging and exploiting cross-shy‐disciplinary expertise that includes materials science manufacturingsystems approaches uncertainty quantification computational science data and information sciencesThrough this one must ensure that the workflows output the critical information needed13 by design13 and13 manufacturing stakeholders in the materials development value chain13 Integration demands expertise

sharing13 which essentially is the core element of a collaboration Cross-shy‐disciplinary Integration13 demands13 effective13 collaborations13 such as those one might set up using the modern tools of the day13 ie a e-shy‐Collaboration

10

13

Motivators for e-shy‐Collaborations13 e-shy‐Collaborations are central to13 enhancing the productivity ofmaterials development activities in several ways

bull individual13 research groups can identify and engage13 complementary expertise13 needed for their projects

bull organizations can13 coordinate multiple teams and13 multiple projects to13 maximize outputimpactbull organizations can13 identify and13 seek the right partners that13 enhance their13 individual and combined

value e-shy‐Collaborations can13 be open semi-shy‐open or closed13 depending o the needs of the specific organization Productivity enhancement leading to dramatic reductions in cost and time13 expended in materials development is the13 main incentive13 for e-shy‐collaborations

Main Impediments to e-shy‐CollaborationsLack13 of tools to identify13 the13 ldquobestrdquo13 collaborators

bull Need software tools that quantify more precisely the expertise of individuals groupsorganizations

bull Need software tools for13 allowing rich annotations and e-shy‐recording of13 discussions Need analytic tools for13 identification extraction13 and management of metadata (ie13 high13 value abstractionsfrom the raw data)

bull Need searchable semantic metadata databases (these are13 lightweight and can be13 centralized

and customized for specific13 user communities) bull Need software tools for tracking provenance (ie13 whowhatwhenhow13 generated modified

etc) Lack13 of experience13 with the13 use13 of integrated workflows

bull Need software tools13 for e-shy‐recording of13 workflows and development of searchable workflow

databases bull Need recommendation schemes that suggest the best workflow(s) based on recorded prior

experience this provides means to transfer expertise13 acquired by an organization to other teams and other13 projects (even in the same organization)

bull Leads to possible automation and dramatic enhancement of productivity

MINED Grouprsquos Past Experience13 (~2005 -shy‐ ~2008) Funded by the Army Research Office ARO(David Stepp) engaged computer science (CS) graduates to produce13 a single13 user platform that13 integrated certain software codes previously produced13 by our group software development subsequently abandoned but recruited two CS graduates into PhD programs in Materials Science andEngineeringMechanical Engineering (MSEME)13 which dramatically altered the direction of13 research in

the group brief chronology of development bull Senior Design Project (2011-shy‐2012) An online13 portal designed for sharing data13 files and tools

learned the importance of user interfaces and the need for professional13 code developers andsoftware architects

bull HUBzero Instantiation (2012-shy‐2013) Built hub by customizing HUBzero for hierarchical structuralmaterials there are several advantages in terms of accessing and executing codes13 (on the server side) but there are significant barriers13 to establishing fast and productive13 cross-shy‐disciplinary e-shy‐collaborations

11

13

bull GTRI Integration Platform (2013-shy‐2014) Engaged professional code13 developers and software13 architects to design13 a multi-shy‐user web13 portal that allows data and13 code sharing while tracking

provenance and workflows underestimated the resources needed for such a project bull MATIN Collaboration Environment (2013-shy‐2014) Lightweight integrated software13 shell that allows

users to13 seamlessly interface with13 numerous cloud13 services offered13 through13 the web13 through13 their13 APIs Examples include GitHub for versioning and archival of codes Authorea forcollaborative editing of documents figshare for citable publication of research Plotly forcollaborative data analysis13 and visualization Google+ and LinkedIn for eteaming13 and networking MATIN focuses on building an emergent community ndash Materials amp Manufacturing Informatics (MMI) MATIN provisions web services for13 software integration platforms and innovationinfrastructure Examples of MATIN13

bull Spatial Statistics httpsgithubcomtonyfastSpatialStatisticsFFT

bull PyMKS httpopenmaterialsgithubiopymksindexhtml bull Example discussion about periodic boundary conditions in running Finite Element (FE or FEM)

simulations httpsgithubcomsfepysfepyissues267

In the platform developed it is noted that the user or userrsquos organization can set permissions at the

level13 of individual datasetscodes o the degree and13 type of13 sharingCentral Challenge Activation The transition from traditional disciplinary workflows to the

integrated workflows needs to overcome an activation barrier There is no13 clear incentive to13 get ahead13 of the curve in13 anticipating how the upcoming transformations in how materials development will be

pursued13 in13 the future Open (reproducible) science alone is not a sufficiently strong motivator for materials development community with13 strong interests in intellectual13 property development13 indeed itmight actually13 be perceived13 as a negative There is need to explore13 and refine13 what incentives work for individual13 researchers (eg increased productivity)

Future13 Plans13 Continue to13 develop13 and13 deploy MATIN at GT and any other organizationsgroups that13 express an interest in working with us Design create and distribute13 suite13 of ldquomobile13 firstrdquo Appsdesigned13 to13 make data ingest into13 collaboration environments such as MATIN from both material characterization equipment and multi-shy‐scale models an enhanced13 user experience with13 friendly andnatural interfaces Proactively promoteincentivize13 datametadata13 ingest into publicly shareddatabases Work with13 professional societies to13 announce and run new competitionsawards for13 recognizing and rewarding good practices by the members of the broader materials innovation13 usercommunity (eg Best13 Documented Data Award Best Documented Tool Award Best Documented

Workflow Award) Develop and deliver new workshops for training the new generation of13 graduate

students postdocs and junior faculty in the13 emerging workflows of Materials amp ManufacturingInformatics

High Performance Cast Aluminum Alloys for Next Generation Passenger Vehicle EnginesDongwon Shin Materials Science and Technology Division Oak Ridge National Laboratory

12

13

Oak Ridge National Laboratory (ORNL) teamed with Chrysler and Nemak13 to develop next-shy‐generation high-shy‐performance cast aluminum alloys that have13 improved castability high-shy‐temperature

strength and fatigue13 performance At this point the13 data management practice is limited13 to13 analytical data spreadsheets other than13 the CALPHAD computational thermodynamic databases and13 proprietary

property databases for casting simulation As the project goes on more robust data schema will bedeveloped13 to13 store various types of data populated13 from the experiments and13 calculations

Besides the data structure design the speaker discussed13 the necessity of a new approach13 to13 identifying key descriptors from the large dataset that can guide the design of new materials An alloydesign13 requires the balance of extremely complex relationships among a large number of descriptors Some13 examples of these13 descriptors include13 phase fraction13 of key intermetallics lattice parameters ofthe matrix and diffusion13 kinetics13 of solute atoms However optimizing these quantities relies on good

physical metallurgy intuition13 coupled13 with13 clever experiments dogged13 sleuthing and13 a healthy dose ofluck13 Such descriptor-shy‐based predictions of materials properties date back to13 the beginning13 of modern

solid-shy‐state chemistry and physics As13 shown in the Darken-shy‐Gurry map to predict the solubility ofelements in Tantalum13 intuitive descriptors13 such as electronegativity and radius13 could be used formaterials design However as the chemistry and physics13 of materials become more complex intuition-shy‐based13 search13 may not work anymore Thus the use of more advanced13 statistical analysis and13 datamining approach might be required to accelerate the design of complex multi-shy‐componentmulti-shy‐phase

alloy design13

Data Related to Cast Aluminum

Jake Zindel Ford Motor Company

The main focus of our research is the development of CAE (Computer-shy‐aided Engineering) tools to support13 two significant13 activities in the company component13 design (product13 development)13 and

manufacturing engineering (high-shy‐volume production) A key13 part of our research is conducting13 experiments to collect data13 for the13 development and validation of13 the CAE tools

The data13 we collect consists of mechanical properties physical properties microstructuralimages and manufacturing process parameters that form the13 pedigree13 of the13 material This information

can be in the form of Excel files proprietary-shy‐software storage files screen shots and written reports Our immediate need is to deal with this large amount of existing information collected over

many years on a ldquolocalrdquo level The first step is to organize many independent collections of informationin all13 their forms13 The next issue is the time required to enter all13 this information into the database13 Maintenance may also require a significant amount of effort Lastly there will also13 be a need13 to13 manipulate the data contained in the database using Excel Mini-shy‐tab MATLAB etc Therefore links or13 easy exportation of data13 to these13 types of applications will be13 necessary

The next level would involve dealing with information that has been created or collected byothers From an13 OEM (Original Equipment13 Manufacturer)13 perspective this would13 involve convincing

suppliers13 to share processing information that they may consider proprietary In addition to suppliersuniversities and13 national laboratories are also13 sources of data Data from all the13 sources will require13 some sort of protocol to determine its13 quality and resolve inconsistencies

13

13

In the end13 it seems likely that considerable budget and time will be required to implement comprehensive informationdatabases which13 could be the largest hurdle13 of them all

QuesTekrsquos MGIICME Approach to Alloy Optimization and Design (Including Cast Iron) Jason Sebastian QuesTek Innovations LLC

Integrated Computational Materials Engineering (ICME) methods involve the holistic application13 of different computational models across various length13 scales to13 the design development and13 rapid13 qualification13 of advanced13 materials Using itsMaterials by Design technologies QuesTek has successfully applied these ICME methods to13 the design development13 and full qualification of several advanced alloys (including aerospace13 structural steels aerospace13 gear steels advanced castable13 titanium alloys high-shy‐strength corrosion-shy‐resistant13 aluminum alloys13 and high-shy‐performance cast iron)

QuesTekrsquos ICME methods benefit tremendously from the activities of the US Materials GenomeInitiative (MGI)13 Recognizing that the ldquo10 to13 2 yearsrdquo that13 it takes for alloy design and development ishindering the rapid13 deployment of higher-shy‐performance materials13 the MGI is focused on ldquocutting in halfrdquo the time and cost13 to design and deploy novel advanced materials needed to enhance and sustain UScompetitiveness

QuesTek also has ongoing13 activities on the13 application of ICME methods to the13 design and

development of materials for the energy sector including advanced13 materials that enable vehicle

lightweighting13 Of particular note is an ongoing effort with a major diesel13 engine manufacturer on the13 development of high-shy‐performance cast iron13 alloys for engine block applications QuesTekrsquos efforts have

made extensive use of various MGI-shy‐type tools and databases including 1)13 density functional theory

methods for high-shy‐throughput13 screening of cast iron inoculant stabilities (structure lattice parameterstability of various13 oxidesulfide phases) 2)13 liquid iron mobility databases (to help accurately modelsolidification phenomena) and 3) thermodynamic13 databases13 (to examine nano-shy‐precipitate stability)

Moving forward to continue to accelerate the materials development cycle efforts shouldremain focused on the development13 of13 the fundamental databases (thermodynamic kinetic etc)13 that13 enable13 ICME models There13 should also be13 increased focus o efforts related13 to13 the integration13 of the

various computational databases models and tools (the ldquoIrdquo13 in ICME)

Integrated Computational13 Materials Engineering (ICME) of Generation Three Advanced High Strength

Steels Louis G Hector Jr General Motors RampD Center

While traditional material development methods often involve laborious trial-shy‐and-shy‐error testing Integrated Computational13 Materials Engineering (ICME) which falls under the umbrella of the Materials Genome Initiative is offering a means13 for materials13 development under a more compressed time scale Advanced13 high13 strength13 steels (AHSS) which13 contain13 multiple phases (eg ferrite bainite pearlite austenite martensite) and may exhibit13 phase transformation with strain are fertile ground for13 ICME However prediction of macro-shy‐scale constitutive behavior based upon the multi-shy‐scale physical chemical and mechanical phenomena13 in these13 complex microstructures is formidable challenge for ICMEUnprecedented collaboration between universities industry and government labs will be required to

14

13

address AHSS13 development archiving of data and the13 implementation of these13 materials into products that13 benefit13 the American consumer This presentation began with a brief overview of a newDOEUSAMP (US Automotive Materials Partnership)-shy‐funded ICME program aimed at13 the development13 of Generation13 Three (Gen13 3) AHSS The lack of commercial Gen13 3 steels that meet DOE targets is offering13 significant challenges13 to this13 new ICME program The program is13 meeting these challenges13 following four thrusts (1)13 making Gen 3 steels to meet DOE targets (2) passing experimental measurements of key

properties to13 an13 ICME model that produces microstructure-shy‐based13 constitutive models (3) materialevaluation in forming13 and vehicle13 performance13 simulations (4) archiving13 all relevant data13 for easy use13 by

others in13 the program Although13 the Gen13 3 AHSS ICME program is ldquomission13 orientedrdquo it demonstrates an important role for universities in providing the necessary foundation in fundamental materialsscience and engineering

Integrated Computational13 Methods for Composites Materials (ICM2)13 Lessons Learned from

Integration Planning and Feasibility Demonstrations Lara Liou GE Aviation

General Electric Aviation and Lockheed Martin Aeronautics have teamed through an Air Force

Research13 Lab13 program to13 conduct a demonstration13 of an13 Integrated13 Computational Materials Engineering digital framework that links material processing property and13 structural relationships to13 account for processability manufacturability and system performance13 with the13 goal of demonstrating

that13 the usage of13 integrated models can contribute to future airframe and engine designs with dramatic reductions in development13 time and cost Both engine and airframe Foundational Engineering Problems will be demonstrated through a common digital framework to be developed and validated onincreasingly complex articles13 The modeling tools that13 cover composite process models localized13 mechanical behavior models and global design models (Convergent Manufacturing TechnologyrsquosCOMPRO CCA Autodeskrsquos Autodesk Simulation13 Composite Analysis and13 an13 Abaqus based13 University ofMichigan micromechanics model) are centrally linked via a commercially available model integration

interface (Phoenix Integrationrsquos ModelCenterreg)13 Detailed plans of what data will13 be passed among thevarious models as well as how it will be passed have been completed Creation of these detailed13 process maps and13 simple feasibility demonstrations have raised13 inherent technical and13 logistical challenges13 to integrating commercially13 available multi-shy‐scale models Robust model integration methods13 to address these challenges were13 discussed

The exercise of drawing up detailed process maps for what to integrate and how to integrate13 the selected tools and models resulted in key early lessons learned and identification of13 tools critical tolong-shy‐term robust and sustained application of13 Integrated Computational Methods for Composites Materials (ICM2)13 The key lessons learned include

bull Individual13 models must be relatively mature and validated before an ICM2 problem can be

confidently13 defined bull F analysis-shy‐based13 tools within13 the same digital thread should utilize13 the13 same13 FE solver bull The team ideally13 should consist of at least one expert for each13 tool and13 its underlying model as

well as an expert in the digital thread and integration interface tools and programming

15

13

bull Mesh interpolation between13 finite element analysis based13 tools will likely be required and13 acommercially13 available tool for this13 type of composite mesh interpolation is13 key13 for robust and

sustained application of ICM2

Residual Stress Engineering of Rotating Nickel Components Terry Wong13 Aerojet Rocketdyne

The goal for the PW-shy‐8 project13 The Nickel Base Superalloy Residual Stress Foundational Engineering Problem program aims to13 develop13 a multi-shy‐disciplinary analysis approach13 to13 predict and13 incorporate bulk residual13 stress in the13 design of component This will allow for design of part that is optimized for a particular application or design intent (such13 as weight performance or reduction13 ofscrap) To accomplish this13 goal the PW-shy‐813 team has two major tasks 1) the13 development of an13 infrastructure and associated tools needed to predict and incorporate bulk residual13 stress in the designstages13 of a part and 2) a demonstration task that will validate the infrastructures13 and tools13 developed

The Data13 Management task is the foundational element13 of13 the larger13 infrastructure

development task The goal for the data management work is to13 develop13 an13 infrastructure that allowsthe various team members in PW-shy‐813 to share13 data13 and to store13 data13 in database This not only ensures that data generated13 by this program is never lost but also13 allows for data to13 be transferred13 in13 a secure

manner In order to meet the goal of a system13 that securely stores and shares data three major tasks must be accomplished 1) Validation that there13 is secure13 method to transfer data13 between team

members and that the method of storing data is also secure 2)13 development of13 a database schema and

3) development of means to easily upload data13 into the13 database13 and download data13 from the13 database

Of the three the most difficult task was to create a system in which the transfer and storage ofdata was secure Various proposals were made to13 ensure the security of the data but many of them

were rejected not because the proposal was actually unsecure but because it was not acceptable byeach of the13 team memberrsquos IT security policies The13 process to get each team membersrsquo IT department to approve the security of13 the data transfer13 and storage took more effort13 than anticipated In the endthe team members along with their13 respective IT departments approved the use of13 TRUcentrix as a tool to transfer13 data and the use of13 an ASM International13 hosted13 database to13 store data

The development of database schema13 is necessary step in creating the database to store all the data generated (via physical tests as well as data generated by models) A big help in creating the

database schema was to13 use the schema generated13 by Dr Steven13 Arnold13 from NASArsquos Glenn13 Research13 Center Using NASArsquos schema eliminated13 the need13 to13 start the database build13 from scratch13 and13 allowed13 us to13 focus o the specific needs of storing residual stress data

The last task involves the development of tools that make the input of data13 easy The moredifficult it is to13 enter data into13 a database the more likely data13 will not be13 entered Excel-shy‐based13 templates are being developed in order13 to make data entry as easy as possible

The two big lessons learned from PW-shy‐813 in relationship to data13 management are

bull It is essential13 to work with each13 companiesrsquo IT and IT Security people to develop a system thatsatisfies13 the security requirements of each company

16

13

bull Database schema development should not be reinvented but begin with already established the13 schema

Perspectives on MAI ProgramsDr Mike Glavicic Rolls Royce

The current formulation of MAI (AFRLrsquos Metals Affordability Initiative) programs are structured13 independently from one another in that a standard methodology does not exist that describes whatdata is to13 be archived13 at the conclusion13 of a program Nor does there exist a standard13 protocol of howto archive things on a server13 or some other computer resource at the conclusion13 of a program As aresult the current13 practices of13 MAI programs

bull Do not address IP (intellectual property)13 and licensing issues in systematic fashion which leadsto issues in the commercialization of ICME software that is generated13 under a program

bull Do not have a formalized requirement of how data is to be packaged and stored at the end of aprogram Hence even13 in13 the best of cases in13 which13 all of the data is stored the data is archived13 in13 simple folder13 fashion that13 is not13 searchable in any manner13 possible Moreover even if13 all of13 the

data is present it is extremely cumbersome to13 untangle the data for future use using a file folder strategy

Hence at the end of the program both the government and13 MAI team are13 left to manage13 the13 data and13 results in13 an13 unregimented13 fashion

The lack of regimented protocol of how to package programrsquos13 results13 (raw data and reports) makes it extremely difficult for new government programs to leverage the data from previous programs Case in13 point the RR-shy‐1013 program was developed with the13 singular goal to integrate13 the13 models developed13 in13 the LAD-shy‐213 program into commercial software13 platform From the13 outset of the13 program data models and13 other information13 that was required to attained the13 goals of the13 RR-shy‐1013 program were13 either difficult to track down lost or not provided to the13 RR-shy‐1013 team due13 to contractual disputes As aresult funding that13 was earmarked for13 specific technical work at13 the conception of13 the13 program had to

be diverted13 for the recreation13 of data that existed13 in13 a previous program and13 the progress of the

program was stunted13 by this unforeseen13 burden In13 spite of the RR-shy‐1013 programrsquos13 ability to overcome

these issues and become a successful program it13 should be noted that13 the success of13 other13 programs that13 will encounter13 similar13 problems will likely not13 apply as RR-shy‐1013 was very fortunate13 to have13 the13 means to overcome these serious issues that13 were a result13 of13 the lack of13 a data management13 plan

Furthermore if the13 MGI initiative13 is to progressively develop materials models that are13 leveraged and improved in future government funded programs a policy on how to store and share thedata from disconnected13 programs is required The PW-shy‐1113 program is the first13 step in this process as it13 will have the data from three MAI programs (RR-shy‐10 GE-shy‐10 PW-shy‐9) uploaded into database13 that will be13 searchable and will enable the reuse of the data collected in these programs However even if this13 program is a success13 there will undoubtedly13 be other features13 that will be required in the platform

developed13 and13 hence future modification13 will be required Most importantly there does not currently13 exist server that has been set up and funded for the13 archiving13 of all13 future programs with a protocol13

17

13

13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

that13 must13 be adhered to at13 the conclusion of13 a funded program Without13 such a protocol and the

existence13 of server to be13 used in all future13 programs the13 goals set forth by MGI will be13 unnecessarily

difficult and13 more expensive to attain

MAI Data Informatics Data Standards Modeling amp UQDr Rajiv Naik Pratt amp Whitney

The MAI Data13 Informatics Data13 Standards Modeling amp Uncertainty Quantification (UQ) (PW-shy‐11)13 effort13 will enable the pervasive implementation of ICME by developing (i) a web-shy‐based13 OEM-shy‐Suppliercollaboration framework13 for production13 certification13 and13 process modeling data (ii) data standards and

protocols for interoperability between13 OEMs and13 Suppliers and13 (iii)13 UQ methods to capture variability in

process microstructure and property data Models and data13 from earlier MAI programs such as GE-shy‐10 PW-shy‐913 and RR-shy‐1013 will be13 used to develop and implement OEM-shy‐Supplier integration data standards data

structures and UQ models Several OEMs and suppliers (Pratt amp Whitney General Electric Rolls-shy‐RoyceTIMET Alcoa13 and RTI International Metals Inc) are working together on the PW-shy‐11 Team

The OEM-shy‐Supplier collaboration tool will enable data13 exchange related to mechanical testingmicrostructural13 characteristics processing and process modeling The13 data13 standards developed by the

PW-shy‐1113 team will provide13 an industry standard for such OEM-shy‐Supplier collaboration The collaboration13 framework would provide a platform to manage OEM-shy‐Supplier13 workflows data and13 informationsecurity and access13 control and the13 ability to perform system administration without the13 need13 for day-shy‐to-shy‐day intervention13 of the IT department The capture of such13 supplier data over the collaboration

framework will enable the OEMs to enhance process and property13 models data analytics and UQ

modeling Three potential technologies will be evaluated13 for the OEM-shy‐Supplier Collaboration13 Frameworkndash GE Forge Vanderbilt Forge and Semantic Web technology and one13 technology will be selected13 for the

next phase of the program

18

13

Semantic Web technology is paradigm shift from highly structured13 databases to13 more flexible

graph databases from single database searches to13 multiple and13 disparate13 database13 searches and from

keyword searches to contextual and meaningful (semantic) searches It can provide flexibility13 of data

structures and enhanced OEM-shy‐Supplier13 interoperability However it requires more work upfront13 for13 implementation as it needs the development of a common vocabulary and relationships

The data13 standards and data13 structures effort will leverage work at national and13 international standards13 organizations such13 as ASM ASTM CEN and13 NIST and13 develop13 standards and vocabularies forproduction13 certification data that can include chemistry microstructure and mechanical properties This effort will also develop means for the capture of data needed13 for the UQ and13 ICME models andbuild13 databases for13 the capture of13 the data from the GE-shy‐10 PW-shy‐9 and RR-shy‐10 MAI programs

The ICME13 models being developed under the GE-shy‐10 PW-shy‐9 and RR-shy‐10 programs involve severalinterconnected models related to processing microstructure and mechanical properties The UQ effort under PW-shy‐11 will involve considering the uncertainty at each13 step13 of the modeling process and flowing it through the interconnected models to evaluate the relationships between13 the variabilities of13 the

processing parameters the microstructure and13 the mechanical performance

ASMrsquos Computational Materials Data Network amp the NIST-shy‐ASM Structural Materials Data

Demonstration ProjectScott D Henry ASM International

ASM International ndash the worldrsquos largest13 association of13 metals-shy‐focused materials engineers scientists technicians educators and students13 ndash has launched13 a series of initiatives under the

ldquoComputational Materials Data (CMD)13 Networkrdquo umbrella The objective13 of the13 CMD Network is to serve13 as center for information collection and dissemination for materials data13 to support integrated

computational materials13 engineering (ICME) and to help realize the goals13 of the US Materials13 Genome

InitiativeThe first major project for the CMD Network is the NIST-shy‐ASM Structural Materials Data

Demonstration Project (SMDDP) a cooperative research project funded by NIST with participation from

the Kent13 State Center13 for13 Materials Informatics and Granta Design Ltd13 The main13 objectives of SMDDP

are13 to

19

13

bull Establish well-shy‐pedigreed13 and13 curated13 demonstration13 datasets for non-shy‐proprietary metallic structural materials13 data over multiple length scales The team has13 chosen to work with 6061

aluminum and the13 related Al-shy‐Mg-shy‐Si system

bull Work with NIST and the materials data community to develop materials data schema and

ontologies for the demonstration13 datasets cognizant of broader interests and13 datasets

bull Develop and carry out a series of test problems that represent relevant use cases for the13 repository

bull Make data open to the materials data community for use in data analytics modeling and

educational purposes

bull Actively engage the materials data community and13 widely disseminate the findings from the

project

bull Develop and implement data13 capture13 and curation procedures that can serve13 as models for otherdata repositories

The basic philosophy behind the project is that often standards13 emerge through trial and error and examples are13 needed to spur this process

The project is using the NIST-shy‐KSU DSpace13 repository for data13 files The13 curated structured

database is being developed13 using GRANTA13 MI software and13 is hosted13 o ASM serversThe project ndashlaunched in late 2013 and planned as an 18-shy‐month program13 ndash is being conducted in

phases The13 first phase13 involved developing starter database13 and gaining experience13 in data13 curation The second phase (currently underway) involves building out more comprehensive data13 sets that will be

suitable for carrying out test problems13 related to processing-shy‐structure-shy‐property relationships in13 Al 6061 The third phase will involve making further refinements in response to testing The complete data13 set will be opened up online to the materials community as a whole

Globus13 Scientific13 Data Services13 Current and FutureBen13 Blaiszik1 Kyle Chard1 Jim Pruyne1 Rachana Anathakrishnan1 Steve Tuecke12 Ian Foster12

1University of Chicago Computation Institute 2Argonne National Laboratory MCS Division

During this talk we discussed a variety of currently available Globus services13 Via these Globusservices developers13 and IT administrators13 can offer simple secure and robust file transfer as13 well as13 identity profile and group management to their user communities13 Globus allows end users to create

and13 manage a unique identity that can13 be linked13 to13 external identities for authentication End13 users can13 also transfer files across wide13 area13 networks faster and more13 easily whatever their location

The talk also covered prototype services related to data13 publication metadata tagging and13 data

search Globus13 publishing capabilities13 are delivered through a hosted service Metadata is13 stored in the

cloud while raw published data and a fully-shy‐formed metadata log are stored on campus institutionaland group resources that13 are managed and operated by campus administrators Published datasets are

20

13

organized13 by communities and13 their member collections Globus users can13 create and13 manage theirown13 communities and13 collections through13 the data publication13 service13 with collection level policies regarding user13 access Additionally metadata from these communities are imported into a centralized

index to allow for robust searching functionality

Semantics13 for Deep13 InteroperabilitySamuel Chance iNovex

Semantic technology is based on mature13 web standards and intended to be13 used as global framework for13 machine-shy‐to-shy‐machine communication Interoperability is enabled even when the more

basic elements of the semantic technology stack are used13 in13 conjunction with a controlled vocabularyKnowledge13 representation is enabled as one13 begins to utilize13 the13 ldquohigherrdquo elements of the13 stack

brief history of the semantic web13 was presented13 and13 started13 with13 a depiction13 of the web13 that most people see today documents and web pages presented to the user13 through a web application It13 then depicted a transition to an intermediate state where documents and data can co-shy‐exist on asemantic13 web riding atop the same infrastructure (http) Even in this13 state the data can13 be directly

interpreted by a machine13 The history concluded with a depiction of data being extracted from adocument to13 become part of the web13 of data

The basics of semantic technology were presented and emphasized that the technology has afoundation based13 o formal logic that enables software13 to understand information and reduce13 the13 need

for13 a human to interpret13 the meaning This section of13 the presentation concluded with a chart13 depicting

how ldquotriplesrdquo link to13 each13 other to13 form a web13 of dataThe final section touched upon adoption of13 and things to consider13 when semantic technologies

are13 being considered as solution for community or enterprise

21

13

Big Data13 and Semantic TechnologiesHarlan Shober RJ Lee Group

The presentation started with a description of entities that13 comprise laboratoryinformatics with the central elements being a federated architecture scientific ldquoBig Datardquo management13 and workflow-shy‐driven Laboratory Information Management13 Systems (LIMS)To aid attendee understanding initial context13 was provided by first13 defining for ldquoBig Datardquo and

Semantic Technologies Context13 was further bolstered through a description of a problem that13 included laboratory testingcharacterization and fabrication processes The challenges for atypical laboratory include the use of many systems and the creation of large amounts of data13 that13 grows daily but13 cannot13 be readily accessed

After the context13 was established the needed functionality was described and consistedof several elements contribute index search analyze and archive While bearing in mind theneeded functionality a string of universal scientific data13 management13 process categories werethen presented with the intent13 of applying the universal description to the materials design anddevelopment13 domain

22

13

Though semantic technologies are applicable over the entire lifecycle of the data13 management13 process three categories were specifically identified as primary candidates fortheir application Data13 Acquisition Data13 Manipulation and Storage and Discovery andAdvisory W3C standards for Resource Description Framework (RDF) were applied to a notionalexample with the intent13 of completing a thread that13 gave the attendee a sense of how thistechnology could be applied

The presentation continued with a description of Postulate a specific solutiondeveloped by RJ Lee Group that13 could be used as part13 of a suite of tools to enable the captureof data13 and the near-shy‐frictionless tagging of data13 with additional observations provenanceinformation or links to additional data Using tools like Postulate repositories could be moreeasily shared with others in a federated manner These technologies would enable thedevelopment13 of indices that13 increase the precision of search results and ease analytics since thedata13 would be ldquoself-shy‐describedrdquo

In the remainder of the presentation Postulate functionality was further described andsome of the challenges presented earlier were addressed For example how do you processthe quantity and variety of data how do you store it how do you find the data13 you care aboutand how can the data13 be presented to you in a meaningful way

Natural13 Language Processing (NLP)Tim Hawes Decisive Analytics

The presentation provided the attendees with a description of NLP13 and how it can be13 used It started with examples13 of how it can be used to mine ldquoBig Datardquo to predict credit card fraud or assess13 user purchase histories for product development NLP is also13 being used13 in13 the field13 of biology and13 appears to have13 enjoyed some13 level of success

Since13 the13 biology and materials communities both publish large13 volume13 of documents (abstracts papers presentations etc) many of13 the NLP approaches developed for13 biology may be

applicable13 to materials The next section of the presentation focused on describing various aspects of NLP tokenization

part-shy‐of-shy‐speech tagging disambiguation parsing entity disambiguation and role13 labeling This was helpful in13 removing the mystery surrounding NLP for those13 who hadnrsquot had previous exposure

Typical products from NLP13 would be useful to the materials development community Examples include providing the gist of a documentrsquos content providing answers to natural13 language questions and knowledge13 construction

Specific NLP13 challenges for the13 materials domain were13 identified inline13 use13 of equations prevalence of tables graphs and13 images (more of a computer vision13 problem) and13 numerically driven13 information13 Additionally the ldquogenrerdquo problem was discussed as13 well as13 a way this might be addressed

using raw data (versus learning data) and13 semi-shy‐supervised learning active learning and distant supervision approaches

23

13

The presentation concluded by reiterating how NLP13 can be used and the value of using it describing some its limitations and13 identifying tool development needed13 for the materials domain

24

13

Community13 and13 Government Priorities in13 Addressing13 Materials Data

The following numbered13 priorities for13 both community and government13 actions were lsquorolled uprsquofrom workshop participantsrsquo inputs and were reviewed at13 the workshop The raw inputs provided by

participants are captured13 under each13 numbered13 priority and provide context to13 the priority and13 often13 more specific actions

Community-shy‐Led13 Priorities1 Develop13 and deploy13 standards13 for data and federatedcollaborative environments-shy‐ Promote13 standards for data13 and collaboration environments-shy‐ Designing and deploying suitable software environments that are capable of facilitating e-shy‐

recording and accelerating (fail often-shy‐fail fast)13 of13 the communityrsquos experience (both successes and

failures)13 in exploring integrated workflows needed to realize the vision of ICME and MGI-shy‐ Data13 standards (program managementtechnical datametadata)-shy‐ Community must start developing a common13 data format data package could13 look like an13

HDF5 file and be shared in a variety of systems -shy‐ Begin13 definition13 of standardized13 infrastructure

-shy‐ Produce13 draft guideline for the metadata including information regarding authorship provenance targethellip

-shy‐ Share13 the13 data13 types being collected with the13 community Include13 nametypeunits of data

-shy‐ Community must adopt a federated13 approach13 -shy‐ allow organizations to autonomously create their13 own13 toolsets but be able to13 readwrite to13 the common13 data package mentioned13 in13 point 1

-shy‐ Good documentation on repositories are important-shy‐ Address networking of repositories (lightweight metadata databaseindex) -shy‐ Digital heterostructure microstructure mapping algorithms for both scanning electron microscopy

(SEM)13 and transmission electron microscopy (TEM) -shy‐ Materials process model (thermodynamics) integration with ICMSE (Integrated Computational

Materials Science and Engineering) data flow

-shy‐ Data sharing protocol across users (government13 OEM13 developers) with appropriate securitycontrol

-shy‐ Work with instrumentequipment OEMs to get auto ingest capabilities and open architectures-shy‐ Work together to define common vocabularies for interoperability-shy‐ Standard tests for ICME13 model data13 standards for microstructure13 data

-shy‐ Standards for ICME13 model data13 storage13 and workflows 2 Communicate the value and need13 for digital materials13 data-shy‐ Brand13 management ndash consistent messaging of goals13 and beliefs -shy‐ Share13 success stories -shy‐ Value (ROI13 Return on Investment) o data capture (globally locally)-shy‐ Give incentive ($$$) for case studies to provide evidence that a shared data repository can be

actually useful beyond the archival purpose

25

13

-shy‐ Share13 ROIs for MII (the13 Materials Innovation Infrastructure)-shy‐ ICME projects due to their iterative nature take longer and are more expensive to realize benefits

in industry implementation13 Multi-shy‐year ICME programs are becoming13 a hard sell to management difficult to13 establish ROI and13 realize benefits claimed

3 Engage the community-shy‐ Write grant proposals-shy‐ Collaboration13 (or e-shy‐collaboration) science that will help us13 identify13 the specific13 incentives13 and

impediments for establishing mutually beneficial highly productive sharing of expertise between13 disciplinary experts in13 materials manufacturing and13 data sciences

-shy‐ Continue to13 work o hard13 problemsmdashdevelop13 open13 source codes discover new descriptors data

mining uncertainty analysis-shy‐ Organize as a community (governmentindustryacademia nationalinternational) -shy‐ Follow data13 recommendations in MGI strategic planmdashworkshops to develop roadmap-shy‐ Attend13 more events like the MGI workshop13 mdash gaining13 lessons learned from other organizations is

extremely valuable

4 ldquoDefine critical datardquo13 to be13 compiled-shy‐ Consensus o what fundamental genomic data is most-shy‐important to compile (thermodynamics13

kinetics13 properties [strength]13 high-shy‐throughput13 DFT) -shy‐ Agree to13 the concept of a ldquouniversal repositoryrdquo for all genomic data that everyone will contribute

to and use (cf everyone maintaining and hosting their13 own databases) this includes issues of13 data

security and sharing etc -shy‐ List of most-shy‐important ICMEMGI13 materials development activities from an industrycommercial13

perspective (alloys composites piezoelectrics etc)-shy‐ Some13 ongoing repository efforts seem redundant We13 should avoid unnecessary competition and

try not13 to step o other peoples toes5 Explore and leverage data solutions developed13 outside materials13 community-shy‐ Semantic web infrastructure13 ndash evaluate13 feasibility of lsquounstructured materials datardquo management

(Technical pathcosttimeframe) -shy‐ Keep communicating outside13 the13 community about solutions -shy‐ Focus on available13 tools open ones when possible

6 Train13 students-shy‐ Sponsor students13 who can think in terms13 of IT and MSE

-shy‐ Training workshops (through professional societies) for training the new generation of materials students13 and researchers13 in the use of readily (already) accessible data science tools13 and cloud

services13 to make13 the13 best use13 of this emerging13 infrastructure13 in enhancing13 their own individual day-shy‐to-shy‐day productivity

7 Establish community13 norms13 for publishing materials13 data-shy‐ Involve publishers by requiring data when publishing Begin by making just the data available first

and then work on standardization panel

26

13

-shy‐ Data associated with publications should be shared along with the publications Not only the

metadata (eg average grain size vs cooling rate) but also the source of the data (eg micrographs used13 to13 generate13 grain size temperaturetime charts13 used to calculate cooling rates etc)

Government-shy‐Led13 Priorities1 Lead13 data strategyapproach13 development-shy‐ Hold meeting like this one in 6-shy‐1213 months with purpose13 of developing roadmap for MII-shy‐ Define a technical roadmap for13 the materials data infrastructure

-shy‐ Governmentrsquos got to decide on a strategy-shy‐ Whatrsquos the critical nucleus eg getting13 CALPHAD off the13 ground

-shy‐ Government must establish an official stance on security and infrastructure-shy‐ List of most-shy‐important ICMEMGI materials development activities from a government perspective

(alloys composites piezoelectrics etc)13 -shy‐ Needs assessment for professional workforce development-shy‐ Work with industry to make sure the research-shy‐to-shy‐engineering13 gap is filled What is the practical13 use

of the projects -shy‐ Provide13 the13 materials community organizing function

-shy‐ Business case for productivity improvement with13 data readily available

-shy‐ Facilitator as honest broker of capability awareness and data13 sharing

-shy‐ Interdisciplinary coordinated effort13 to avoidminimize duplication and redundancy

2 Facilitate data deposition (policy amp technology)-shy‐ Setting expectations and guidelines for best practices and standards where13 relevant -shy‐ Invest in tool13 making data management as easy as possible-shy‐ Promote standards developmentcommon data structuresshare knowledge representations -shy‐ Learn about the WBC Semantic Web capabilities and limitations -shy‐ Government should foster community development for both data package refinement and creating

standard approach to creating an enterprise toolset -shy‐ Provide13 motivation for standard data13 collection ensure13 it includes pedigreeprovenancemeta-shy‐data

-shy‐ Facilitator for materials data13 standards (good and trustworthy data13 vs bad data) -shy‐ Facilitator for forming consortium for data13 mining data13 warehousing and sharing

-shy‐ Facilitate13 industry standards for ICME13 model validation and maturity levels-shy‐ Facilitate13 standards for model-shy‐based13 process and13 supplier qualification

3 Incentivize data deposition-shy‐ Provide13 motivation for13 standard data collection ensure it13 includes13 pedigreeprovenancemeta-shy‐data-shy‐ Push ldquodata13 management planrdquo requirements to ldquodata13 depositrdquo requirements -shy‐ Make data sharing a requirement -shy‐ Contractual requirements that preserve data for future use

-shy‐ Demand data submission as a deliverable that will be in the contract before paying the funds -shy‐ Require that any data generated13 by federal grants is provided13 to13 the government (or a designated13

repository)13 within 3 years after completionmdashstart with basic13 research

-shy‐ Demonstrate the return on investments in the materials data infrastructure

27

13

4 Support13 data sharing13 and deposition13 (provide resources)-shy‐ Increase of MGI13 funds dedicated to MII13 development to 10-shy‐-shy‐repositories amp collab platforms -shy‐ Commit to13 long-shy‐term support for data repositories amp collaborative platforms-shy‐ Funding for development collection of fundamental genomic data

-shy‐ Support thermodynamicskinetics database development for light-shy‐weight alloy systems-shy‐ Funding for universal data13 repository (eg at NIST) ndash creation and maintenance

-shy‐ Fund or establish data13 repositories with commonuniform data13 collection format -shy‐ Createidentify secure data sharing site (like AMRDEC)13 for13 government13 contractorsmdashnot a

repository

-shy‐ AFRL take lead13 o making a database that can13 be13 shared

-shy‐ Infrastructure for materials related semantic web applications5 Support13 data creation-shy‐ Support experimental measurement effort to provide13 quality data for13 database development -shy‐ Support more13 high-shy‐throughput13 (HT)13 studies13 for applied programs13 to populate data Current HT

approaches support are13 limited to too fundamental single-shy‐phase DFT calculations 6 Enhance data management13 competency-shy‐ Government needs to increase in-shy‐house expertise in13 these areas or alternatively work closer with13

groups like NIST We cannot allow other organizations to define our needs for us7 Provide quality datasets13 for model13 developmentvalidation-shy‐ Need for open model benchmark cases with data to be available to the community

28

13

Attendees

John Allison University of MichiganJohn Beatty Army Research Laboratory

Ben13 Blaiszik University of ChicagoANL Carrie Campbell NIST

Sam Chance13 InovexJennifer13 Fielding Air Force Research LabMarco Fornari Central Michigan

Tim Hawes Decisive Analytics Corp

Lou Hector General MotorsAndrea Helbach Air Force Research LabKris Hill ITI-shy‐GlobalMike Glavicic Rolls Royce

Ro Gorham NCDMM

Dennis Griffin NASA

Mike Groeber Air Force Research LabScott Henry ASM InternationalMatt Jacobsen Air Force Research LabWill Joost Dept of Energy-shy‐EERE

Surya13 Kalidindi Georgia Tech

Kevin Kendig Air Force Research LabBen13 Leever Air Force Research LabLara Liou GE Aviation

Kenny Lipkowitz Office of Naval Research

Bill Mullins Office of Naval Research

Benji Maruyama Air Force Research LabRajiv Naik Pratt amp WhitneyJim Neumann Honeywell Ruth13 Pachter Air Force Research LabClare Paul Air Force Research LabAmra Peles United13 Technologies Research13 CenterAndy Powell GE Aviation

Brian13 Puchala University MichiganAjit Roy Air Force Research LabAyman13 Salem Materials ResourcesTom Searles Materials Data Management Jason Sebastian QuestekRichard13 Serwecki NASA

Dongwon Shin Oak Ridge13 National Labo

29

13

Harlan Shober RJ LeeJeff13 Sheehy NASA

Jeff13 Simmons Air Force Research LabVeera Sundararaghavan University of MichiganSesh Tamirisakandala RTI Glenn Tarcea University of MichiganTJ Turner Air Force Research LabVasisht Venkatesh Pratt amp WhitneyChuck Ward Air Force Research LabJim Warren Natl Institute of Standards amp TechJeff13 Williams GE Aviation

Terry Wong Rockedyne

Wayne Ziegler Army Research13 Laboratory

Jake Zindel Ford Motor Company

30

Page 11: Materials Genome Initiative: materials data · universallyaccepted best practices. Under the MGI, Federal agencies have funded research and development designedtoexplore, develop,

13

MGI Efforts in Infrastructure Development13 at NISTJim Warren National Institute of13 Standards and Technology

The presentation provided an overview of NIST activities supporting the creation13 of a materials data infrastructure The infrastructure could13 start by defining the interfacesAPIs (Application Programing Interfaces) with which user is able13 to interact for13 the deposit13 or13 retrieval of13 data Beyond13 an interface the following are essential elements for creation of viable infrastructure

bull The establishment of repositories to13 store the databull Tools to enable digital capture of the information preferably at the time of creation

(computation or13 experiment) bull Tools to enable markup of the data13 with sufficient metadata13 to inform someone else how the

data was created as well as various attributes that would help the user judge the quality of theinformation

bull Assignment of a persistent digital identifier (like the DOI (Digital Object13 Identifier) for13 journals)13 so the data can be cited and discovered by other

bull The registration of the availability of the data into13 some sort of ldquoregistryrdquo to13 enable discovery without prior knowledge of the existence of the repositoryspecific data features

Additionally creation of new policy and13 standards as well as development of tools to13 evaluate the quality of13 the data and tools to13 enable data-shy‐driven13 discovery will be required to manage and use materials data to the greatest extent possible

NIST is responding to the OSTP (Office of13 Science and Technology and Policy) memos (Dr John Holdren 2 Feb 14) by developing practices that will be of use to13 the community including a Common13 Access Platform allowing access across multiple collections of information13 Research13 and13 Development of Persistent Identifier (PID) infrastructure with13 the Research Data Alliance (RDA) and RampD o Data Type Registries with13 RDA

Finally NIST is collaborating with the National Data Service (based at the National Center for Supercomputing Applications) and is developing a set of best practices for Data Management Plans

7

13

Th Materials Commons A Novel Information13 Repository and13 Collaboration13 Platform for the13 Materials CommunityBrian13 Puchala Glenn13 Tarcea Stravya13 Tamma Emmanuelle Marquis John13 Allison13 University of Michigan

Critical to13 accelerating the pace of materials science and13 development is the development ofnew and13 improved13 infrastructure providing a seamless way for materials researchers to13 share and13 use

materials data and models To address this need we are developing the Materials Commons (httpprismsenginumicheduprismsmaterialscommons)13 an information repository andcollaboration platform for the metals13 community13 in selected technical emphasis13 areas We envision the

Materials Commons becoming a continuous seamless part of the scientific workflow13 processResearchers will upload13 the results of experiments and13 computations as they are performedautomatically where13 possible along with the13 provenance13 information describing the13 experimental and

computational processes By13 associating this13 data with the experimental and virtual computational materials samples from13 which it is obtained the Materials Commons will build process-shy‐structure-shy‐property relationships enabling the construction13 and13 validation13 of constitutive and13 process models The

Materials Commons website provides an easy-shy‐to-shy‐use interface for uploading and13 downloading data and13 data provenance searching and sharing data At its core the13 Materials Commons consists of secure13 data storage cluster an13 application13 for efficiently uploading and downloading large data sets and a REST

(REpresentational State Transfer) based13 API to13 access and13 extend13 the capabilities of the repository TheAPI allows for features such13 as automated13 data upload13 from experiments and13 computations seamless integration of13 computational models and algorithmic analysis of13 process-shy‐structure-shy‐propertyrelationships The Materials Commons is a central thrust13 of13 the Center13 for13 PRedictive Structural Materials Science (PRISMS)

NIST Materials Data Informatics EffortsCarrie Campbell National Institute of Standards an Technology

The presentation covered NISTrsquos pilot efforts in13 creating a data infrastructure focusing on Phase-shy‐Based13 Property Data Phase-shy‐based data13 are13 diverse13 data13 that13 include 1 2 3 and 4-shy‐dimensional13 data are13 semi-shy‐structured and often include incomplete data sets The figure below provides anoverview of NISTs data efforts supporting the MGI

8

13

NIST Materials Data Repository (httpsmaterialsdatanistgovdspacexmlui) is a customizedDSpace repository for materials that enables sharing of variety of data types including text13 files images and video The repository also provides persistent identifier for each entry and allows users to choose

from a variety of13 license types NIST is also working on two data curation tools for phase-shy‐based13 dataThe Thermodynamic Research13 Center (httptrcnistgov) is extending the ThermoML markup languageand Guide13 Data13 Capture13 software13 tool to metallic systems13 with an initial emphasis on phase equilibriaand thermochemical data The13 Materials Data13 Curation System (MDCS)13 employs user-shy‐defined13 XML-shy‐based13 schemas to curate13 data The13 MDCS13 allows either web-shy‐base data curation13 or using REST API The

entered data13 is stored in a non-shy‐relational database repository can be searched using SPARQL (SPARQL Protocol and RDF Query Language) interface13 and integrated with variety of scientific workflow13 tools13

AFLOW ConsortiumMarco Fornari Michigan13 State University

The overall goal of the Materials Genome Initiative (MGI) is to accelerate the discovery13 development and13 deployment of new materials with ad hoc functionalities The AFLOW Consortium

focuses on the software infrastructures to achieve the MGI goals by using first13 principles calculations The emphasis is on the data13 that must be generated archived validated and post-shy‐processed13 efficiently13 both13 to13 identify new materials and13 distill meaningful quantitative relationships within13 the data This extends the13 solution to the13 inverse13 problem of chemically replacing13 rare13 and costly elements in criticaltechnologies

The AFLOW Consortium provides access to the repository AFLOWLIBORG

lthttpAFLOWLIBORGgt13 that publicly distributes data13 on energetics and thermodynamic stability for more than 600000 compounds and band structures for roughly 300000 materials The repository

includes bulk crystals and alloys13 Data were computed with the AFLOW high-shy‐throughput13 framework

using density functional theory The results of the calculations can13 be queried13 with13 the recentlydeveloped13 REST-shy‐API that standardizes and13 labels the data for retrieval purposes The13 fields defined in

the AFLOW REST-shy‐API include a material object identifier (AUID) data cloud13 location13 (AURL) ownership13 9

13

and sponsorship keywords and data13 specific labels (COMPOSITION ENTROPY EGAP hellip) Simple13 scripting languages13 can be used to automatically download13 from the AFLOWLIB13 repository The AFLOW

approach has13 been extensively applied on Platinum-shy‐group alloys thermoelectric materials magnetic materials topological insulators scintillators etc The AFLOW REST-shy‐API can13 be extended13 to13 includeexperimental data For further details on the13 database the13 reader is directed tohttpmaterialsdukeeduauroAUROARTICULAjcommatsci201202002pdf

Materials Innovation Infrastructure Lessons Learned and Future PlansSurya Kalidindi Georgia Institute13 of Technology

central activity in13 materials research13 is the exploration of processing-shy‐structure-shy‐property (PSP) linkages13 Workflow is defined13 as the sequence13 of steps employed for establishing13 PS linkages ofinterest to any specific engineeringtechnology application

High Throughput Explorations of PSP Linkages13 We must treat13 hierarchical13 material as acomplex System which means embracing and exploiting uncertainty quantification13 and13 complexity

management conceptstools13 It also means we need to design develop and13 validate decision support systems13 that will leverage the best current understanding (with13 its uncertainties) and13 provide objective13 guidance13 on future13 effort investment (combination of experiments and simulations)13 This meansleveraging Data Science to effectively feed the decision support systems described above by developingand deploying high13 throughput strategies for obtaining high value information from both experimentsand models

What are integrated13 workflows Integrated workflows are those that utilize the best13 combination of experiments13 and simulations13 in extracting robust and reliable PSP linkages13 This requiresengaging and exploiting cross-shy‐disciplinary expertise that includes materials science manufacturingsystems approaches uncertainty quantification computational science data and information sciencesThrough this one must ensure that the workflows output the critical information needed13 by design13 and13 manufacturing stakeholders in the materials development value chain13 Integration demands expertise

sharing13 which essentially is the core element of a collaboration Cross-shy‐disciplinary Integration13 demands13 effective13 collaborations13 such as those one might set up using the modern tools of the day13 ie a e-shy‐Collaboration

10

13

Motivators for e-shy‐Collaborations13 e-shy‐Collaborations are central to13 enhancing the productivity ofmaterials development activities in several ways

bull individual13 research groups can identify and engage13 complementary expertise13 needed for their projects

bull organizations can13 coordinate multiple teams and13 multiple projects to13 maximize outputimpactbull organizations can13 identify and13 seek the right partners that13 enhance their13 individual and combined

value e-shy‐Collaborations can13 be open semi-shy‐open or closed13 depending o the needs of the specific organization Productivity enhancement leading to dramatic reductions in cost and time13 expended in materials development is the13 main incentive13 for e-shy‐collaborations

Main Impediments to e-shy‐CollaborationsLack13 of tools to identify13 the13 ldquobestrdquo13 collaborators

bull Need software tools that quantify more precisely the expertise of individuals groupsorganizations

bull Need software tools for13 allowing rich annotations and e-shy‐recording of13 discussions Need analytic tools for13 identification extraction13 and management of metadata (ie13 high13 value abstractionsfrom the raw data)

bull Need searchable semantic metadata databases (these are13 lightweight and can be13 centralized

and customized for specific13 user communities) bull Need software tools for tracking provenance (ie13 whowhatwhenhow13 generated modified

etc) Lack13 of experience13 with the13 use13 of integrated workflows

bull Need software tools13 for e-shy‐recording of13 workflows and development of searchable workflow

databases bull Need recommendation schemes that suggest the best workflow(s) based on recorded prior

experience this provides means to transfer expertise13 acquired by an organization to other teams and other13 projects (even in the same organization)

bull Leads to possible automation and dramatic enhancement of productivity

MINED Grouprsquos Past Experience13 (~2005 -shy‐ ~2008) Funded by the Army Research Office ARO(David Stepp) engaged computer science (CS) graduates to produce13 a single13 user platform that13 integrated certain software codes previously produced13 by our group software development subsequently abandoned but recruited two CS graduates into PhD programs in Materials Science andEngineeringMechanical Engineering (MSEME)13 which dramatically altered the direction of13 research in

the group brief chronology of development bull Senior Design Project (2011-shy‐2012) An online13 portal designed for sharing data13 files and tools

learned the importance of user interfaces and the need for professional13 code developers andsoftware architects

bull HUBzero Instantiation (2012-shy‐2013) Built hub by customizing HUBzero for hierarchical structuralmaterials there are several advantages in terms of accessing and executing codes13 (on the server side) but there are significant barriers13 to establishing fast and productive13 cross-shy‐disciplinary e-shy‐collaborations

11

13

bull GTRI Integration Platform (2013-shy‐2014) Engaged professional code13 developers and software13 architects to design13 a multi-shy‐user web13 portal that allows data and13 code sharing while tracking

provenance and workflows underestimated the resources needed for such a project bull MATIN Collaboration Environment (2013-shy‐2014) Lightweight integrated software13 shell that allows

users to13 seamlessly interface with13 numerous cloud13 services offered13 through13 the web13 through13 their13 APIs Examples include GitHub for versioning and archival of codes Authorea forcollaborative editing of documents figshare for citable publication of research Plotly forcollaborative data analysis13 and visualization Google+ and LinkedIn for eteaming13 and networking MATIN focuses on building an emergent community ndash Materials amp Manufacturing Informatics (MMI) MATIN provisions web services for13 software integration platforms and innovationinfrastructure Examples of MATIN13

bull Spatial Statistics httpsgithubcomtonyfastSpatialStatisticsFFT

bull PyMKS httpopenmaterialsgithubiopymksindexhtml bull Example discussion about periodic boundary conditions in running Finite Element (FE or FEM)

simulations httpsgithubcomsfepysfepyissues267

In the platform developed it is noted that the user or userrsquos organization can set permissions at the

level13 of individual datasetscodes o the degree and13 type of13 sharingCentral Challenge Activation The transition from traditional disciplinary workflows to the

integrated workflows needs to overcome an activation barrier There is no13 clear incentive to13 get ahead13 of the curve in13 anticipating how the upcoming transformations in how materials development will be

pursued13 in13 the future Open (reproducible) science alone is not a sufficiently strong motivator for materials development community with13 strong interests in intellectual13 property development13 indeed itmight actually13 be perceived13 as a negative There is need to explore13 and refine13 what incentives work for individual13 researchers (eg increased productivity)

Future13 Plans13 Continue to13 develop13 and13 deploy MATIN at GT and any other organizationsgroups that13 express an interest in working with us Design create and distribute13 suite13 of ldquomobile13 firstrdquo Appsdesigned13 to13 make data ingest into13 collaboration environments such as MATIN from both material characterization equipment and multi-shy‐scale models an enhanced13 user experience with13 friendly andnatural interfaces Proactively promoteincentivize13 datametadata13 ingest into publicly shareddatabases Work with13 professional societies to13 announce and run new competitionsawards for13 recognizing and rewarding good practices by the members of the broader materials innovation13 usercommunity (eg Best13 Documented Data Award Best Documented Tool Award Best Documented

Workflow Award) Develop and deliver new workshops for training the new generation of13 graduate

students postdocs and junior faculty in the13 emerging workflows of Materials amp ManufacturingInformatics

High Performance Cast Aluminum Alloys for Next Generation Passenger Vehicle EnginesDongwon Shin Materials Science and Technology Division Oak Ridge National Laboratory

12

13

Oak Ridge National Laboratory (ORNL) teamed with Chrysler and Nemak13 to develop next-shy‐generation high-shy‐performance cast aluminum alloys that have13 improved castability high-shy‐temperature

strength and fatigue13 performance At this point the13 data management practice is limited13 to13 analytical data spreadsheets other than13 the CALPHAD computational thermodynamic databases and13 proprietary

property databases for casting simulation As the project goes on more robust data schema will bedeveloped13 to13 store various types of data populated13 from the experiments and13 calculations

Besides the data structure design the speaker discussed13 the necessity of a new approach13 to13 identifying key descriptors from the large dataset that can guide the design of new materials An alloydesign13 requires the balance of extremely complex relationships among a large number of descriptors Some13 examples of these13 descriptors include13 phase fraction13 of key intermetallics lattice parameters ofthe matrix and diffusion13 kinetics13 of solute atoms However optimizing these quantities relies on good

physical metallurgy intuition13 coupled13 with13 clever experiments dogged13 sleuthing and13 a healthy dose ofluck13 Such descriptor-shy‐based predictions of materials properties date back to13 the beginning13 of modern

solid-shy‐state chemistry and physics As13 shown in the Darken-shy‐Gurry map to predict the solubility ofelements in Tantalum13 intuitive descriptors13 such as electronegativity and radius13 could be used formaterials design However as the chemistry and physics13 of materials become more complex intuition-shy‐based13 search13 may not work anymore Thus the use of more advanced13 statistical analysis and13 datamining approach might be required to accelerate the design of complex multi-shy‐componentmulti-shy‐phase

alloy design13

Data Related to Cast Aluminum

Jake Zindel Ford Motor Company

The main focus of our research is the development of CAE (Computer-shy‐aided Engineering) tools to support13 two significant13 activities in the company component13 design (product13 development)13 and

manufacturing engineering (high-shy‐volume production) A key13 part of our research is conducting13 experiments to collect data13 for the13 development and validation of13 the CAE tools

The data13 we collect consists of mechanical properties physical properties microstructuralimages and manufacturing process parameters that form the13 pedigree13 of the13 material This information

can be in the form of Excel files proprietary-shy‐software storage files screen shots and written reports Our immediate need is to deal with this large amount of existing information collected over

many years on a ldquolocalrdquo level The first step is to organize many independent collections of informationin all13 their forms13 The next issue is the time required to enter all13 this information into the database13 Maintenance may also require a significant amount of effort Lastly there will also13 be a need13 to13 manipulate the data contained in the database using Excel Mini-shy‐tab MATLAB etc Therefore links or13 easy exportation of data13 to these13 types of applications will be13 necessary

The next level would involve dealing with information that has been created or collected byothers From an13 OEM (Original Equipment13 Manufacturer)13 perspective this would13 involve convincing

suppliers13 to share processing information that they may consider proprietary In addition to suppliersuniversities and13 national laboratories are also13 sources of data Data from all the13 sources will require13 some sort of protocol to determine its13 quality and resolve inconsistencies

13

13

In the end13 it seems likely that considerable budget and time will be required to implement comprehensive informationdatabases which13 could be the largest hurdle13 of them all

QuesTekrsquos MGIICME Approach to Alloy Optimization and Design (Including Cast Iron) Jason Sebastian QuesTek Innovations LLC

Integrated Computational Materials Engineering (ICME) methods involve the holistic application13 of different computational models across various length13 scales to13 the design development and13 rapid13 qualification13 of advanced13 materials Using itsMaterials by Design technologies QuesTek has successfully applied these ICME methods to13 the design development13 and full qualification of several advanced alloys (including aerospace13 structural steels aerospace13 gear steels advanced castable13 titanium alloys high-shy‐strength corrosion-shy‐resistant13 aluminum alloys13 and high-shy‐performance cast iron)

QuesTekrsquos ICME methods benefit tremendously from the activities of the US Materials GenomeInitiative (MGI)13 Recognizing that the ldquo10 to13 2 yearsrdquo that13 it takes for alloy design and development ishindering the rapid13 deployment of higher-shy‐performance materials13 the MGI is focused on ldquocutting in halfrdquo the time and cost13 to design and deploy novel advanced materials needed to enhance and sustain UScompetitiveness

QuesTek also has ongoing13 activities on the13 application of ICME methods to the13 design and

development of materials for the energy sector including advanced13 materials that enable vehicle

lightweighting13 Of particular note is an ongoing effort with a major diesel13 engine manufacturer on the13 development of high-shy‐performance cast iron13 alloys for engine block applications QuesTekrsquos efforts have

made extensive use of various MGI-shy‐type tools and databases including 1)13 density functional theory

methods for high-shy‐throughput13 screening of cast iron inoculant stabilities (structure lattice parameterstability of various13 oxidesulfide phases) 2)13 liquid iron mobility databases (to help accurately modelsolidification phenomena) and 3) thermodynamic13 databases13 (to examine nano-shy‐precipitate stability)

Moving forward to continue to accelerate the materials development cycle efforts shouldremain focused on the development13 of13 the fundamental databases (thermodynamic kinetic etc)13 that13 enable13 ICME models There13 should also be13 increased focus o efforts related13 to13 the integration13 of the

various computational databases models and tools (the ldquoIrdquo13 in ICME)

Integrated Computational13 Materials Engineering (ICME) of Generation Three Advanced High Strength

Steels Louis G Hector Jr General Motors RampD Center

While traditional material development methods often involve laborious trial-shy‐and-shy‐error testing Integrated Computational13 Materials Engineering (ICME) which falls under the umbrella of the Materials Genome Initiative is offering a means13 for materials13 development under a more compressed time scale Advanced13 high13 strength13 steels (AHSS) which13 contain13 multiple phases (eg ferrite bainite pearlite austenite martensite) and may exhibit13 phase transformation with strain are fertile ground for13 ICME However prediction of macro-shy‐scale constitutive behavior based upon the multi-shy‐scale physical chemical and mechanical phenomena13 in these13 complex microstructures is formidable challenge for ICMEUnprecedented collaboration between universities industry and government labs will be required to

14

13

address AHSS13 development archiving of data and the13 implementation of these13 materials into products that13 benefit13 the American consumer This presentation began with a brief overview of a newDOEUSAMP (US Automotive Materials Partnership)-shy‐funded ICME program aimed at13 the development13 of Generation13 Three (Gen13 3) AHSS The lack of commercial Gen13 3 steels that meet DOE targets is offering13 significant challenges13 to this13 new ICME program The program is13 meeting these challenges13 following four thrusts (1)13 making Gen 3 steels to meet DOE targets (2) passing experimental measurements of key

properties to13 an13 ICME model that produces microstructure-shy‐based13 constitutive models (3) materialevaluation in forming13 and vehicle13 performance13 simulations (4) archiving13 all relevant data13 for easy use13 by

others in13 the program Although13 the Gen13 3 AHSS ICME program is ldquomission13 orientedrdquo it demonstrates an important role for universities in providing the necessary foundation in fundamental materialsscience and engineering

Integrated Computational13 Methods for Composites Materials (ICM2)13 Lessons Learned from

Integration Planning and Feasibility Demonstrations Lara Liou GE Aviation

General Electric Aviation and Lockheed Martin Aeronautics have teamed through an Air Force

Research13 Lab13 program to13 conduct a demonstration13 of an13 Integrated13 Computational Materials Engineering digital framework that links material processing property and13 structural relationships to13 account for processability manufacturability and system performance13 with the13 goal of demonstrating

that13 the usage of13 integrated models can contribute to future airframe and engine designs with dramatic reductions in development13 time and cost Both engine and airframe Foundational Engineering Problems will be demonstrated through a common digital framework to be developed and validated onincreasingly complex articles13 The modeling tools that13 cover composite process models localized13 mechanical behavior models and global design models (Convergent Manufacturing TechnologyrsquosCOMPRO CCA Autodeskrsquos Autodesk Simulation13 Composite Analysis and13 an13 Abaqus based13 University ofMichigan micromechanics model) are centrally linked via a commercially available model integration

interface (Phoenix Integrationrsquos ModelCenterreg)13 Detailed plans of what data will13 be passed among thevarious models as well as how it will be passed have been completed Creation of these detailed13 process maps and13 simple feasibility demonstrations have raised13 inherent technical and13 logistical challenges13 to integrating commercially13 available multi-shy‐scale models Robust model integration methods13 to address these challenges were13 discussed

The exercise of drawing up detailed process maps for what to integrate and how to integrate13 the selected tools and models resulted in key early lessons learned and identification of13 tools critical tolong-shy‐term robust and sustained application of13 Integrated Computational Methods for Composites Materials (ICM2)13 The key lessons learned include

bull Individual13 models must be relatively mature and validated before an ICM2 problem can be

confidently13 defined bull F analysis-shy‐based13 tools within13 the same digital thread should utilize13 the13 same13 FE solver bull The team ideally13 should consist of at least one expert for each13 tool and13 its underlying model as

well as an expert in the digital thread and integration interface tools and programming

15

13

bull Mesh interpolation between13 finite element analysis based13 tools will likely be required and13 acommercially13 available tool for this13 type of composite mesh interpolation is13 key13 for robust and

sustained application of ICM2

Residual Stress Engineering of Rotating Nickel Components Terry Wong13 Aerojet Rocketdyne

The goal for the PW-shy‐8 project13 The Nickel Base Superalloy Residual Stress Foundational Engineering Problem program aims to13 develop13 a multi-shy‐disciplinary analysis approach13 to13 predict and13 incorporate bulk residual13 stress in the13 design of component This will allow for design of part that is optimized for a particular application or design intent (such13 as weight performance or reduction13 ofscrap) To accomplish this13 goal the PW-shy‐813 team has two major tasks 1) the13 development of an13 infrastructure and associated tools needed to predict and incorporate bulk residual13 stress in the designstages13 of a part and 2) a demonstration task that will validate the infrastructures13 and tools13 developed

The Data13 Management task is the foundational element13 of13 the larger13 infrastructure

development task The goal for the data management work is to13 develop13 an13 infrastructure that allowsthe various team members in PW-shy‐813 to share13 data13 and to store13 data13 in database This not only ensures that data generated13 by this program is never lost but also13 allows for data to13 be transferred13 in13 a secure

manner In order to meet the goal of a system13 that securely stores and shares data three major tasks must be accomplished 1) Validation that there13 is secure13 method to transfer data13 between team

members and that the method of storing data is also secure 2)13 development of13 a database schema and

3) development of means to easily upload data13 into the13 database13 and download data13 from the13 database

Of the three the most difficult task was to create a system in which the transfer and storage ofdata was secure Various proposals were made to13 ensure the security of the data but many of them

were rejected not because the proposal was actually unsecure but because it was not acceptable byeach of the13 team memberrsquos IT security policies The13 process to get each team membersrsquo IT department to approve the security of13 the data transfer13 and storage took more effort13 than anticipated In the endthe team members along with their13 respective IT departments approved the use of13 TRUcentrix as a tool to transfer13 data and the use of13 an ASM International13 hosted13 database to13 store data

The development of database schema13 is necessary step in creating the database to store all the data generated (via physical tests as well as data generated by models) A big help in creating the

database schema was to13 use the schema generated13 by Dr Steven13 Arnold13 from NASArsquos Glenn13 Research13 Center Using NASArsquos schema eliminated13 the need13 to13 start the database build13 from scratch13 and13 allowed13 us to13 focus o the specific needs of storing residual stress data

The last task involves the development of tools that make the input of data13 easy The moredifficult it is to13 enter data into13 a database the more likely data13 will not be13 entered Excel-shy‐based13 templates are being developed in order13 to make data entry as easy as possible

The two big lessons learned from PW-shy‐813 in relationship to data13 management are

bull It is essential13 to work with each13 companiesrsquo IT and IT Security people to develop a system thatsatisfies13 the security requirements of each company

16

13

bull Database schema development should not be reinvented but begin with already established the13 schema

Perspectives on MAI ProgramsDr Mike Glavicic Rolls Royce

The current formulation of MAI (AFRLrsquos Metals Affordability Initiative) programs are structured13 independently from one another in that a standard methodology does not exist that describes whatdata is to13 be archived13 at the conclusion13 of a program Nor does there exist a standard13 protocol of howto archive things on a server13 or some other computer resource at the conclusion13 of a program As aresult the current13 practices of13 MAI programs

bull Do not address IP (intellectual property)13 and licensing issues in systematic fashion which leadsto issues in the commercialization of ICME software that is generated13 under a program

bull Do not have a formalized requirement of how data is to be packaged and stored at the end of aprogram Hence even13 in13 the best of cases in13 which13 all of the data is stored the data is archived13 in13 simple folder13 fashion that13 is not13 searchable in any manner13 possible Moreover even if13 all of13 the

data is present it is extremely cumbersome to13 untangle the data for future use using a file folder strategy

Hence at the end of the program both the government and13 MAI team are13 left to manage13 the13 data and13 results in13 an13 unregimented13 fashion

The lack of regimented protocol of how to package programrsquos13 results13 (raw data and reports) makes it extremely difficult for new government programs to leverage the data from previous programs Case in13 point the RR-shy‐1013 program was developed with the13 singular goal to integrate13 the13 models developed13 in13 the LAD-shy‐213 program into commercial software13 platform From the13 outset of the13 program data models and13 other information13 that was required to attained the13 goals of the13 RR-shy‐1013 program were13 either difficult to track down lost or not provided to the13 RR-shy‐1013 team due13 to contractual disputes As aresult funding that13 was earmarked for13 specific technical work at13 the conception of13 the13 program had to

be diverted13 for the recreation13 of data that existed13 in13 a previous program and13 the progress of the

program was stunted13 by this unforeseen13 burden In13 spite of the RR-shy‐1013 programrsquos13 ability to overcome

these issues and become a successful program it13 should be noted that13 the success of13 other13 programs that13 will encounter13 similar13 problems will likely not13 apply as RR-shy‐1013 was very fortunate13 to have13 the13 means to overcome these serious issues that13 were a result13 of13 the lack of13 a data management13 plan

Furthermore if the13 MGI initiative13 is to progressively develop materials models that are13 leveraged and improved in future government funded programs a policy on how to store and share thedata from disconnected13 programs is required The PW-shy‐1113 program is the first13 step in this process as it13 will have the data from three MAI programs (RR-shy‐10 GE-shy‐10 PW-shy‐9) uploaded into database13 that will be13 searchable and will enable the reuse of the data collected in these programs However even if this13 program is a success13 there will undoubtedly13 be other features13 that will be required in the platform

developed13 and13 hence future modification13 will be required Most importantly there does not currently13 exist server that has been set up and funded for the13 archiving13 of all13 future programs with a protocol13

17

13

13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

that13 must13 be adhered to at13 the conclusion of13 a funded program Without13 such a protocol and the

existence13 of server to be13 used in all future13 programs the13 goals set forth by MGI will be13 unnecessarily

difficult and13 more expensive to attain

MAI Data Informatics Data Standards Modeling amp UQDr Rajiv Naik Pratt amp Whitney

The MAI Data13 Informatics Data13 Standards Modeling amp Uncertainty Quantification (UQ) (PW-shy‐11)13 effort13 will enable the pervasive implementation of ICME by developing (i) a web-shy‐based13 OEM-shy‐Suppliercollaboration framework13 for production13 certification13 and13 process modeling data (ii) data standards and

protocols for interoperability between13 OEMs and13 Suppliers and13 (iii)13 UQ methods to capture variability in

process microstructure and property data Models and data13 from earlier MAI programs such as GE-shy‐10 PW-shy‐913 and RR-shy‐1013 will be13 used to develop and implement OEM-shy‐Supplier integration data standards data

structures and UQ models Several OEMs and suppliers (Pratt amp Whitney General Electric Rolls-shy‐RoyceTIMET Alcoa13 and RTI International Metals Inc) are working together on the PW-shy‐11 Team

The OEM-shy‐Supplier collaboration tool will enable data13 exchange related to mechanical testingmicrostructural13 characteristics processing and process modeling The13 data13 standards developed by the

PW-shy‐1113 team will provide13 an industry standard for such OEM-shy‐Supplier collaboration The collaboration13 framework would provide a platform to manage OEM-shy‐Supplier13 workflows data and13 informationsecurity and access13 control and the13 ability to perform system administration without the13 need13 for day-shy‐to-shy‐day intervention13 of the IT department The capture of such13 supplier data over the collaboration

framework will enable the OEMs to enhance process and property13 models data analytics and UQ

modeling Three potential technologies will be evaluated13 for the OEM-shy‐Supplier Collaboration13 Frameworkndash GE Forge Vanderbilt Forge and Semantic Web technology and one13 technology will be selected13 for the

next phase of the program

18

13

Semantic Web technology is paradigm shift from highly structured13 databases to13 more flexible

graph databases from single database searches to13 multiple and13 disparate13 database13 searches and from

keyword searches to contextual and meaningful (semantic) searches It can provide flexibility13 of data

structures and enhanced OEM-shy‐Supplier13 interoperability However it requires more work upfront13 for13 implementation as it needs the development of a common vocabulary and relationships

The data13 standards and data13 structures effort will leverage work at national and13 international standards13 organizations such13 as ASM ASTM CEN and13 NIST and13 develop13 standards and vocabularies forproduction13 certification data that can include chemistry microstructure and mechanical properties This effort will also develop means for the capture of data needed13 for the UQ and13 ICME models andbuild13 databases for13 the capture of13 the data from the GE-shy‐10 PW-shy‐9 and RR-shy‐10 MAI programs

The ICME13 models being developed under the GE-shy‐10 PW-shy‐9 and RR-shy‐10 programs involve severalinterconnected models related to processing microstructure and mechanical properties The UQ effort under PW-shy‐11 will involve considering the uncertainty at each13 step13 of the modeling process and flowing it through the interconnected models to evaluate the relationships between13 the variabilities of13 the

processing parameters the microstructure and13 the mechanical performance

ASMrsquos Computational Materials Data Network amp the NIST-shy‐ASM Structural Materials Data

Demonstration ProjectScott D Henry ASM International

ASM International ndash the worldrsquos largest13 association of13 metals-shy‐focused materials engineers scientists technicians educators and students13 ndash has launched13 a series of initiatives under the

ldquoComputational Materials Data (CMD)13 Networkrdquo umbrella The objective13 of the13 CMD Network is to serve13 as center for information collection and dissemination for materials data13 to support integrated

computational materials13 engineering (ICME) and to help realize the goals13 of the US Materials13 Genome

InitiativeThe first major project for the CMD Network is the NIST-shy‐ASM Structural Materials Data

Demonstration Project (SMDDP) a cooperative research project funded by NIST with participation from

the Kent13 State Center13 for13 Materials Informatics and Granta Design Ltd13 The main13 objectives of SMDDP

are13 to

19

13

bull Establish well-shy‐pedigreed13 and13 curated13 demonstration13 datasets for non-shy‐proprietary metallic structural materials13 data over multiple length scales The team has13 chosen to work with 6061

aluminum and the13 related Al-shy‐Mg-shy‐Si system

bull Work with NIST and the materials data community to develop materials data schema and

ontologies for the demonstration13 datasets cognizant of broader interests and13 datasets

bull Develop and carry out a series of test problems that represent relevant use cases for the13 repository

bull Make data open to the materials data community for use in data analytics modeling and

educational purposes

bull Actively engage the materials data community and13 widely disseminate the findings from the

project

bull Develop and implement data13 capture13 and curation procedures that can serve13 as models for otherdata repositories

The basic philosophy behind the project is that often standards13 emerge through trial and error and examples are13 needed to spur this process

The project is using the NIST-shy‐KSU DSpace13 repository for data13 files The13 curated structured

database is being developed13 using GRANTA13 MI software and13 is hosted13 o ASM serversThe project ndashlaunched in late 2013 and planned as an 18-shy‐month program13 ndash is being conducted in

phases The13 first phase13 involved developing starter database13 and gaining experience13 in data13 curation The second phase (currently underway) involves building out more comprehensive data13 sets that will be

suitable for carrying out test problems13 related to processing-shy‐structure-shy‐property relationships in13 Al 6061 The third phase will involve making further refinements in response to testing The complete data13 set will be opened up online to the materials community as a whole

Globus13 Scientific13 Data Services13 Current and FutureBen13 Blaiszik1 Kyle Chard1 Jim Pruyne1 Rachana Anathakrishnan1 Steve Tuecke12 Ian Foster12

1University of Chicago Computation Institute 2Argonne National Laboratory MCS Division

During this talk we discussed a variety of currently available Globus services13 Via these Globusservices developers13 and IT administrators13 can offer simple secure and robust file transfer as13 well as13 identity profile and group management to their user communities13 Globus allows end users to create

and13 manage a unique identity that can13 be linked13 to13 external identities for authentication End13 users can13 also transfer files across wide13 area13 networks faster and more13 easily whatever their location

The talk also covered prototype services related to data13 publication metadata tagging and13 data

search Globus13 publishing capabilities13 are delivered through a hosted service Metadata is13 stored in the

cloud while raw published data and a fully-shy‐formed metadata log are stored on campus institutionaland group resources that13 are managed and operated by campus administrators Published datasets are

20

13

organized13 by communities and13 their member collections Globus users can13 create and13 manage theirown13 communities and13 collections through13 the data publication13 service13 with collection level policies regarding user13 access Additionally metadata from these communities are imported into a centralized

index to allow for robust searching functionality

Semantics13 for Deep13 InteroperabilitySamuel Chance iNovex

Semantic technology is based on mature13 web standards and intended to be13 used as global framework for13 machine-shy‐to-shy‐machine communication Interoperability is enabled even when the more

basic elements of the semantic technology stack are used13 in13 conjunction with a controlled vocabularyKnowledge13 representation is enabled as one13 begins to utilize13 the13 ldquohigherrdquo elements of the13 stack

brief history of the semantic web13 was presented13 and13 started13 with13 a depiction13 of the web13 that most people see today documents and web pages presented to the user13 through a web application It13 then depicted a transition to an intermediate state where documents and data can co-shy‐exist on asemantic13 web riding atop the same infrastructure (http) Even in this13 state the data can13 be directly

interpreted by a machine13 The history concluded with a depiction of data being extracted from adocument to13 become part of the web13 of data

The basics of semantic technology were presented and emphasized that the technology has afoundation based13 o formal logic that enables software13 to understand information and reduce13 the13 need

for13 a human to interpret13 the meaning This section of13 the presentation concluded with a chart13 depicting

how ldquotriplesrdquo link to13 each13 other to13 form a web13 of dataThe final section touched upon adoption of13 and things to consider13 when semantic technologies

are13 being considered as solution for community or enterprise

21

13

Big Data13 and Semantic TechnologiesHarlan Shober RJ Lee Group

The presentation started with a description of entities that13 comprise laboratoryinformatics with the central elements being a federated architecture scientific ldquoBig Datardquo management13 and workflow-shy‐driven Laboratory Information Management13 Systems (LIMS)To aid attendee understanding initial context13 was provided by first13 defining for ldquoBig Datardquo and

Semantic Technologies Context13 was further bolstered through a description of a problem that13 included laboratory testingcharacterization and fabrication processes The challenges for atypical laboratory include the use of many systems and the creation of large amounts of data13 that13 grows daily but13 cannot13 be readily accessed

After the context13 was established the needed functionality was described and consistedof several elements contribute index search analyze and archive While bearing in mind theneeded functionality a string of universal scientific data13 management13 process categories werethen presented with the intent13 of applying the universal description to the materials design anddevelopment13 domain

22

13

Though semantic technologies are applicable over the entire lifecycle of the data13 management13 process three categories were specifically identified as primary candidates fortheir application Data13 Acquisition Data13 Manipulation and Storage and Discovery andAdvisory W3C standards for Resource Description Framework (RDF) were applied to a notionalexample with the intent13 of completing a thread that13 gave the attendee a sense of how thistechnology could be applied

The presentation continued with a description of Postulate a specific solutiondeveloped by RJ Lee Group that13 could be used as part13 of a suite of tools to enable the captureof data13 and the near-shy‐frictionless tagging of data13 with additional observations provenanceinformation or links to additional data Using tools like Postulate repositories could be moreeasily shared with others in a federated manner These technologies would enable thedevelopment13 of indices that13 increase the precision of search results and ease analytics since thedata13 would be ldquoself-shy‐describedrdquo

In the remainder of the presentation Postulate functionality was further described andsome of the challenges presented earlier were addressed For example how do you processthe quantity and variety of data how do you store it how do you find the data13 you care aboutand how can the data13 be presented to you in a meaningful way

Natural13 Language Processing (NLP)Tim Hawes Decisive Analytics

The presentation provided the attendees with a description of NLP13 and how it can be13 used It started with examples13 of how it can be used to mine ldquoBig Datardquo to predict credit card fraud or assess13 user purchase histories for product development NLP is also13 being used13 in13 the field13 of biology and13 appears to have13 enjoyed some13 level of success

Since13 the13 biology and materials communities both publish large13 volume13 of documents (abstracts papers presentations etc) many of13 the NLP approaches developed for13 biology may be

applicable13 to materials The next section of the presentation focused on describing various aspects of NLP tokenization

part-shy‐of-shy‐speech tagging disambiguation parsing entity disambiguation and role13 labeling This was helpful in13 removing the mystery surrounding NLP for those13 who hadnrsquot had previous exposure

Typical products from NLP13 would be useful to the materials development community Examples include providing the gist of a documentrsquos content providing answers to natural13 language questions and knowledge13 construction

Specific NLP13 challenges for the13 materials domain were13 identified inline13 use13 of equations prevalence of tables graphs and13 images (more of a computer vision13 problem) and13 numerically driven13 information13 Additionally the ldquogenrerdquo problem was discussed as13 well as13 a way this might be addressed

using raw data (versus learning data) and13 semi-shy‐supervised learning active learning and distant supervision approaches

23

13

The presentation concluded by reiterating how NLP13 can be used and the value of using it describing some its limitations and13 identifying tool development needed13 for the materials domain

24

13

Community13 and13 Government Priorities in13 Addressing13 Materials Data

The following numbered13 priorities for13 both community and government13 actions were lsquorolled uprsquofrom workshop participantsrsquo inputs and were reviewed at13 the workshop The raw inputs provided by

participants are captured13 under each13 numbered13 priority and provide context to13 the priority and13 often13 more specific actions

Community-shy‐Led13 Priorities1 Develop13 and deploy13 standards13 for data and federatedcollaborative environments-shy‐ Promote13 standards for data13 and collaboration environments-shy‐ Designing and deploying suitable software environments that are capable of facilitating e-shy‐

recording and accelerating (fail often-shy‐fail fast)13 of13 the communityrsquos experience (both successes and

failures)13 in exploring integrated workflows needed to realize the vision of ICME and MGI-shy‐ Data13 standards (program managementtechnical datametadata)-shy‐ Community must start developing a common13 data format data package could13 look like an13

HDF5 file and be shared in a variety of systems -shy‐ Begin13 definition13 of standardized13 infrastructure

-shy‐ Produce13 draft guideline for the metadata including information regarding authorship provenance targethellip

-shy‐ Share13 the13 data13 types being collected with the13 community Include13 nametypeunits of data

-shy‐ Community must adopt a federated13 approach13 -shy‐ allow organizations to autonomously create their13 own13 toolsets but be able to13 readwrite to13 the common13 data package mentioned13 in13 point 1

-shy‐ Good documentation on repositories are important-shy‐ Address networking of repositories (lightweight metadata databaseindex) -shy‐ Digital heterostructure microstructure mapping algorithms for both scanning electron microscopy

(SEM)13 and transmission electron microscopy (TEM) -shy‐ Materials process model (thermodynamics) integration with ICMSE (Integrated Computational

Materials Science and Engineering) data flow

-shy‐ Data sharing protocol across users (government13 OEM13 developers) with appropriate securitycontrol

-shy‐ Work with instrumentequipment OEMs to get auto ingest capabilities and open architectures-shy‐ Work together to define common vocabularies for interoperability-shy‐ Standard tests for ICME13 model data13 standards for microstructure13 data

-shy‐ Standards for ICME13 model data13 storage13 and workflows 2 Communicate the value and need13 for digital materials13 data-shy‐ Brand13 management ndash consistent messaging of goals13 and beliefs -shy‐ Share13 success stories -shy‐ Value (ROI13 Return on Investment) o data capture (globally locally)-shy‐ Give incentive ($$$) for case studies to provide evidence that a shared data repository can be

actually useful beyond the archival purpose

25

13

-shy‐ Share13 ROIs for MII (the13 Materials Innovation Infrastructure)-shy‐ ICME projects due to their iterative nature take longer and are more expensive to realize benefits

in industry implementation13 Multi-shy‐year ICME programs are becoming13 a hard sell to management difficult to13 establish ROI and13 realize benefits claimed

3 Engage the community-shy‐ Write grant proposals-shy‐ Collaboration13 (or e-shy‐collaboration) science that will help us13 identify13 the specific13 incentives13 and

impediments for establishing mutually beneficial highly productive sharing of expertise between13 disciplinary experts in13 materials manufacturing and13 data sciences

-shy‐ Continue to13 work o hard13 problemsmdashdevelop13 open13 source codes discover new descriptors data

mining uncertainty analysis-shy‐ Organize as a community (governmentindustryacademia nationalinternational) -shy‐ Follow data13 recommendations in MGI strategic planmdashworkshops to develop roadmap-shy‐ Attend13 more events like the MGI workshop13 mdash gaining13 lessons learned from other organizations is

extremely valuable

4 ldquoDefine critical datardquo13 to be13 compiled-shy‐ Consensus o what fundamental genomic data is most-shy‐important to compile (thermodynamics13

kinetics13 properties [strength]13 high-shy‐throughput13 DFT) -shy‐ Agree to13 the concept of a ldquouniversal repositoryrdquo for all genomic data that everyone will contribute

to and use (cf everyone maintaining and hosting their13 own databases) this includes issues of13 data

security and sharing etc -shy‐ List of most-shy‐important ICMEMGI13 materials development activities from an industrycommercial13

perspective (alloys composites piezoelectrics etc)-shy‐ Some13 ongoing repository efforts seem redundant We13 should avoid unnecessary competition and

try not13 to step o other peoples toes5 Explore and leverage data solutions developed13 outside materials13 community-shy‐ Semantic web infrastructure13 ndash evaluate13 feasibility of lsquounstructured materials datardquo management

(Technical pathcosttimeframe) -shy‐ Keep communicating outside13 the13 community about solutions -shy‐ Focus on available13 tools open ones when possible

6 Train13 students-shy‐ Sponsor students13 who can think in terms13 of IT and MSE

-shy‐ Training workshops (through professional societies) for training the new generation of materials students13 and researchers13 in the use of readily (already) accessible data science tools13 and cloud

services13 to make13 the13 best use13 of this emerging13 infrastructure13 in enhancing13 their own individual day-shy‐to-shy‐day productivity

7 Establish community13 norms13 for publishing materials13 data-shy‐ Involve publishers by requiring data when publishing Begin by making just the data available first

and then work on standardization panel

26

13

-shy‐ Data associated with publications should be shared along with the publications Not only the

metadata (eg average grain size vs cooling rate) but also the source of the data (eg micrographs used13 to13 generate13 grain size temperaturetime charts13 used to calculate cooling rates etc)

Government-shy‐Led13 Priorities1 Lead13 data strategyapproach13 development-shy‐ Hold meeting like this one in 6-shy‐1213 months with purpose13 of developing roadmap for MII-shy‐ Define a technical roadmap for13 the materials data infrastructure

-shy‐ Governmentrsquos got to decide on a strategy-shy‐ Whatrsquos the critical nucleus eg getting13 CALPHAD off the13 ground

-shy‐ Government must establish an official stance on security and infrastructure-shy‐ List of most-shy‐important ICMEMGI materials development activities from a government perspective

(alloys composites piezoelectrics etc)13 -shy‐ Needs assessment for professional workforce development-shy‐ Work with industry to make sure the research-shy‐to-shy‐engineering13 gap is filled What is the practical13 use

of the projects -shy‐ Provide13 the13 materials community organizing function

-shy‐ Business case for productivity improvement with13 data readily available

-shy‐ Facilitator as honest broker of capability awareness and data13 sharing

-shy‐ Interdisciplinary coordinated effort13 to avoidminimize duplication and redundancy

2 Facilitate data deposition (policy amp technology)-shy‐ Setting expectations and guidelines for best practices and standards where13 relevant -shy‐ Invest in tool13 making data management as easy as possible-shy‐ Promote standards developmentcommon data structuresshare knowledge representations -shy‐ Learn about the WBC Semantic Web capabilities and limitations -shy‐ Government should foster community development for both data package refinement and creating

standard approach to creating an enterprise toolset -shy‐ Provide13 motivation for standard data13 collection ensure13 it includes pedigreeprovenancemeta-shy‐data

-shy‐ Facilitator for materials data13 standards (good and trustworthy data13 vs bad data) -shy‐ Facilitator for forming consortium for data13 mining data13 warehousing and sharing

-shy‐ Facilitate13 industry standards for ICME13 model validation and maturity levels-shy‐ Facilitate13 standards for model-shy‐based13 process and13 supplier qualification

3 Incentivize data deposition-shy‐ Provide13 motivation for13 standard data collection ensure it13 includes13 pedigreeprovenancemeta-shy‐data-shy‐ Push ldquodata13 management planrdquo requirements to ldquodata13 depositrdquo requirements -shy‐ Make data sharing a requirement -shy‐ Contractual requirements that preserve data for future use

-shy‐ Demand data submission as a deliverable that will be in the contract before paying the funds -shy‐ Require that any data generated13 by federal grants is provided13 to13 the government (or a designated13

repository)13 within 3 years after completionmdashstart with basic13 research

-shy‐ Demonstrate the return on investments in the materials data infrastructure

27

13

4 Support13 data sharing13 and deposition13 (provide resources)-shy‐ Increase of MGI13 funds dedicated to MII13 development to 10-shy‐-shy‐repositories amp collab platforms -shy‐ Commit to13 long-shy‐term support for data repositories amp collaborative platforms-shy‐ Funding for development collection of fundamental genomic data

-shy‐ Support thermodynamicskinetics database development for light-shy‐weight alloy systems-shy‐ Funding for universal data13 repository (eg at NIST) ndash creation and maintenance

-shy‐ Fund or establish data13 repositories with commonuniform data13 collection format -shy‐ Createidentify secure data sharing site (like AMRDEC)13 for13 government13 contractorsmdashnot a

repository

-shy‐ AFRL take lead13 o making a database that can13 be13 shared

-shy‐ Infrastructure for materials related semantic web applications5 Support13 data creation-shy‐ Support experimental measurement effort to provide13 quality data for13 database development -shy‐ Support more13 high-shy‐throughput13 (HT)13 studies13 for applied programs13 to populate data Current HT

approaches support are13 limited to too fundamental single-shy‐phase DFT calculations 6 Enhance data management13 competency-shy‐ Government needs to increase in-shy‐house expertise in13 these areas or alternatively work closer with13

groups like NIST We cannot allow other organizations to define our needs for us7 Provide quality datasets13 for model13 developmentvalidation-shy‐ Need for open model benchmark cases with data to be available to the community

28

13

Attendees

John Allison University of MichiganJohn Beatty Army Research Laboratory

Ben13 Blaiszik University of ChicagoANL Carrie Campbell NIST

Sam Chance13 InovexJennifer13 Fielding Air Force Research LabMarco Fornari Central Michigan

Tim Hawes Decisive Analytics Corp

Lou Hector General MotorsAndrea Helbach Air Force Research LabKris Hill ITI-shy‐GlobalMike Glavicic Rolls Royce

Ro Gorham NCDMM

Dennis Griffin NASA

Mike Groeber Air Force Research LabScott Henry ASM InternationalMatt Jacobsen Air Force Research LabWill Joost Dept of Energy-shy‐EERE

Surya13 Kalidindi Georgia Tech

Kevin Kendig Air Force Research LabBen13 Leever Air Force Research LabLara Liou GE Aviation

Kenny Lipkowitz Office of Naval Research

Bill Mullins Office of Naval Research

Benji Maruyama Air Force Research LabRajiv Naik Pratt amp WhitneyJim Neumann Honeywell Ruth13 Pachter Air Force Research LabClare Paul Air Force Research LabAmra Peles United13 Technologies Research13 CenterAndy Powell GE Aviation

Brian13 Puchala University MichiganAjit Roy Air Force Research LabAyman13 Salem Materials ResourcesTom Searles Materials Data Management Jason Sebastian QuestekRichard13 Serwecki NASA

Dongwon Shin Oak Ridge13 National Labo

29

13

Harlan Shober RJ LeeJeff13 Sheehy NASA

Jeff13 Simmons Air Force Research LabVeera Sundararaghavan University of MichiganSesh Tamirisakandala RTI Glenn Tarcea University of MichiganTJ Turner Air Force Research LabVasisht Venkatesh Pratt amp WhitneyChuck Ward Air Force Research LabJim Warren Natl Institute of Standards amp TechJeff13 Williams GE Aviation

Terry Wong Rockedyne

Wayne Ziegler Army Research13 Laboratory

Jake Zindel Ford Motor Company

30

Page 12: Materials Genome Initiative: materials data · universallyaccepted best practices. Under the MGI, Federal agencies have funded research and development designedtoexplore, develop,

13

Th Materials Commons A Novel Information13 Repository and13 Collaboration13 Platform for the13 Materials CommunityBrian13 Puchala Glenn13 Tarcea Stravya13 Tamma Emmanuelle Marquis John13 Allison13 University of Michigan

Critical to13 accelerating the pace of materials science and13 development is the development ofnew and13 improved13 infrastructure providing a seamless way for materials researchers to13 share and13 use

materials data and models To address this need we are developing the Materials Commons (httpprismsenginumicheduprismsmaterialscommons)13 an information repository andcollaboration platform for the metals13 community13 in selected technical emphasis13 areas We envision the

Materials Commons becoming a continuous seamless part of the scientific workflow13 processResearchers will upload13 the results of experiments and13 computations as they are performedautomatically where13 possible along with the13 provenance13 information describing the13 experimental and

computational processes By13 associating this13 data with the experimental and virtual computational materials samples from13 which it is obtained the Materials Commons will build process-shy‐structure-shy‐property relationships enabling the construction13 and13 validation13 of constitutive and13 process models The

Materials Commons website provides an easy-shy‐to-shy‐use interface for uploading and13 downloading data and13 data provenance searching and sharing data At its core the13 Materials Commons consists of secure13 data storage cluster an13 application13 for efficiently uploading and downloading large data sets and a REST

(REpresentational State Transfer) based13 API to13 access and13 extend13 the capabilities of the repository TheAPI allows for features such13 as automated13 data upload13 from experiments and13 computations seamless integration of13 computational models and algorithmic analysis of13 process-shy‐structure-shy‐propertyrelationships The Materials Commons is a central thrust13 of13 the Center13 for13 PRedictive Structural Materials Science (PRISMS)

NIST Materials Data Informatics EffortsCarrie Campbell National Institute of Standards an Technology

The presentation covered NISTrsquos pilot efforts in13 creating a data infrastructure focusing on Phase-shy‐Based13 Property Data Phase-shy‐based data13 are13 diverse13 data13 that13 include 1 2 3 and 4-shy‐dimensional13 data are13 semi-shy‐structured and often include incomplete data sets The figure below provides anoverview of NISTs data efforts supporting the MGI

8

13

NIST Materials Data Repository (httpsmaterialsdatanistgovdspacexmlui) is a customizedDSpace repository for materials that enables sharing of variety of data types including text13 files images and video The repository also provides persistent identifier for each entry and allows users to choose

from a variety of13 license types NIST is also working on two data curation tools for phase-shy‐based13 dataThe Thermodynamic Research13 Center (httptrcnistgov) is extending the ThermoML markup languageand Guide13 Data13 Capture13 software13 tool to metallic systems13 with an initial emphasis on phase equilibriaand thermochemical data The13 Materials Data13 Curation System (MDCS)13 employs user-shy‐defined13 XML-shy‐based13 schemas to curate13 data The13 MDCS13 allows either web-shy‐base data curation13 or using REST API The

entered data13 is stored in a non-shy‐relational database repository can be searched using SPARQL (SPARQL Protocol and RDF Query Language) interface13 and integrated with variety of scientific workflow13 tools13

AFLOW ConsortiumMarco Fornari Michigan13 State University

The overall goal of the Materials Genome Initiative (MGI) is to accelerate the discovery13 development and13 deployment of new materials with ad hoc functionalities The AFLOW Consortium

focuses on the software infrastructures to achieve the MGI goals by using first13 principles calculations The emphasis is on the data13 that must be generated archived validated and post-shy‐processed13 efficiently13 both13 to13 identify new materials and13 distill meaningful quantitative relationships within13 the data This extends the13 solution to the13 inverse13 problem of chemically replacing13 rare13 and costly elements in criticaltechnologies

The AFLOW Consortium provides access to the repository AFLOWLIBORG

lthttpAFLOWLIBORGgt13 that publicly distributes data13 on energetics and thermodynamic stability for more than 600000 compounds and band structures for roughly 300000 materials The repository

includes bulk crystals and alloys13 Data were computed with the AFLOW high-shy‐throughput13 framework

using density functional theory The results of the calculations can13 be queried13 with13 the recentlydeveloped13 REST-shy‐API that standardizes and13 labels the data for retrieval purposes The13 fields defined in

the AFLOW REST-shy‐API include a material object identifier (AUID) data cloud13 location13 (AURL) ownership13 9

13

and sponsorship keywords and data13 specific labels (COMPOSITION ENTROPY EGAP hellip) Simple13 scripting languages13 can be used to automatically download13 from the AFLOWLIB13 repository The AFLOW

approach has13 been extensively applied on Platinum-shy‐group alloys thermoelectric materials magnetic materials topological insulators scintillators etc The AFLOW REST-shy‐API can13 be extended13 to13 includeexperimental data For further details on the13 database the13 reader is directed tohttpmaterialsdukeeduauroAUROARTICULAjcommatsci201202002pdf

Materials Innovation Infrastructure Lessons Learned and Future PlansSurya Kalidindi Georgia Institute13 of Technology

central activity in13 materials research13 is the exploration of processing-shy‐structure-shy‐property (PSP) linkages13 Workflow is defined13 as the sequence13 of steps employed for establishing13 PS linkages ofinterest to any specific engineeringtechnology application

High Throughput Explorations of PSP Linkages13 We must treat13 hierarchical13 material as acomplex System which means embracing and exploiting uncertainty quantification13 and13 complexity

management conceptstools13 It also means we need to design develop and13 validate decision support systems13 that will leverage the best current understanding (with13 its uncertainties) and13 provide objective13 guidance13 on future13 effort investment (combination of experiments and simulations)13 This meansleveraging Data Science to effectively feed the decision support systems described above by developingand deploying high13 throughput strategies for obtaining high value information from both experimentsand models

What are integrated13 workflows Integrated workflows are those that utilize the best13 combination of experiments13 and simulations13 in extracting robust and reliable PSP linkages13 This requiresengaging and exploiting cross-shy‐disciplinary expertise that includes materials science manufacturingsystems approaches uncertainty quantification computational science data and information sciencesThrough this one must ensure that the workflows output the critical information needed13 by design13 and13 manufacturing stakeholders in the materials development value chain13 Integration demands expertise

sharing13 which essentially is the core element of a collaboration Cross-shy‐disciplinary Integration13 demands13 effective13 collaborations13 such as those one might set up using the modern tools of the day13 ie a e-shy‐Collaboration

10

13

Motivators for e-shy‐Collaborations13 e-shy‐Collaborations are central to13 enhancing the productivity ofmaterials development activities in several ways

bull individual13 research groups can identify and engage13 complementary expertise13 needed for their projects

bull organizations can13 coordinate multiple teams and13 multiple projects to13 maximize outputimpactbull organizations can13 identify and13 seek the right partners that13 enhance their13 individual and combined

value e-shy‐Collaborations can13 be open semi-shy‐open or closed13 depending o the needs of the specific organization Productivity enhancement leading to dramatic reductions in cost and time13 expended in materials development is the13 main incentive13 for e-shy‐collaborations

Main Impediments to e-shy‐CollaborationsLack13 of tools to identify13 the13 ldquobestrdquo13 collaborators

bull Need software tools that quantify more precisely the expertise of individuals groupsorganizations

bull Need software tools for13 allowing rich annotations and e-shy‐recording of13 discussions Need analytic tools for13 identification extraction13 and management of metadata (ie13 high13 value abstractionsfrom the raw data)

bull Need searchable semantic metadata databases (these are13 lightweight and can be13 centralized

and customized for specific13 user communities) bull Need software tools for tracking provenance (ie13 whowhatwhenhow13 generated modified

etc) Lack13 of experience13 with the13 use13 of integrated workflows

bull Need software tools13 for e-shy‐recording of13 workflows and development of searchable workflow

databases bull Need recommendation schemes that suggest the best workflow(s) based on recorded prior

experience this provides means to transfer expertise13 acquired by an organization to other teams and other13 projects (even in the same organization)

bull Leads to possible automation and dramatic enhancement of productivity

MINED Grouprsquos Past Experience13 (~2005 -shy‐ ~2008) Funded by the Army Research Office ARO(David Stepp) engaged computer science (CS) graduates to produce13 a single13 user platform that13 integrated certain software codes previously produced13 by our group software development subsequently abandoned but recruited two CS graduates into PhD programs in Materials Science andEngineeringMechanical Engineering (MSEME)13 which dramatically altered the direction of13 research in

the group brief chronology of development bull Senior Design Project (2011-shy‐2012) An online13 portal designed for sharing data13 files and tools

learned the importance of user interfaces and the need for professional13 code developers andsoftware architects

bull HUBzero Instantiation (2012-shy‐2013) Built hub by customizing HUBzero for hierarchical structuralmaterials there are several advantages in terms of accessing and executing codes13 (on the server side) but there are significant barriers13 to establishing fast and productive13 cross-shy‐disciplinary e-shy‐collaborations

11

13

bull GTRI Integration Platform (2013-shy‐2014) Engaged professional code13 developers and software13 architects to design13 a multi-shy‐user web13 portal that allows data and13 code sharing while tracking

provenance and workflows underestimated the resources needed for such a project bull MATIN Collaboration Environment (2013-shy‐2014) Lightweight integrated software13 shell that allows

users to13 seamlessly interface with13 numerous cloud13 services offered13 through13 the web13 through13 their13 APIs Examples include GitHub for versioning and archival of codes Authorea forcollaborative editing of documents figshare for citable publication of research Plotly forcollaborative data analysis13 and visualization Google+ and LinkedIn for eteaming13 and networking MATIN focuses on building an emergent community ndash Materials amp Manufacturing Informatics (MMI) MATIN provisions web services for13 software integration platforms and innovationinfrastructure Examples of MATIN13

bull Spatial Statistics httpsgithubcomtonyfastSpatialStatisticsFFT

bull PyMKS httpopenmaterialsgithubiopymksindexhtml bull Example discussion about periodic boundary conditions in running Finite Element (FE or FEM)

simulations httpsgithubcomsfepysfepyissues267

In the platform developed it is noted that the user or userrsquos organization can set permissions at the

level13 of individual datasetscodes o the degree and13 type of13 sharingCentral Challenge Activation The transition from traditional disciplinary workflows to the

integrated workflows needs to overcome an activation barrier There is no13 clear incentive to13 get ahead13 of the curve in13 anticipating how the upcoming transformations in how materials development will be

pursued13 in13 the future Open (reproducible) science alone is not a sufficiently strong motivator for materials development community with13 strong interests in intellectual13 property development13 indeed itmight actually13 be perceived13 as a negative There is need to explore13 and refine13 what incentives work for individual13 researchers (eg increased productivity)

Future13 Plans13 Continue to13 develop13 and13 deploy MATIN at GT and any other organizationsgroups that13 express an interest in working with us Design create and distribute13 suite13 of ldquomobile13 firstrdquo Appsdesigned13 to13 make data ingest into13 collaboration environments such as MATIN from both material characterization equipment and multi-shy‐scale models an enhanced13 user experience with13 friendly andnatural interfaces Proactively promoteincentivize13 datametadata13 ingest into publicly shareddatabases Work with13 professional societies to13 announce and run new competitionsawards for13 recognizing and rewarding good practices by the members of the broader materials innovation13 usercommunity (eg Best13 Documented Data Award Best Documented Tool Award Best Documented

Workflow Award) Develop and deliver new workshops for training the new generation of13 graduate

students postdocs and junior faculty in the13 emerging workflows of Materials amp ManufacturingInformatics

High Performance Cast Aluminum Alloys for Next Generation Passenger Vehicle EnginesDongwon Shin Materials Science and Technology Division Oak Ridge National Laboratory

12

13

Oak Ridge National Laboratory (ORNL) teamed with Chrysler and Nemak13 to develop next-shy‐generation high-shy‐performance cast aluminum alloys that have13 improved castability high-shy‐temperature

strength and fatigue13 performance At this point the13 data management practice is limited13 to13 analytical data spreadsheets other than13 the CALPHAD computational thermodynamic databases and13 proprietary

property databases for casting simulation As the project goes on more robust data schema will bedeveloped13 to13 store various types of data populated13 from the experiments and13 calculations

Besides the data structure design the speaker discussed13 the necessity of a new approach13 to13 identifying key descriptors from the large dataset that can guide the design of new materials An alloydesign13 requires the balance of extremely complex relationships among a large number of descriptors Some13 examples of these13 descriptors include13 phase fraction13 of key intermetallics lattice parameters ofthe matrix and diffusion13 kinetics13 of solute atoms However optimizing these quantities relies on good

physical metallurgy intuition13 coupled13 with13 clever experiments dogged13 sleuthing and13 a healthy dose ofluck13 Such descriptor-shy‐based predictions of materials properties date back to13 the beginning13 of modern

solid-shy‐state chemistry and physics As13 shown in the Darken-shy‐Gurry map to predict the solubility ofelements in Tantalum13 intuitive descriptors13 such as electronegativity and radius13 could be used formaterials design However as the chemistry and physics13 of materials become more complex intuition-shy‐based13 search13 may not work anymore Thus the use of more advanced13 statistical analysis and13 datamining approach might be required to accelerate the design of complex multi-shy‐componentmulti-shy‐phase

alloy design13

Data Related to Cast Aluminum

Jake Zindel Ford Motor Company

The main focus of our research is the development of CAE (Computer-shy‐aided Engineering) tools to support13 two significant13 activities in the company component13 design (product13 development)13 and

manufacturing engineering (high-shy‐volume production) A key13 part of our research is conducting13 experiments to collect data13 for the13 development and validation of13 the CAE tools

The data13 we collect consists of mechanical properties physical properties microstructuralimages and manufacturing process parameters that form the13 pedigree13 of the13 material This information

can be in the form of Excel files proprietary-shy‐software storage files screen shots and written reports Our immediate need is to deal with this large amount of existing information collected over

many years on a ldquolocalrdquo level The first step is to organize many independent collections of informationin all13 their forms13 The next issue is the time required to enter all13 this information into the database13 Maintenance may also require a significant amount of effort Lastly there will also13 be a need13 to13 manipulate the data contained in the database using Excel Mini-shy‐tab MATLAB etc Therefore links or13 easy exportation of data13 to these13 types of applications will be13 necessary

The next level would involve dealing with information that has been created or collected byothers From an13 OEM (Original Equipment13 Manufacturer)13 perspective this would13 involve convincing

suppliers13 to share processing information that they may consider proprietary In addition to suppliersuniversities and13 national laboratories are also13 sources of data Data from all the13 sources will require13 some sort of protocol to determine its13 quality and resolve inconsistencies

13

13

In the end13 it seems likely that considerable budget and time will be required to implement comprehensive informationdatabases which13 could be the largest hurdle13 of them all

QuesTekrsquos MGIICME Approach to Alloy Optimization and Design (Including Cast Iron) Jason Sebastian QuesTek Innovations LLC

Integrated Computational Materials Engineering (ICME) methods involve the holistic application13 of different computational models across various length13 scales to13 the design development and13 rapid13 qualification13 of advanced13 materials Using itsMaterials by Design technologies QuesTek has successfully applied these ICME methods to13 the design development13 and full qualification of several advanced alloys (including aerospace13 structural steels aerospace13 gear steels advanced castable13 titanium alloys high-shy‐strength corrosion-shy‐resistant13 aluminum alloys13 and high-shy‐performance cast iron)

QuesTekrsquos ICME methods benefit tremendously from the activities of the US Materials GenomeInitiative (MGI)13 Recognizing that the ldquo10 to13 2 yearsrdquo that13 it takes for alloy design and development ishindering the rapid13 deployment of higher-shy‐performance materials13 the MGI is focused on ldquocutting in halfrdquo the time and cost13 to design and deploy novel advanced materials needed to enhance and sustain UScompetitiveness

QuesTek also has ongoing13 activities on the13 application of ICME methods to the13 design and

development of materials for the energy sector including advanced13 materials that enable vehicle

lightweighting13 Of particular note is an ongoing effort with a major diesel13 engine manufacturer on the13 development of high-shy‐performance cast iron13 alloys for engine block applications QuesTekrsquos efforts have

made extensive use of various MGI-shy‐type tools and databases including 1)13 density functional theory

methods for high-shy‐throughput13 screening of cast iron inoculant stabilities (structure lattice parameterstability of various13 oxidesulfide phases) 2)13 liquid iron mobility databases (to help accurately modelsolidification phenomena) and 3) thermodynamic13 databases13 (to examine nano-shy‐precipitate stability)

Moving forward to continue to accelerate the materials development cycle efforts shouldremain focused on the development13 of13 the fundamental databases (thermodynamic kinetic etc)13 that13 enable13 ICME models There13 should also be13 increased focus o efforts related13 to13 the integration13 of the

various computational databases models and tools (the ldquoIrdquo13 in ICME)

Integrated Computational13 Materials Engineering (ICME) of Generation Three Advanced High Strength

Steels Louis G Hector Jr General Motors RampD Center

While traditional material development methods often involve laborious trial-shy‐and-shy‐error testing Integrated Computational13 Materials Engineering (ICME) which falls under the umbrella of the Materials Genome Initiative is offering a means13 for materials13 development under a more compressed time scale Advanced13 high13 strength13 steels (AHSS) which13 contain13 multiple phases (eg ferrite bainite pearlite austenite martensite) and may exhibit13 phase transformation with strain are fertile ground for13 ICME However prediction of macro-shy‐scale constitutive behavior based upon the multi-shy‐scale physical chemical and mechanical phenomena13 in these13 complex microstructures is formidable challenge for ICMEUnprecedented collaboration between universities industry and government labs will be required to

14

13

address AHSS13 development archiving of data and the13 implementation of these13 materials into products that13 benefit13 the American consumer This presentation began with a brief overview of a newDOEUSAMP (US Automotive Materials Partnership)-shy‐funded ICME program aimed at13 the development13 of Generation13 Three (Gen13 3) AHSS The lack of commercial Gen13 3 steels that meet DOE targets is offering13 significant challenges13 to this13 new ICME program The program is13 meeting these challenges13 following four thrusts (1)13 making Gen 3 steels to meet DOE targets (2) passing experimental measurements of key

properties to13 an13 ICME model that produces microstructure-shy‐based13 constitutive models (3) materialevaluation in forming13 and vehicle13 performance13 simulations (4) archiving13 all relevant data13 for easy use13 by

others in13 the program Although13 the Gen13 3 AHSS ICME program is ldquomission13 orientedrdquo it demonstrates an important role for universities in providing the necessary foundation in fundamental materialsscience and engineering

Integrated Computational13 Methods for Composites Materials (ICM2)13 Lessons Learned from

Integration Planning and Feasibility Demonstrations Lara Liou GE Aviation

General Electric Aviation and Lockheed Martin Aeronautics have teamed through an Air Force

Research13 Lab13 program to13 conduct a demonstration13 of an13 Integrated13 Computational Materials Engineering digital framework that links material processing property and13 structural relationships to13 account for processability manufacturability and system performance13 with the13 goal of demonstrating

that13 the usage of13 integrated models can contribute to future airframe and engine designs with dramatic reductions in development13 time and cost Both engine and airframe Foundational Engineering Problems will be demonstrated through a common digital framework to be developed and validated onincreasingly complex articles13 The modeling tools that13 cover composite process models localized13 mechanical behavior models and global design models (Convergent Manufacturing TechnologyrsquosCOMPRO CCA Autodeskrsquos Autodesk Simulation13 Composite Analysis and13 an13 Abaqus based13 University ofMichigan micromechanics model) are centrally linked via a commercially available model integration

interface (Phoenix Integrationrsquos ModelCenterreg)13 Detailed plans of what data will13 be passed among thevarious models as well as how it will be passed have been completed Creation of these detailed13 process maps and13 simple feasibility demonstrations have raised13 inherent technical and13 logistical challenges13 to integrating commercially13 available multi-shy‐scale models Robust model integration methods13 to address these challenges were13 discussed

The exercise of drawing up detailed process maps for what to integrate and how to integrate13 the selected tools and models resulted in key early lessons learned and identification of13 tools critical tolong-shy‐term robust and sustained application of13 Integrated Computational Methods for Composites Materials (ICM2)13 The key lessons learned include

bull Individual13 models must be relatively mature and validated before an ICM2 problem can be

confidently13 defined bull F analysis-shy‐based13 tools within13 the same digital thread should utilize13 the13 same13 FE solver bull The team ideally13 should consist of at least one expert for each13 tool and13 its underlying model as

well as an expert in the digital thread and integration interface tools and programming

15

13

bull Mesh interpolation between13 finite element analysis based13 tools will likely be required and13 acommercially13 available tool for this13 type of composite mesh interpolation is13 key13 for robust and

sustained application of ICM2

Residual Stress Engineering of Rotating Nickel Components Terry Wong13 Aerojet Rocketdyne

The goal for the PW-shy‐8 project13 The Nickel Base Superalloy Residual Stress Foundational Engineering Problem program aims to13 develop13 a multi-shy‐disciplinary analysis approach13 to13 predict and13 incorporate bulk residual13 stress in the13 design of component This will allow for design of part that is optimized for a particular application or design intent (such13 as weight performance or reduction13 ofscrap) To accomplish this13 goal the PW-shy‐813 team has two major tasks 1) the13 development of an13 infrastructure and associated tools needed to predict and incorporate bulk residual13 stress in the designstages13 of a part and 2) a demonstration task that will validate the infrastructures13 and tools13 developed

The Data13 Management task is the foundational element13 of13 the larger13 infrastructure

development task The goal for the data management work is to13 develop13 an13 infrastructure that allowsthe various team members in PW-shy‐813 to share13 data13 and to store13 data13 in database This not only ensures that data generated13 by this program is never lost but also13 allows for data to13 be transferred13 in13 a secure

manner In order to meet the goal of a system13 that securely stores and shares data three major tasks must be accomplished 1) Validation that there13 is secure13 method to transfer data13 between team

members and that the method of storing data is also secure 2)13 development of13 a database schema and

3) development of means to easily upload data13 into the13 database13 and download data13 from the13 database

Of the three the most difficult task was to create a system in which the transfer and storage ofdata was secure Various proposals were made to13 ensure the security of the data but many of them

were rejected not because the proposal was actually unsecure but because it was not acceptable byeach of the13 team memberrsquos IT security policies The13 process to get each team membersrsquo IT department to approve the security of13 the data transfer13 and storage took more effort13 than anticipated In the endthe team members along with their13 respective IT departments approved the use of13 TRUcentrix as a tool to transfer13 data and the use of13 an ASM International13 hosted13 database to13 store data

The development of database schema13 is necessary step in creating the database to store all the data generated (via physical tests as well as data generated by models) A big help in creating the

database schema was to13 use the schema generated13 by Dr Steven13 Arnold13 from NASArsquos Glenn13 Research13 Center Using NASArsquos schema eliminated13 the need13 to13 start the database build13 from scratch13 and13 allowed13 us to13 focus o the specific needs of storing residual stress data

The last task involves the development of tools that make the input of data13 easy The moredifficult it is to13 enter data into13 a database the more likely data13 will not be13 entered Excel-shy‐based13 templates are being developed in order13 to make data entry as easy as possible

The two big lessons learned from PW-shy‐813 in relationship to data13 management are

bull It is essential13 to work with each13 companiesrsquo IT and IT Security people to develop a system thatsatisfies13 the security requirements of each company

16

13

bull Database schema development should not be reinvented but begin with already established the13 schema

Perspectives on MAI ProgramsDr Mike Glavicic Rolls Royce

The current formulation of MAI (AFRLrsquos Metals Affordability Initiative) programs are structured13 independently from one another in that a standard methodology does not exist that describes whatdata is to13 be archived13 at the conclusion13 of a program Nor does there exist a standard13 protocol of howto archive things on a server13 or some other computer resource at the conclusion13 of a program As aresult the current13 practices of13 MAI programs

bull Do not address IP (intellectual property)13 and licensing issues in systematic fashion which leadsto issues in the commercialization of ICME software that is generated13 under a program

bull Do not have a formalized requirement of how data is to be packaged and stored at the end of aprogram Hence even13 in13 the best of cases in13 which13 all of the data is stored the data is archived13 in13 simple folder13 fashion that13 is not13 searchable in any manner13 possible Moreover even if13 all of13 the

data is present it is extremely cumbersome to13 untangle the data for future use using a file folder strategy

Hence at the end of the program both the government and13 MAI team are13 left to manage13 the13 data and13 results in13 an13 unregimented13 fashion

The lack of regimented protocol of how to package programrsquos13 results13 (raw data and reports) makes it extremely difficult for new government programs to leverage the data from previous programs Case in13 point the RR-shy‐1013 program was developed with the13 singular goal to integrate13 the13 models developed13 in13 the LAD-shy‐213 program into commercial software13 platform From the13 outset of the13 program data models and13 other information13 that was required to attained the13 goals of the13 RR-shy‐1013 program were13 either difficult to track down lost or not provided to the13 RR-shy‐1013 team due13 to contractual disputes As aresult funding that13 was earmarked for13 specific technical work at13 the conception of13 the13 program had to

be diverted13 for the recreation13 of data that existed13 in13 a previous program and13 the progress of the

program was stunted13 by this unforeseen13 burden In13 spite of the RR-shy‐1013 programrsquos13 ability to overcome

these issues and become a successful program it13 should be noted that13 the success of13 other13 programs that13 will encounter13 similar13 problems will likely not13 apply as RR-shy‐1013 was very fortunate13 to have13 the13 means to overcome these serious issues that13 were a result13 of13 the lack of13 a data management13 plan

Furthermore if the13 MGI initiative13 is to progressively develop materials models that are13 leveraged and improved in future government funded programs a policy on how to store and share thedata from disconnected13 programs is required The PW-shy‐1113 program is the first13 step in this process as it13 will have the data from three MAI programs (RR-shy‐10 GE-shy‐10 PW-shy‐9) uploaded into database13 that will be13 searchable and will enable the reuse of the data collected in these programs However even if this13 program is a success13 there will undoubtedly13 be other features13 that will be required in the platform

developed13 and13 hence future modification13 will be required Most importantly there does not currently13 exist server that has been set up and funded for the13 archiving13 of all13 future programs with a protocol13

17

13

13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

that13 must13 be adhered to at13 the conclusion of13 a funded program Without13 such a protocol and the

existence13 of server to be13 used in all future13 programs the13 goals set forth by MGI will be13 unnecessarily

difficult and13 more expensive to attain

MAI Data Informatics Data Standards Modeling amp UQDr Rajiv Naik Pratt amp Whitney

The MAI Data13 Informatics Data13 Standards Modeling amp Uncertainty Quantification (UQ) (PW-shy‐11)13 effort13 will enable the pervasive implementation of ICME by developing (i) a web-shy‐based13 OEM-shy‐Suppliercollaboration framework13 for production13 certification13 and13 process modeling data (ii) data standards and

protocols for interoperability between13 OEMs and13 Suppliers and13 (iii)13 UQ methods to capture variability in

process microstructure and property data Models and data13 from earlier MAI programs such as GE-shy‐10 PW-shy‐913 and RR-shy‐1013 will be13 used to develop and implement OEM-shy‐Supplier integration data standards data

structures and UQ models Several OEMs and suppliers (Pratt amp Whitney General Electric Rolls-shy‐RoyceTIMET Alcoa13 and RTI International Metals Inc) are working together on the PW-shy‐11 Team

The OEM-shy‐Supplier collaboration tool will enable data13 exchange related to mechanical testingmicrostructural13 characteristics processing and process modeling The13 data13 standards developed by the

PW-shy‐1113 team will provide13 an industry standard for such OEM-shy‐Supplier collaboration The collaboration13 framework would provide a platform to manage OEM-shy‐Supplier13 workflows data and13 informationsecurity and access13 control and the13 ability to perform system administration without the13 need13 for day-shy‐to-shy‐day intervention13 of the IT department The capture of such13 supplier data over the collaboration

framework will enable the OEMs to enhance process and property13 models data analytics and UQ

modeling Three potential technologies will be evaluated13 for the OEM-shy‐Supplier Collaboration13 Frameworkndash GE Forge Vanderbilt Forge and Semantic Web technology and one13 technology will be selected13 for the

next phase of the program

18

13

Semantic Web technology is paradigm shift from highly structured13 databases to13 more flexible

graph databases from single database searches to13 multiple and13 disparate13 database13 searches and from

keyword searches to contextual and meaningful (semantic) searches It can provide flexibility13 of data

structures and enhanced OEM-shy‐Supplier13 interoperability However it requires more work upfront13 for13 implementation as it needs the development of a common vocabulary and relationships

The data13 standards and data13 structures effort will leverage work at national and13 international standards13 organizations such13 as ASM ASTM CEN and13 NIST and13 develop13 standards and vocabularies forproduction13 certification data that can include chemistry microstructure and mechanical properties This effort will also develop means for the capture of data needed13 for the UQ and13 ICME models andbuild13 databases for13 the capture of13 the data from the GE-shy‐10 PW-shy‐9 and RR-shy‐10 MAI programs

The ICME13 models being developed under the GE-shy‐10 PW-shy‐9 and RR-shy‐10 programs involve severalinterconnected models related to processing microstructure and mechanical properties The UQ effort under PW-shy‐11 will involve considering the uncertainty at each13 step13 of the modeling process and flowing it through the interconnected models to evaluate the relationships between13 the variabilities of13 the

processing parameters the microstructure and13 the mechanical performance

ASMrsquos Computational Materials Data Network amp the NIST-shy‐ASM Structural Materials Data

Demonstration ProjectScott D Henry ASM International

ASM International ndash the worldrsquos largest13 association of13 metals-shy‐focused materials engineers scientists technicians educators and students13 ndash has launched13 a series of initiatives under the

ldquoComputational Materials Data (CMD)13 Networkrdquo umbrella The objective13 of the13 CMD Network is to serve13 as center for information collection and dissemination for materials data13 to support integrated

computational materials13 engineering (ICME) and to help realize the goals13 of the US Materials13 Genome

InitiativeThe first major project for the CMD Network is the NIST-shy‐ASM Structural Materials Data

Demonstration Project (SMDDP) a cooperative research project funded by NIST with participation from

the Kent13 State Center13 for13 Materials Informatics and Granta Design Ltd13 The main13 objectives of SMDDP

are13 to

19

13

bull Establish well-shy‐pedigreed13 and13 curated13 demonstration13 datasets for non-shy‐proprietary metallic structural materials13 data over multiple length scales The team has13 chosen to work with 6061

aluminum and the13 related Al-shy‐Mg-shy‐Si system

bull Work with NIST and the materials data community to develop materials data schema and

ontologies for the demonstration13 datasets cognizant of broader interests and13 datasets

bull Develop and carry out a series of test problems that represent relevant use cases for the13 repository

bull Make data open to the materials data community for use in data analytics modeling and

educational purposes

bull Actively engage the materials data community and13 widely disseminate the findings from the

project

bull Develop and implement data13 capture13 and curation procedures that can serve13 as models for otherdata repositories

The basic philosophy behind the project is that often standards13 emerge through trial and error and examples are13 needed to spur this process

The project is using the NIST-shy‐KSU DSpace13 repository for data13 files The13 curated structured

database is being developed13 using GRANTA13 MI software and13 is hosted13 o ASM serversThe project ndashlaunched in late 2013 and planned as an 18-shy‐month program13 ndash is being conducted in

phases The13 first phase13 involved developing starter database13 and gaining experience13 in data13 curation The second phase (currently underway) involves building out more comprehensive data13 sets that will be

suitable for carrying out test problems13 related to processing-shy‐structure-shy‐property relationships in13 Al 6061 The third phase will involve making further refinements in response to testing The complete data13 set will be opened up online to the materials community as a whole

Globus13 Scientific13 Data Services13 Current and FutureBen13 Blaiszik1 Kyle Chard1 Jim Pruyne1 Rachana Anathakrishnan1 Steve Tuecke12 Ian Foster12

1University of Chicago Computation Institute 2Argonne National Laboratory MCS Division

During this talk we discussed a variety of currently available Globus services13 Via these Globusservices developers13 and IT administrators13 can offer simple secure and robust file transfer as13 well as13 identity profile and group management to their user communities13 Globus allows end users to create

and13 manage a unique identity that can13 be linked13 to13 external identities for authentication End13 users can13 also transfer files across wide13 area13 networks faster and more13 easily whatever their location

The talk also covered prototype services related to data13 publication metadata tagging and13 data

search Globus13 publishing capabilities13 are delivered through a hosted service Metadata is13 stored in the

cloud while raw published data and a fully-shy‐formed metadata log are stored on campus institutionaland group resources that13 are managed and operated by campus administrators Published datasets are

20

13

organized13 by communities and13 their member collections Globus users can13 create and13 manage theirown13 communities and13 collections through13 the data publication13 service13 with collection level policies regarding user13 access Additionally metadata from these communities are imported into a centralized

index to allow for robust searching functionality

Semantics13 for Deep13 InteroperabilitySamuel Chance iNovex

Semantic technology is based on mature13 web standards and intended to be13 used as global framework for13 machine-shy‐to-shy‐machine communication Interoperability is enabled even when the more

basic elements of the semantic technology stack are used13 in13 conjunction with a controlled vocabularyKnowledge13 representation is enabled as one13 begins to utilize13 the13 ldquohigherrdquo elements of the13 stack

brief history of the semantic web13 was presented13 and13 started13 with13 a depiction13 of the web13 that most people see today documents and web pages presented to the user13 through a web application It13 then depicted a transition to an intermediate state where documents and data can co-shy‐exist on asemantic13 web riding atop the same infrastructure (http) Even in this13 state the data can13 be directly

interpreted by a machine13 The history concluded with a depiction of data being extracted from adocument to13 become part of the web13 of data

The basics of semantic technology were presented and emphasized that the technology has afoundation based13 o formal logic that enables software13 to understand information and reduce13 the13 need

for13 a human to interpret13 the meaning This section of13 the presentation concluded with a chart13 depicting

how ldquotriplesrdquo link to13 each13 other to13 form a web13 of dataThe final section touched upon adoption of13 and things to consider13 when semantic technologies

are13 being considered as solution for community or enterprise

21

13

Big Data13 and Semantic TechnologiesHarlan Shober RJ Lee Group

The presentation started with a description of entities that13 comprise laboratoryinformatics with the central elements being a federated architecture scientific ldquoBig Datardquo management13 and workflow-shy‐driven Laboratory Information Management13 Systems (LIMS)To aid attendee understanding initial context13 was provided by first13 defining for ldquoBig Datardquo and

Semantic Technologies Context13 was further bolstered through a description of a problem that13 included laboratory testingcharacterization and fabrication processes The challenges for atypical laboratory include the use of many systems and the creation of large amounts of data13 that13 grows daily but13 cannot13 be readily accessed

After the context13 was established the needed functionality was described and consistedof several elements contribute index search analyze and archive While bearing in mind theneeded functionality a string of universal scientific data13 management13 process categories werethen presented with the intent13 of applying the universal description to the materials design anddevelopment13 domain

22

13

Though semantic technologies are applicable over the entire lifecycle of the data13 management13 process three categories were specifically identified as primary candidates fortheir application Data13 Acquisition Data13 Manipulation and Storage and Discovery andAdvisory W3C standards for Resource Description Framework (RDF) were applied to a notionalexample with the intent13 of completing a thread that13 gave the attendee a sense of how thistechnology could be applied

The presentation continued with a description of Postulate a specific solutiondeveloped by RJ Lee Group that13 could be used as part13 of a suite of tools to enable the captureof data13 and the near-shy‐frictionless tagging of data13 with additional observations provenanceinformation or links to additional data Using tools like Postulate repositories could be moreeasily shared with others in a federated manner These technologies would enable thedevelopment13 of indices that13 increase the precision of search results and ease analytics since thedata13 would be ldquoself-shy‐describedrdquo

In the remainder of the presentation Postulate functionality was further described andsome of the challenges presented earlier were addressed For example how do you processthe quantity and variety of data how do you store it how do you find the data13 you care aboutand how can the data13 be presented to you in a meaningful way

Natural13 Language Processing (NLP)Tim Hawes Decisive Analytics

The presentation provided the attendees with a description of NLP13 and how it can be13 used It started with examples13 of how it can be used to mine ldquoBig Datardquo to predict credit card fraud or assess13 user purchase histories for product development NLP is also13 being used13 in13 the field13 of biology and13 appears to have13 enjoyed some13 level of success

Since13 the13 biology and materials communities both publish large13 volume13 of documents (abstracts papers presentations etc) many of13 the NLP approaches developed for13 biology may be

applicable13 to materials The next section of the presentation focused on describing various aspects of NLP tokenization

part-shy‐of-shy‐speech tagging disambiguation parsing entity disambiguation and role13 labeling This was helpful in13 removing the mystery surrounding NLP for those13 who hadnrsquot had previous exposure

Typical products from NLP13 would be useful to the materials development community Examples include providing the gist of a documentrsquos content providing answers to natural13 language questions and knowledge13 construction

Specific NLP13 challenges for the13 materials domain were13 identified inline13 use13 of equations prevalence of tables graphs and13 images (more of a computer vision13 problem) and13 numerically driven13 information13 Additionally the ldquogenrerdquo problem was discussed as13 well as13 a way this might be addressed

using raw data (versus learning data) and13 semi-shy‐supervised learning active learning and distant supervision approaches

23

13

The presentation concluded by reiterating how NLP13 can be used and the value of using it describing some its limitations and13 identifying tool development needed13 for the materials domain

24

13

Community13 and13 Government Priorities in13 Addressing13 Materials Data

The following numbered13 priorities for13 both community and government13 actions were lsquorolled uprsquofrom workshop participantsrsquo inputs and were reviewed at13 the workshop The raw inputs provided by

participants are captured13 under each13 numbered13 priority and provide context to13 the priority and13 often13 more specific actions

Community-shy‐Led13 Priorities1 Develop13 and deploy13 standards13 for data and federatedcollaborative environments-shy‐ Promote13 standards for data13 and collaboration environments-shy‐ Designing and deploying suitable software environments that are capable of facilitating e-shy‐

recording and accelerating (fail often-shy‐fail fast)13 of13 the communityrsquos experience (both successes and

failures)13 in exploring integrated workflows needed to realize the vision of ICME and MGI-shy‐ Data13 standards (program managementtechnical datametadata)-shy‐ Community must start developing a common13 data format data package could13 look like an13

HDF5 file and be shared in a variety of systems -shy‐ Begin13 definition13 of standardized13 infrastructure

-shy‐ Produce13 draft guideline for the metadata including information regarding authorship provenance targethellip

-shy‐ Share13 the13 data13 types being collected with the13 community Include13 nametypeunits of data

-shy‐ Community must adopt a federated13 approach13 -shy‐ allow organizations to autonomously create their13 own13 toolsets but be able to13 readwrite to13 the common13 data package mentioned13 in13 point 1

-shy‐ Good documentation on repositories are important-shy‐ Address networking of repositories (lightweight metadata databaseindex) -shy‐ Digital heterostructure microstructure mapping algorithms for both scanning electron microscopy

(SEM)13 and transmission electron microscopy (TEM) -shy‐ Materials process model (thermodynamics) integration with ICMSE (Integrated Computational

Materials Science and Engineering) data flow

-shy‐ Data sharing protocol across users (government13 OEM13 developers) with appropriate securitycontrol

-shy‐ Work with instrumentequipment OEMs to get auto ingest capabilities and open architectures-shy‐ Work together to define common vocabularies for interoperability-shy‐ Standard tests for ICME13 model data13 standards for microstructure13 data

-shy‐ Standards for ICME13 model data13 storage13 and workflows 2 Communicate the value and need13 for digital materials13 data-shy‐ Brand13 management ndash consistent messaging of goals13 and beliefs -shy‐ Share13 success stories -shy‐ Value (ROI13 Return on Investment) o data capture (globally locally)-shy‐ Give incentive ($$$) for case studies to provide evidence that a shared data repository can be

actually useful beyond the archival purpose

25

13

-shy‐ Share13 ROIs for MII (the13 Materials Innovation Infrastructure)-shy‐ ICME projects due to their iterative nature take longer and are more expensive to realize benefits

in industry implementation13 Multi-shy‐year ICME programs are becoming13 a hard sell to management difficult to13 establish ROI and13 realize benefits claimed

3 Engage the community-shy‐ Write grant proposals-shy‐ Collaboration13 (or e-shy‐collaboration) science that will help us13 identify13 the specific13 incentives13 and

impediments for establishing mutually beneficial highly productive sharing of expertise between13 disciplinary experts in13 materials manufacturing and13 data sciences

-shy‐ Continue to13 work o hard13 problemsmdashdevelop13 open13 source codes discover new descriptors data

mining uncertainty analysis-shy‐ Organize as a community (governmentindustryacademia nationalinternational) -shy‐ Follow data13 recommendations in MGI strategic planmdashworkshops to develop roadmap-shy‐ Attend13 more events like the MGI workshop13 mdash gaining13 lessons learned from other organizations is

extremely valuable

4 ldquoDefine critical datardquo13 to be13 compiled-shy‐ Consensus o what fundamental genomic data is most-shy‐important to compile (thermodynamics13

kinetics13 properties [strength]13 high-shy‐throughput13 DFT) -shy‐ Agree to13 the concept of a ldquouniversal repositoryrdquo for all genomic data that everyone will contribute

to and use (cf everyone maintaining and hosting their13 own databases) this includes issues of13 data

security and sharing etc -shy‐ List of most-shy‐important ICMEMGI13 materials development activities from an industrycommercial13

perspective (alloys composites piezoelectrics etc)-shy‐ Some13 ongoing repository efforts seem redundant We13 should avoid unnecessary competition and

try not13 to step o other peoples toes5 Explore and leverage data solutions developed13 outside materials13 community-shy‐ Semantic web infrastructure13 ndash evaluate13 feasibility of lsquounstructured materials datardquo management

(Technical pathcosttimeframe) -shy‐ Keep communicating outside13 the13 community about solutions -shy‐ Focus on available13 tools open ones when possible

6 Train13 students-shy‐ Sponsor students13 who can think in terms13 of IT and MSE

-shy‐ Training workshops (through professional societies) for training the new generation of materials students13 and researchers13 in the use of readily (already) accessible data science tools13 and cloud

services13 to make13 the13 best use13 of this emerging13 infrastructure13 in enhancing13 their own individual day-shy‐to-shy‐day productivity

7 Establish community13 norms13 for publishing materials13 data-shy‐ Involve publishers by requiring data when publishing Begin by making just the data available first

and then work on standardization panel

26

13

-shy‐ Data associated with publications should be shared along with the publications Not only the

metadata (eg average grain size vs cooling rate) but also the source of the data (eg micrographs used13 to13 generate13 grain size temperaturetime charts13 used to calculate cooling rates etc)

Government-shy‐Led13 Priorities1 Lead13 data strategyapproach13 development-shy‐ Hold meeting like this one in 6-shy‐1213 months with purpose13 of developing roadmap for MII-shy‐ Define a technical roadmap for13 the materials data infrastructure

-shy‐ Governmentrsquos got to decide on a strategy-shy‐ Whatrsquos the critical nucleus eg getting13 CALPHAD off the13 ground

-shy‐ Government must establish an official stance on security and infrastructure-shy‐ List of most-shy‐important ICMEMGI materials development activities from a government perspective

(alloys composites piezoelectrics etc)13 -shy‐ Needs assessment for professional workforce development-shy‐ Work with industry to make sure the research-shy‐to-shy‐engineering13 gap is filled What is the practical13 use

of the projects -shy‐ Provide13 the13 materials community organizing function

-shy‐ Business case for productivity improvement with13 data readily available

-shy‐ Facilitator as honest broker of capability awareness and data13 sharing

-shy‐ Interdisciplinary coordinated effort13 to avoidminimize duplication and redundancy

2 Facilitate data deposition (policy amp technology)-shy‐ Setting expectations and guidelines for best practices and standards where13 relevant -shy‐ Invest in tool13 making data management as easy as possible-shy‐ Promote standards developmentcommon data structuresshare knowledge representations -shy‐ Learn about the WBC Semantic Web capabilities and limitations -shy‐ Government should foster community development for both data package refinement and creating

standard approach to creating an enterprise toolset -shy‐ Provide13 motivation for standard data13 collection ensure13 it includes pedigreeprovenancemeta-shy‐data

-shy‐ Facilitator for materials data13 standards (good and trustworthy data13 vs bad data) -shy‐ Facilitator for forming consortium for data13 mining data13 warehousing and sharing

-shy‐ Facilitate13 industry standards for ICME13 model validation and maturity levels-shy‐ Facilitate13 standards for model-shy‐based13 process and13 supplier qualification

3 Incentivize data deposition-shy‐ Provide13 motivation for13 standard data collection ensure it13 includes13 pedigreeprovenancemeta-shy‐data-shy‐ Push ldquodata13 management planrdquo requirements to ldquodata13 depositrdquo requirements -shy‐ Make data sharing a requirement -shy‐ Contractual requirements that preserve data for future use

-shy‐ Demand data submission as a deliverable that will be in the contract before paying the funds -shy‐ Require that any data generated13 by federal grants is provided13 to13 the government (or a designated13

repository)13 within 3 years after completionmdashstart with basic13 research

-shy‐ Demonstrate the return on investments in the materials data infrastructure

27

13

4 Support13 data sharing13 and deposition13 (provide resources)-shy‐ Increase of MGI13 funds dedicated to MII13 development to 10-shy‐-shy‐repositories amp collab platforms -shy‐ Commit to13 long-shy‐term support for data repositories amp collaborative platforms-shy‐ Funding for development collection of fundamental genomic data

-shy‐ Support thermodynamicskinetics database development for light-shy‐weight alloy systems-shy‐ Funding for universal data13 repository (eg at NIST) ndash creation and maintenance

-shy‐ Fund or establish data13 repositories with commonuniform data13 collection format -shy‐ Createidentify secure data sharing site (like AMRDEC)13 for13 government13 contractorsmdashnot a

repository

-shy‐ AFRL take lead13 o making a database that can13 be13 shared

-shy‐ Infrastructure for materials related semantic web applications5 Support13 data creation-shy‐ Support experimental measurement effort to provide13 quality data for13 database development -shy‐ Support more13 high-shy‐throughput13 (HT)13 studies13 for applied programs13 to populate data Current HT

approaches support are13 limited to too fundamental single-shy‐phase DFT calculations 6 Enhance data management13 competency-shy‐ Government needs to increase in-shy‐house expertise in13 these areas or alternatively work closer with13

groups like NIST We cannot allow other organizations to define our needs for us7 Provide quality datasets13 for model13 developmentvalidation-shy‐ Need for open model benchmark cases with data to be available to the community

28

13

Attendees

John Allison University of MichiganJohn Beatty Army Research Laboratory

Ben13 Blaiszik University of ChicagoANL Carrie Campbell NIST

Sam Chance13 InovexJennifer13 Fielding Air Force Research LabMarco Fornari Central Michigan

Tim Hawes Decisive Analytics Corp

Lou Hector General MotorsAndrea Helbach Air Force Research LabKris Hill ITI-shy‐GlobalMike Glavicic Rolls Royce

Ro Gorham NCDMM

Dennis Griffin NASA

Mike Groeber Air Force Research LabScott Henry ASM InternationalMatt Jacobsen Air Force Research LabWill Joost Dept of Energy-shy‐EERE

Surya13 Kalidindi Georgia Tech

Kevin Kendig Air Force Research LabBen13 Leever Air Force Research LabLara Liou GE Aviation

Kenny Lipkowitz Office of Naval Research

Bill Mullins Office of Naval Research

Benji Maruyama Air Force Research LabRajiv Naik Pratt amp WhitneyJim Neumann Honeywell Ruth13 Pachter Air Force Research LabClare Paul Air Force Research LabAmra Peles United13 Technologies Research13 CenterAndy Powell GE Aviation

Brian13 Puchala University MichiganAjit Roy Air Force Research LabAyman13 Salem Materials ResourcesTom Searles Materials Data Management Jason Sebastian QuestekRichard13 Serwecki NASA

Dongwon Shin Oak Ridge13 National Labo

29

13

Harlan Shober RJ LeeJeff13 Sheehy NASA

Jeff13 Simmons Air Force Research LabVeera Sundararaghavan University of MichiganSesh Tamirisakandala RTI Glenn Tarcea University of MichiganTJ Turner Air Force Research LabVasisht Venkatesh Pratt amp WhitneyChuck Ward Air Force Research LabJim Warren Natl Institute of Standards amp TechJeff13 Williams GE Aviation

Terry Wong Rockedyne

Wayne Ziegler Army Research13 Laboratory

Jake Zindel Ford Motor Company

30

Page 13: Materials Genome Initiative: materials data · universallyaccepted best practices. Under the MGI, Federal agencies have funded research and development designedtoexplore, develop,

13

NIST Materials Data Repository (httpsmaterialsdatanistgovdspacexmlui) is a customizedDSpace repository for materials that enables sharing of variety of data types including text13 files images and video The repository also provides persistent identifier for each entry and allows users to choose

from a variety of13 license types NIST is also working on two data curation tools for phase-shy‐based13 dataThe Thermodynamic Research13 Center (httptrcnistgov) is extending the ThermoML markup languageand Guide13 Data13 Capture13 software13 tool to metallic systems13 with an initial emphasis on phase equilibriaand thermochemical data The13 Materials Data13 Curation System (MDCS)13 employs user-shy‐defined13 XML-shy‐based13 schemas to curate13 data The13 MDCS13 allows either web-shy‐base data curation13 or using REST API The

entered data13 is stored in a non-shy‐relational database repository can be searched using SPARQL (SPARQL Protocol and RDF Query Language) interface13 and integrated with variety of scientific workflow13 tools13

AFLOW ConsortiumMarco Fornari Michigan13 State University

The overall goal of the Materials Genome Initiative (MGI) is to accelerate the discovery13 development and13 deployment of new materials with ad hoc functionalities The AFLOW Consortium

focuses on the software infrastructures to achieve the MGI goals by using first13 principles calculations The emphasis is on the data13 that must be generated archived validated and post-shy‐processed13 efficiently13 both13 to13 identify new materials and13 distill meaningful quantitative relationships within13 the data This extends the13 solution to the13 inverse13 problem of chemically replacing13 rare13 and costly elements in criticaltechnologies

The AFLOW Consortium provides access to the repository AFLOWLIBORG

lthttpAFLOWLIBORGgt13 that publicly distributes data13 on energetics and thermodynamic stability for more than 600000 compounds and band structures for roughly 300000 materials The repository

includes bulk crystals and alloys13 Data were computed with the AFLOW high-shy‐throughput13 framework

using density functional theory The results of the calculations can13 be queried13 with13 the recentlydeveloped13 REST-shy‐API that standardizes and13 labels the data for retrieval purposes The13 fields defined in

the AFLOW REST-shy‐API include a material object identifier (AUID) data cloud13 location13 (AURL) ownership13 9

13

and sponsorship keywords and data13 specific labels (COMPOSITION ENTROPY EGAP hellip) Simple13 scripting languages13 can be used to automatically download13 from the AFLOWLIB13 repository The AFLOW

approach has13 been extensively applied on Platinum-shy‐group alloys thermoelectric materials magnetic materials topological insulators scintillators etc The AFLOW REST-shy‐API can13 be extended13 to13 includeexperimental data For further details on the13 database the13 reader is directed tohttpmaterialsdukeeduauroAUROARTICULAjcommatsci201202002pdf

Materials Innovation Infrastructure Lessons Learned and Future PlansSurya Kalidindi Georgia Institute13 of Technology

central activity in13 materials research13 is the exploration of processing-shy‐structure-shy‐property (PSP) linkages13 Workflow is defined13 as the sequence13 of steps employed for establishing13 PS linkages ofinterest to any specific engineeringtechnology application

High Throughput Explorations of PSP Linkages13 We must treat13 hierarchical13 material as acomplex System which means embracing and exploiting uncertainty quantification13 and13 complexity

management conceptstools13 It also means we need to design develop and13 validate decision support systems13 that will leverage the best current understanding (with13 its uncertainties) and13 provide objective13 guidance13 on future13 effort investment (combination of experiments and simulations)13 This meansleveraging Data Science to effectively feed the decision support systems described above by developingand deploying high13 throughput strategies for obtaining high value information from both experimentsand models

What are integrated13 workflows Integrated workflows are those that utilize the best13 combination of experiments13 and simulations13 in extracting robust and reliable PSP linkages13 This requiresengaging and exploiting cross-shy‐disciplinary expertise that includes materials science manufacturingsystems approaches uncertainty quantification computational science data and information sciencesThrough this one must ensure that the workflows output the critical information needed13 by design13 and13 manufacturing stakeholders in the materials development value chain13 Integration demands expertise

sharing13 which essentially is the core element of a collaboration Cross-shy‐disciplinary Integration13 demands13 effective13 collaborations13 such as those one might set up using the modern tools of the day13 ie a e-shy‐Collaboration

10

13

Motivators for e-shy‐Collaborations13 e-shy‐Collaborations are central to13 enhancing the productivity ofmaterials development activities in several ways

bull individual13 research groups can identify and engage13 complementary expertise13 needed for their projects

bull organizations can13 coordinate multiple teams and13 multiple projects to13 maximize outputimpactbull organizations can13 identify and13 seek the right partners that13 enhance their13 individual and combined

value e-shy‐Collaborations can13 be open semi-shy‐open or closed13 depending o the needs of the specific organization Productivity enhancement leading to dramatic reductions in cost and time13 expended in materials development is the13 main incentive13 for e-shy‐collaborations

Main Impediments to e-shy‐CollaborationsLack13 of tools to identify13 the13 ldquobestrdquo13 collaborators

bull Need software tools that quantify more precisely the expertise of individuals groupsorganizations

bull Need software tools for13 allowing rich annotations and e-shy‐recording of13 discussions Need analytic tools for13 identification extraction13 and management of metadata (ie13 high13 value abstractionsfrom the raw data)

bull Need searchable semantic metadata databases (these are13 lightweight and can be13 centralized

and customized for specific13 user communities) bull Need software tools for tracking provenance (ie13 whowhatwhenhow13 generated modified

etc) Lack13 of experience13 with the13 use13 of integrated workflows

bull Need software tools13 for e-shy‐recording of13 workflows and development of searchable workflow

databases bull Need recommendation schemes that suggest the best workflow(s) based on recorded prior

experience this provides means to transfer expertise13 acquired by an organization to other teams and other13 projects (even in the same organization)

bull Leads to possible automation and dramatic enhancement of productivity

MINED Grouprsquos Past Experience13 (~2005 -shy‐ ~2008) Funded by the Army Research Office ARO(David Stepp) engaged computer science (CS) graduates to produce13 a single13 user platform that13 integrated certain software codes previously produced13 by our group software development subsequently abandoned but recruited two CS graduates into PhD programs in Materials Science andEngineeringMechanical Engineering (MSEME)13 which dramatically altered the direction of13 research in

the group brief chronology of development bull Senior Design Project (2011-shy‐2012) An online13 portal designed for sharing data13 files and tools

learned the importance of user interfaces and the need for professional13 code developers andsoftware architects

bull HUBzero Instantiation (2012-shy‐2013) Built hub by customizing HUBzero for hierarchical structuralmaterials there are several advantages in terms of accessing and executing codes13 (on the server side) but there are significant barriers13 to establishing fast and productive13 cross-shy‐disciplinary e-shy‐collaborations

11

13

bull GTRI Integration Platform (2013-shy‐2014) Engaged professional code13 developers and software13 architects to design13 a multi-shy‐user web13 portal that allows data and13 code sharing while tracking

provenance and workflows underestimated the resources needed for such a project bull MATIN Collaboration Environment (2013-shy‐2014) Lightweight integrated software13 shell that allows

users to13 seamlessly interface with13 numerous cloud13 services offered13 through13 the web13 through13 their13 APIs Examples include GitHub for versioning and archival of codes Authorea forcollaborative editing of documents figshare for citable publication of research Plotly forcollaborative data analysis13 and visualization Google+ and LinkedIn for eteaming13 and networking MATIN focuses on building an emergent community ndash Materials amp Manufacturing Informatics (MMI) MATIN provisions web services for13 software integration platforms and innovationinfrastructure Examples of MATIN13

bull Spatial Statistics httpsgithubcomtonyfastSpatialStatisticsFFT

bull PyMKS httpopenmaterialsgithubiopymksindexhtml bull Example discussion about periodic boundary conditions in running Finite Element (FE or FEM)

simulations httpsgithubcomsfepysfepyissues267

In the platform developed it is noted that the user or userrsquos organization can set permissions at the

level13 of individual datasetscodes o the degree and13 type of13 sharingCentral Challenge Activation The transition from traditional disciplinary workflows to the

integrated workflows needs to overcome an activation barrier There is no13 clear incentive to13 get ahead13 of the curve in13 anticipating how the upcoming transformations in how materials development will be

pursued13 in13 the future Open (reproducible) science alone is not a sufficiently strong motivator for materials development community with13 strong interests in intellectual13 property development13 indeed itmight actually13 be perceived13 as a negative There is need to explore13 and refine13 what incentives work for individual13 researchers (eg increased productivity)

Future13 Plans13 Continue to13 develop13 and13 deploy MATIN at GT and any other organizationsgroups that13 express an interest in working with us Design create and distribute13 suite13 of ldquomobile13 firstrdquo Appsdesigned13 to13 make data ingest into13 collaboration environments such as MATIN from both material characterization equipment and multi-shy‐scale models an enhanced13 user experience with13 friendly andnatural interfaces Proactively promoteincentivize13 datametadata13 ingest into publicly shareddatabases Work with13 professional societies to13 announce and run new competitionsawards for13 recognizing and rewarding good practices by the members of the broader materials innovation13 usercommunity (eg Best13 Documented Data Award Best Documented Tool Award Best Documented

Workflow Award) Develop and deliver new workshops for training the new generation of13 graduate

students postdocs and junior faculty in the13 emerging workflows of Materials amp ManufacturingInformatics

High Performance Cast Aluminum Alloys for Next Generation Passenger Vehicle EnginesDongwon Shin Materials Science and Technology Division Oak Ridge National Laboratory

12

13

Oak Ridge National Laboratory (ORNL) teamed with Chrysler and Nemak13 to develop next-shy‐generation high-shy‐performance cast aluminum alloys that have13 improved castability high-shy‐temperature

strength and fatigue13 performance At this point the13 data management practice is limited13 to13 analytical data spreadsheets other than13 the CALPHAD computational thermodynamic databases and13 proprietary

property databases for casting simulation As the project goes on more robust data schema will bedeveloped13 to13 store various types of data populated13 from the experiments and13 calculations

Besides the data structure design the speaker discussed13 the necessity of a new approach13 to13 identifying key descriptors from the large dataset that can guide the design of new materials An alloydesign13 requires the balance of extremely complex relationships among a large number of descriptors Some13 examples of these13 descriptors include13 phase fraction13 of key intermetallics lattice parameters ofthe matrix and diffusion13 kinetics13 of solute atoms However optimizing these quantities relies on good

physical metallurgy intuition13 coupled13 with13 clever experiments dogged13 sleuthing and13 a healthy dose ofluck13 Such descriptor-shy‐based predictions of materials properties date back to13 the beginning13 of modern

solid-shy‐state chemistry and physics As13 shown in the Darken-shy‐Gurry map to predict the solubility ofelements in Tantalum13 intuitive descriptors13 such as electronegativity and radius13 could be used formaterials design However as the chemistry and physics13 of materials become more complex intuition-shy‐based13 search13 may not work anymore Thus the use of more advanced13 statistical analysis and13 datamining approach might be required to accelerate the design of complex multi-shy‐componentmulti-shy‐phase

alloy design13

Data Related to Cast Aluminum

Jake Zindel Ford Motor Company

The main focus of our research is the development of CAE (Computer-shy‐aided Engineering) tools to support13 two significant13 activities in the company component13 design (product13 development)13 and

manufacturing engineering (high-shy‐volume production) A key13 part of our research is conducting13 experiments to collect data13 for the13 development and validation of13 the CAE tools

The data13 we collect consists of mechanical properties physical properties microstructuralimages and manufacturing process parameters that form the13 pedigree13 of the13 material This information

can be in the form of Excel files proprietary-shy‐software storage files screen shots and written reports Our immediate need is to deal with this large amount of existing information collected over

many years on a ldquolocalrdquo level The first step is to organize many independent collections of informationin all13 their forms13 The next issue is the time required to enter all13 this information into the database13 Maintenance may also require a significant amount of effort Lastly there will also13 be a need13 to13 manipulate the data contained in the database using Excel Mini-shy‐tab MATLAB etc Therefore links or13 easy exportation of data13 to these13 types of applications will be13 necessary

The next level would involve dealing with information that has been created or collected byothers From an13 OEM (Original Equipment13 Manufacturer)13 perspective this would13 involve convincing

suppliers13 to share processing information that they may consider proprietary In addition to suppliersuniversities and13 national laboratories are also13 sources of data Data from all the13 sources will require13 some sort of protocol to determine its13 quality and resolve inconsistencies

13

13

In the end13 it seems likely that considerable budget and time will be required to implement comprehensive informationdatabases which13 could be the largest hurdle13 of them all

QuesTekrsquos MGIICME Approach to Alloy Optimization and Design (Including Cast Iron) Jason Sebastian QuesTek Innovations LLC

Integrated Computational Materials Engineering (ICME) methods involve the holistic application13 of different computational models across various length13 scales to13 the design development and13 rapid13 qualification13 of advanced13 materials Using itsMaterials by Design technologies QuesTek has successfully applied these ICME methods to13 the design development13 and full qualification of several advanced alloys (including aerospace13 structural steels aerospace13 gear steels advanced castable13 titanium alloys high-shy‐strength corrosion-shy‐resistant13 aluminum alloys13 and high-shy‐performance cast iron)

QuesTekrsquos ICME methods benefit tremendously from the activities of the US Materials GenomeInitiative (MGI)13 Recognizing that the ldquo10 to13 2 yearsrdquo that13 it takes for alloy design and development ishindering the rapid13 deployment of higher-shy‐performance materials13 the MGI is focused on ldquocutting in halfrdquo the time and cost13 to design and deploy novel advanced materials needed to enhance and sustain UScompetitiveness

QuesTek also has ongoing13 activities on the13 application of ICME methods to the13 design and

development of materials for the energy sector including advanced13 materials that enable vehicle

lightweighting13 Of particular note is an ongoing effort with a major diesel13 engine manufacturer on the13 development of high-shy‐performance cast iron13 alloys for engine block applications QuesTekrsquos efforts have

made extensive use of various MGI-shy‐type tools and databases including 1)13 density functional theory

methods for high-shy‐throughput13 screening of cast iron inoculant stabilities (structure lattice parameterstability of various13 oxidesulfide phases) 2)13 liquid iron mobility databases (to help accurately modelsolidification phenomena) and 3) thermodynamic13 databases13 (to examine nano-shy‐precipitate stability)

Moving forward to continue to accelerate the materials development cycle efforts shouldremain focused on the development13 of13 the fundamental databases (thermodynamic kinetic etc)13 that13 enable13 ICME models There13 should also be13 increased focus o efforts related13 to13 the integration13 of the

various computational databases models and tools (the ldquoIrdquo13 in ICME)

Integrated Computational13 Materials Engineering (ICME) of Generation Three Advanced High Strength

Steels Louis G Hector Jr General Motors RampD Center

While traditional material development methods often involve laborious trial-shy‐and-shy‐error testing Integrated Computational13 Materials Engineering (ICME) which falls under the umbrella of the Materials Genome Initiative is offering a means13 for materials13 development under a more compressed time scale Advanced13 high13 strength13 steels (AHSS) which13 contain13 multiple phases (eg ferrite bainite pearlite austenite martensite) and may exhibit13 phase transformation with strain are fertile ground for13 ICME However prediction of macro-shy‐scale constitutive behavior based upon the multi-shy‐scale physical chemical and mechanical phenomena13 in these13 complex microstructures is formidable challenge for ICMEUnprecedented collaboration between universities industry and government labs will be required to

14

13

address AHSS13 development archiving of data and the13 implementation of these13 materials into products that13 benefit13 the American consumer This presentation began with a brief overview of a newDOEUSAMP (US Automotive Materials Partnership)-shy‐funded ICME program aimed at13 the development13 of Generation13 Three (Gen13 3) AHSS The lack of commercial Gen13 3 steels that meet DOE targets is offering13 significant challenges13 to this13 new ICME program The program is13 meeting these challenges13 following four thrusts (1)13 making Gen 3 steels to meet DOE targets (2) passing experimental measurements of key

properties to13 an13 ICME model that produces microstructure-shy‐based13 constitutive models (3) materialevaluation in forming13 and vehicle13 performance13 simulations (4) archiving13 all relevant data13 for easy use13 by

others in13 the program Although13 the Gen13 3 AHSS ICME program is ldquomission13 orientedrdquo it demonstrates an important role for universities in providing the necessary foundation in fundamental materialsscience and engineering

Integrated Computational13 Methods for Composites Materials (ICM2)13 Lessons Learned from

Integration Planning and Feasibility Demonstrations Lara Liou GE Aviation

General Electric Aviation and Lockheed Martin Aeronautics have teamed through an Air Force

Research13 Lab13 program to13 conduct a demonstration13 of an13 Integrated13 Computational Materials Engineering digital framework that links material processing property and13 structural relationships to13 account for processability manufacturability and system performance13 with the13 goal of demonstrating

that13 the usage of13 integrated models can contribute to future airframe and engine designs with dramatic reductions in development13 time and cost Both engine and airframe Foundational Engineering Problems will be demonstrated through a common digital framework to be developed and validated onincreasingly complex articles13 The modeling tools that13 cover composite process models localized13 mechanical behavior models and global design models (Convergent Manufacturing TechnologyrsquosCOMPRO CCA Autodeskrsquos Autodesk Simulation13 Composite Analysis and13 an13 Abaqus based13 University ofMichigan micromechanics model) are centrally linked via a commercially available model integration

interface (Phoenix Integrationrsquos ModelCenterreg)13 Detailed plans of what data will13 be passed among thevarious models as well as how it will be passed have been completed Creation of these detailed13 process maps and13 simple feasibility demonstrations have raised13 inherent technical and13 logistical challenges13 to integrating commercially13 available multi-shy‐scale models Robust model integration methods13 to address these challenges were13 discussed

The exercise of drawing up detailed process maps for what to integrate and how to integrate13 the selected tools and models resulted in key early lessons learned and identification of13 tools critical tolong-shy‐term robust and sustained application of13 Integrated Computational Methods for Composites Materials (ICM2)13 The key lessons learned include

bull Individual13 models must be relatively mature and validated before an ICM2 problem can be

confidently13 defined bull F analysis-shy‐based13 tools within13 the same digital thread should utilize13 the13 same13 FE solver bull The team ideally13 should consist of at least one expert for each13 tool and13 its underlying model as

well as an expert in the digital thread and integration interface tools and programming

15

13

bull Mesh interpolation between13 finite element analysis based13 tools will likely be required and13 acommercially13 available tool for this13 type of composite mesh interpolation is13 key13 for robust and

sustained application of ICM2

Residual Stress Engineering of Rotating Nickel Components Terry Wong13 Aerojet Rocketdyne

The goal for the PW-shy‐8 project13 The Nickel Base Superalloy Residual Stress Foundational Engineering Problem program aims to13 develop13 a multi-shy‐disciplinary analysis approach13 to13 predict and13 incorporate bulk residual13 stress in the13 design of component This will allow for design of part that is optimized for a particular application or design intent (such13 as weight performance or reduction13 ofscrap) To accomplish this13 goal the PW-shy‐813 team has two major tasks 1) the13 development of an13 infrastructure and associated tools needed to predict and incorporate bulk residual13 stress in the designstages13 of a part and 2) a demonstration task that will validate the infrastructures13 and tools13 developed

The Data13 Management task is the foundational element13 of13 the larger13 infrastructure

development task The goal for the data management work is to13 develop13 an13 infrastructure that allowsthe various team members in PW-shy‐813 to share13 data13 and to store13 data13 in database This not only ensures that data generated13 by this program is never lost but also13 allows for data to13 be transferred13 in13 a secure

manner In order to meet the goal of a system13 that securely stores and shares data three major tasks must be accomplished 1) Validation that there13 is secure13 method to transfer data13 between team

members and that the method of storing data is also secure 2)13 development of13 a database schema and

3) development of means to easily upload data13 into the13 database13 and download data13 from the13 database

Of the three the most difficult task was to create a system in which the transfer and storage ofdata was secure Various proposals were made to13 ensure the security of the data but many of them

were rejected not because the proposal was actually unsecure but because it was not acceptable byeach of the13 team memberrsquos IT security policies The13 process to get each team membersrsquo IT department to approve the security of13 the data transfer13 and storage took more effort13 than anticipated In the endthe team members along with their13 respective IT departments approved the use of13 TRUcentrix as a tool to transfer13 data and the use of13 an ASM International13 hosted13 database to13 store data

The development of database schema13 is necessary step in creating the database to store all the data generated (via physical tests as well as data generated by models) A big help in creating the

database schema was to13 use the schema generated13 by Dr Steven13 Arnold13 from NASArsquos Glenn13 Research13 Center Using NASArsquos schema eliminated13 the need13 to13 start the database build13 from scratch13 and13 allowed13 us to13 focus o the specific needs of storing residual stress data

The last task involves the development of tools that make the input of data13 easy The moredifficult it is to13 enter data into13 a database the more likely data13 will not be13 entered Excel-shy‐based13 templates are being developed in order13 to make data entry as easy as possible

The two big lessons learned from PW-shy‐813 in relationship to data13 management are

bull It is essential13 to work with each13 companiesrsquo IT and IT Security people to develop a system thatsatisfies13 the security requirements of each company

16

13

bull Database schema development should not be reinvented but begin with already established the13 schema

Perspectives on MAI ProgramsDr Mike Glavicic Rolls Royce

The current formulation of MAI (AFRLrsquos Metals Affordability Initiative) programs are structured13 independently from one another in that a standard methodology does not exist that describes whatdata is to13 be archived13 at the conclusion13 of a program Nor does there exist a standard13 protocol of howto archive things on a server13 or some other computer resource at the conclusion13 of a program As aresult the current13 practices of13 MAI programs

bull Do not address IP (intellectual property)13 and licensing issues in systematic fashion which leadsto issues in the commercialization of ICME software that is generated13 under a program

bull Do not have a formalized requirement of how data is to be packaged and stored at the end of aprogram Hence even13 in13 the best of cases in13 which13 all of the data is stored the data is archived13 in13 simple folder13 fashion that13 is not13 searchable in any manner13 possible Moreover even if13 all of13 the

data is present it is extremely cumbersome to13 untangle the data for future use using a file folder strategy

Hence at the end of the program both the government and13 MAI team are13 left to manage13 the13 data and13 results in13 an13 unregimented13 fashion

The lack of regimented protocol of how to package programrsquos13 results13 (raw data and reports) makes it extremely difficult for new government programs to leverage the data from previous programs Case in13 point the RR-shy‐1013 program was developed with the13 singular goal to integrate13 the13 models developed13 in13 the LAD-shy‐213 program into commercial software13 platform From the13 outset of the13 program data models and13 other information13 that was required to attained the13 goals of the13 RR-shy‐1013 program were13 either difficult to track down lost or not provided to the13 RR-shy‐1013 team due13 to contractual disputes As aresult funding that13 was earmarked for13 specific technical work at13 the conception of13 the13 program had to

be diverted13 for the recreation13 of data that existed13 in13 a previous program and13 the progress of the

program was stunted13 by this unforeseen13 burden In13 spite of the RR-shy‐1013 programrsquos13 ability to overcome

these issues and become a successful program it13 should be noted that13 the success of13 other13 programs that13 will encounter13 similar13 problems will likely not13 apply as RR-shy‐1013 was very fortunate13 to have13 the13 means to overcome these serious issues that13 were a result13 of13 the lack of13 a data management13 plan

Furthermore if the13 MGI initiative13 is to progressively develop materials models that are13 leveraged and improved in future government funded programs a policy on how to store and share thedata from disconnected13 programs is required The PW-shy‐1113 program is the first13 step in this process as it13 will have the data from three MAI programs (RR-shy‐10 GE-shy‐10 PW-shy‐9) uploaded into database13 that will be13 searchable and will enable the reuse of the data collected in these programs However even if this13 program is a success13 there will undoubtedly13 be other features13 that will be required in the platform

developed13 and13 hence future modification13 will be required Most importantly there does not currently13 exist server that has been set up and funded for the13 archiving13 of all13 future programs with a protocol13

17

13

13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

that13 must13 be adhered to at13 the conclusion of13 a funded program Without13 such a protocol and the

existence13 of server to be13 used in all future13 programs the13 goals set forth by MGI will be13 unnecessarily

difficult and13 more expensive to attain

MAI Data Informatics Data Standards Modeling amp UQDr Rajiv Naik Pratt amp Whitney

The MAI Data13 Informatics Data13 Standards Modeling amp Uncertainty Quantification (UQ) (PW-shy‐11)13 effort13 will enable the pervasive implementation of ICME by developing (i) a web-shy‐based13 OEM-shy‐Suppliercollaboration framework13 for production13 certification13 and13 process modeling data (ii) data standards and

protocols for interoperability between13 OEMs and13 Suppliers and13 (iii)13 UQ methods to capture variability in

process microstructure and property data Models and data13 from earlier MAI programs such as GE-shy‐10 PW-shy‐913 and RR-shy‐1013 will be13 used to develop and implement OEM-shy‐Supplier integration data standards data

structures and UQ models Several OEMs and suppliers (Pratt amp Whitney General Electric Rolls-shy‐RoyceTIMET Alcoa13 and RTI International Metals Inc) are working together on the PW-shy‐11 Team

The OEM-shy‐Supplier collaboration tool will enable data13 exchange related to mechanical testingmicrostructural13 characteristics processing and process modeling The13 data13 standards developed by the

PW-shy‐1113 team will provide13 an industry standard for such OEM-shy‐Supplier collaboration The collaboration13 framework would provide a platform to manage OEM-shy‐Supplier13 workflows data and13 informationsecurity and access13 control and the13 ability to perform system administration without the13 need13 for day-shy‐to-shy‐day intervention13 of the IT department The capture of such13 supplier data over the collaboration

framework will enable the OEMs to enhance process and property13 models data analytics and UQ

modeling Three potential technologies will be evaluated13 for the OEM-shy‐Supplier Collaboration13 Frameworkndash GE Forge Vanderbilt Forge and Semantic Web technology and one13 technology will be selected13 for the

next phase of the program

18

13

Semantic Web technology is paradigm shift from highly structured13 databases to13 more flexible

graph databases from single database searches to13 multiple and13 disparate13 database13 searches and from

keyword searches to contextual and meaningful (semantic) searches It can provide flexibility13 of data

structures and enhanced OEM-shy‐Supplier13 interoperability However it requires more work upfront13 for13 implementation as it needs the development of a common vocabulary and relationships

The data13 standards and data13 structures effort will leverage work at national and13 international standards13 organizations such13 as ASM ASTM CEN and13 NIST and13 develop13 standards and vocabularies forproduction13 certification data that can include chemistry microstructure and mechanical properties This effort will also develop means for the capture of data needed13 for the UQ and13 ICME models andbuild13 databases for13 the capture of13 the data from the GE-shy‐10 PW-shy‐9 and RR-shy‐10 MAI programs

The ICME13 models being developed under the GE-shy‐10 PW-shy‐9 and RR-shy‐10 programs involve severalinterconnected models related to processing microstructure and mechanical properties The UQ effort under PW-shy‐11 will involve considering the uncertainty at each13 step13 of the modeling process and flowing it through the interconnected models to evaluate the relationships between13 the variabilities of13 the

processing parameters the microstructure and13 the mechanical performance

ASMrsquos Computational Materials Data Network amp the NIST-shy‐ASM Structural Materials Data

Demonstration ProjectScott D Henry ASM International

ASM International ndash the worldrsquos largest13 association of13 metals-shy‐focused materials engineers scientists technicians educators and students13 ndash has launched13 a series of initiatives under the

ldquoComputational Materials Data (CMD)13 Networkrdquo umbrella The objective13 of the13 CMD Network is to serve13 as center for information collection and dissemination for materials data13 to support integrated

computational materials13 engineering (ICME) and to help realize the goals13 of the US Materials13 Genome

InitiativeThe first major project for the CMD Network is the NIST-shy‐ASM Structural Materials Data

Demonstration Project (SMDDP) a cooperative research project funded by NIST with participation from

the Kent13 State Center13 for13 Materials Informatics and Granta Design Ltd13 The main13 objectives of SMDDP

are13 to

19

13

bull Establish well-shy‐pedigreed13 and13 curated13 demonstration13 datasets for non-shy‐proprietary metallic structural materials13 data over multiple length scales The team has13 chosen to work with 6061

aluminum and the13 related Al-shy‐Mg-shy‐Si system

bull Work with NIST and the materials data community to develop materials data schema and

ontologies for the demonstration13 datasets cognizant of broader interests and13 datasets

bull Develop and carry out a series of test problems that represent relevant use cases for the13 repository

bull Make data open to the materials data community for use in data analytics modeling and

educational purposes

bull Actively engage the materials data community and13 widely disseminate the findings from the

project

bull Develop and implement data13 capture13 and curation procedures that can serve13 as models for otherdata repositories

The basic philosophy behind the project is that often standards13 emerge through trial and error and examples are13 needed to spur this process

The project is using the NIST-shy‐KSU DSpace13 repository for data13 files The13 curated structured

database is being developed13 using GRANTA13 MI software and13 is hosted13 o ASM serversThe project ndashlaunched in late 2013 and planned as an 18-shy‐month program13 ndash is being conducted in

phases The13 first phase13 involved developing starter database13 and gaining experience13 in data13 curation The second phase (currently underway) involves building out more comprehensive data13 sets that will be

suitable for carrying out test problems13 related to processing-shy‐structure-shy‐property relationships in13 Al 6061 The third phase will involve making further refinements in response to testing The complete data13 set will be opened up online to the materials community as a whole

Globus13 Scientific13 Data Services13 Current and FutureBen13 Blaiszik1 Kyle Chard1 Jim Pruyne1 Rachana Anathakrishnan1 Steve Tuecke12 Ian Foster12

1University of Chicago Computation Institute 2Argonne National Laboratory MCS Division

During this talk we discussed a variety of currently available Globus services13 Via these Globusservices developers13 and IT administrators13 can offer simple secure and robust file transfer as13 well as13 identity profile and group management to their user communities13 Globus allows end users to create

and13 manage a unique identity that can13 be linked13 to13 external identities for authentication End13 users can13 also transfer files across wide13 area13 networks faster and more13 easily whatever their location

The talk also covered prototype services related to data13 publication metadata tagging and13 data

search Globus13 publishing capabilities13 are delivered through a hosted service Metadata is13 stored in the

cloud while raw published data and a fully-shy‐formed metadata log are stored on campus institutionaland group resources that13 are managed and operated by campus administrators Published datasets are

20

13

organized13 by communities and13 their member collections Globus users can13 create and13 manage theirown13 communities and13 collections through13 the data publication13 service13 with collection level policies regarding user13 access Additionally metadata from these communities are imported into a centralized

index to allow for robust searching functionality

Semantics13 for Deep13 InteroperabilitySamuel Chance iNovex

Semantic technology is based on mature13 web standards and intended to be13 used as global framework for13 machine-shy‐to-shy‐machine communication Interoperability is enabled even when the more

basic elements of the semantic technology stack are used13 in13 conjunction with a controlled vocabularyKnowledge13 representation is enabled as one13 begins to utilize13 the13 ldquohigherrdquo elements of the13 stack

brief history of the semantic web13 was presented13 and13 started13 with13 a depiction13 of the web13 that most people see today documents and web pages presented to the user13 through a web application It13 then depicted a transition to an intermediate state where documents and data can co-shy‐exist on asemantic13 web riding atop the same infrastructure (http) Even in this13 state the data can13 be directly

interpreted by a machine13 The history concluded with a depiction of data being extracted from adocument to13 become part of the web13 of data

The basics of semantic technology were presented and emphasized that the technology has afoundation based13 o formal logic that enables software13 to understand information and reduce13 the13 need

for13 a human to interpret13 the meaning This section of13 the presentation concluded with a chart13 depicting

how ldquotriplesrdquo link to13 each13 other to13 form a web13 of dataThe final section touched upon adoption of13 and things to consider13 when semantic technologies

are13 being considered as solution for community or enterprise

21

13

Big Data13 and Semantic TechnologiesHarlan Shober RJ Lee Group

The presentation started with a description of entities that13 comprise laboratoryinformatics with the central elements being a federated architecture scientific ldquoBig Datardquo management13 and workflow-shy‐driven Laboratory Information Management13 Systems (LIMS)To aid attendee understanding initial context13 was provided by first13 defining for ldquoBig Datardquo and

Semantic Technologies Context13 was further bolstered through a description of a problem that13 included laboratory testingcharacterization and fabrication processes The challenges for atypical laboratory include the use of many systems and the creation of large amounts of data13 that13 grows daily but13 cannot13 be readily accessed

After the context13 was established the needed functionality was described and consistedof several elements contribute index search analyze and archive While bearing in mind theneeded functionality a string of universal scientific data13 management13 process categories werethen presented with the intent13 of applying the universal description to the materials design anddevelopment13 domain

22

13

Though semantic technologies are applicable over the entire lifecycle of the data13 management13 process three categories were specifically identified as primary candidates fortheir application Data13 Acquisition Data13 Manipulation and Storage and Discovery andAdvisory W3C standards for Resource Description Framework (RDF) were applied to a notionalexample with the intent13 of completing a thread that13 gave the attendee a sense of how thistechnology could be applied

The presentation continued with a description of Postulate a specific solutiondeveloped by RJ Lee Group that13 could be used as part13 of a suite of tools to enable the captureof data13 and the near-shy‐frictionless tagging of data13 with additional observations provenanceinformation or links to additional data Using tools like Postulate repositories could be moreeasily shared with others in a federated manner These technologies would enable thedevelopment13 of indices that13 increase the precision of search results and ease analytics since thedata13 would be ldquoself-shy‐describedrdquo

In the remainder of the presentation Postulate functionality was further described andsome of the challenges presented earlier were addressed For example how do you processthe quantity and variety of data how do you store it how do you find the data13 you care aboutand how can the data13 be presented to you in a meaningful way

Natural13 Language Processing (NLP)Tim Hawes Decisive Analytics

The presentation provided the attendees with a description of NLP13 and how it can be13 used It started with examples13 of how it can be used to mine ldquoBig Datardquo to predict credit card fraud or assess13 user purchase histories for product development NLP is also13 being used13 in13 the field13 of biology and13 appears to have13 enjoyed some13 level of success

Since13 the13 biology and materials communities both publish large13 volume13 of documents (abstracts papers presentations etc) many of13 the NLP approaches developed for13 biology may be

applicable13 to materials The next section of the presentation focused on describing various aspects of NLP tokenization

part-shy‐of-shy‐speech tagging disambiguation parsing entity disambiguation and role13 labeling This was helpful in13 removing the mystery surrounding NLP for those13 who hadnrsquot had previous exposure

Typical products from NLP13 would be useful to the materials development community Examples include providing the gist of a documentrsquos content providing answers to natural13 language questions and knowledge13 construction

Specific NLP13 challenges for the13 materials domain were13 identified inline13 use13 of equations prevalence of tables graphs and13 images (more of a computer vision13 problem) and13 numerically driven13 information13 Additionally the ldquogenrerdquo problem was discussed as13 well as13 a way this might be addressed

using raw data (versus learning data) and13 semi-shy‐supervised learning active learning and distant supervision approaches

23

13

The presentation concluded by reiterating how NLP13 can be used and the value of using it describing some its limitations and13 identifying tool development needed13 for the materials domain

24

13

Community13 and13 Government Priorities in13 Addressing13 Materials Data

The following numbered13 priorities for13 both community and government13 actions were lsquorolled uprsquofrom workshop participantsrsquo inputs and were reviewed at13 the workshop The raw inputs provided by

participants are captured13 under each13 numbered13 priority and provide context to13 the priority and13 often13 more specific actions

Community-shy‐Led13 Priorities1 Develop13 and deploy13 standards13 for data and federatedcollaborative environments-shy‐ Promote13 standards for data13 and collaboration environments-shy‐ Designing and deploying suitable software environments that are capable of facilitating e-shy‐

recording and accelerating (fail often-shy‐fail fast)13 of13 the communityrsquos experience (both successes and

failures)13 in exploring integrated workflows needed to realize the vision of ICME and MGI-shy‐ Data13 standards (program managementtechnical datametadata)-shy‐ Community must start developing a common13 data format data package could13 look like an13

HDF5 file and be shared in a variety of systems -shy‐ Begin13 definition13 of standardized13 infrastructure

-shy‐ Produce13 draft guideline for the metadata including information regarding authorship provenance targethellip

-shy‐ Share13 the13 data13 types being collected with the13 community Include13 nametypeunits of data

-shy‐ Community must adopt a federated13 approach13 -shy‐ allow organizations to autonomously create their13 own13 toolsets but be able to13 readwrite to13 the common13 data package mentioned13 in13 point 1

-shy‐ Good documentation on repositories are important-shy‐ Address networking of repositories (lightweight metadata databaseindex) -shy‐ Digital heterostructure microstructure mapping algorithms for both scanning electron microscopy

(SEM)13 and transmission electron microscopy (TEM) -shy‐ Materials process model (thermodynamics) integration with ICMSE (Integrated Computational

Materials Science and Engineering) data flow

-shy‐ Data sharing protocol across users (government13 OEM13 developers) with appropriate securitycontrol

-shy‐ Work with instrumentequipment OEMs to get auto ingest capabilities and open architectures-shy‐ Work together to define common vocabularies for interoperability-shy‐ Standard tests for ICME13 model data13 standards for microstructure13 data

-shy‐ Standards for ICME13 model data13 storage13 and workflows 2 Communicate the value and need13 for digital materials13 data-shy‐ Brand13 management ndash consistent messaging of goals13 and beliefs -shy‐ Share13 success stories -shy‐ Value (ROI13 Return on Investment) o data capture (globally locally)-shy‐ Give incentive ($$$) for case studies to provide evidence that a shared data repository can be

actually useful beyond the archival purpose

25

13

-shy‐ Share13 ROIs for MII (the13 Materials Innovation Infrastructure)-shy‐ ICME projects due to their iterative nature take longer and are more expensive to realize benefits

in industry implementation13 Multi-shy‐year ICME programs are becoming13 a hard sell to management difficult to13 establish ROI and13 realize benefits claimed

3 Engage the community-shy‐ Write grant proposals-shy‐ Collaboration13 (or e-shy‐collaboration) science that will help us13 identify13 the specific13 incentives13 and

impediments for establishing mutually beneficial highly productive sharing of expertise between13 disciplinary experts in13 materials manufacturing and13 data sciences

-shy‐ Continue to13 work o hard13 problemsmdashdevelop13 open13 source codes discover new descriptors data

mining uncertainty analysis-shy‐ Organize as a community (governmentindustryacademia nationalinternational) -shy‐ Follow data13 recommendations in MGI strategic planmdashworkshops to develop roadmap-shy‐ Attend13 more events like the MGI workshop13 mdash gaining13 lessons learned from other organizations is

extremely valuable

4 ldquoDefine critical datardquo13 to be13 compiled-shy‐ Consensus o what fundamental genomic data is most-shy‐important to compile (thermodynamics13

kinetics13 properties [strength]13 high-shy‐throughput13 DFT) -shy‐ Agree to13 the concept of a ldquouniversal repositoryrdquo for all genomic data that everyone will contribute

to and use (cf everyone maintaining and hosting their13 own databases) this includes issues of13 data

security and sharing etc -shy‐ List of most-shy‐important ICMEMGI13 materials development activities from an industrycommercial13

perspective (alloys composites piezoelectrics etc)-shy‐ Some13 ongoing repository efforts seem redundant We13 should avoid unnecessary competition and

try not13 to step o other peoples toes5 Explore and leverage data solutions developed13 outside materials13 community-shy‐ Semantic web infrastructure13 ndash evaluate13 feasibility of lsquounstructured materials datardquo management

(Technical pathcosttimeframe) -shy‐ Keep communicating outside13 the13 community about solutions -shy‐ Focus on available13 tools open ones when possible

6 Train13 students-shy‐ Sponsor students13 who can think in terms13 of IT and MSE

-shy‐ Training workshops (through professional societies) for training the new generation of materials students13 and researchers13 in the use of readily (already) accessible data science tools13 and cloud

services13 to make13 the13 best use13 of this emerging13 infrastructure13 in enhancing13 their own individual day-shy‐to-shy‐day productivity

7 Establish community13 norms13 for publishing materials13 data-shy‐ Involve publishers by requiring data when publishing Begin by making just the data available first

and then work on standardization panel

26

13

-shy‐ Data associated with publications should be shared along with the publications Not only the

metadata (eg average grain size vs cooling rate) but also the source of the data (eg micrographs used13 to13 generate13 grain size temperaturetime charts13 used to calculate cooling rates etc)

Government-shy‐Led13 Priorities1 Lead13 data strategyapproach13 development-shy‐ Hold meeting like this one in 6-shy‐1213 months with purpose13 of developing roadmap for MII-shy‐ Define a technical roadmap for13 the materials data infrastructure

-shy‐ Governmentrsquos got to decide on a strategy-shy‐ Whatrsquos the critical nucleus eg getting13 CALPHAD off the13 ground

-shy‐ Government must establish an official stance on security and infrastructure-shy‐ List of most-shy‐important ICMEMGI materials development activities from a government perspective

(alloys composites piezoelectrics etc)13 -shy‐ Needs assessment for professional workforce development-shy‐ Work with industry to make sure the research-shy‐to-shy‐engineering13 gap is filled What is the practical13 use

of the projects -shy‐ Provide13 the13 materials community organizing function

-shy‐ Business case for productivity improvement with13 data readily available

-shy‐ Facilitator as honest broker of capability awareness and data13 sharing

-shy‐ Interdisciplinary coordinated effort13 to avoidminimize duplication and redundancy

2 Facilitate data deposition (policy amp technology)-shy‐ Setting expectations and guidelines for best practices and standards where13 relevant -shy‐ Invest in tool13 making data management as easy as possible-shy‐ Promote standards developmentcommon data structuresshare knowledge representations -shy‐ Learn about the WBC Semantic Web capabilities and limitations -shy‐ Government should foster community development for both data package refinement and creating

standard approach to creating an enterprise toolset -shy‐ Provide13 motivation for standard data13 collection ensure13 it includes pedigreeprovenancemeta-shy‐data

-shy‐ Facilitator for materials data13 standards (good and trustworthy data13 vs bad data) -shy‐ Facilitator for forming consortium for data13 mining data13 warehousing and sharing

-shy‐ Facilitate13 industry standards for ICME13 model validation and maturity levels-shy‐ Facilitate13 standards for model-shy‐based13 process and13 supplier qualification

3 Incentivize data deposition-shy‐ Provide13 motivation for13 standard data collection ensure it13 includes13 pedigreeprovenancemeta-shy‐data-shy‐ Push ldquodata13 management planrdquo requirements to ldquodata13 depositrdquo requirements -shy‐ Make data sharing a requirement -shy‐ Contractual requirements that preserve data for future use

-shy‐ Demand data submission as a deliverable that will be in the contract before paying the funds -shy‐ Require that any data generated13 by federal grants is provided13 to13 the government (or a designated13

repository)13 within 3 years after completionmdashstart with basic13 research

-shy‐ Demonstrate the return on investments in the materials data infrastructure

27

13

4 Support13 data sharing13 and deposition13 (provide resources)-shy‐ Increase of MGI13 funds dedicated to MII13 development to 10-shy‐-shy‐repositories amp collab platforms -shy‐ Commit to13 long-shy‐term support for data repositories amp collaborative platforms-shy‐ Funding for development collection of fundamental genomic data

-shy‐ Support thermodynamicskinetics database development for light-shy‐weight alloy systems-shy‐ Funding for universal data13 repository (eg at NIST) ndash creation and maintenance

-shy‐ Fund or establish data13 repositories with commonuniform data13 collection format -shy‐ Createidentify secure data sharing site (like AMRDEC)13 for13 government13 contractorsmdashnot a

repository

-shy‐ AFRL take lead13 o making a database that can13 be13 shared

-shy‐ Infrastructure for materials related semantic web applications5 Support13 data creation-shy‐ Support experimental measurement effort to provide13 quality data for13 database development -shy‐ Support more13 high-shy‐throughput13 (HT)13 studies13 for applied programs13 to populate data Current HT

approaches support are13 limited to too fundamental single-shy‐phase DFT calculations 6 Enhance data management13 competency-shy‐ Government needs to increase in-shy‐house expertise in13 these areas or alternatively work closer with13

groups like NIST We cannot allow other organizations to define our needs for us7 Provide quality datasets13 for model13 developmentvalidation-shy‐ Need for open model benchmark cases with data to be available to the community

28

13

Attendees

John Allison University of MichiganJohn Beatty Army Research Laboratory

Ben13 Blaiszik University of ChicagoANL Carrie Campbell NIST

Sam Chance13 InovexJennifer13 Fielding Air Force Research LabMarco Fornari Central Michigan

Tim Hawes Decisive Analytics Corp

Lou Hector General MotorsAndrea Helbach Air Force Research LabKris Hill ITI-shy‐GlobalMike Glavicic Rolls Royce

Ro Gorham NCDMM

Dennis Griffin NASA

Mike Groeber Air Force Research LabScott Henry ASM InternationalMatt Jacobsen Air Force Research LabWill Joost Dept of Energy-shy‐EERE

Surya13 Kalidindi Georgia Tech

Kevin Kendig Air Force Research LabBen13 Leever Air Force Research LabLara Liou GE Aviation

Kenny Lipkowitz Office of Naval Research

Bill Mullins Office of Naval Research

Benji Maruyama Air Force Research LabRajiv Naik Pratt amp WhitneyJim Neumann Honeywell Ruth13 Pachter Air Force Research LabClare Paul Air Force Research LabAmra Peles United13 Technologies Research13 CenterAndy Powell GE Aviation

Brian13 Puchala University MichiganAjit Roy Air Force Research LabAyman13 Salem Materials ResourcesTom Searles Materials Data Management Jason Sebastian QuestekRichard13 Serwecki NASA

Dongwon Shin Oak Ridge13 National Labo

29

13

Harlan Shober RJ LeeJeff13 Sheehy NASA

Jeff13 Simmons Air Force Research LabVeera Sundararaghavan University of MichiganSesh Tamirisakandala RTI Glenn Tarcea University of MichiganTJ Turner Air Force Research LabVasisht Venkatesh Pratt amp WhitneyChuck Ward Air Force Research LabJim Warren Natl Institute of Standards amp TechJeff13 Williams GE Aviation

Terry Wong Rockedyne

Wayne Ziegler Army Research13 Laboratory

Jake Zindel Ford Motor Company

30

Page 14: Materials Genome Initiative: materials data · universallyaccepted best practices. Under the MGI, Federal agencies have funded research and development designedtoexplore, develop,

13

and sponsorship keywords and data13 specific labels (COMPOSITION ENTROPY EGAP hellip) Simple13 scripting languages13 can be used to automatically download13 from the AFLOWLIB13 repository The AFLOW

approach has13 been extensively applied on Platinum-shy‐group alloys thermoelectric materials magnetic materials topological insulators scintillators etc The AFLOW REST-shy‐API can13 be extended13 to13 includeexperimental data For further details on the13 database the13 reader is directed tohttpmaterialsdukeeduauroAUROARTICULAjcommatsci201202002pdf

Materials Innovation Infrastructure Lessons Learned and Future PlansSurya Kalidindi Georgia Institute13 of Technology

central activity in13 materials research13 is the exploration of processing-shy‐structure-shy‐property (PSP) linkages13 Workflow is defined13 as the sequence13 of steps employed for establishing13 PS linkages ofinterest to any specific engineeringtechnology application

High Throughput Explorations of PSP Linkages13 We must treat13 hierarchical13 material as acomplex System which means embracing and exploiting uncertainty quantification13 and13 complexity

management conceptstools13 It also means we need to design develop and13 validate decision support systems13 that will leverage the best current understanding (with13 its uncertainties) and13 provide objective13 guidance13 on future13 effort investment (combination of experiments and simulations)13 This meansleveraging Data Science to effectively feed the decision support systems described above by developingand deploying high13 throughput strategies for obtaining high value information from both experimentsand models

What are integrated13 workflows Integrated workflows are those that utilize the best13 combination of experiments13 and simulations13 in extracting robust and reliable PSP linkages13 This requiresengaging and exploiting cross-shy‐disciplinary expertise that includes materials science manufacturingsystems approaches uncertainty quantification computational science data and information sciencesThrough this one must ensure that the workflows output the critical information needed13 by design13 and13 manufacturing stakeholders in the materials development value chain13 Integration demands expertise

sharing13 which essentially is the core element of a collaboration Cross-shy‐disciplinary Integration13 demands13 effective13 collaborations13 such as those one might set up using the modern tools of the day13 ie a e-shy‐Collaboration

10

13

Motivators for e-shy‐Collaborations13 e-shy‐Collaborations are central to13 enhancing the productivity ofmaterials development activities in several ways

bull individual13 research groups can identify and engage13 complementary expertise13 needed for their projects

bull organizations can13 coordinate multiple teams and13 multiple projects to13 maximize outputimpactbull organizations can13 identify and13 seek the right partners that13 enhance their13 individual and combined

value e-shy‐Collaborations can13 be open semi-shy‐open or closed13 depending o the needs of the specific organization Productivity enhancement leading to dramatic reductions in cost and time13 expended in materials development is the13 main incentive13 for e-shy‐collaborations

Main Impediments to e-shy‐CollaborationsLack13 of tools to identify13 the13 ldquobestrdquo13 collaborators

bull Need software tools that quantify more precisely the expertise of individuals groupsorganizations

bull Need software tools for13 allowing rich annotations and e-shy‐recording of13 discussions Need analytic tools for13 identification extraction13 and management of metadata (ie13 high13 value abstractionsfrom the raw data)

bull Need searchable semantic metadata databases (these are13 lightweight and can be13 centralized

and customized for specific13 user communities) bull Need software tools for tracking provenance (ie13 whowhatwhenhow13 generated modified

etc) Lack13 of experience13 with the13 use13 of integrated workflows

bull Need software tools13 for e-shy‐recording of13 workflows and development of searchable workflow

databases bull Need recommendation schemes that suggest the best workflow(s) based on recorded prior

experience this provides means to transfer expertise13 acquired by an organization to other teams and other13 projects (even in the same organization)

bull Leads to possible automation and dramatic enhancement of productivity

MINED Grouprsquos Past Experience13 (~2005 -shy‐ ~2008) Funded by the Army Research Office ARO(David Stepp) engaged computer science (CS) graduates to produce13 a single13 user platform that13 integrated certain software codes previously produced13 by our group software development subsequently abandoned but recruited two CS graduates into PhD programs in Materials Science andEngineeringMechanical Engineering (MSEME)13 which dramatically altered the direction of13 research in

the group brief chronology of development bull Senior Design Project (2011-shy‐2012) An online13 portal designed for sharing data13 files and tools

learned the importance of user interfaces and the need for professional13 code developers andsoftware architects

bull HUBzero Instantiation (2012-shy‐2013) Built hub by customizing HUBzero for hierarchical structuralmaterials there are several advantages in terms of accessing and executing codes13 (on the server side) but there are significant barriers13 to establishing fast and productive13 cross-shy‐disciplinary e-shy‐collaborations

11

13

bull GTRI Integration Platform (2013-shy‐2014) Engaged professional code13 developers and software13 architects to design13 a multi-shy‐user web13 portal that allows data and13 code sharing while tracking

provenance and workflows underestimated the resources needed for such a project bull MATIN Collaboration Environment (2013-shy‐2014) Lightweight integrated software13 shell that allows

users to13 seamlessly interface with13 numerous cloud13 services offered13 through13 the web13 through13 their13 APIs Examples include GitHub for versioning and archival of codes Authorea forcollaborative editing of documents figshare for citable publication of research Plotly forcollaborative data analysis13 and visualization Google+ and LinkedIn for eteaming13 and networking MATIN focuses on building an emergent community ndash Materials amp Manufacturing Informatics (MMI) MATIN provisions web services for13 software integration platforms and innovationinfrastructure Examples of MATIN13

bull Spatial Statistics httpsgithubcomtonyfastSpatialStatisticsFFT

bull PyMKS httpopenmaterialsgithubiopymksindexhtml bull Example discussion about periodic boundary conditions in running Finite Element (FE or FEM)

simulations httpsgithubcomsfepysfepyissues267

In the platform developed it is noted that the user or userrsquos organization can set permissions at the

level13 of individual datasetscodes o the degree and13 type of13 sharingCentral Challenge Activation The transition from traditional disciplinary workflows to the

integrated workflows needs to overcome an activation barrier There is no13 clear incentive to13 get ahead13 of the curve in13 anticipating how the upcoming transformations in how materials development will be

pursued13 in13 the future Open (reproducible) science alone is not a sufficiently strong motivator for materials development community with13 strong interests in intellectual13 property development13 indeed itmight actually13 be perceived13 as a negative There is need to explore13 and refine13 what incentives work for individual13 researchers (eg increased productivity)

Future13 Plans13 Continue to13 develop13 and13 deploy MATIN at GT and any other organizationsgroups that13 express an interest in working with us Design create and distribute13 suite13 of ldquomobile13 firstrdquo Appsdesigned13 to13 make data ingest into13 collaboration environments such as MATIN from both material characterization equipment and multi-shy‐scale models an enhanced13 user experience with13 friendly andnatural interfaces Proactively promoteincentivize13 datametadata13 ingest into publicly shareddatabases Work with13 professional societies to13 announce and run new competitionsawards for13 recognizing and rewarding good practices by the members of the broader materials innovation13 usercommunity (eg Best13 Documented Data Award Best Documented Tool Award Best Documented

Workflow Award) Develop and deliver new workshops for training the new generation of13 graduate

students postdocs and junior faculty in the13 emerging workflows of Materials amp ManufacturingInformatics

High Performance Cast Aluminum Alloys for Next Generation Passenger Vehicle EnginesDongwon Shin Materials Science and Technology Division Oak Ridge National Laboratory

12

13

Oak Ridge National Laboratory (ORNL) teamed with Chrysler and Nemak13 to develop next-shy‐generation high-shy‐performance cast aluminum alloys that have13 improved castability high-shy‐temperature

strength and fatigue13 performance At this point the13 data management practice is limited13 to13 analytical data spreadsheets other than13 the CALPHAD computational thermodynamic databases and13 proprietary

property databases for casting simulation As the project goes on more robust data schema will bedeveloped13 to13 store various types of data populated13 from the experiments and13 calculations

Besides the data structure design the speaker discussed13 the necessity of a new approach13 to13 identifying key descriptors from the large dataset that can guide the design of new materials An alloydesign13 requires the balance of extremely complex relationships among a large number of descriptors Some13 examples of these13 descriptors include13 phase fraction13 of key intermetallics lattice parameters ofthe matrix and diffusion13 kinetics13 of solute atoms However optimizing these quantities relies on good

physical metallurgy intuition13 coupled13 with13 clever experiments dogged13 sleuthing and13 a healthy dose ofluck13 Such descriptor-shy‐based predictions of materials properties date back to13 the beginning13 of modern

solid-shy‐state chemistry and physics As13 shown in the Darken-shy‐Gurry map to predict the solubility ofelements in Tantalum13 intuitive descriptors13 such as electronegativity and radius13 could be used formaterials design However as the chemistry and physics13 of materials become more complex intuition-shy‐based13 search13 may not work anymore Thus the use of more advanced13 statistical analysis and13 datamining approach might be required to accelerate the design of complex multi-shy‐componentmulti-shy‐phase

alloy design13

Data Related to Cast Aluminum

Jake Zindel Ford Motor Company

The main focus of our research is the development of CAE (Computer-shy‐aided Engineering) tools to support13 two significant13 activities in the company component13 design (product13 development)13 and

manufacturing engineering (high-shy‐volume production) A key13 part of our research is conducting13 experiments to collect data13 for the13 development and validation of13 the CAE tools

The data13 we collect consists of mechanical properties physical properties microstructuralimages and manufacturing process parameters that form the13 pedigree13 of the13 material This information

can be in the form of Excel files proprietary-shy‐software storage files screen shots and written reports Our immediate need is to deal with this large amount of existing information collected over

many years on a ldquolocalrdquo level The first step is to organize many independent collections of informationin all13 their forms13 The next issue is the time required to enter all13 this information into the database13 Maintenance may also require a significant amount of effort Lastly there will also13 be a need13 to13 manipulate the data contained in the database using Excel Mini-shy‐tab MATLAB etc Therefore links or13 easy exportation of data13 to these13 types of applications will be13 necessary

The next level would involve dealing with information that has been created or collected byothers From an13 OEM (Original Equipment13 Manufacturer)13 perspective this would13 involve convincing

suppliers13 to share processing information that they may consider proprietary In addition to suppliersuniversities and13 national laboratories are also13 sources of data Data from all the13 sources will require13 some sort of protocol to determine its13 quality and resolve inconsistencies

13

13

In the end13 it seems likely that considerable budget and time will be required to implement comprehensive informationdatabases which13 could be the largest hurdle13 of them all

QuesTekrsquos MGIICME Approach to Alloy Optimization and Design (Including Cast Iron) Jason Sebastian QuesTek Innovations LLC

Integrated Computational Materials Engineering (ICME) methods involve the holistic application13 of different computational models across various length13 scales to13 the design development and13 rapid13 qualification13 of advanced13 materials Using itsMaterials by Design technologies QuesTek has successfully applied these ICME methods to13 the design development13 and full qualification of several advanced alloys (including aerospace13 structural steels aerospace13 gear steels advanced castable13 titanium alloys high-shy‐strength corrosion-shy‐resistant13 aluminum alloys13 and high-shy‐performance cast iron)

QuesTekrsquos ICME methods benefit tremendously from the activities of the US Materials GenomeInitiative (MGI)13 Recognizing that the ldquo10 to13 2 yearsrdquo that13 it takes for alloy design and development ishindering the rapid13 deployment of higher-shy‐performance materials13 the MGI is focused on ldquocutting in halfrdquo the time and cost13 to design and deploy novel advanced materials needed to enhance and sustain UScompetitiveness

QuesTek also has ongoing13 activities on the13 application of ICME methods to the13 design and

development of materials for the energy sector including advanced13 materials that enable vehicle

lightweighting13 Of particular note is an ongoing effort with a major diesel13 engine manufacturer on the13 development of high-shy‐performance cast iron13 alloys for engine block applications QuesTekrsquos efforts have

made extensive use of various MGI-shy‐type tools and databases including 1)13 density functional theory

methods for high-shy‐throughput13 screening of cast iron inoculant stabilities (structure lattice parameterstability of various13 oxidesulfide phases) 2)13 liquid iron mobility databases (to help accurately modelsolidification phenomena) and 3) thermodynamic13 databases13 (to examine nano-shy‐precipitate stability)

Moving forward to continue to accelerate the materials development cycle efforts shouldremain focused on the development13 of13 the fundamental databases (thermodynamic kinetic etc)13 that13 enable13 ICME models There13 should also be13 increased focus o efforts related13 to13 the integration13 of the

various computational databases models and tools (the ldquoIrdquo13 in ICME)

Integrated Computational13 Materials Engineering (ICME) of Generation Three Advanced High Strength

Steels Louis G Hector Jr General Motors RampD Center

While traditional material development methods often involve laborious trial-shy‐and-shy‐error testing Integrated Computational13 Materials Engineering (ICME) which falls under the umbrella of the Materials Genome Initiative is offering a means13 for materials13 development under a more compressed time scale Advanced13 high13 strength13 steels (AHSS) which13 contain13 multiple phases (eg ferrite bainite pearlite austenite martensite) and may exhibit13 phase transformation with strain are fertile ground for13 ICME However prediction of macro-shy‐scale constitutive behavior based upon the multi-shy‐scale physical chemical and mechanical phenomena13 in these13 complex microstructures is formidable challenge for ICMEUnprecedented collaboration between universities industry and government labs will be required to

14

13

address AHSS13 development archiving of data and the13 implementation of these13 materials into products that13 benefit13 the American consumer This presentation began with a brief overview of a newDOEUSAMP (US Automotive Materials Partnership)-shy‐funded ICME program aimed at13 the development13 of Generation13 Three (Gen13 3) AHSS The lack of commercial Gen13 3 steels that meet DOE targets is offering13 significant challenges13 to this13 new ICME program The program is13 meeting these challenges13 following four thrusts (1)13 making Gen 3 steels to meet DOE targets (2) passing experimental measurements of key

properties to13 an13 ICME model that produces microstructure-shy‐based13 constitutive models (3) materialevaluation in forming13 and vehicle13 performance13 simulations (4) archiving13 all relevant data13 for easy use13 by

others in13 the program Although13 the Gen13 3 AHSS ICME program is ldquomission13 orientedrdquo it demonstrates an important role for universities in providing the necessary foundation in fundamental materialsscience and engineering

Integrated Computational13 Methods for Composites Materials (ICM2)13 Lessons Learned from

Integration Planning and Feasibility Demonstrations Lara Liou GE Aviation

General Electric Aviation and Lockheed Martin Aeronautics have teamed through an Air Force

Research13 Lab13 program to13 conduct a demonstration13 of an13 Integrated13 Computational Materials Engineering digital framework that links material processing property and13 structural relationships to13 account for processability manufacturability and system performance13 with the13 goal of demonstrating

that13 the usage of13 integrated models can contribute to future airframe and engine designs with dramatic reductions in development13 time and cost Both engine and airframe Foundational Engineering Problems will be demonstrated through a common digital framework to be developed and validated onincreasingly complex articles13 The modeling tools that13 cover composite process models localized13 mechanical behavior models and global design models (Convergent Manufacturing TechnologyrsquosCOMPRO CCA Autodeskrsquos Autodesk Simulation13 Composite Analysis and13 an13 Abaqus based13 University ofMichigan micromechanics model) are centrally linked via a commercially available model integration

interface (Phoenix Integrationrsquos ModelCenterreg)13 Detailed plans of what data will13 be passed among thevarious models as well as how it will be passed have been completed Creation of these detailed13 process maps and13 simple feasibility demonstrations have raised13 inherent technical and13 logistical challenges13 to integrating commercially13 available multi-shy‐scale models Robust model integration methods13 to address these challenges were13 discussed

The exercise of drawing up detailed process maps for what to integrate and how to integrate13 the selected tools and models resulted in key early lessons learned and identification of13 tools critical tolong-shy‐term robust and sustained application of13 Integrated Computational Methods for Composites Materials (ICM2)13 The key lessons learned include

bull Individual13 models must be relatively mature and validated before an ICM2 problem can be

confidently13 defined bull F analysis-shy‐based13 tools within13 the same digital thread should utilize13 the13 same13 FE solver bull The team ideally13 should consist of at least one expert for each13 tool and13 its underlying model as

well as an expert in the digital thread and integration interface tools and programming

15

13

bull Mesh interpolation between13 finite element analysis based13 tools will likely be required and13 acommercially13 available tool for this13 type of composite mesh interpolation is13 key13 for robust and

sustained application of ICM2

Residual Stress Engineering of Rotating Nickel Components Terry Wong13 Aerojet Rocketdyne

The goal for the PW-shy‐8 project13 The Nickel Base Superalloy Residual Stress Foundational Engineering Problem program aims to13 develop13 a multi-shy‐disciplinary analysis approach13 to13 predict and13 incorporate bulk residual13 stress in the13 design of component This will allow for design of part that is optimized for a particular application or design intent (such13 as weight performance or reduction13 ofscrap) To accomplish this13 goal the PW-shy‐813 team has two major tasks 1) the13 development of an13 infrastructure and associated tools needed to predict and incorporate bulk residual13 stress in the designstages13 of a part and 2) a demonstration task that will validate the infrastructures13 and tools13 developed

The Data13 Management task is the foundational element13 of13 the larger13 infrastructure

development task The goal for the data management work is to13 develop13 an13 infrastructure that allowsthe various team members in PW-shy‐813 to share13 data13 and to store13 data13 in database This not only ensures that data generated13 by this program is never lost but also13 allows for data to13 be transferred13 in13 a secure

manner In order to meet the goal of a system13 that securely stores and shares data three major tasks must be accomplished 1) Validation that there13 is secure13 method to transfer data13 between team

members and that the method of storing data is also secure 2)13 development of13 a database schema and

3) development of means to easily upload data13 into the13 database13 and download data13 from the13 database

Of the three the most difficult task was to create a system in which the transfer and storage ofdata was secure Various proposals were made to13 ensure the security of the data but many of them

were rejected not because the proposal was actually unsecure but because it was not acceptable byeach of the13 team memberrsquos IT security policies The13 process to get each team membersrsquo IT department to approve the security of13 the data transfer13 and storage took more effort13 than anticipated In the endthe team members along with their13 respective IT departments approved the use of13 TRUcentrix as a tool to transfer13 data and the use of13 an ASM International13 hosted13 database to13 store data

The development of database schema13 is necessary step in creating the database to store all the data generated (via physical tests as well as data generated by models) A big help in creating the

database schema was to13 use the schema generated13 by Dr Steven13 Arnold13 from NASArsquos Glenn13 Research13 Center Using NASArsquos schema eliminated13 the need13 to13 start the database build13 from scratch13 and13 allowed13 us to13 focus o the specific needs of storing residual stress data

The last task involves the development of tools that make the input of data13 easy The moredifficult it is to13 enter data into13 a database the more likely data13 will not be13 entered Excel-shy‐based13 templates are being developed in order13 to make data entry as easy as possible

The two big lessons learned from PW-shy‐813 in relationship to data13 management are

bull It is essential13 to work with each13 companiesrsquo IT and IT Security people to develop a system thatsatisfies13 the security requirements of each company

16

13

bull Database schema development should not be reinvented but begin with already established the13 schema

Perspectives on MAI ProgramsDr Mike Glavicic Rolls Royce

The current formulation of MAI (AFRLrsquos Metals Affordability Initiative) programs are structured13 independently from one another in that a standard methodology does not exist that describes whatdata is to13 be archived13 at the conclusion13 of a program Nor does there exist a standard13 protocol of howto archive things on a server13 or some other computer resource at the conclusion13 of a program As aresult the current13 practices of13 MAI programs

bull Do not address IP (intellectual property)13 and licensing issues in systematic fashion which leadsto issues in the commercialization of ICME software that is generated13 under a program

bull Do not have a formalized requirement of how data is to be packaged and stored at the end of aprogram Hence even13 in13 the best of cases in13 which13 all of the data is stored the data is archived13 in13 simple folder13 fashion that13 is not13 searchable in any manner13 possible Moreover even if13 all of13 the

data is present it is extremely cumbersome to13 untangle the data for future use using a file folder strategy

Hence at the end of the program both the government and13 MAI team are13 left to manage13 the13 data and13 results in13 an13 unregimented13 fashion

The lack of regimented protocol of how to package programrsquos13 results13 (raw data and reports) makes it extremely difficult for new government programs to leverage the data from previous programs Case in13 point the RR-shy‐1013 program was developed with the13 singular goal to integrate13 the13 models developed13 in13 the LAD-shy‐213 program into commercial software13 platform From the13 outset of the13 program data models and13 other information13 that was required to attained the13 goals of the13 RR-shy‐1013 program were13 either difficult to track down lost or not provided to the13 RR-shy‐1013 team due13 to contractual disputes As aresult funding that13 was earmarked for13 specific technical work at13 the conception of13 the13 program had to

be diverted13 for the recreation13 of data that existed13 in13 a previous program and13 the progress of the

program was stunted13 by this unforeseen13 burden In13 spite of the RR-shy‐1013 programrsquos13 ability to overcome

these issues and become a successful program it13 should be noted that13 the success of13 other13 programs that13 will encounter13 similar13 problems will likely not13 apply as RR-shy‐1013 was very fortunate13 to have13 the13 means to overcome these serious issues that13 were a result13 of13 the lack of13 a data management13 plan

Furthermore if the13 MGI initiative13 is to progressively develop materials models that are13 leveraged and improved in future government funded programs a policy on how to store and share thedata from disconnected13 programs is required The PW-shy‐1113 program is the first13 step in this process as it13 will have the data from three MAI programs (RR-shy‐10 GE-shy‐10 PW-shy‐9) uploaded into database13 that will be13 searchable and will enable the reuse of the data collected in these programs However even if this13 program is a success13 there will undoubtedly13 be other features13 that will be required in the platform

developed13 and13 hence future modification13 will be required Most importantly there does not currently13 exist server that has been set up and funded for the13 archiving13 of all13 future programs with a protocol13

17

13

13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

that13 must13 be adhered to at13 the conclusion of13 a funded program Without13 such a protocol and the

existence13 of server to be13 used in all future13 programs the13 goals set forth by MGI will be13 unnecessarily

difficult and13 more expensive to attain

MAI Data Informatics Data Standards Modeling amp UQDr Rajiv Naik Pratt amp Whitney

The MAI Data13 Informatics Data13 Standards Modeling amp Uncertainty Quantification (UQ) (PW-shy‐11)13 effort13 will enable the pervasive implementation of ICME by developing (i) a web-shy‐based13 OEM-shy‐Suppliercollaboration framework13 for production13 certification13 and13 process modeling data (ii) data standards and

protocols for interoperability between13 OEMs and13 Suppliers and13 (iii)13 UQ methods to capture variability in

process microstructure and property data Models and data13 from earlier MAI programs such as GE-shy‐10 PW-shy‐913 and RR-shy‐1013 will be13 used to develop and implement OEM-shy‐Supplier integration data standards data

structures and UQ models Several OEMs and suppliers (Pratt amp Whitney General Electric Rolls-shy‐RoyceTIMET Alcoa13 and RTI International Metals Inc) are working together on the PW-shy‐11 Team

The OEM-shy‐Supplier collaboration tool will enable data13 exchange related to mechanical testingmicrostructural13 characteristics processing and process modeling The13 data13 standards developed by the

PW-shy‐1113 team will provide13 an industry standard for such OEM-shy‐Supplier collaboration The collaboration13 framework would provide a platform to manage OEM-shy‐Supplier13 workflows data and13 informationsecurity and access13 control and the13 ability to perform system administration without the13 need13 for day-shy‐to-shy‐day intervention13 of the IT department The capture of such13 supplier data over the collaboration

framework will enable the OEMs to enhance process and property13 models data analytics and UQ

modeling Three potential technologies will be evaluated13 for the OEM-shy‐Supplier Collaboration13 Frameworkndash GE Forge Vanderbilt Forge and Semantic Web technology and one13 technology will be selected13 for the

next phase of the program

18

13

Semantic Web technology is paradigm shift from highly structured13 databases to13 more flexible

graph databases from single database searches to13 multiple and13 disparate13 database13 searches and from

keyword searches to contextual and meaningful (semantic) searches It can provide flexibility13 of data

structures and enhanced OEM-shy‐Supplier13 interoperability However it requires more work upfront13 for13 implementation as it needs the development of a common vocabulary and relationships

The data13 standards and data13 structures effort will leverage work at national and13 international standards13 organizations such13 as ASM ASTM CEN and13 NIST and13 develop13 standards and vocabularies forproduction13 certification data that can include chemistry microstructure and mechanical properties This effort will also develop means for the capture of data needed13 for the UQ and13 ICME models andbuild13 databases for13 the capture of13 the data from the GE-shy‐10 PW-shy‐9 and RR-shy‐10 MAI programs

The ICME13 models being developed under the GE-shy‐10 PW-shy‐9 and RR-shy‐10 programs involve severalinterconnected models related to processing microstructure and mechanical properties The UQ effort under PW-shy‐11 will involve considering the uncertainty at each13 step13 of the modeling process and flowing it through the interconnected models to evaluate the relationships between13 the variabilities of13 the

processing parameters the microstructure and13 the mechanical performance

ASMrsquos Computational Materials Data Network amp the NIST-shy‐ASM Structural Materials Data

Demonstration ProjectScott D Henry ASM International

ASM International ndash the worldrsquos largest13 association of13 metals-shy‐focused materials engineers scientists technicians educators and students13 ndash has launched13 a series of initiatives under the

ldquoComputational Materials Data (CMD)13 Networkrdquo umbrella The objective13 of the13 CMD Network is to serve13 as center for information collection and dissemination for materials data13 to support integrated

computational materials13 engineering (ICME) and to help realize the goals13 of the US Materials13 Genome

InitiativeThe first major project for the CMD Network is the NIST-shy‐ASM Structural Materials Data

Demonstration Project (SMDDP) a cooperative research project funded by NIST with participation from

the Kent13 State Center13 for13 Materials Informatics and Granta Design Ltd13 The main13 objectives of SMDDP

are13 to

19

13

bull Establish well-shy‐pedigreed13 and13 curated13 demonstration13 datasets for non-shy‐proprietary metallic structural materials13 data over multiple length scales The team has13 chosen to work with 6061

aluminum and the13 related Al-shy‐Mg-shy‐Si system

bull Work with NIST and the materials data community to develop materials data schema and

ontologies for the demonstration13 datasets cognizant of broader interests and13 datasets

bull Develop and carry out a series of test problems that represent relevant use cases for the13 repository

bull Make data open to the materials data community for use in data analytics modeling and

educational purposes

bull Actively engage the materials data community and13 widely disseminate the findings from the

project

bull Develop and implement data13 capture13 and curation procedures that can serve13 as models for otherdata repositories

The basic philosophy behind the project is that often standards13 emerge through trial and error and examples are13 needed to spur this process

The project is using the NIST-shy‐KSU DSpace13 repository for data13 files The13 curated structured

database is being developed13 using GRANTA13 MI software and13 is hosted13 o ASM serversThe project ndashlaunched in late 2013 and planned as an 18-shy‐month program13 ndash is being conducted in

phases The13 first phase13 involved developing starter database13 and gaining experience13 in data13 curation The second phase (currently underway) involves building out more comprehensive data13 sets that will be

suitable for carrying out test problems13 related to processing-shy‐structure-shy‐property relationships in13 Al 6061 The third phase will involve making further refinements in response to testing The complete data13 set will be opened up online to the materials community as a whole

Globus13 Scientific13 Data Services13 Current and FutureBen13 Blaiszik1 Kyle Chard1 Jim Pruyne1 Rachana Anathakrishnan1 Steve Tuecke12 Ian Foster12

1University of Chicago Computation Institute 2Argonne National Laboratory MCS Division

During this talk we discussed a variety of currently available Globus services13 Via these Globusservices developers13 and IT administrators13 can offer simple secure and robust file transfer as13 well as13 identity profile and group management to their user communities13 Globus allows end users to create

and13 manage a unique identity that can13 be linked13 to13 external identities for authentication End13 users can13 also transfer files across wide13 area13 networks faster and more13 easily whatever their location

The talk also covered prototype services related to data13 publication metadata tagging and13 data

search Globus13 publishing capabilities13 are delivered through a hosted service Metadata is13 stored in the

cloud while raw published data and a fully-shy‐formed metadata log are stored on campus institutionaland group resources that13 are managed and operated by campus administrators Published datasets are

20

13

organized13 by communities and13 their member collections Globus users can13 create and13 manage theirown13 communities and13 collections through13 the data publication13 service13 with collection level policies regarding user13 access Additionally metadata from these communities are imported into a centralized

index to allow for robust searching functionality

Semantics13 for Deep13 InteroperabilitySamuel Chance iNovex

Semantic technology is based on mature13 web standards and intended to be13 used as global framework for13 machine-shy‐to-shy‐machine communication Interoperability is enabled even when the more

basic elements of the semantic technology stack are used13 in13 conjunction with a controlled vocabularyKnowledge13 representation is enabled as one13 begins to utilize13 the13 ldquohigherrdquo elements of the13 stack

brief history of the semantic web13 was presented13 and13 started13 with13 a depiction13 of the web13 that most people see today documents and web pages presented to the user13 through a web application It13 then depicted a transition to an intermediate state where documents and data can co-shy‐exist on asemantic13 web riding atop the same infrastructure (http) Even in this13 state the data can13 be directly

interpreted by a machine13 The history concluded with a depiction of data being extracted from adocument to13 become part of the web13 of data

The basics of semantic technology were presented and emphasized that the technology has afoundation based13 o formal logic that enables software13 to understand information and reduce13 the13 need

for13 a human to interpret13 the meaning This section of13 the presentation concluded with a chart13 depicting

how ldquotriplesrdquo link to13 each13 other to13 form a web13 of dataThe final section touched upon adoption of13 and things to consider13 when semantic technologies

are13 being considered as solution for community or enterprise

21

13

Big Data13 and Semantic TechnologiesHarlan Shober RJ Lee Group

The presentation started with a description of entities that13 comprise laboratoryinformatics with the central elements being a federated architecture scientific ldquoBig Datardquo management13 and workflow-shy‐driven Laboratory Information Management13 Systems (LIMS)To aid attendee understanding initial context13 was provided by first13 defining for ldquoBig Datardquo and

Semantic Technologies Context13 was further bolstered through a description of a problem that13 included laboratory testingcharacterization and fabrication processes The challenges for atypical laboratory include the use of many systems and the creation of large amounts of data13 that13 grows daily but13 cannot13 be readily accessed

After the context13 was established the needed functionality was described and consistedof several elements contribute index search analyze and archive While bearing in mind theneeded functionality a string of universal scientific data13 management13 process categories werethen presented with the intent13 of applying the universal description to the materials design anddevelopment13 domain

22

13

Though semantic technologies are applicable over the entire lifecycle of the data13 management13 process three categories were specifically identified as primary candidates fortheir application Data13 Acquisition Data13 Manipulation and Storage and Discovery andAdvisory W3C standards for Resource Description Framework (RDF) were applied to a notionalexample with the intent13 of completing a thread that13 gave the attendee a sense of how thistechnology could be applied

The presentation continued with a description of Postulate a specific solutiondeveloped by RJ Lee Group that13 could be used as part13 of a suite of tools to enable the captureof data13 and the near-shy‐frictionless tagging of data13 with additional observations provenanceinformation or links to additional data Using tools like Postulate repositories could be moreeasily shared with others in a federated manner These technologies would enable thedevelopment13 of indices that13 increase the precision of search results and ease analytics since thedata13 would be ldquoself-shy‐describedrdquo

In the remainder of the presentation Postulate functionality was further described andsome of the challenges presented earlier were addressed For example how do you processthe quantity and variety of data how do you store it how do you find the data13 you care aboutand how can the data13 be presented to you in a meaningful way

Natural13 Language Processing (NLP)Tim Hawes Decisive Analytics

The presentation provided the attendees with a description of NLP13 and how it can be13 used It started with examples13 of how it can be used to mine ldquoBig Datardquo to predict credit card fraud or assess13 user purchase histories for product development NLP is also13 being used13 in13 the field13 of biology and13 appears to have13 enjoyed some13 level of success

Since13 the13 biology and materials communities both publish large13 volume13 of documents (abstracts papers presentations etc) many of13 the NLP approaches developed for13 biology may be

applicable13 to materials The next section of the presentation focused on describing various aspects of NLP tokenization

part-shy‐of-shy‐speech tagging disambiguation parsing entity disambiguation and role13 labeling This was helpful in13 removing the mystery surrounding NLP for those13 who hadnrsquot had previous exposure

Typical products from NLP13 would be useful to the materials development community Examples include providing the gist of a documentrsquos content providing answers to natural13 language questions and knowledge13 construction

Specific NLP13 challenges for the13 materials domain were13 identified inline13 use13 of equations prevalence of tables graphs and13 images (more of a computer vision13 problem) and13 numerically driven13 information13 Additionally the ldquogenrerdquo problem was discussed as13 well as13 a way this might be addressed

using raw data (versus learning data) and13 semi-shy‐supervised learning active learning and distant supervision approaches

23

13

The presentation concluded by reiterating how NLP13 can be used and the value of using it describing some its limitations and13 identifying tool development needed13 for the materials domain

24

13

Community13 and13 Government Priorities in13 Addressing13 Materials Data

The following numbered13 priorities for13 both community and government13 actions were lsquorolled uprsquofrom workshop participantsrsquo inputs and were reviewed at13 the workshop The raw inputs provided by

participants are captured13 under each13 numbered13 priority and provide context to13 the priority and13 often13 more specific actions

Community-shy‐Led13 Priorities1 Develop13 and deploy13 standards13 for data and federatedcollaborative environments-shy‐ Promote13 standards for data13 and collaboration environments-shy‐ Designing and deploying suitable software environments that are capable of facilitating e-shy‐

recording and accelerating (fail often-shy‐fail fast)13 of13 the communityrsquos experience (both successes and

failures)13 in exploring integrated workflows needed to realize the vision of ICME and MGI-shy‐ Data13 standards (program managementtechnical datametadata)-shy‐ Community must start developing a common13 data format data package could13 look like an13

HDF5 file and be shared in a variety of systems -shy‐ Begin13 definition13 of standardized13 infrastructure

-shy‐ Produce13 draft guideline for the metadata including information regarding authorship provenance targethellip

-shy‐ Share13 the13 data13 types being collected with the13 community Include13 nametypeunits of data

-shy‐ Community must adopt a federated13 approach13 -shy‐ allow organizations to autonomously create their13 own13 toolsets but be able to13 readwrite to13 the common13 data package mentioned13 in13 point 1

-shy‐ Good documentation on repositories are important-shy‐ Address networking of repositories (lightweight metadata databaseindex) -shy‐ Digital heterostructure microstructure mapping algorithms for both scanning electron microscopy

(SEM)13 and transmission electron microscopy (TEM) -shy‐ Materials process model (thermodynamics) integration with ICMSE (Integrated Computational

Materials Science and Engineering) data flow

-shy‐ Data sharing protocol across users (government13 OEM13 developers) with appropriate securitycontrol

-shy‐ Work with instrumentequipment OEMs to get auto ingest capabilities and open architectures-shy‐ Work together to define common vocabularies for interoperability-shy‐ Standard tests for ICME13 model data13 standards for microstructure13 data

-shy‐ Standards for ICME13 model data13 storage13 and workflows 2 Communicate the value and need13 for digital materials13 data-shy‐ Brand13 management ndash consistent messaging of goals13 and beliefs -shy‐ Share13 success stories -shy‐ Value (ROI13 Return on Investment) o data capture (globally locally)-shy‐ Give incentive ($$$) for case studies to provide evidence that a shared data repository can be

actually useful beyond the archival purpose

25

13

-shy‐ Share13 ROIs for MII (the13 Materials Innovation Infrastructure)-shy‐ ICME projects due to their iterative nature take longer and are more expensive to realize benefits

in industry implementation13 Multi-shy‐year ICME programs are becoming13 a hard sell to management difficult to13 establish ROI and13 realize benefits claimed

3 Engage the community-shy‐ Write grant proposals-shy‐ Collaboration13 (or e-shy‐collaboration) science that will help us13 identify13 the specific13 incentives13 and

impediments for establishing mutually beneficial highly productive sharing of expertise between13 disciplinary experts in13 materials manufacturing and13 data sciences

-shy‐ Continue to13 work o hard13 problemsmdashdevelop13 open13 source codes discover new descriptors data

mining uncertainty analysis-shy‐ Organize as a community (governmentindustryacademia nationalinternational) -shy‐ Follow data13 recommendations in MGI strategic planmdashworkshops to develop roadmap-shy‐ Attend13 more events like the MGI workshop13 mdash gaining13 lessons learned from other organizations is

extremely valuable

4 ldquoDefine critical datardquo13 to be13 compiled-shy‐ Consensus o what fundamental genomic data is most-shy‐important to compile (thermodynamics13

kinetics13 properties [strength]13 high-shy‐throughput13 DFT) -shy‐ Agree to13 the concept of a ldquouniversal repositoryrdquo for all genomic data that everyone will contribute

to and use (cf everyone maintaining and hosting their13 own databases) this includes issues of13 data

security and sharing etc -shy‐ List of most-shy‐important ICMEMGI13 materials development activities from an industrycommercial13

perspective (alloys composites piezoelectrics etc)-shy‐ Some13 ongoing repository efforts seem redundant We13 should avoid unnecessary competition and

try not13 to step o other peoples toes5 Explore and leverage data solutions developed13 outside materials13 community-shy‐ Semantic web infrastructure13 ndash evaluate13 feasibility of lsquounstructured materials datardquo management

(Technical pathcosttimeframe) -shy‐ Keep communicating outside13 the13 community about solutions -shy‐ Focus on available13 tools open ones when possible

6 Train13 students-shy‐ Sponsor students13 who can think in terms13 of IT and MSE

-shy‐ Training workshops (through professional societies) for training the new generation of materials students13 and researchers13 in the use of readily (already) accessible data science tools13 and cloud

services13 to make13 the13 best use13 of this emerging13 infrastructure13 in enhancing13 their own individual day-shy‐to-shy‐day productivity

7 Establish community13 norms13 for publishing materials13 data-shy‐ Involve publishers by requiring data when publishing Begin by making just the data available first

and then work on standardization panel

26

13

-shy‐ Data associated with publications should be shared along with the publications Not only the

metadata (eg average grain size vs cooling rate) but also the source of the data (eg micrographs used13 to13 generate13 grain size temperaturetime charts13 used to calculate cooling rates etc)

Government-shy‐Led13 Priorities1 Lead13 data strategyapproach13 development-shy‐ Hold meeting like this one in 6-shy‐1213 months with purpose13 of developing roadmap for MII-shy‐ Define a technical roadmap for13 the materials data infrastructure

-shy‐ Governmentrsquos got to decide on a strategy-shy‐ Whatrsquos the critical nucleus eg getting13 CALPHAD off the13 ground

-shy‐ Government must establish an official stance on security and infrastructure-shy‐ List of most-shy‐important ICMEMGI materials development activities from a government perspective

(alloys composites piezoelectrics etc)13 -shy‐ Needs assessment for professional workforce development-shy‐ Work with industry to make sure the research-shy‐to-shy‐engineering13 gap is filled What is the practical13 use

of the projects -shy‐ Provide13 the13 materials community organizing function

-shy‐ Business case for productivity improvement with13 data readily available

-shy‐ Facilitator as honest broker of capability awareness and data13 sharing

-shy‐ Interdisciplinary coordinated effort13 to avoidminimize duplication and redundancy

2 Facilitate data deposition (policy amp technology)-shy‐ Setting expectations and guidelines for best practices and standards where13 relevant -shy‐ Invest in tool13 making data management as easy as possible-shy‐ Promote standards developmentcommon data structuresshare knowledge representations -shy‐ Learn about the WBC Semantic Web capabilities and limitations -shy‐ Government should foster community development for both data package refinement and creating

standard approach to creating an enterprise toolset -shy‐ Provide13 motivation for standard data13 collection ensure13 it includes pedigreeprovenancemeta-shy‐data

-shy‐ Facilitator for materials data13 standards (good and trustworthy data13 vs bad data) -shy‐ Facilitator for forming consortium for data13 mining data13 warehousing and sharing

-shy‐ Facilitate13 industry standards for ICME13 model validation and maturity levels-shy‐ Facilitate13 standards for model-shy‐based13 process and13 supplier qualification

3 Incentivize data deposition-shy‐ Provide13 motivation for13 standard data collection ensure it13 includes13 pedigreeprovenancemeta-shy‐data-shy‐ Push ldquodata13 management planrdquo requirements to ldquodata13 depositrdquo requirements -shy‐ Make data sharing a requirement -shy‐ Contractual requirements that preserve data for future use

-shy‐ Demand data submission as a deliverable that will be in the contract before paying the funds -shy‐ Require that any data generated13 by federal grants is provided13 to13 the government (or a designated13

repository)13 within 3 years after completionmdashstart with basic13 research

-shy‐ Demonstrate the return on investments in the materials data infrastructure

27

13

4 Support13 data sharing13 and deposition13 (provide resources)-shy‐ Increase of MGI13 funds dedicated to MII13 development to 10-shy‐-shy‐repositories amp collab platforms -shy‐ Commit to13 long-shy‐term support for data repositories amp collaborative platforms-shy‐ Funding for development collection of fundamental genomic data

-shy‐ Support thermodynamicskinetics database development for light-shy‐weight alloy systems-shy‐ Funding for universal data13 repository (eg at NIST) ndash creation and maintenance

-shy‐ Fund or establish data13 repositories with commonuniform data13 collection format -shy‐ Createidentify secure data sharing site (like AMRDEC)13 for13 government13 contractorsmdashnot a

repository

-shy‐ AFRL take lead13 o making a database that can13 be13 shared

-shy‐ Infrastructure for materials related semantic web applications5 Support13 data creation-shy‐ Support experimental measurement effort to provide13 quality data for13 database development -shy‐ Support more13 high-shy‐throughput13 (HT)13 studies13 for applied programs13 to populate data Current HT

approaches support are13 limited to too fundamental single-shy‐phase DFT calculations 6 Enhance data management13 competency-shy‐ Government needs to increase in-shy‐house expertise in13 these areas or alternatively work closer with13

groups like NIST We cannot allow other organizations to define our needs for us7 Provide quality datasets13 for model13 developmentvalidation-shy‐ Need for open model benchmark cases with data to be available to the community

28

13

Attendees

John Allison University of MichiganJohn Beatty Army Research Laboratory

Ben13 Blaiszik University of ChicagoANL Carrie Campbell NIST

Sam Chance13 InovexJennifer13 Fielding Air Force Research LabMarco Fornari Central Michigan

Tim Hawes Decisive Analytics Corp

Lou Hector General MotorsAndrea Helbach Air Force Research LabKris Hill ITI-shy‐GlobalMike Glavicic Rolls Royce

Ro Gorham NCDMM

Dennis Griffin NASA

Mike Groeber Air Force Research LabScott Henry ASM InternationalMatt Jacobsen Air Force Research LabWill Joost Dept of Energy-shy‐EERE

Surya13 Kalidindi Georgia Tech

Kevin Kendig Air Force Research LabBen13 Leever Air Force Research LabLara Liou GE Aviation

Kenny Lipkowitz Office of Naval Research

Bill Mullins Office of Naval Research

Benji Maruyama Air Force Research LabRajiv Naik Pratt amp WhitneyJim Neumann Honeywell Ruth13 Pachter Air Force Research LabClare Paul Air Force Research LabAmra Peles United13 Technologies Research13 CenterAndy Powell GE Aviation

Brian13 Puchala University MichiganAjit Roy Air Force Research LabAyman13 Salem Materials ResourcesTom Searles Materials Data Management Jason Sebastian QuestekRichard13 Serwecki NASA

Dongwon Shin Oak Ridge13 National Labo

29

13

Harlan Shober RJ LeeJeff13 Sheehy NASA

Jeff13 Simmons Air Force Research LabVeera Sundararaghavan University of MichiganSesh Tamirisakandala RTI Glenn Tarcea University of MichiganTJ Turner Air Force Research LabVasisht Venkatesh Pratt amp WhitneyChuck Ward Air Force Research LabJim Warren Natl Institute of Standards amp TechJeff13 Williams GE Aviation

Terry Wong Rockedyne

Wayne Ziegler Army Research13 Laboratory

Jake Zindel Ford Motor Company

30

Page 15: Materials Genome Initiative: materials data · universallyaccepted best practices. Under the MGI, Federal agencies have funded research and development designedtoexplore, develop,

13

Motivators for e-shy‐Collaborations13 e-shy‐Collaborations are central to13 enhancing the productivity ofmaterials development activities in several ways

bull individual13 research groups can identify and engage13 complementary expertise13 needed for their projects

bull organizations can13 coordinate multiple teams and13 multiple projects to13 maximize outputimpactbull organizations can13 identify and13 seek the right partners that13 enhance their13 individual and combined

value e-shy‐Collaborations can13 be open semi-shy‐open or closed13 depending o the needs of the specific organization Productivity enhancement leading to dramatic reductions in cost and time13 expended in materials development is the13 main incentive13 for e-shy‐collaborations

Main Impediments to e-shy‐CollaborationsLack13 of tools to identify13 the13 ldquobestrdquo13 collaborators

bull Need software tools that quantify more precisely the expertise of individuals groupsorganizations

bull Need software tools for13 allowing rich annotations and e-shy‐recording of13 discussions Need analytic tools for13 identification extraction13 and management of metadata (ie13 high13 value abstractionsfrom the raw data)

bull Need searchable semantic metadata databases (these are13 lightweight and can be13 centralized

and customized for specific13 user communities) bull Need software tools for tracking provenance (ie13 whowhatwhenhow13 generated modified

etc) Lack13 of experience13 with the13 use13 of integrated workflows

bull Need software tools13 for e-shy‐recording of13 workflows and development of searchable workflow

databases bull Need recommendation schemes that suggest the best workflow(s) based on recorded prior

experience this provides means to transfer expertise13 acquired by an organization to other teams and other13 projects (even in the same organization)

bull Leads to possible automation and dramatic enhancement of productivity

MINED Grouprsquos Past Experience13 (~2005 -shy‐ ~2008) Funded by the Army Research Office ARO(David Stepp) engaged computer science (CS) graduates to produce13 a single13 user platform that13 integrated certain software codes previously produced13 by our group software development subsequently abandoned but recruited two CS graduates into PhD programs in Materials Science andEngineeringMechanical Engineering (MSEME)13 which dramatically altered the direction of13 research in

the group brief chronology of development bull Senior Design Project (2011-shy‐2012) An online13 portal designed for sharing data13 files and tools

learned the importance of user interfaces and the need for professional13 code developers andsoftware architects

bull HUBzero Instantiation (2012-shy‐2013) Built hub by customizing HUBzero for hierarchical structuralmaterials there are several advantages in terms of accessing and executing codes13 (on the server side) but there are significant barriers13 to establishing fast and productive13 cross-shy‐disciplinary e-shy‐collaborations

11

13

bull GTRI Integration Platform (2013-shy‐2014) Engaged professional code13 developers and software13 architects to design13 a multi-shy‐user web13 portal that allows data and13 code sharing while tracking

provenance and workflows underestimated the resources needed for such a project bull MATIN Collaboration Environment (2013-shy‐2014) Lightweight integrated software13 shell that allows

users to13 seamlessly interface with13 numerous cloud13 services offered13 through13 the web13 through13 their13 APIs Examples include GitHub for versioning and archival of codes Authorea forcollaborative editing of documents figshare for citable publication of research Plotly forcollaborative data analysis13 and visualization Google+ and LinkedIn for eteaming13 and networking MATIN focuses on building an emergent community ndash Materials amp Manufacturing Informatics (MMI) MATIN provisions web services for13 software integration platforms and innovationinfrastructure Examples of MATIN13

bull Spatial Statistics httpsgithubcomtonyfastSpatialStatisticsFFT

bull PyMKS httpopenmaterialsgithubiopymksindexhtml bull Example discussion about periodic boundary conditions in running Finite Element (FE or FEM)

simulations httpsgithubcomsfepysfepyissues267

In the platform developed it is noted that the user or userrsquos organization can set permissions at the

level13 of individual datasetscodes o the degree and13 type of13 sharingCentral Challenge Activation The transition from traditional disciplinary workflows to the

integrated workflows needs to overcome an activation barrier There is no13 clear incentive to13 get ahead13 of the curve in13 anticipating how the upcoming transformations in how materials development will be

pursued13 in13 the future Open (reproducible) science alone is not a sufficiently strong motivator for materials development community with13 strong interests in intellectual13 property development13 indeed itmight actually13 be perceived13 as a negative There is need to explore13 and refine13 what incentives work for individual13 researchers (eg increased productivity)

Future13 Plans13 Continue to13 develop13 and13 deploy MATIN at GT and any other organizationsgroups that13 express an interest in working with us Design create and distribute13 suite13 of ldquomobile13 firstrdquo Appsdesigned13 to13 make data ingest into13 collaboration environments such as MATIN from both material characterization equipment and multi-shy‐scale models an enhanced13 user experience with13 friendly andnatural interfaces Proactively promoteincentivize13 datametadata13 ingest into publicly shareddatabases Work with13 professional societies to13 announce and run new competitionsawards for13 recognizing and rewarding good practices by the members of the broader materials innovation13 usercommunity (eg Best13 Documented Data Award Best Documented Tool Award Best Documented

Workflow Award) Develop and deliver new workshops for training the new generation of13 graduate

students postdocs and junior faculty in the13 emerging workflows of Materials amp ManufacturingInformatics

High Performance Cast Aluminum Alloys for Next Generation Passenger Vehicle EnginesDongwon Shin Materials Science and Technology Division Oak Ridge National Laboratory

12

13

Oak Ridge National Laboratory (ORNL) teamed with Chrysler and Nemak13 to develop next-shy‐generation high-shy‐performance cast aluminum alloys that have13 improved castability high-shy‐temperature

strength and fatigue13 performance At this point the13 data management practice is limited13 to13 analytical data spreadsheets other than13 the CALPHAD computational thermodynamic databases and13 proprietary

property databases for casting simulation As the project goes on more robust data schema will bedeveloped13 to13 store various types of data populated13 from the experiments and13 calculations

Besides the data structure design the speaker discussed13 the necessity of a new approach13 to13 identifying key descriptors from the large dataset that can guide the design of new materials An alloydesign13 requires the balance of extremely complex relationships among a large number of descriptors Some13 examples of these13 descriptors include13 phase fraction13 of key intermetallics lattice parameters ofthe matrix and diffusion13 kinetics13 of solute atoms However optimizing these quantities relies on good

physical metallurgy intuition13 coupled13 with13 clever experiments dogged13 sleuthing and13 a healthy dose ofluck13 Such descriptor-shy‐based predictions of materials properties date back to13 the beginning13 of modern

solid-shy‐state chemistry and physics As13 shown in the Darken-shy‐Gurry map to predict the solubility ofelements in Tantalum13 intuitive descriptors13 such as electronegativity and radius13 could be used formaterials design However as the chemistry and physics13 of materials become more complex intuition-shy‐based13 search13 may not work anymore Thus the use of more advanced13 statistical analysis and13 datamining approach might be required to accelerate the design of complex multi-shy‐componentmulti-shy‐phase

alloy design13

Data Related to Cast Aluminum

Jake Zindel Ford Motor Company

The main focus of our research is the development of CAE (Computer-shy‐aided Engineering) tools to support13 two significant13 activities in the company component13 design (product13 development)13 and

manufacturing engineering (high-shy‐volume production) A key13 part of our research is conducting13 experiments to collect data13 for the13 development and validation of13 the CAE tools

The data13 we collect consists of mechanical properties physical properties microstructuralimages and manufacturing process parameters that form the13 pedigree13 of the13 material This information

can be in the form of Excel files proprietary-shy‐software storage files screen shots and written reports Our immediate need is to deal with this large amount of existing information collected over

many years on a ldquolocalrdquo level The first step is to organize many independent collections of informationin all13 their forms13 The next issue is the time required to enter all13 this information into the database13 Maintenance may also require a significant amount of effort Lastly there will also13 be a need13 to13 manipulate the data contained in the database using Excel Mini-shy‐tab MATLAB etc Therefore links or13 easy exportation of data13 to these13 types of applications will be13 necessary

The next level would involve dealing with information that has been created or collected byothers From an13 OEM (Original Equipment13 Manufacturer)13 perspective this would13 involve convincing

suppliers13 to share processing information that they may consider proprietary In addition to suppliersuniversities and13 national laboratories are also13 sources of data Data from all the13 sources will require13 some sort of protocol to determine its13 quality and resolve inconsistencies

13

13

In the end13 it seems likely that considerable budget and time will be required to implement comprehensive informationdatabases which13 could be the largest hurdle13 of them all

QuesTekrsquos MGIICME Approach to Alloy Optimization and Design (Including Cast Iron) Jason Sebastian QuesTek Innovations LLC

Integrated Computational Materials Engineering (ICME) methods involve the holistic application13 of different computational models across various length13 scales to13 the design development and13 rapid13 qualification13 of advanced13 materials Using itsMaterials by Design technologies QuesTek has successfully applied these ICME methods to13 the design development13 and full qualification of several advanced alloys (including aerospace13 structural steels aerospace13 gear steels advanced castable13 titanium alloys high-shy‐strength corrosion-shy‐resistant13 aluminum alloys13 and high-shy‐performance cast iron)

QuesTekrsquos ICME methods benefit tremendously from the activities of the US Materials GenomeInitiative (MGI)13 Recognizing that the ldquo10 to13 2 yearsrdquo that13 it takes for alloy design and development ishindering the rapid13 deployment of higher-shy‐performance materials13 the MGI is focused on ldquocutting in halfrdquo the time and cost13 to design and deploy novel advanced materials needed to enhance and sustain UScompetitiveness

QuesTek also has ongoing13 activities on the13 application of ICME methods to the13 design and

development of materials for the energy sector including advanced13 materials that enable vehicle

lightweighting13 Of particular note is an ongoing effort with a major diesel13 engine manufacturer on the13 development of high-shy‐performance cast iron13 alloys for engine block applications QuesTekrsquos efforts have

made extensive use of various MGI-shy‐type tools and databases including 1)13 density functional theory

methods for high-shy‐throughput13 screening of cast iron inoculant stabilities (structure lattice parameterstability of various13 oxidesulfide phases) 2)13 liquid iron mobility databases (to help accurately modelsolidification phenomena) and 3) thermodynamic13 databases13 (to examine nano-shy‐precipitate stability)

Moving forward to continue to accelerate the materials development cycle efforts shouldremain focused on the development13 of13 the fundamental databases (thermodynamic kinetic etc)13 that13 enable13 ICME models There13 should also be13 increased focus o efforts related13 to13 the integration13 of the

various computational databases models and tools (the ldquoIrdquo13 in ICME)

Integrated Computational13 Materials Engineering (ICME) of Generation Three Advanced High Strength

Steels Louis G Hector Jr General Motors RampD Center

While traditional material development methods often involve laborious trial-shy‐and-shy‐error testing Integrated Computational13 Materials Engineering (ICME) which falls under the umbrella of the Materials Genome Initiative is offering a means13 for materials13 development under a more compressed time scale Advanced13 high13 strength13 steels (AHSS) which13 contain13 multiple phases (eg ferrite bainite pearlite austenite martensite) and may exhibit13 phase transformation with strain are fertile ground for13 ICME However prediction of macro-shy‐scale constitutive behavior based upon the multi-shy‐scale physical chemical and mechanical phenomena13 in these13 complex microstructures is formidable challenge for ICMEUnprecedented collaboration between universities industry and government labs will be required to

14

13

address AHSS13 development archiving of data and the13 implementation of these13 materials into products that13 benefit13 the American consumer This presentation began with a brief overview of a newDOEUSAMP (US Automotive Materials Partnership)-shy‐funded ICME program aimed at13 the development13 of Generation13 Three (Gen13 3) AHSS The lack of commercial Gen13 3 steels that meet DOE targets is offering13 significant challenges13 to this13 new ICME program The program is13 meeting these challenges13 following four thrusts (1)13 making Gen 3 steels to meet DOE targets (2) passing experimental measurements of key

properties to13 an13 ICME model that produces microstructure-shy‐based13 constitutive models (3) materialevaluation in forming13 and vehicle13 performance13 simulations (4) archiving13 all relevant data13 for easy use13 by

others in13 the program Although13 the Gen13 3 AHSS ICME program is ldquomission13 orientedrdquo it demonstrates an important role for universities in providing the necessary foundation in fundamental materialsscience and engineering

Integrated Computational13 Methods for Composites Materials (ICM2)13 Lessons Learned from

Integration Planning and Feasibility Demonstrations Lara Liou GE Aviation

General Electric Aviation and Lockheed Martin Aeronautics have teamed through an Air Force

Research13 Lab13 program to13 conduct a demonstration13 of an13 Integrated13 Computational Materials Engineering digital framework that links material processing property and13 structural relationships to13 account for processability manufacturability and system performance13 with the13 goal of demonstrating

that13 the usage of13 integrated models can contribute to future airframe and engine designs with dramatic reductions in development13 time and cost Both engine and airframe Foundational Engineering Problems will be demonstrated through a common digital framework to be developed and validated onincreasingly complex articles13 The modeling tools that13 cover composite process models localized13 mechanical behavior models and global design models (Convergent Manufacturing TechnologyrsquosCOMPRO CCA Autodeskrsquos Autodesk Simulation13 Composite Analysis and13 an13 Abaqus based13 University ofMichigan micromechanics model) are centrally linked via a commercially available model integration

interface (Phoenix Integrationrsquos ModelCenterreg)13 Detailed plans of what data will13 be passed among thevarious models as well as how it will be passed have been completed Creation of these detailed13 process maps and13 simple feasibility demonstrations have raised13 inherent technical and13 logistical challenges13 to integrating commercially13 available multi-shy‐scale models Robust model integration methods13 to address these challenges were13 discussed

The exercise of drawing up detailed process maps for what to integrate and how to integrate13 the selected tools and models resulted in key early lessons learned and identification of13 tools critical tolong-shy‐term robust and sustained application of13 Integrated Computational Methods for Composites Materials (ICM2)13 The key lessons learned include

bull Individual13 models must be relatively mature and validated before an ICM2 problem can be

confidently13 defined bull F analysis-shy‐based13 tools within13 the same digital thread should utilize13 the13 same13 FE solver bull The team ideally13 should consist of at least one expert for each13 tool and13 its underlying model as

well as an expert in the digital thread and integration interface tools and programming

15

13

bull Mesh interpolation between13 finite element analysis based13 tools will likely be required and13 acommercially13 available tool for this13 type of composite mesh interpolation is13 key13 for robust and

sustained application of ICM2

Residual Stress Engineering of Rotating Nickel Components Terry Wong13 Aerojet Rocketdyne

The goal for the PW-shy‐8 project13 The Nickel Base Superalloy Residual Stress Foundational Engineering Problem program aims to13 develop13 a multi-shy‐disciplinary analysis approach13 to13 predict and13 incorporate bulk residual13 stress in the13 design of component This will allow for design of part that is optimized for a particular application or design intent (such13 as weight performance or reduction13 ofscrap) To accomplish this13 goal the PW-shy‐813 team has two major tasks 1) the13 development of an13 infrastructure and associated tools needed to predict and incorporate bulk residual13 stress in the designstages13 of a part and 2) a demonstration task that will validate the infrastructures13 and tools13 developed

The Data13 Management task is the foundational element13 of13 the larger13 infrastructure

development task The goal for the data management work is to13 develop13 an13 infrastructure that allowsthe various team members in PW-shy‐813 to share13 data13 and to store13 data13 in database This not only ensures that data generated13 by this program is never lost but also13 allows for data to13 be transferred13 in13 a secure

manner In order to meet the goal of a system13 that securely stores and shares data three major tasks must be accomplished 1) Validation that there13 is secure13 method to transfer data13 between team

members and that the method of storing data is also secure 2)13 development of13 a database schema and

3) development of means to easily upload data13 into the13 database13 and download data13 from the13 database

Of the three the most difficult task was to create a system in which the transfer and storage ofdata was secure Various proposals were made to13 ensure the security of the data but many of them

were rejected not because the proposal was actually unsecure but because it was not acceptable byeach of the13 team memberrsquos IT security policies The13 process to get each team membersrsquo IT department to approve the security of13 the data transfer13 and storage took more effort13 than anticipated In the endthe team members along with their13 respective IT departments approved the use of13 TRUcentrix as a tool to transfer13 data and the use of13 an ASM International13 hosted13 database to13 store data

The development of database schema13 is necessary step in creating the database to store all the data generated (via physical tests as well as data generated by models) A big help in creating the

database schema was to13 use the schema generated13 by Dr Steven13 Arnold13 from NASArsquos Glenn13 Research13 Center Using NASArsquos schema eliminated13 the need13 to13 start the database build13 from scratch13 and13 allowed13 us to13 focus o the specific needs of storing residual stress data

The last task involves the development of tools that make the input of data13 easy The moredifficult it is to13 enter data into13 a database the more likely data13 will not be13 entered Excel-shy‐based13 templates are being developed in order13 to make data entry as easy as possible

The two big lessons learned from PW-shy‐813 in relationship to data13 management are

bull It is essential13 to work with each13 companiesrsquo IT and IT Security people to develop a system thatsatisfies13 the security requirements of each company

16

13

bull Database schema development should not be reinvented but begin with already established the13 schema

Perspectives on MAI ProgramsDr Mike Glavicic Rolls Royce

The current formulation of MAI (AFRLrsquos Metals Affordability Initiative) programs are structured13 independently from one another in that a standard methodology does not exist that describes whatdata is to13 be archived13 at the conclusion13 of a program Nor does there exist a standard13 protocol of howto archive things on a server13 or some other computer resource at the conclusion13 of a program As aresult the current13 practices of13 MAI programs

bull Do not address IP (intellectual property)13 and licensing issues in systematic fashion which leadsto issues in the commercialization of ICME software that is generated13 under a program

bull Do not have a formalized requirement of how data is to be packaged and stored at the end of aprogram Hence even13 in13 the best of cases in13 which13 all of the data is stored the data is archived13 in13 simple folder13 fashion that13 is not13 searchable in any manner13 possible Moreover even if13 all of13 the

data is present it is extremely cumbersome to13 untangle the data for future use using a file folder strategy

Hence at the end of the program both the government and13 MAI team are13 left to manage13 the13 data and13 results in13 an13 unregimented13 fashion

The lack of regimented protocol of how to package programrsquos13 results13 (raw data and reports) makes it extremely difficult for new government programs to leverage the data from previous programs Case in13 point the RR-shy‐1013 program was developed with the13 singular goal to integrate13 the13 models developed13 in13 the LAD-shy‐213 program into commercial software13 platform From the13 outset of the13 program data models and13 other information13 that was required to attained the13 goals of the13 RR-shy‐1013 program were13 either difficult to track down lost or not provided to the13 RR-shy‐1013 team due13 to contractual disputes As aresult funding that13 was earmarked for13 specific technical work at13 the conception of13 the13 program had to

be diverted13 for the recreation13 of data that existed13 in13 a previous program and13 the progress of the

program was stunted13 by this unforeseen13 burden In13 spite of the RR-shy‐1013 programrsquos13 ability to overcome

these issues and become a successful program it13 should be noted that13 the success of13 other13 programs that13 will encounter13 similar13 problems will likely not13 apply as RR-shy‐1013 was very fortunate13 to have13 the13 means to overcome these serious issues that13 were a result13 of13 the lack of13 a data management13 plan

Furthermore if the13 MGI initiative13 is to progressively develop materials models that are13 leveraged and improved in future government funded programs a policy on how to store and share thedata from disconnected13 programs is required The PW-shy‐1113 program is the first13 step in this process as it13 will have the data from three MAI programs (RR-shy‐10 GE-shy‐10 PW-shy‐9) uploaded into database13 that will be13 searchable and will enable the reuse of the data collected in these programs However even if this13 program is a success13 there will undoubtedly13 be other features13 that will be required in the platform

developed13 and13 hence future modification13 will be required Most importantly there does not currently13 exist server that has been set up and funded for the13 archiving13 of all13 future programs with a protocol13

17

13

13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

that13 must13 be adhered to at13 the conclusion of13 a funded program Without13 such a protocol and the

existence13 of server to be13 used in all future13 programs the13 goals set forth by MGI will be13 unnecessarily

difficult and13 more expensive to attain

MAI Data Informatics Data Standards Modeling amp UQDr Rajiv Naik Pratt amp Whitney

The MAI Data13 Informatics Data13 Standards Modeling amp Uncertainty Quantification (UQ) (PW-shy‐11)13 effort13 will enable the pervasive implementation of ICME by developing (i) a web-shy‐based13 OEM-shy‐Suppliercollaboration framework13 for production13 certification13 and13 process modeling data (ii) data standards and

protocols for interoperability between13 OEMs and13 Suppliers and13 (iii)13 UQ methods to capture variability in

process microstructure and property data Models and data13 from earlier MAI programs such as GE-shy‐10 PW-shy‐913 and RR-shy‐1013 will be13 used to develop and implement OEM-shy‐Supplier integration data standards data

structures and UQ models Several OEMs and suppliers (Pratt amp Whitney General Electric Rolls-shy‐RoyceTIMET Alcoa13 and RTI International Metals Inc) are working together on the PW-shy‐11 Team

The OEM-shy‐Supplier collaboration tool will enable data13 exchange related to mechanical testingmicrostructural13 characteristics processing and process modeling The13 data13 standards developed by the

PW-shy‐1113 team will provide13 an industry standard for such OEM-shy‐Supplier collaboration The collaboration13 framework would provide a platform to manage OEM-shy‐Supplier13 workflows data and13 informationsecurity and access13 control and the13 ability to perform system administration without the13 need13 for day-shy‐to-shy‐day intervention13 of the IT department The capture of such13 supplier data over the collaboration

framework will enable the OEMs to enhance process and property13 models data analytics and UQ

modeling Three potential technologies will be evaluated13 for the OEM-shy‐Supplier Collaboration13 Frameworkndash GE Forge Vanderbilt Forge and Semantic Web technology and one13 technology will be selected13 for the

next phase of the program

18

13

Semantic Web technology is paradigm shift from highly structured13 databases to13 more flexible

graph databases from single database searches to13 multiple and13 disparate13 database13 searches and from

keyword searches to contextual and meaningful (semantic) searches It can provide flexibility13 of data

structures and enhanced OEM-shy‐Supplier13 interoperability However it requires more work upfront13 for13 implementation as it needs the development of a common vocabulary and relationships

The data13 standards and data13 structures effort will leverage work at national and13 international standards13 organizations such13 as ASM ASTM CEN and13 NIST and13 develop13 standards and vocabularies forproduction13 certification data that can include chemistry microstructure and mechanical properties This effort will also develop means for the capture of data needed13 for the UQ and13 ICME models andbuild13 databases for13 the capture of13 the data from the GE-shy‐10 PW-shy‐9 and RR-shy‐10 MAI programs

The ICME13 models being developed under the GE-shy‐10 PW-shy‐9 and RR-shy‐10 programs involve severalinterconnected models related to processing microstructure and mechanical properties The UQ effort under PW-shy‐11 will involve considering the uncertainty at each13 step13 of the modeling process and flowing it through the interconnected models to evaluate the relationships between13 the variabilities of13 the

processing parameters the microstructure and13 the mechanical performance

ASMrsquos Computational Materials Data Network amp the NIST-shy‐ASM Structural Materials Data

Demonstration ProjectScott D Henry ASM International

ASM International ndash the worldrsquos largest13 association of13 metals-shy‐focused materials engineers scientists technicians educators and students13 ndash has launched13 a series of initiatives under the

ldquoComputational Materials Data (CMD)13 Networkrdquo umbrella The objective13 of the13 CMD Network is to serve13 as center for information collection and dissemination for materials data13 to support integrated

computational materials13 engineering (ICME) and to help realize the goals13 of the US Materials13 Genome

InitiativeThe first major project for the CMD Network is the NIST-shy‐ASM Structural Materials Data

Demonstration Project (SMDDP) a cooperative research project funded by NIST with participation from

the Kent13 State Center13 for13 Materials Informatics and Granta Design Ltd13 The main13 objectives of SMDDP

are13 to

19

13

bull Establish well-shy‐pedigreed13 and13 curated13 demonstration13 datasets for non-shy‐proprietary metallic structural materials13 data over multiple length scales The team has13 chosen to work with 6061

aluminum and the13 related Al-shy‐Mg-shy‐Si system

bull Work with NIST and the materials data community to develop materials data schema and

ontologies for the demonstration13 datasets cognizant of broader interests and13 datasets

bull Develop and carry out a series of test problems that represent relevant use cases for the13 repository

bull Make data open to the materials data community for use in data analytics modeling and

educational purposes

bull Actively engage the materials data community and13 widely disseminate the findings from the

project

bull Develop and implement data13 capture13 and curation procedures that can serve13 as models for otherdata repositories

The basic philosophy behind the project is that often standards13 emerge through trial and error and examples are13 needed to spur this process

The project is using the NIST-shy‐KSU DSpace13 repository for data13 files The13 curated structured

database is being developed13 using GRANTA13 MI software and13 is hosted13 o ASM serversThe project ndashlaunched in late 2013 and planned as an 18-shy‐month program13 ndash is being conducted in

phases The13 first phase13 involved developing starter database13 and gaining experience13 in data13 curation The second phase (currently underway) involves building out more comprehensive data13 sets that will be

suitable for carrying out test problems13 related to processing-shy‐structure-shy‐property relationships in13 Al 6061 The third phase will involve making further refinements in response to testing The complete data13 set will be opened up online to the materials community as a whole

Globus13 Scientific13 Data Services13 Current and FutureBen13 Blaiszik1 Kyle Chard1 Jim Pruyne1 Rachana Anathakrishnan1 Steve Tuecke12 Ian Foster12

1University of Chicago Computation Institute 2Argonne National Laboratory MCS Division

During this talk we discussed a variety of currently available Globus services13 Via these Globusservices developers13 and IT administrators13 can offer simple secure and robust file transfer as13 well as13 identity profile and group management to their user communities13 Globus allows end users to create

and13 manage a unique identity that can13 be linked13 to13 external identities for authentication End13 users can13 also transfer files across wide13 area13 networks faster and more13 easily whatever their location

The talk also covered prototype services related to data13 publication metadata tagging and13 data

search Globus13 publishing capabilities13 are delivered through a hosted service Metadata is13 stored in the

cloud while raw published data and a fully-shy‐formed metadata log are stored on campus institutionaland group resources that13 are managed and operated by campus administrators Published datasets are

20

13

organized13 by communities and13 their member collections Globus users can13 create and13 manage theirown13 communities and13 collections through13 the data publication13 service13 with collection level policies regarding user13 access Additionally metadata from these communities are imported into a centralized

index to allow for robust searching functionality

Semantics13 for Deep13 InteroperabilitySamuel Chance iNovex

Semantic technology is based on mature13 web standards and intended to be13 used as global framework for13 machine-shy‐to-shy‐machine communication Interoperability is enabled even when the more

basic elements of the semantic technology stack are used13 in13 conjunction with a controlled vocabularyKnowledge13 representation is enabled as one13 begins to utilize13 the13 ldquohigherrdquo elements of the13 stack

brief history of the semantic web13 was presented13 and13 started13 with13 a depiction13 of the web13 that most people see today documents and web pages presented to the user13 through a web application It13 then depicted a transition to an intermediate state where documents and data can co-shy‐exist on asemantic13 web riding atop the same infrastructure (http) Even in this13 state the data can13 be directly

interpreted by a machine13 The history concluded with a depiction of data being extracted from adocument to13 become part of the web13 of data

The basics of semantic technology were presented and emphasized that the technology has afoundation based13 o formal logic that enables software13 to understand information and reduce13 the13 need

for13 a human to interpret13 the meaning This section of13 the presentation concluded with a chart13 depicting

how ldquotriplesrdquo link to13 each13 other to13 form a web13 of dataThe final section touched upon adoption of13 and things to consider13 when semantic technologies

are13 being considered as solution for community or enterprise

21

13

Big Data13 and Semantic TechnologiesHarlan Shober RJ Lee Group

The presentation started with a description of entities that13 comprise laboratoryinformatics with the central elements being a federated architecture scientific ldquoBig Datardquo management13 and workflow-shy‐driven Laboratory Information Management13 Systems (LIMS)To aid attendee understanding initial context13 was provided by first13 defining for ldquoBig Datardquo and

Semantic Technologies Context13 was further bolstered through a description of a problem that13 included laboratory testingcharacterization and fabrication processes The challenges for atypical laboratory include the use of many systems and the creation of large amounts of data13 that13 grows daily but13 cannot13 be readily accessed

After the context13 was established the needed functionality was described and consistedof several elements contribute index search analyze and archive While bearing in mind theneeded functionality a string of universal scientific data13 management13 process categories werethen presented with the intent13 of applying the universal description to the materials design anddevelopment13 domain

22

13

Though semantic technologies are applicable over the entire lifecycle of the data13 management13 process three categories were specifically identified as primary candidates fortheir application Data13 Acquisition Data13 Manipulation and Storage and Discovery andAdvisory W3C standards for Resource Description Framework (RDF) were applied to a notionalexample with the intent13 of completing a thread that13 gave the attendee a sense of how thistechnology could be applied

The presentation continued with a description of Postulate a specific solutiondeveloped by RJ Lee Group that13 could be used as part13 of a suite of tools to enable the captureof data13 and the near-shy‐frictionless tagging of data13 with additional observations provenanceinformation or links to additional data Using tools like Postulate repositories could be moreeasily shared with others in a federated manner These technologies would enable thedevelopment13 of indices that13 increase the precision of search results and ease analytics since thedata13 would be ldquoself-shy‐describedrdquo

In the remainder of the presentation Postulate functionality was further described andsome of the challenges presented earlier were addressed For example how do you processthe quantity and variety of data how do you store it how do you find the data13 you care aboutand how can the data13 be presented to you in a meaningful way

Natural13 Language Processing (NLP)Tim Hawes Decisive Analytics

The presentation provided the attendees with a description of NLP13 and how it can be13 used It started with examples13 of how it can be used to mine ldquoBig Datardquo to predict credit card fraud or assess13 user purchase histories for product development NLP is also13 being used13 in13 the field13 of biology and13 appears to have13 enjoyed some13 level of success

Since13 the13 biology and materials communities both publish large13 volume13 of documents (abstracts papers presentations etc) many of13 the NLP approaches developed for13 biology may be

applicable13 to materials The next section of the presentation focused on describing various aspects of NLP tokenization

part-shy‐of-shy‐speech tagging disambiguation parsing entity disambiguation and role13 labeling This was helpful in13 removing the mystery surrounding NLP for those13 who hadnrsquot had previous exposure

Typical products from NLP13 would be useful to the materials development community Examples include providing the gist of a documentrsquos content providing answers to natural13 language questions and knowledge13 construction

Specific NLP13 challenges for the13 materials domain were13 identified inline13 use13 of equations prevalence of tables graphs and13 images (more of a computer vision13 problem) and13 numerically driven13 information13 Additionally the ldquogenrerdquo problem was discussed as13 well as13 a way this might be addressed

using raw data (versus learning data) and13 semi-shy‐supervised learning active learning and distant supervision approaches

23

13

The presentation concluded by reiterating how NLP13 can be used and the value of using it describing some its limitations and13 identifying tool development needed13 for the materials domain

24

13

Community13 and13 Government Priorities in13 Addressing13 Materials Data

The following numbered13 priorities for13 both community and government13 actions were lsquorolled uprsquofrom workshop participantsrsquo inputs and were reviewed at13 the workshop The raw inputs provided by

participants are captured13 under each13 numbered13 priority and provide context to13 the priority and13 often13 more specific actions

Community-shy‐Led13 Priorities1 Develop13 and deploy13 standards13 for data and federatedcollaborative environments-shy‐ Promote13 standards for data13 and collaboration environments-shy‐ Designing and deploying suitable software environments that are capable of facilitating e-shy‐

recording and accelerating (fail often-shy‐fail fast)13 of13 the communityrsquos experience (both successes and

failures)13 in exploring integrated workflows needed to realize the vision of ICME and MGI-shy‐ Data13 standards (program managementtechnical datametadata)-shy‐ Community must start developing a common13 data format data package could13 look like an13

HDF5 file and be shared in a variety of systems -shy‐ Begin13 definition13 of standardized13 infrastructure

-shy‐ Produce13 draft guideline for the metadata including information regarding authorship provenance targethellip

-shy‐ Share13 the13 data13 types being collected with the13 community Include13 nametypeunits of data

-shy‐ Community must adopt a federated13 approach13 -shy‐ allow organizations to autonomously create their13 own13 toolsets but be able to13 readwrite to13 the common13 data package mentioned13 in13 point 1

-shy‐ Good documentation on repositories are important-shy‐ Address networking of repositories (lightweight metadata databaseindex) -shy‐ Digital heterostructure microstructure mapping algorithms for both scanning electron microscopy

(SEM)13 and transmission electron microscopy (TEM) -shy‐ Materials process model (thermodynamics) integration with ICMSE (Integrated Computational

Materials Science and Engineering) data flow

-shy‐ Data sharing protocol across users (government13 OEM13 developers) with appropriate securitycontrol

-shy‐ Work with instrumentequipment OEMs to get auto ingest capabilities and open architectures-shy‐ Work together to define common vocabularies for interoperability-shy‐ Standard tests for ICME13 model data13 standards for microstructure13 data

-shy‐ Standards for ICME13 model data13 storage13 and workflows 2 Communicate the value and need13 for digital materials13 data-shy‐ Brand13 management ndash consistent messaging of goals13 and beliefs -shy‐ Share13 success stories -shy‐ Value (ROI13 Return on Investment) o data capture (globally locally)-shy‐ Give incentive ($$$) for case studies to provide evidence that a shared data repository can be

actually useful beyond the archival purpose

25

13

-shy‐ Share13 ROIs for MII (the13 Materials Innovation Infrastructure)-shy‐ ICME projects due to their iterative nature take longer and are more expensive to realize benefits

in industry implementation13 Multi-shy‐year ICME programs are becoming13 a hard sell to management difficult to13 establish ROI and13 realize benefits claimed

3 Engage the community-shy‐ Write grant proposals-shy‐ Collaboration13 (or e-shy‐collaboration) science that will help us13 identify13 the specific13 incentives13 and

impediments for establishing mutually beneficial highly productive sharing of expertise between13 disciplinary experts in13 materials manufacturing and13 data sciences

-shy‐ Continue to13 work o hard13 problemsmdashdevelop13 open13 source codes discover new descriptors data

mining uncertainty analysis-shy‐ Organize as a community (governmentindustryacademia nationalinternational) -shy‐ Follow data13 recommendations in MGI strategic planmdashworkshops to develop roadmap-shy‐ Attend13 more events like the MGI workshop13 mdash gaining13 lessons learned from other organizations is

extremely valuable

4 ldquoDefine critical datardquo13 to be13 compiled-shy‐ Consensus o what fundamental genomic data is most-shy‐important to compile (thermodynamics13

kinetics13 properties [strength]13 high-shy‐throughput13 DFT) -shy‐ Agree to13 the concept of a ldquouniversal repositoryrdquo for all genomic data that everyone will contribute

to and use (cf everyone maintaining and hosting their13 own databases) this includes issues of13 data

security and sharing etc -shy‐ List of most-shy‐important ICMEMGI13 materials development activities from an industrycommercial13

perspective (alloys composites piezoelectrics etc)-shy‐ Some13 ongoing repository efforts seem redundant We13 should avoid unnecessary competition and

try not13 to step o other peoples toes5 Explore and leverage data solutions developed13 outside materials13 community-shy‐ Semantic web infrastructure13 ndash evaluate13 feasibility of lsquounstructured materials datardquo management

(Technical pathcosttimeframe) -shy‐ Keep communicating outside13 the13 community about solutions -shy‐ Focus on available13 tools open ones when possible

6 Train13 students-shy‐ Sponsor students13 who can think in terms13 of IT and MSE

-shy‐ Training workshops (through professional societies) for training the new generation of materials students13 and researchers13 in the use of readily (already) accessible data science tools13 and cloud

services13 to make13 the13 best use13 of this emerging13 infrastructure13 in enhancing13 their own individual day-shy‐to-shy‐day productivity

7 Establish community13 norms13 for publishing materials13 data-shy‐ Involve publishers by requiring data when publishing Begin by making just the data available first

and then work on standardization panel

26

13

-shy‐ Data associated with publications should be shared along with the publications Not only the

metadata (eg average grain size vs cooling rate) but also the source of the data (eg micrographs used13 to13 generate13 grain size temperaturetime charts13 used to calculate cooling rates etc)

Government-shy‐Led13 Priorities1 Lead13 data strategyapproach13 development-shy‐ Hold meeting like this one in 6-shy‐1213 months with purpose13 of developing roadmap for MII-shy‐ Define a technical roadmap for13 the materials data infrastructure

-shy‐ Governmentrsquos got to decide on a strategy-shy‐ Whatrsquos the critical nucleus eg getting13 CALPHAD off the13 ground

-shy‐ Government must establish an official stance on security and infrastructure-shy‐ List of most-shy‐important ICMEMGI materials development activities from a government perspective

(alloys composites piezoelectrics etc)13 -shy‐ Needs assessment for professional workforce development-shy‐ Work with industry to make sure the research-shy‐to-shy‐engineering13 gap is filled What is the practical13 use

of the projects -shy‐ Provide13 the13 materials community organizing function

-shy‐ Business case for productivity improvement with13 data readily available

-shy‐ Facilitator as honest broker of capability awareness and data13 sharing

-shy‐ Interdisciplinary coordinated effort13 to avoidminimize duplication and redundancy

2 Facilitate data deposition (policy amp technology)-shy‐ Setting expectations and guidelines for best practices and standards where13 relevant -shy‐ Invest in tool13 making data management as easy as possible-shy‐ Promote standards developmentcommon data structuresshare knowledge representations -shy‐ Learn about the WBC Semantic Web capabilities and limitations -shy‐ Government should foster community development for both data package refinement and creating

standard approach to creating an enterprise toolset -shy‐ Provide13 motivation for standard data13 collection ensure13 it includes pedigreeprovenancemeta-shy‐data

-shy‐ Facilitator for materials data13 standards (good and trustworthy data13 vs bad data) -shy‐ Facilitator for forming consortium for data13 mining data13 warehousing and sharing

-shy‐ Facilitate13 industry standards for ICME13 model validation and maturity levels-shy‐ Facilitate13 standards for model-shy‐based13 process and13 supplier qualification

3 Incentivize data deposition-shy‐ Provide13 motivation for13 standard data collection ensure it13 includes13 pedigreeprovenancemeta-shy‐data-shy‐ Push ldquodata13 management planrdquo requirements to ldquodata13 depositrdquo requirements -shy‐ Make data sharing a requirement -shy‐ Contractual requirements that preserve data for future use

-shy‐ Demand data submission as a deliverable that will be in the contract before paying the funds -shy‐ Require that any data generated13 by federal grants is provided13 to13 the government (or a designated13

repository)13 within 3 years after completionmdashstart with basic13 research

-shy‐ Demonstrate the return on investments in the materials data infrastructure

27

13

4 Support13 data sharing13 and deposition13 (provide resources)-shy‐ Increase of MGI13 funds dedicated to MII13 development to 10-shy‐-shy‐repositories amp collab platforms -shy‐ Commit to13 long-shy‐term support for data repositories amp collaborative platforms-shy‐ Funding for development collection of fundamental genomic data

-shy‐ Support thermodynamicskinetics database development for light-shy‐weight alloy systems-shy‐ Funding for universal data13 repository (eg at NIST) ndash creation and maintenance

-shy‐ Fund or establish data13 repositories with commonuniform data13 collection format -shy‐ Createidentify secure data sharing site (like AMRDEC)13 for13 government13 contractorsmdashnot a

repository

-shy‐ AFRL take lead13 o making a database that can13 be13 shared

-shy‐ Infrastructure for materials related semantic web applications5 Support13 data creation-shy‐ Support experimental measurement effort to provide13 quality data for13 database development -shy‐ Support more13 high-shy‐throughput13 (HT)13 studies13 for applied programs13 to populate data Current HT

approaches support are13 limited to too fundamental single-shy‐phase DFT calculations 6 Enhance data management13 competency-shy‐ Government needs to increase in-shy‐house expertise in13 these areas or alternatively work closer with13

groups like NIST We cannot allow other organizations to define our needs for us7 Provide quality datasets13 for model13 developmentvalidation-shy‐ Need for open model benchmark cases with data to be available to the community

28

13

Attendees

John Allison University of MichiganJohn Beatty Army Research Laboratory

Ben13 Blaiszik University of ChicagoANL Carrie Campbell NIST

Sam Chance13 InovexJennifer13 Fielding Air Force Research LabMarco Fornari Central Michigan

Tim Hawes Decisive Analytics Corp

Lou Hector General MotorsAndrea Helbach Air Force Research LabKris Hill ITI-shy‐GlobalMike Glavicic Rolls Royce

Ro Gorham NCDMM

Dennis Griffin NASA

Mike Groeber Air Force Research LabScott Henry ASM InternationalMatt Jacobsen Air Force Research LabWill Joost Dept of Energy-shy‐EERE

Surya13 Kalidindi Georgia Tech

Kevin Kendig Air Force Research LabBen13 Leever Air Force Research LabLara Liou GE Aviation

Kenny Lipkowitz Office of Naval Research

Bill Mullins Office of Naval Research

Benji Maruyama Air Force Research LabRajiv Naik Pratt amp WhitneyJim Neumann Honeywell Ruth13 Pachter Air Force Research LabClare Paul Air Force Research LabAmra Peles United13 Technologies Research13 CenterAndy Powell GE Aviation

Brian13 Puchala University MichiganAjit Roy Air Force Research LabAyman13 Salem Materials ResourcesTom Searles Materials Data Management Jason Sebastian QuestekRichard13 Serwecki NASA

Dongwon Shin Oak Ridge13 National Labo

29

13

Harlan Shober RJ LeeJeff13 Sheehy NASA

Jeff13 Simmons Air Force Research LabVeera Sundararaghavan University of MichiganSesh Tamirisakandala RTI Glenn Tarcea University of MichiganTJ Turner Air Force Research LabVasisht Venkatesh Pratt amp WhitneyChuck Ward Air Force Research LabJim Warren Natl Institute of Standards amp TechJeff13 Williams GE Aviation

Terry Wong Rockedyne

Wayne Ziegler Army Research13 Laboratory

Jake Zindel Ford Motor Company

30

Page 16: Materials Genome Initiative: materials data · universallyaccepted best practices. Under the MGI, Federal agencies have funded research and development designedtoexplore, develop,

13

bull GTRI Integration Platform (2013-shy‐2014) Engaged professional code13 developers and software13 architects to design13 a multi-shy‐user web13 portal that allows data and13 code sharing while tracking

provenance and workflows underestimated the resources needed for such a project bull MATIN Collaboration Environment (2013-shy‐2014) Lightweight integrated software13 shell that allows

users to13 seamlessly interface with13 numerous cloud13 services offered13 through13 the web13 through13 their13 APIs Examples include GitHub for versioning and archival of codes Authorea forcollaborative editing of documents figshare for citable publication of research Plotly forcollaborative data analysis13 and visualization Google+ and LinkedIn for eteaming13 and networking MATIN focuses on building an emergent community ndash Materials amp Manufacturing Informatics (MMI) MATIN provisions web services for13 software integration platforms and innovationinfrastructure Examples of MATIN13

bull Spatial Statistics httpsgithubcomtonyfastSpatialStatisticsFFT

bull PyMKS httpopenmaterialsgithubiopymksindexhtml bull Example discussion about periodic boundary conditions in running Finite Element (FE or FEM)

simulations httpsgithubcomsfepysfepyissues267

In the platform developed it is noted that the user or userrsquos organization can set permissions at the

level13 of individual datasetscodes o the degree and13 type of13 sharingCentral Challenge Activation The transition from traditional disciplinary workflows to the

integrated workflows needs to overcome an activation barrier There is no13 clear incentive to13 get ahead13 of the curve in13 anticipating how the upcoming transformations in how materials development will be

pursued13 in13 the future Open (reproducible) science alone is not a sufficiently strong motivator for materials development community with13 strong interests in intellectual13 property development13 indeed itmight actually13 be perceived13 as a negative There is need to explore13 and refine13 what incentives work for individual13 researchers (eg increased productivity)

Future13 Plans13 Continue to13 develop13 and13 deploy MATIN at GT and any other organizationsgroups that13 express an interest in working with us Design create and distribute13 suite13 of ldquomobile13 firstrdquo Appsdesigned13 to13 make data ingest into13 collaboration environments such as MATIN from both material characterization equipment and multi-shy‐scale models an enhanced13 user experience with13 friendly andnatural interfaces Proactively promoteincentivize13 datametadata13 ingest into publicly shareddatabases Work with13 professional societies to13 announce and run new competitionsawards for13 recognizing and rewarding good practices by the members of the broader materials innovation13 usercommunity (eg Best13 Documented Data Award Best Documented Tool Award Best Documented

Workflow Award) Develop and deliver new workshops for training the new generation of13 graduate

students postdocs and junior faculty in the13 emerging workflows of Materials amp ManufacturingInformatics

High Performance Cast Aluminum Alloys for Next Generation Passenger Vehicle EnginesDongwon Shin Materials Science and Technology Division Oak Ridge National Laboratory

12

13

Oak Ridge National Laboratory (ORNL) teamed with Chrysler and Nemak13 to develop next-shy‐generation high-shy‐performance cast aluminum alloys that have13 improved castability high-shy‐temperature

strength and fatigue13 performance At this point the13 data management practice is limited13 to13 analytical data spreadsheets other than13 the CALPHAD computational thermodynamic databases and13 proprietary

property databases for casting simulation As the project goes on more robust data schema will bedeveloped13 to13 store various types of data populated13 from the experiments and13 calculations

Besides the data structure design the speaker discussed13 the necessity of a new approach13 to13 identifying key descriptors from the large dataset that can guide the design of new materials An alloydesign13 requires the balance of extremely complex relationships among a large number of descriptors Some13 examples of these13 descriptors include13 phase fraction13 of key intermetallics lattice parameters ofthe matrix and diffusion13 kinetics13 of solute atoms However optimizing these quantities relies on good

physical metallurgy intuition13 coupled13 with13 clever experiments dogged13 sleuthing and13 a healthy dose ofluck13 Such descriptor-shy‐based predictions of materials properties date back to13 the beginning13 of modern

solid-shy‐state chemistry and physics As13 shown in the Darken-shy‐Gurry map to predict the solubility ofelements in Tantalum13 intuitive descriptors13 such as electronegativity and radius13 could be used formaterials design However as the chemistry and physics13 of materials become more complex intuition-shy‐based13 search13 may not work anymore Thus the use of more advanced13 statistical analysis and13 datamining approach might be required to accelerate the design of complex multi-shy‐componentmulti-shy‐phase

alloy design13

Data Related to Cast Aluminum

Jake Zindel Ford Motor Company

The main focus of our research is the development of CAE (Computer-shy‐aided Engineering) tools to support13 two significant13 activities in the company component13 design (product13 development)13 and

manufacturing engineering (high-shy‐volume production) A key13 part of our research is conducting13 experiments to collect data13 for the13 development and validation of13 the CAE tools

The data13 we collect consists of mechanical properties physical properties microstructuralimages and manufacturing process parameters that form the13 pedigree13 of the13 material This information

can be in the form of Excel files proprietary-shy‐software storage files screen shots and written reports Our immediate need is to deal with this large amount of existing information collected over

many years on a ldquolocalrdquo level The first step is to organize many independent collections of informationin all13 their forms13 The next issue is the time required to enter all13 this information into the database13 Maintenance may also require a significant amount of effort Lastly there will also13 be a need13 to13 manipulate the data contained in the database using Excel Mini-shy‐tab MATLAB etc Therefore links or13 easy exportation of data13 to these13 types of applications will be13 necessary

The next level would involve dealing with information that has been created or collected byothers From an13 OEM (Original Equipment13 Manufacturer)13 perspective this would13 involve convincing

suppliers13 to share processing information that they may consider proprietary In addition to suppliersuniversities and13 national laboratories are also13 sources of data Data from all the13 sources will require13 some sort of protocol to determine its13 quality and resolve inconsistencies

13

13

In the end13 it seems likely that considerable budget and time will be required to implement comprehensive informationdatabases which13 could be the largest hurdle13 of them all

QuesTekrsquos MGIICME Approach to Alloy Optimization and Design (Including Cast Iron) Jason Sebastian QuesTek Innovations LLC

Integrated Computational Materials Engineering (ICME) methods involve the holistic application13 of different computational models across various length13 scales to13 the design development and13 rapid13 qualification13 of advanced13 materials Using itsMaterials by Design technologies QuesTek has successfully applied these ICME methods to13 the design development13 and full qualification of several advanced alloys (including aerospace13 structural steels aerospace13 gear steels advanced castable13 titanium alloys high-shy‐strength corrosion-shy‐resistant13 aluminum alloys13 and high-shy‐performance cast iron)

QuesTekrsquos ICME methods benefit tremendously from the activities of the US Materials GenomeInitiative (MGI)13 Recognizing that the ldquo10 to13 2 yearsrdquo that13 it takes for alloy design and development ishindering the rapid13 deployment of higher-shy‐performance materials13 the MGI is focused on ldquocutting in halfrdquo the time and cost13 to design and deploy novel advanced materials needed to enhance and sustain UScompetitiveness

QuesTek also has ongoing13 activities on the13 application of ICME methods to the13 design and

development of materials for the energy sector including advanced13 materials that enable vehicle

lightweighting13 Of particular note is an ongoing effort with a major diesel13 engine manufacturer on the13 development of high-shy‐performance cast iron13 alloys for engine block applications QuesTekrsquos efforts have

made extensive use of various MGI-shy‐type tools and databases including 1)13 density functional theory

methods for high-shy‐throughput13 screening of cast iron inoculant stabilities (structure lattice parameterstability of various13 oxidesulfide phases) 2)13 liquid iron mobility databases (to help accurately modelsolidification phenomena) and 3) thermodynamic13 databases13 (to examine nano-shy‐precipitate stability)

Moving forward to continue to accelerate the materials development cycle efforts shouldremain focused on the development13 of13 the fundamental databases (thermodynamic kinetic etc)13 that13 enable13 ICME models There13 should also be13 increased focus o efforts related13 to13 the integration13 of the

various computational databases models and tools (the ldquoIrdquo13 in ICME)

Integrated Computational13 Materials Engineering (ICME) of Generation Three Advanced High Strength

Steels Louis G Hector Jr General Motors RampD Center

While traditional material development methods often involve laborious trial-shy‐and-shy‐error testing Integrated Computational13 Materials Engineering (ICME) which falls under the umbrella of the Materials Genome Initiative is offering a means13 for materials13 development under a more compressed time scale Advanced13 high13 strength13 steels (AHSS) which13 contain13 multiple phases (eg ferrite bainite pearlite austenite martensite) and may exhibit13 phase transformation with strain are fertile ground for13 ICME However prediction of macro-shy‐scale constitutive behavior based upon the multi-shy‐scale physical chemical and mechanical phenomena13 in these13 complex microstructures is formidable challenge for ICMEUnprecedented collaboration between universities industry and government labs will be required to

14

13

address AHSS13 development archiving of data and the13 implementation of these13 materials into products that13 benefit13 the American consumer This presentation began with a brief overview of a newDOEUSAMP (US Automotive Materials Partnership)-shy‐funded ICME program aimed at13 the development13 of Generation13 Three (Gen13 3) AHSS The lack of commercial Gen13 3 steels that meet DOE targets is offering13 significant challenges13 to this13 new ICME program The program is13 meeting these challenges13 following four thrusts (1)13 making Gen 3 steels to meet DOE targets (2) passing experimental measurements of key

properties to13 an13 ICME model that produces microstructure-shy‐based13 constitutive models (3) materialevaluation in forming13 and vehicle13 performance13 simulations (4) archiving13 all relevant data13 for easy use13 by

others in13 the program Although13 the Gen13 3 AHSS ICME program is ldquomission13 orientedrdquo it demonstrates an important role for universities in providing the necessary foundation in fundamental materialsscience and engineering

Integrated Computational13 Methods for Composites Materials (ICM2)13 Lessons Learned from

Integration Planning and Feasibility Demonstrations Lara Liou GE Aviation

General Electric Aviation and Lockheed Martin Aeronautics have teamed through an Air Force

Research13 Lab13 program to13 conduct a demonstration13 of an13 Integrated13 Computational Materials Engineering digital framework that links material processing property and13 structural relationships to13 account for processability manufacturability and system performance13 with the13 goal of demonstrating

that13 the usage of13 integrated models can contribute to future airframe and engine designs with dramatic reductions in development13 time and cost Both engine and airframe Foundational Engineering Problems will be demonstrated through a common digital framework to be developed and validated onincreasingly complex articles13 The modeling tools that13 cover composite process models localized13 mechanical behavior models and global design models (Convergent Manufacturing TechnologyrsquosCOMPRO CCA Autodeskrsquos Autodesk Simulation13 Composite Analysis and13 an13 Abaqus based13 University ofMichigan micromechanics model) are centrally linked via a commercially available model integration

interface (Phoenix Integrationrsquos ModelCenterreg)13 Detailed plans of what data will13 be passed among thevarious models as well as how it will be passed have been completed Creation of these detailed13 process maps and13 simple feasibility demonstrations have raised13 inherent technical and13 logistical challenges13 to integrating commercially13 available multi-shy‐scale models Robust model integration methods13 to address these challenges were13 discussed

The exercise of drawing up detailed process maps for what to integrate and how to integrate13 the selected tools and models resulted in key early lessons learned and identification of13 tools critical tolong-shy‐term robust and sustained application of13 Integrated Computational Methods for Composites Materials (ICM2)13 The key lessons learned include

bull Individual13 models must be relatively mature and validated before an ICM2 problem can be

confidently13 defined bull F analysis-shy‐based13 tools within13 the same digital thread should utilize13 the13 same13 FE solver bull The team ideally13 should consist of at least one expert for each13 tool and13 its underlying model as

well as an expert in the digital thread and integration interface tools and programming

15

13

bull Mesh interpolation between13 finite element analysis based13 tools will likely be required and13 acommercially13 available tool for this13 type of composite mesh interpolation is13 key13 for robust and

sustained application of ICM2

Residual Stress Engineering of Rotating Nickel Components Terry Wong13 Aerojet Rocketdyne

The goal for the PW-shy‐8 project13 The Nickel Base Superalloy Residual Stress Foundational Engineering Problem program aims to13 develop13 a multi-shy‐disciplinary analysis approach13 to13 predict and13 incorporate bulk residual13 stress in the13 design of component This will allow for design of part that is optimized for a particular application or design intent (such13 as weight performance or reduction13 ofscrap) To accomplish this13 goal the PW-shy‐813 team has two major tasks 1) the13 development of an13 infrastructure and associated tools needed to predict and incorporate bulk residual13 stress in the designstages13 of a part and 2) a demonstration task that will validate the infrastructures13 and tools13 developed

The Data13 Management task is the foundational element13 of13 the larger13 infrastructure

development task The goal for the data management work is to13 develop13 an13 infrastructure that allowsthe various team members in PW-shy‐813 to share13 data13 and to store13 data13 in database This not only ensures that data generated13 by this program is never lost but also13 allows for data to13 be transferred13 in13 a secure

manner In order to meet the goal of a system13 that securely stores and shares data three major tasks must be accomplished 1) Validation that there13 is secure13 method to transfer data13 between team

members and that the method of storing data is also secure 2)13 development of13 a database schema and

3) development of means to easily upload data13 into the13 database13 and download data13 from the13 database

Of the three the most difficult task was to create a system in which the transfer and storage ofdata was secure Various proposals were made to13 ensure the security of the data but many of them

were rejected not because the proposal was actually unsecure but because it was not acceptable byeach of the13 team memberrsquos IT security policies The13 process to get each team membersrsquo IT department to approve the security of13 the data transfer13 and storage took more effort13 than anticipated In the endthe team members along with their13 respective IT departments approved the use of13 TRUcentrix as a tool to transfer13 data and the use of13 an ASM International13 hosted13 database to13 store data

The development of database schema13 is necessary step in creating the database to store all the data generated (via physical tests as well as data generated by models) A big help in creating the

database schema was to13 use the schema generated13 by Dr Steven13 Arnold13 from NASArsquos Glenn13 Research13 Center Using NASArsquos schema eliminated13 the need13 to13 start the database build13 from scratch13 and13 allowed13 us to13 focus o the specific needs of storing residual stress data

The last task involves the development of tools that make the input of data13 easy The moredifficult it is to13 enter data into13 a database the more likely data13 will not be13 entered Excel-shy‐based13 templates are being developed in order13 to make data entry as easy as possible

The two big lessons learned from PW-shy‐813 in relationship to data13 management are

bull It is essential13 to work with each13 companiesrsquo IT and IT Security people to develop a system thatsatisfies13 the security requirements of each company

16

13

bull Database schema development should not be reinvented but begin with already established the13 schema

Perspectives on MAI ProgramsDr Mike Glavicic Rolls Royce

The current formulation of MAI (AFRLrsquos Metals Affordability Initiative) programs are structured13 independently from one another in that a standard methodology does not exist that describes whatdata is to13 be archived13 at the conclusion13 of a program Nor does there exist a standard13 protocol of howto archive things on a server13 or some other computer resource at the conclusion13 of a program As aresult the current13 practices of13 MAI programs

bull Do not address IP (intellectual property)13 and licensing issues in systematic fashion which leadsto issues in the commercialization of ICME software that is generated13 under a program

bull Do not have a formalized requirement of how data is to be packaged and stored at the end of aprogram Hence even13 in13 the best of cases in13 which13 all of the data is stored the data is archived13 in13 simple folder13 fashion that13 is not13 searchable in any manner13 possible Moreover even if13 all of13 the

data is present it is extremely cumbersome to13 untangle the data for future use using a file folder strategy

Hence at the end of the program both the government and13 MAI team are13 left to manage13 the13 data and13 results in13 an13 unregimented13 fashion

The lack of regimented protocol of how to package programrsquos13 results13 (raw data and reports) makes it extremely difficult for new government programs to leverage the data from previous programs Case in13 point the RR-shy‐1013 program was developed with the13 singular goal to integrate13 the13 models developed13 in13 the LAD-shy‐213 program into commercial software13 platform From the13 outset of the13 program data models and13 other information13 that was required to attained the13 goals of the13 RR-shy‐1013 program were13 either difficult to track down lost or not provided to the13 RR-shy‐1013 team due13 to contractual disputes As aresult funding that13 was earmarked for13 specific technical work at13 the conception of13 the13 program had to

be diverted13 for the recreation13 of data that existed13 in13 a previous program and13 the progress of the

program was stunted13 by this unforeseen13 burden In13 spite of the RR-shy‐1013 programrsquos13 ability to overcome

these issues and become a successful program it13 should be noted that13 the success of13 other13 programs that13 will encounter13 similar13 problems will likely not13 apply as RR-shy‐1013 was very fortunate13 to have13 the13 means to overcome these serious issues that13 were a result13 of13 the lack of13 a data management13 plan

Furthermore if the13 MGI initiative13 is to progressively develop materials models that are13 leveraged and improved in future government funded programs a policy on how to store and share thedata from disconnected13 programs is required The PW-shy‐1113 program is the first13 step in this process as it13 will have the data from three MAI programs (RR-shy‐10 GE-shy‐10 PW-shy‐9) uploaded into database13 that will be13 searchable and will enable the reuse of the data collected in these programs However even if this13 program is a success13 there will undoubtedly13 be other features13 that will be required in the platform

developed13 and13 hence future modification13 will be required Most importantly there does not currently13 exist server that has been set up and funded for the13 archiving13 of all13 future programs with a protocol13

17

13

13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

that13 must13 be adhered to at13 the conclusion of13 a funded program Without13 such a protocol and the

existence13 of server to be13 used in all future13 programs the13 goals set forth by MGI will be13 unnecessarily

difficult and13 more expensive to attain

MAI Data Informatics Data Standards Modeling amp UQDr Rajiv Naik Pratt amp Whitney

The MAI Data13 Informatics Data13 Standards Modeling amp Uncertainty Quantification (UQ) (PW-shy‐11)13 effort13 will enable the pervasive implementation of ICME by developing (i) a web-shy‐based13 OEM-shy‐Suppliercollaboration framework13 for production13 certification13 and13 process modeling data (ii) data standards and

protocols for interoperability between13 OEMs and13 Suppliers and13 (iii)13 UQ methods to capture variability in

process microstructure and property data Models and data13 from earlier MAI programs such as GE-shy‐10 PW-shy‐913 and RR-shy‐1013 will be13 used to develop and implement OEM-shy‐Supplier integration data standards data

structures and UQ models Several OEMs and suppliers (Pratt amp Whitney General Electric Rolls-shy‐RoyceTIMET Alcoa13 and RTI International Metals Inc) are working together on the PW-shy‐11 Team

The OEM-shy‐Supplier collaboration tool will enable data13 exchange related to mechanical testingmicrostructural13 characteristics processing and process modeling The13 data13 standards developed by the

PW-shy‐1113 team will provide13 an industry standard for such OEM-shy‐Supplier collaboration The collaboration13 framework would provide a platform to manage OEM-shy‐Supplier13 workflows data and13 informationsecurity and access13 control and the13 ability to perform system administration without the13 need13 for day-shy‐to-shy‐day intervention13 of the IT department The capture of such13 supplier data over the collaboration

framework will enable the OEMs to enhance process and property13 models data analytics and UQ

modeling Three potential technologies will be evaluated13 for the OEM-shy‐Supplier Collaboration13 Frameworkndash GE Forge Vanderbilt Forge and Semantic Web technology and one13 technology will be selected13 for the

next phase of the program

18

13

Semantic Web technology is paradigm shift from highly structured13 databases to13 more flexible

graph databases from single database searches to13 multiple and13 disparate13 database13 searches and from

keyword searches to contextual and meaningful (semantic) searches It can provide flexibility13 of data

structures and enhanced OEM-shy‐Supplier13 interoperability However it requires more work upfront13 for13 implementation as it needs the development of a common vocabulary and relationships

The data13 standards and data13 structures effort will leverage work at national and13 international standards13 organizations such13 as ASM ASTM CEN and13 NIST and13 develop13 standards and vocabularies forproduction13 certification data that can include chemistry microstructure and mechanical properties This effort will also develop means for the capture of data needed13 for the UQ and13 ICME models andbuild13 databases for13 the capture of13 the data from the GE-shy‐10 PW-shy‐9 and RR-shy‐10 MAI programs

The ICME13 models being developed under the GE-shy‐10 PW-shy‐9 and RR-shy‐10 programs involve severalinterconnected models related to processing microstructure and mechanical properties The UQ effort under PW-shy‐11 will involve considering the uncertainty at each13 step13 of the modeling process and flowing it through the interconnected models to evaluate the relationships between13 the variabilities of13 the

processing parameters the microstructure and13 the mechanical performance

ASMrsquos Computational Materials Data Network amp the NIST-shy‐ASM Structural Materials Data

Demonstration ProjectScott D Henry ASM International

ASM International ndash the worldrsquos largest13 association of13 metals-shy‐focused materials engineers scientists technicians educators and students13 ndash has launched13 a series of initiatives under the

ldquoComputational Materials Data (CMD)13 Networkrdquo umbrella The objective13 of the13 CMD Network is to serve13 as center for information collection and dissemination for materials data13 to support integrated

computational materials13 engineering (ICME) and to help realize the goals13 of the US Materials13 Genome

InitiativeThe first major project for the CMD Network is the NIST-shy‐ASM Structural Materials Data

Demonstration Project (SMDDP) a cooperative research project funded by NIST with participation from

the Kent13 State Center13 for13 Materials Informatics and Granta Design Ltd13 The main13 objectives of SMDDP

are13 to

19

13

bull Establish well-shy‐pedigreed13 and13 curated13 demonstration13 datasets for non-shy‐proprietary metallic structural materials13 data over multiple length scales The team has13 chosen to work with 6061

aluminum and the13 related Al-shy‐Mg-shy‐Si system

bull Work with NIST and the materials data community to develop materials data schema and

ontologies for the demonstration13 datasets cognizant of broader interests and13 datasets

bull Develop and carry out a series of test problems that represent relevant use cases for the13 repository

bull Make data open to the materials data community for use in data analytics modeling and

educational purposes

bull Actively engage the materials data community and13 widely disseminate the findings from the

project

bull Develop and implement data13 capture13 and curation procedures that can serve13 as models for otherdata repositories

The basic philosophy behind the project is that often standards13 emerge through trial and error and examples are13 needed to spur this process

The project is using the NIST-shy‐KSU DSpace13 repository for data13 files The13 curated structured

database is being developed13 using GRANTA13 MI software and13 is hosted13 o ASM serversThe project ndashlaunched in late 2013 and planned as an 18-shy‐month program13 ndash is being conducted in

phases The13 first phase13 involved developing starter database13 and gaining experience13 in data13 curation The second phase (currently underway) involves building out more comprehensive data13 sets that will be

suitable for carrying out test problems13 related to processing-shy‐structure-shy‐property relationships in13 Al 6061 The third phase will involve making further refinements in response to testing The complete data13 set will be opened up online to the materials community as a whole

Globus13 Scientific13 Data Services13 Current and FutureBen13 Blaiszik1 Kyle Chard1 Jim Pruyne1 Rachana Anathakrishnan1 Steve Tuecke12 Ian Foster12

1University of Chicago Computation Institute 2Argonne National Laboratory MCS Division

During this talk we discussed a variety of currently available Globus services13 Via these Globusservices developers13 and IT administrators13 can offer simple secure and robust file transfer as13 well as13 identity profile and group management to their user communities13 Globus allows end users to create

and13 manage a unique identity that can13 be linked13 to13 external identities for authentication End13 users can13 also transfer files across wide13 area13 networks faster and more13 easily whatever their location

The talk also covered prototype services related to data13 publication metadata tagging and13 data

search Globus13 publishing capabilities13 are delivered through a hosted service Metadata is13 stored in the

cloud while raw published data and a fully-shy‐formed metadata log are stored on campus institutionaland group resources that13 are managed and operated by campus administrators Published datasets are

20

13

organized13 by communities and13 their member collections Globus users can13 create and13 manage theirown13 communities and13 collections through13 the data publication13 service13 with collection level policies regarding user13 access Additionally metadata from these communities are imported into a centralized

index to allow for robust searching functionality

Semantics13 for Deep13 InteroperabilitySamuel Chance iNovex

Semantic technology is based on mature13 web standards and intended to be13 used as global framework for13 machine-shy‐to-shy‐machine communication Interoperability is enabled even when the more

basic elements of the semantic technology stack are used13 in13 conjunction with a controlled vocabularyKnowledge13 representation is enabled as one13 begins to utilize13 the13 ldquohigherrdquo elements of the13 stack

brief history of the semantic web13 was presented13 and13 started13 with13 a depiction13 of the web13 that most people see today documents and web pages presented to the user13 through a web application It13 then depicted a transition to an intermediate state where documents and data can co-shy‐exist on asemantic13 web riding atop the same infrastructure (http) Even in this13 state the data can13 be directly

interpreted by a machine13 The history concluded with a depiction of data being extracted from adocument to13 become part of the web13 of data

The basics of semantic technology were presented and emphasized that the technology has afoundation based13 o formal logic that enables software13 to understand information and reduce13 the13 need

for13 a human to interpret13 the meaning This section of13 the presentation concluded with a chart13 depicting

how ldquotriplesrdquo link to13 each13 other to13 form a web13 of dataThe final section touched upon adoption of13 and things to consider13 when semantic technologies

are13 being considered as solution for community or enterprise

21

13

Big Data13 and Semantic TechnologiesHarlan Shober RJ Lee Group

The presentation started with a description of entities that13 comprise laboratoryinformatics with the central elements being a federated architecture scientific ldquoBig Datardquo management13 and workflow-shy‐driven Laboratory Information Management13 Systems (LIMS)To aid attendee understanding initial context13 was provided by first13 defining for ldquoBig Datardquo and

Semantic Technologies Context13 was further bolstered through a description of a problem that13 included laboratory testingcharacterization and fabrication processes The challenges for atypical laboratory include the use of many systems and the creation of large amounts of data13 that13 grows daily but13 cannot13 be readily accessed

After the context13 was established the needed functionality was described and consistedof several elements contribute index search analyze and archive While bearing in mind theneeded functionality a string of universal scientific data13 management13 process categories werethen presented with the intent13 of applying the universal description to the materials design anddevelopment13 domain

22

13

Though semantic technologies are applicable over the entire lifecycle of the data13 management13 process three categories were specifically identified as primary candidates fortheir application Data13 Acquisition Data13 Manipulation and Storage and Discovery andAdvisory W3C standards for Resource Description Framework (RDF) were applied to a notionalexample with the intent13 of completing a thread that13 gave the attendee a sense of how thistechnology could be applied

The presentation continued with a description of Postulate a specific solutiondeveloped by RJ Lee Group that13 could be used as part13 of a suite of tools to enable the captureof data13 and the near-shy‐frictionless tagging of data13 with additional observations provenanceinformation or links to additional data Using tools like Postulate repositories could be moreeasily shared with others in a federated manner These technologies would enable thedevelopment13 of indices that13 increase the precision of search results and ease analytics since thedata13 would be ldquoself-shy‐describedrdquo

In the remainder of the presentation Postulate functionality was further described andsome of the challenges presented earlier were addressed For example how do you processthe quantity and variety of data how do you store it how do you find the data13 you care aboutand how can the data13 be presented to you in a meaningful way

Natural13 Language Processing (NLP)Tim Hawes Decisive Analytics

The presentation provided the attendees with a description of NLP13 and how it can be13 used It started with examples13 of how it can be used to mine ldquoBig Datardquo to predict credit card fraud or assess13 user purchase histories for product development NLP is also13 being used13 in13 the field13 of biology and13 appears to have13 enjoyed some13 level of success

Since13 the13 biology and materials communities both publish large13 volume13 of documents (abstracts papers presentations etc) many of13 the NLP approaches developed for13 biology may be

applicable13 to materials The next section of the presentation focused on describing various aspects of NLP tokenization

part-shy‐of-shy‐speech tagging disambiguation parsing entity disambiguation and role13 labeling This was helpful in13 removing the mystery surrounding NLP for those13 who hadnrsquot had previous exposure

Typical products from NLP13 would be useful to the materials development community Examples include providing the gist of a documentrsquos content providing answers to natural13 language questions and knowledge13 construction

Specific NLP13 challenges for the13 materials domain were13 identified inline13 use13 of equations prevalence of tables graphs and13 images (more of a computer vision13 problem) and13 numerically driven13 information13 Additionally the ldquogenrerdquo problem was discussed as13 well as13 a way this might be addressed

using raw data (versus learning data) and13 semi-shy‐supervised learning active learning and distant supervision approaches

23

13

The presentation concluded by reiterating how NLP13 can be used and the value of using it describing some its limitations and13 identifying tool development needed13 for the materials domain

24

13

Community13 and13 Government Priorities in13 Addressing13 Materials Data

The following numbered13 priorities for13 both community and government13 actions were lsquorolled uprsquofrom workshop participantsrsquo inputs and were reviewed at13 the workshop The raw inputs provided by

participants are captured13 under each13 numbered13 priority and provide context to13 the priority and13 often13 more specific actions

Community-shy‐Led13 Priorities1 Develop13 and deploy13 standards13 for data and federatedcollaborative environments-shy‐ Promote13 standards for data13 and collaboration environments-shy‐ Designing and deploying suitable software environments that are capable of facilitating e-shy‐

recording and accelerating (fail often-shy‐fail fast)13 of13 the communityrsquos experience (both successes and

failures)13 in exploring integrated workflows needed to realize the vision of ICME and MGI-shy‐ Data13 standards (program managementtechnical datametadata)-shy‐ Community must start developing a common13 data format data package could13 look like an13

HDF5 file and be shared in a variety of systems -shy‐ Begin13 definition13 of standardized13 infrastructure

-shy‐ Produce13 draft guideline for the metadata including information regarding authorship provenance targethellip

-shy‐ Share13 the13 data13 types being collected with the13 community Include13 nametypeunits of data

-shy‐ Community must adopt a federated13 approach13 -shy‐ allow organizations to autonomously create their13 own13 toolsets but be able to13 readwrite to13 the common13 data package mentioned13 in13 point 1

-shy‐ Good documentation on repositories are important-shy‐ Address networking of repositories (lightweight metadata databaseindex) -shy‐ Digital heterostructure microstructure mapping algorithms for both scanning electron microscopy

(SEM)13 and transmission electron microscopy (TEM) -shy‐ Materials process model (thermodynamics) integration with ICMSE (Integrated Computational

Materials Science and Engineering) data flow

-shy‐ Data sharing protocol across users (government13 OEM13 developers) with appropriate securitycontrol

-shy‐ Work with instrumentequipment OEMs to get auto ingest capabilities and open architectures-shy‐ Work together to define common vocabularies for interoperability-shy‐ Standard tests for ICME13 model data13 standards for microstructure13 data

-shy‐ Standards for ICME13 model data13 storage13 and workflows 2 Communicate the value and need13 for digital materials13 data-shy‐ Brand13 management ndash consistent messaging of goals13 and beliefs -shy‐ Share13 success stories -shy‐ Value (ROI13 Return on Investment) o data capture (globally locally)-shy‐ Give incentive ($$$) for case studies to provide evidence that a shared data repository can be

actually useful beyond the archival purpose

25

13

-shy‐ Share13 ROIs for MII (the13 Materials Innovation Infrastructure)-shy‐ ICME projects due to their iterative nature take longer and are more expensive to realize benefits

in industry implementation13 Multi-shy‐year ICME programs are becoming13 a hard sell to management difficult to13 establish ROI and13 realize benefits claimed

3 Engage the community-shy‐ Write grant proposals-shy‐ Collaboration13 (or e-shy‐collaboration) science that will help us13 identify13 the specific13 incentives13 and

impediments for establishing mutually beneficial highly productive sharing of expertise between13 disciplinary experts in13 materials manufacturing and13 data sciences

-shy‐ Continue to13 work o hard13 problemsmdashdevelop13 open13 source codes discover new descriptors data

mining uncertainty analysis-shy‐ Organize as a community (governmentindustryacademia nationalinternational) -shy‐ Follow data13 recommendations in MGI strategic planmdashworkshops to develop roadmap-shy‐ Attend13 more events like the MGI workshop13 mdash gaining13 lessons learned from other organizations is

extremely valuable

4 ldquoDefine critical datardquo13 to be13 compiled-shy‐ Consensus o what fundamental genomic data is most-shy‐important to compile (thermodynamics13

kinetics13 properties [strength]13 high-shy‐throughput13 DFT) -shy‐ Agree to13 the concept of a ldquouniversal repositoryrdquo for all genomic data that everyone will contribute

to and use (cf everyone maintaining and hosting their13 own databases) this includes issues of13 data

security and sharing etc -shy‐ List of most-shy‐important ICMEMGI13 materials development activities from an industrycommercial13

perspective (alloys composites piezoelectrics etc)-shy‐ Some13 ongoing repository efforts seem redundant We13 should avoid unnecessary competition and

try not13 to step o other peoples toes5 Explore and leverage data solutions developed13 outside materials13 community-shy‐ Semantic web infrastructure13 ndash evaluate13 feasibility of lsquounstructured materials datardquo management

(Technical pathcosttimeframe) -shy‐ Keep communicating outside13 the13 community about solutions -shy‐ Focus on available13 tools open ones when possible

6 Train13 students-shy‐ Sponsor students13 who can think in terms13 of IT and MSE

-shy‐ Training workshops (through professional societies) for training the new generation of materials students13 and researchers13 in the use of readily (already) accessible data science tools13 and cloud

services13 to make13 the13 best use13 of this emerging13 infrastructure13 in enhancing13 their own individual day-shy‐to-shy‐day productivity

7 Establish community13 norms13 for publishing materials13 data-shy‐ Involve publishers by requiring data when publishing Begin by making just the data available first

and then work on standardization panel

26

13

-shy‐ Data associated with publications should be shared along with the publications Not only the

metadata (eg average grain size vs cooling rate) but also the source of the data (eg micrographs used13 to13 generate13 grain size temperaturetime charts13 used to calculate cooling rates etc)

Government-shy‐Led13 Priorities1 Lead13 data strategyapproach13 development-shy‐ Hold meeting like this one in 6-shy‐1213 months with purpose13 of developing roadmap for MII-shy‐ Define a technical roadmap for13 the materials data infrastructure

-shy‐ Governmentrsquos got to decide on a strategy-shy‐ Whatrsquos the critical nucleus eg getting13 CALPHAD off the13 ground

-shy‐ Government must establish an official stance on security and infrastructure-shy‐ List of most-shy‐important ICMEMGI materials development activities from a government perspective

(alloys composites piezoelectrics etc)13 -shy‐ Needs assessment for professional workforce development-shy‐ Work with industry to make sure the research-shy‐to-shy‐engineering13 gap is filled What is the practical13 use

of the projects -shy‐ Provide13 the13 materials community organizing function

-shy‐ Business case for productivity improvement with13 data readily available

-shy‐ Facilitator as honest broker of capability awareness and data13 sharing

-shy‐ Interdisciplinary coordinated effort13 to avoidminimize duplication and redundancy

2 Facilitate data deposition (policy amp technology)-shy‐ Setting expectations and guidelines for best practices and standards where13 relevant -shy‐ Invest in tool13 making data management as easy as possible-shy‐ Promote standards developmentcommon data structuresshare knowledge representations -shy‐ Learn about the WBC Semantic Web capabilities and limitations -shy‐ Government should foster community development for both data package refinement and creating

standard approach to creating an enterprise toolset -shy‐ Provide13 motivation for standard data13 collection ensure13 it includes pedigreeprovenancemeta-shy‐data

-shy‐ Facilitator for materials data13 standards (good and trustworthy data13 vs bad data) -shy‐ Facilitator for forming consortium for data13 mining data13 warehousing and sharing

-shy‐ Facilitate13 industry standards for ICME13 model validation and maturity levels-shy‐ Facilitate13 standards for model-shy‐based13 process and13 supplier qualification

3 Incentivize data deposition-shy‐ Provide13 motivation for13 standard data collection ensure it13 includes13 pedigreeprovenancemeta-shy‐data-shy‐ Push ldquodata13 management planrdquo requirements to ldquodata13 depositrdquo requirements -shy‐ Make data sharing a requirement -shy‐ Contractual requirements that preserve data for future use

-shy‐ Demand data submission as a deliverable that will be in the contract before paying the funds -shy‐ Require that any data generated13 by federal grants is provided13 to13 the government (or a designated13

repository)13 within 3 years after completionmdashstart with basic13 research

-shy‐ Demonstrate the return on investments in the materials data infrastructure

27

13

4 Support13 data sharing13 and deposition13 (provide resources)-shy‐ Increase of MGI13 funds dedicated to MII13 development to 10-shy‐-shy‐repositories amp collab platforms -shy‐ Commit to13 long-shy‐term support for data repositories amp collaborative platforms-shy‐ Funding for development collection of fundamental genomic data

-shy‐ Support thermodynamicskinetics database development for light-shy‐weight alloy systems-shy‐ Funding for universal data13 repository (eg at NIST) ndash creation and maintenance

-shy‐ Fund or establish data13 repositories with commonuniform data13 collection format -shy‐ Createidentify secure data sharing site (like AMRDEC)13 for13 government13 contractorsmdashnot a

repository

-shy‐ AFRL take lead13 o making a database that can13 be13 shared

-shy‐ Infrastructure for materials related semantic web applications5 Support13 data creation-shy‐ Support experimental measurement effort to provide13 quality data for13 database development -shy‐ Support more13 high-shy‐throughput13 (HT)13 studies13 for applied programs13 to populate data Current HT

approaches support are13 limited to too fundamental single-shy‐phase DFT calculations 6 Enhance data management13 competency-shy‐ Government needs to increase in-shy‐house expertise in13 these areas or alternatively work closer with13

groups like NIST We cannot allow other organizations to define our needs for us7 Provide quality datasets13 for model13 developmentvalidation-shy‐ Need for open model benchmark cases with data to be available to the community

28

13

Attendees

John Allison University of MichiganJohn Beatty Army Research Laboratory

Ben13 Blaiszik University of ChicagoANL Carrie Campbell NIST

Sam Chance13 InovexJennifer13 Fielding Air Force Research LabMarco Fornari Central Michigan

Tim Hawes Decisive Analytics Corp

Lou Hector General MotorsAndrea Helbach Air Force Research LabKris Hill ITI-shy‐GlobalMike Glavicic Rolls Royce

Ro Gorham NCDMM

Dennis Griffin NASA

Mike Groeber Air Force Research LabScott Henry ASM InternationalMatt Jacobsen Air Force Research LabWill Joost Dept of Energy-shy‐EERE

Surya13 Kalidindi Georgia Tech

Kevin Kendig Air Force Research LabBen13 Leever Air Force Research LabLara Liou GE Aviation

Kenny Lipkowitz Office of Naval Research

Bill Mullins Office of Naval Research

Benji Maruyama Air Force Research LabRajiv Naik Pratt amp WhitneyJim Neumann Honeywell Ruth13 Pachter Air Force Research LabClare Paul Air Force Research LabAmra Peles United13 Technologies Research13 CenterAndy Powell GE Aviation

Brian13 Puchala University MichiganAjit Roy Air Force Research LabAyman13 Salem Materials ResourcesTom Searles Materials Data Management Jason Sebastian QuestekRichard13 Serwecki NASA

Dongwon Shin Oak Ridge13 National Labo

29

13

Harlan Shober RJ LeeJeff13 Sheehy NASA

Jeff13 Simmons Air Force Research LabVeera Sundararaghavan University of MichiganSesh Tamirisakandala RTI Glenn Tarcea University of MichiganTJ Turner Air Force Research LabVasisht Venkatesh Pratt amp WhitneyChuck Ward Air Force Research LabJim Warren Natl Institute of Standards amp TechJeff13 Williams GE Aviation

Terry Wong Rockedyne

Wayne Ziegler Army Research13 Laboratory

Jake Zindel Ford Motor Company

30

Page 17: Materials Genome Initiative: materials data · universallyaccepted best practices. Under the MGI, Federal agencies have funded research and development designedtoexplore, develop,

13

Oak Ridge National Laboratory (ORNL) teamed with Chrysler and Nemak13 to develop next-shy‐generation high-shy‐performance cast aluminum alloys that have13 improved castability high-shy‐temperature

strength and fatigue13 performance At this point the13 data management practice is limited13 to13 analytical data spreadsheets other than13 the CALPHAD computational thermodynamic databases and13 proprietary

property databases for casting simulation As the project goes on more robust data schema will bedeveloped13 to13 store various types of data populated13 from the experiments and13 calculations

Besides the data structure design the speaker discussed13 the necessity of a new approach13 to13 identifying key descriptors from the large dataset that can guide the design of new materials An alloydesign13 requires the balance of extremely complex relationships among a large number of descriptors Some13 examples of these13 descriptors include13 phase fraction13 of key intermetallics lattice parameters ofthe matrix and diffusion13 kinetics13 of solute atoms However optimizing these quantities relies on good

physical metallurgy intuition13 coupled13 with13 clever experiments dogged13 sleuthing and13 a healthy dose ofluck13 Such descriptor-shy‐based predictions of materials properties date back to13 the beginning13 of modern

solid-shy‐state chemistry and physics As13 shown in the Darken-shy‐Gurry map to predict the solubility ofelements in Tantalum13 intuitive descriptors13 such as electronegativity and radius13 could be used formaterials design However as the chemistry and physics13 of materials become more complex intuition-shy‐based13 search13 may not work anymore Thus the use of more advanced13 statistical analysis and13 datamining approach might be required to accelerate the design of complex multi-shy‐componentmulti-shy‐phase

alloy design13

Data Related to Cast Aluminum

Jake Zindel Ford Motor Company

The main focus of our research is the development of CAE (Computer-shy‐aided Engineering) tools to support13 two significant13 activities in the company component13 design (product13 development)13 and

manufacturing engineering (high-shy‐volume production) A key13 part of our research is conducting13 experiments to collect data13 for the13 development and validation of13 the CAE tools

The data13 we collect consists of mechanical properties physical properties microstructuralimages and manufacturing process parameters that form the13 pedigree13 of the13 material This information

can be in the form of Excel files proprietary-shy‐software storage files screen shots and written reports Our immediate need is to deal with this large amount of existing information collected over

many years on a ldquolocalrdquo level The first step is to organize many independent collections of informationin all13 their forms13 The next issue is the time required to enter all13 this information into the database13 Maintenance may also require a significant amount of effort Lastly there will also13 be a need13 to13 manipulate the data contained in the database using Excel Mini-shy‐tab MATLAB etc Therefore links or13 easy exportation of data13 to these13 types of applications will be13 necessary

The next level would involve dealing with information that has been created or collected byothers From an13 OEM (Original Equipment13 Manufacturer)13 perspective this would13 involve convincing

suppliers13 to share processing information that they may consider proprietary In addition to suppliersuniversities and13 national laboratories are also13 sources of data Data from all the13 sources will require13 some sort of protocol to determine its13 quality and resolve inconsistencies

13

13

In the end13 it seems likely that considerable budget and time will be required to implement comprehensive informationdatabases which13 could be the largest hurdle13 of them all

QuesTekrsquos MGIICME Approach to Alloy Optimization and Design (Including Cast Iron) Jason Sebastian QuesTek Innovations LLC

Integrated Computational Materials Engineering (ICME) methods involve the holistic application13 of different computational models across various length13 scales to13 the design development and13 rapid13 qualification13 of advanced13 materials Using itsMaterials by Design technologies QuesTek has successfully applied these ICME methods to13 the design development13 and full qualification of several advanced alloys (including aerospace13 structural steels aerospace13 gear steels advanced castable13 titanium alloys high-shy‐strength corrosion-shy‐resistant13 aluminum alloys13 and high-shy‐performance cast iron)

QuesTekrsquos ICME methods benefit tremendously from the activities of the US Materials GenomeInitiative (MGI)13 Recognizing that the ldquo10 to13 2 yearsrdquo that13 it takes for alloy design and development ishindering the rapid13 deployment of higher-shy‐performance materials13 the MGI is focused on ldquocutting in halfrdquo the time and cost13 to design and deploy novel advanced materials needed to enhance and sustain UScompetitiveness

QuesTek also has ongoing13 activities on the13 application of ICME methods to the13 design and

development of materials for the energy sector including advanced13 materials that enable vehicle

lightweighting13 Of particular note is an ongoing effort with a major diesel13 engine manufacturer on the13 development of high-shy‐performance cast iron13 alloys for engine block applications QuesTekrsquos efforts have

made extensive use of various MGI-shy‐type tools and databases including 1)13 density functional theory

methods for high-shy‐throughput13 screening of cast iron inoculant stabilities (structure lattice parameterstability of various13 oxidesulfide phases) 2)13 liquid iron mobility databases (to help accurately modelsolidification phenomena) and 3) thermodynamic13 databases13 (to examine nano-shy‐precipitate stability)

Moving forward to continue to accelerate the materials development cycle efforts shouldremain focused on the development13 of13 the fundamental databases (thermodynamic kinetic etc)13 that13 enable13 ICME models There13 should also be13 increased focus o efforts related13 to13 the integration13 of the

various computational databases models and tools (the ldquoIrdquo13 in ICME)

Integrated Computational13 Materials Engineering (ICME) of Generation Three Advanced High Strength

Steels Louis G Hector Jr General Motors RampD Center

While traditional material development methods often involve laborious trial-shy‐and-shy‐error testing Integrated Computational13 Materials Engineering (ICME) which falls under the umbrella of the Materials Genome Initiative is offering a means13 for materials13 development under a more compressed time scale Advanced13 high13 strength13 steels (AHSS) which13 contain13 multiple phases (eg ferrite bainite pearlite austenite martensite) and may exhibit13 phase transformation with strain are fertile ground for13 ICME However prediction of macro-shy‐scale constitutive behavior based upon the multi-shy‐scale physical chemical and mechanical phenomena13 in these13 complex microstructures is formidable challenge for ICMEUnprecedented collaboration between universities industry and government labs will be required to

14

13

address AHSS13 development archiving of data and the13 implementation of these13 materials into products that13 benefit13 the American consumer This presentation began with a brief overview of a newDOEUSAMP (US Automotive Materials Partnership)-shy‐funded ICME program aimed at13 the development13 of Generation13 Three (Gen13 3) AHSS The lack of commercial Gen13 3 steels that meet DOE targets is offering13 significant challenges13 to this13 new ICME program The program is13 meeting these challenges13 following four thrusts (1)13 making Gen 3 steels to meet DOE targets (2) passing experimental measurements of key

properties to13 an13 ICME model that produces microstructure-shy‐based13 constitutive models (3) materialevaluation in forming13 and vehicle13 performance13 simulations (4) archiving13 all relevant data13 for easy use13 by

others in13 the program Although13 the Gen13 3 AHSS ICME program is ldquomission13 orientedrdquo it demonstrates an important role for universities in providing the necessary foundation in fundamental materialsscience and engineering

Integrated Computational13 Methods for Composites Materials (ICM2)13 Lessons Learned from

Integration Planning and Feasibility Demonstrations Lara Liou GE Aviation

General Electric Aviation and Lockheed Martin Aeronautics have teamed through an Air Force

Research13 Lab13 program to13 conduct a demonstration13 of an13 Integrated13 Computational Materials Engineering digital framework that links material processing property and13 structural relationships to13 account for processability manufacturability and system performance13 with the13 goal of demonstrating

that13 the usage of13 integrated models can contribute to future airframe and engine designs with dramatic reductions in development13 time and cost Both engine and airframe Foundational Engineering Problems will be demonstrated through a common digital framework to be developed and validated onincreasingly complex articles13 The modeling tools that13 cover composite process models localized13 mechanical behavior models and global design models (Convergent Manufacturing TechnologyrsquosCOMPRO CCA Autodeskrsquos Autodesk Simulation13 Composite Analysis and13 an13 Abaqus based13 University ofMichigan micromechanics model) are centrally linked via a commercially available model integration

interface (Phoenix Integrationrsquos ModelCenterreg)13 Detailed plans of what data will13 be passed among thevarious models as well as how it will be passed have been completed Creation of these detailed13 process maps and13 simple feasibility demonstrations have raised13 inherent technical and13 logistical challenges13 to integrating commercially13 available multi-shy‐scale models Robust model integration methods13 to address these challenges were13 discussed

The exercise of drawing up detailed process maps for what to integrate and how to integrate13 the selected tools and models resulted in key early lessons learned and identification of13 tools critical tolong-shy‐term robust and sustained application of13 Integrated Computational Methods for Composites Materials (ICM2)13 The key lessons learned include

bull Individual13 models must be relatively mature and validated before an ICM2 problem can be

confidently13 defined bull F analysis-shy‐based13 tools within13 the same digital thread should utilize13 the13 same13 FE solver bull The team ideally13 should consist of at least one expert for each13 tool and13 its underlying model as

well as an expert in the digital thread and integration interface tools and programming

15

13

bull Mesh interpolation between13 finite element analysis based13 tools will likely be required and13 acommercially13 available tool for this13 type of composite mesh interpolation is13 key13 for robust and

sustained application of ICM2

Residual Stress Engineering of Rotating Nickel Components Terry Wong13 Aerojet Rocketdyne

The goal for the PW-shy‐8 project13 The Nickel Base Superalloy Residual Stress Foundational Engineering Problem program aims to13 develop13 a multi-shy‐disciplinary analysis approach13 to13 predict and13 incorporate bulk residual13 stress in the13 design of component This will allow for design of part that is optimized for a particular application or design intent (such13 as weight performance or reduction13 ofscrap) To accomplish this13 goal the PW-shy‐813 team has two major tasks 1) the13 development of an13 infrastructure and associated tools needed to predict and incorporate bulk residual13 stress in the designstages13 of a part and 2) a demonstration task that will validate the infrastructures13 and tools13 developed

The Data13 Management task is the foundational element13 of13 the larger13 infrastructure

development task The goal for the data management work is to13 develop13 an13 infrastructure that allowsthe various team members in PW-shy‐813 to share13 data13 and to store13 data13 in database This not only ensures that data generated13 by this program is never lost but also13 allows for data to13 be transferred13 in13 a secure

manner In order to meet the goal of a system13 that securely stores and shares data three major tasks must be accomplished 1) Validation that there13 is secure13 method to transfer data13 between team

members and that the method of storing data is also secure 2)13 development of13 a database schema and

3) development of means to easily upload data13 into the13 database13 and download data13 from the13 database

Of the three the most difficult task was to create a system in which the transfer and storage ofdata was secure Various proposals were made to13 ensure the security of the data but many of them

were rejected not because the proposal was actually unsecure but because it was not acceptable byeach of the13 team memberrsquos IT security policies The13 process to get each team membersrsquo IT department to approve the security of13 the data transfer13 and storage took more effort13 than anticipated In the endthe team members along with their13 respective IT departments approved the use of13 TRUcentrix as a tool to transfer13 data and the use of13 an ASM International13 hosted13 database to13 store data

The development of database schema13 is necessary step in creating the database to store all the data generated (via physical tests as well as data generated by models) A big help in creating the

database schema was to13 use the schema generated13 by Dr Steven13 Arnold13 from NASArsquos Glenn13 Research13 Center Using NASArsquos schema eliminated13 the need13 to13 start the database build13 from scratch13 and13 allowed13 us to13 focus o the specific needs of storing residual stress data

The last task involves the development of tools that make the input of data13 easy The moredifficult it is to13 enter data into13 a database the more likely data13 will not be13 entered Excel-shy‐based13 templates are being developed in order13 to make data entry as easy as possible

The two big lessons learned from PW-shy‐813 in relationship to data13 management are

bull It is essential13 to work with each13 companiesrsquo IT and IT Security people to develop a system thatsatisfies13 the security requirements of each company

16

13

bull Database schema development should not be reinvented but begin with already established the13 schema

Perspectives on MAI ProgramsDr Mike Glavicic Rolls Royce

The current formulation of MAI (AFRLrsquos Metals Affordability Initiative) programs are structured13 independently from one another in that a standard methodology does not exist that describes whatdata is to13 be archived13 at the conclusion13 of a program Nor does there exist a standard13 protocol of howto archive things on a server13 or some other computer resource at the conclusion13 of a program As aresult the current13 practices of13 MAI programs

bull Do not address IP (intellectual property)13 and licensing issues in systematic fashion which leadsto issues in the commercialization of ICME software that is generated13 under a program

bull Do not have a formalized requirement of how data is to be packaged and stored at the end of aprogram Hence even13 in13 the best of cases in13 which13 all of the data is stored the data is archived13 in13 simple folder13 fashion that13 is not13 searchable in any manner13 possible Moreover even if13 all of13 the

data is present it is extremely cumbersome to13 untangle the data for future use using a file folder strategy

Hence at the end of the program both the government and13 MAI team are13 left to manage13 the13 data and13 results in13 an13 unregimented13 fashion

The lack of regimented protocol of how to package programrsquos13 results13 (raw data and reports) makes it extremely difficult for new government programs to leverage the data from previous programs Case in13 point the RR-shy‐1013 program was developed with the13 singular goal to integrate13 the13 models developed13 in13 the LAD-shy‐213 program into commercial software13 platform From the13 outset of the13 program data models and13 other information13 that was required to attained the13 goals of the13 RR-shy‐1013 program were13 either difficult to track down lost or not provided to the13 RR-shy‐1013 team due13 to contractual disputes As aresult funding that13 was earmarked for13 specific technical work at13 the conception of13 the13 program had to

be diverted13 for the recreation13 of data that existed13 in13 a previous program and13 the progress of the

program was stunted13 by this unforeseen13 burden In13 spite of the RR-shy‐1013 programrsquos13 ability to overcome

these issues and become a successful program it13 should be noted that13 the success of13 other13 programs that13 will encounter13 similar13 problems will likely not13 apply as RR-shy‐1013 was very fortunate13 to have13 the13 means to overcome these serious issues that13 were a result13 of13 the lack of13 a data management13 plan

Furthermore if the13 MGI initiative13 is to progressively develop materials models that are13 leveraged and improved in future government funded programs a policy on how to store and share thedata from disconnected13 programs is required The PW-shy‐1113 program is the first13 step in this process as it13 will have the data from three MAI programs (RR-shy‐10 GE-shy‐10 PW-shy‐9) uploaded into database13 that will be13 searchable and will enable the reuse of the data collected in these programs However even if this13 program is a success13 there will undoubtedly13 be other features13 that will be required in the platform

developed13 and13 hence future modification13 will be required Most importantly there does not currently13 exist server that has been set up and funded for the13 archiving13 of all13 future programs with a protocol13

17

13

13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

that13 must13 be adhered to at13 the conclusion of13 a funded program Without13 such a protocol and the

existence13 of server to be13 used in all future13 programs the13 goals set forth by MGI will be13 unnecessarily

difficult and13 more expensive to attain

MAI Data Informatics Data Standards Modeling amp UQDr Rajiv Naik Pratt amp Whitney

The MAI Data13 Informatics Data13 Standards Modeling amp Uncertainty Quantification (UQ) (PW-shy‐11)13 effort13 will enable the pervasive implementation of ICME by developing (i) a web-shy‐based13 OEM-shy‐Suppliercollaboration framework13 for production13 certification13 and13 process modeling data (ii) data standards and

protocols for interoperability between13 OEMs and13 Suppliers and13 (iii)13 UQ methods to capture variability in

process microstructure and property data Models and data13 from earlier MAI programs such as GE-shy‐10 PW-shy‐913 and RR-shy‐1013 will be13 used to develop and implement OEM-shy‐Supplier integration data standards data

structures and UQ models Several OEMs and suppliers (Pratt amp Whitney General Electric Rolls-shy‐RoyceTIMET Alcoa13 and RTI International Metals Inc) are working together on the PW-shy‐11 Team

The OEM-shy‐Supplier collaboration tool will enable data13 exchange related to mechanical testingmicrostructural13 characteristics processing and process modeling The13 data13 standards developed by the

PW-shy‐1113 team will provide13 an industry standard for such OEM-shy‐Supplier collaboration The collaboration13 framework would provide a platform to manage OEM-shy‐Supplier13 workflows data and13 informationsecurity and access13 control and the13 ability to perform system administration without the13 need13 for day-shy‐to-shy‐day intervention13 of the IT department The capture of such13 supplier data over the collaboration

framework will enable the OEMs to enhance process and property13 models data analytics and UQ

modeling Three potential technologies will be evaluated13 for the OEM-shy‐Supplier Collaboration13 Frameworkndash GE Forge Vanderbilt Forge and Semantic Web technology and one13 technology will be selected13 for the

next phase of the program

18

13

Semantic Web technology is paradigm shift from highly structured13 databases to13 more flexible

graph databases from single database searches to13 multiple and13 disparate13 database13 searches and from

keyword searches to contextual and meaningful (semantic) searches It can provide flexibility13 of data

structures and enhanced OEM-shy‐Supplier13 interoperability However it requires more work upfront13 for13 implementation as it needs the development of a common vocabulary and relationships

The data13 standards and data13 structures effort will leverage work at national and13 international standards13 organizations such13 as ASM ASTM CEN and13 NIST and13 develop13 standards and vocabularies forproduction13 certification data that can include chemistry microstructure and mechanical properties This effort will also develop means for the capture of data needed13 for the UQ and13 ICME models andbuild13 databases for13 the capture of13 the data from the GE-shy‐10 PW-shy‐9 and RR-shy‐10 MAI programs

The ICME13 models being developed under the GE-shy‐10 PW-shy‐9 and RR-shy‐10 programs involve severalinterconnected models related to processing microstructure and mechanical properties The UQ effort under PW-shy‐11 will involve considering the uncertainty at each13 step13 of the modeling process and flowing it through the interconnected models to evaluate the relationships between13 the variabilities of13 the

processing parameters the microstructure and13 the mechanical performance

ASMrsquos Computational Materials Data Network amp the NIST-shy‐ASM Structural Materials Data

Demonstration ProjectScott D Henry ASM International

ASM International ndash the worldrsquos largest13 association of13 metals-shy‐focused materials engineers scientists technicians educators and students13 ndash has launched13 a series of initiatives under the

ldquoComputational Materials Data (CMD)13 Networkrdquo umbrella The objective13 of the13 CMD Network is to serve13 as center for information collection and dissemination for materials data13 to support integrated

computational materials13 engineering (ICME) and to help realize the goals13 of the US Materials13 Genome

InitiativeThe first major project for the CMD Network is the NIST-shy‐ASM Structural Materials Data

Demonstration Project (SMDDP) a cooperative research project funded by NIST with participation from

the Kent13 State Center13 for13 Materials Informatics and Granta Design Ltd13 The main13 objectives of SMDDP

are13 to

19

13

bull Establish well-shy‐pedigreed13 and13 curated13 demonstration13 datasets for non-shy‐proprietary metallic structural materials13 data over multiple length scales The team has13 chosen to work with 6061

aluminum and the13 related Al-shy‐Mg-shy‐Si system

bull Work with NIST and the materials data community to develop materials data schema and

ontologies for the demonstration13 datasets cognizant of broader interests and13 datasets

bull Develop and carry out a series of test problems that represent relevant use cases for the13 repository

bull Make data open to the materials data community for use in data analytics modeling and

educational purposes

bull Actively engage the materials data community and13 widely disseminate the findings from the

project

bull Develop and implement data13 capture13 and curation procedures that can serve13 as models for otherdata repositories

The basic philosophy behind the project is that often standards13 emerge through trial and error and examples are13 needed to spur this process

The project is using the NIST-shy‐KSU DSpace13 repository for data13 files The13 curated structured

database is being developed13 using GRANTA13 MI software and13 is hosted13 o ASM serversThe project ndashlaunched in late 2013 and planned as an 18-shy‐month program13 ndash is being conducted in

phases The13 first phase13 involved developing starter database13 and gaining experience13 in data13 curation The second phase (currently underway) involves building out more comprehensive data13 sets that will be

suitable for carrying out test problems13 related to processing-shy‐structure-shy‐property relationships in13 Al 6061 The third phase will involve making further refinements in response to testing The complete data13 set will be opened up online to the materials community as a whole

Globus13 Scientific13 Data Services13 Current and FutureBen13 Blaiszik1 Kyle Chard1 Jim Pruyne1 Rachana Anathakrishnan1 Steve Tuecke12 Ian Foster12

1University of Chicago Computation Institute 2Argonne National Laboratory MCS Division

During this talk we discussed a variety of currently available Globus services13 Via these Globusservices developers13 and IT administrators13 can offer simple secure and robust file transfer as13 well as13 identity profile and group management to their user communities13 Globus allows end users to create

and13 manage a unique identity that can13 be linked13 to13 external identities for authentication End13 users can13 also transfer files across wide13 area13 networks faster and more13 easily whatever their location

The talk also covered prototype services related to data13 publication metadata tagging and13 data

search Globus13 publishing capabilities13 are delivered through a hosted service Metadata is13 stored in the

cloud while raw published data and a fully-shy‐formed metadata log are stored on campus institutionaland group resources that13 are managed and operated by campus administrators Published datasets are

20

13

organized13 by communities and13 their member collections Globus users can13 create and13 manage theirown13 communities and13 collections through13 the data publication13 service13 with collection level policies regarding user13 access Additionally metadata from these communities are imported into a centralized

index to allow for robust searching functionality

Semantics13 for Deep13 InteroperabilitySamuel Chance iNovex

Semantic technology is based on mature13 web standards and intended to be13 used as global framework for13 machine-shy‐to-shy‐machine communication Interoperability is enabled even when the more

basic elements of the semantic technology stack are used13 in13 conjunction with a controlled vocabularyKnowledge13 representation is enabled as one13 begins to utilize13 the13 ldquohigherrdquo elements of the13 stack

brief history of the semantic web13 was presented13 and13 started13 with13 a depiction13 of the web13 that most people see today documents and web pages presented to the user13 through a web application It13 then depicted a transition to an intermediate state where documents and data can co-shy‐exist on asemantic13 web riding atop the same infrastructure (http) Even in this13 state the data can13 be directly

interpreted by a machine13 The history concluded with a depiction of data being extracted from adocument to13 become part of the web13 of data

The basics of semantic technology were presented and emphasized that the technology has afoundation based13 o formal logic that enables software13 to understand information and reduce13 the13 need

for13 a human to interpret13 the meaning This section of13 the presentation concluded with a chart13 depicting

how ldquotriplesrdquo link to13 each13 other to13 form a web13 of dataThe final section touched upon adoption of13 and things to consider13 when semantic technologies

are13 being considered as solution for community or enterprise

21

13

Big Data13 and Semantic TechnologiesHarlan Shober RJ Lee Group

The presentation started with a description of entities that13 comprise laboratoryinformatics with the central elements being a federated architecture scientific ldquoBig Datardquo management13 and workflow-shy‐driven Laboratory Information Management13 Systems (LIMS)To aid attendee understanding initial context13 was provided by first13 defining for ldquoBig Datardquo and

Semantic Technologies Context13 was further bolstered through a description of a problem that13 included laboratory testingcharacterization and fabrication processes The challenges for atypical laboratory include the use of many systems and the creation of large amounts of data13 that13 grows daily but13 cannot13 be readily accessed

After the context13 was established the needed functionality was described and consistedof several elements contribute index search analyze and archive While bearing in mind theneeded functionality a string of universal scientific data13 management13 process categories werethen presented with the intent13 of applying the universal description to the materials design anddevelopment13 domain

22

13

Though semantic technologies are applicable over the entire lifecycle of the data13 management13 process three categories were specifically identified as primary candidates fortheir application Data13 Acquisition Data13 Manipulation and Storage and Discovery andAdvisory W3C standards for Resource Description Framework (RDF) were applied to a notionalexample with the intent13 of completing a thread that13 gave the attendee a sense of how thistechnology could be applied

The presentation continued with a description of Postulate a specific solutiondeveloped by RJ Lee Group that13 could be used as part13 of a suite of tools to enable the captureof data13 and the near-shy‐frictionless tagging of data13 with additional observations provenanceinformation or links to additional data Using tools like Postulate repositories could be moreeasily shared with others in a federated manner These technologies would enable thedevelopment13 of indices that13 increase the precision of search results and ease analytics since thedata13 would be ldquoself-shy‐describedrdquo

In the remainder of the presentation Postulate functionality was further described andsome of the challenges presented earlier were addressed For example how do you processthe quantity and variety of data how do you store it how do you find the data13 you care aboutand how can the data13 be presented to you in a meaningful way

Natural13 Language Processing (NLP)Tim Hawes Decisive Analytics

The presentation provided the attendees with a description of NLP13 and how it can be13 used It started with examples13 of how it can be used to mine ldquoBig Datardquo to predict credit card fraud or assess13 user purchase histories for product development NLP is also13 being used13 in13 the field13 of biology and13 appears to have13 enjoyed some13 level of success

Since13 the13 biology and materials communities both publish large13 volume13 of documents (abstracts papers presentations etc) many of13 the NLP approaches developed for13 biology may be

applicable13 to materials The next section of the presentation focused on describing various aspects of NLP tokenization

part-shy‐of-shy‐speech tagging disambiguation parsing entity disambiguation and role13 labeling This was helpful in13 removing the mystery surrounding NLP for those13 who hadnrsquot had previous exposure

Typical products from NLP13 would be useful to the materials development community Examples include providing the gist of a documentrsquos content providing answers to natural13 language questions and knowledge13 construction

Specific NLP13 challenges for the13 materials domain were13 identified inline13 use13 of equations prevalence of tables graphs and13 images (more of a computer vision13 problem) and13 numerically driven13 information13 Additionally the ldquogenrerdquo problem was discussed as13 well as13 a way this might be addressed

using raw data (versus learning data) and13 semi-shy‐supervised learning active learning and distant supervision approaches

23

13

The presentation concluded by reiterating how NLP13 can be used and the value of using it describing some its limitations and13 identifying tool development needed13 for the materials domain

24

13

Community13 and13 Government Priorities in13 Addressing13 Materials Data

The following numbered13 priorities for13 both community and government13 actions were lsquorolled uprsquofrom workshop participantsrsquo inputs and were reviewed at13 the workshop The raw inputs provided by

participants are captured13 under each13 numbered13 priority and provide context to13 the priority and13 often13 more specific actions

Community-shy‐Led13 Priorities1 Develop13 and deploy13 standards13 for data and federatedcollaborative environments-shy‐ Promote13 standards for data13 and collaboration environments-shy‐ Designing and deploying suitable software environments that are capable of facilitating e-shy‐

recording and accelerating (fail often-shy‐fail fast)13 of13 the communityrsquos experience (both successes and

failures)13 in exploring integrated workflows needed to realize the vision of ICME and MGI-shy‐ Data13 standards (program managementtechnical datametadata)-shy‐ Community must start developing a common13 data format data package could13 look like an13

HDF5 file and be shared in a variety of systems -shy‐ Begin13 definition13 of standardized13 infrastructure

-shy‐ Produce13 draft guideline for the metadata including information regarding authorship provenance targethellip

-shy‐ Share13 the13 data13 types being collected with the13 community Include13 nametypeunits of data

-shy‐ Community must adopt a federated13 approach13 -shy‐ allow organizations to autonomously create their13 own13 toolsets but be able to13 readwrite to13 the common13 data package mentioned13 in13 point 1

-shy‐ Good documentation on repositories are important-shy‐ Address networking of repositories (lightweight metadata databaseindex) -shy‐ Digital heterostructure microstructure mapping algorithms for both scanning electron microscopy

(SEM)13 and transmission electron microscopy (TEM) -shy‐ Materials process model (thermodynamics) integration with ICMSE (Integrated Computational

Materials Science and Engineering) data flow

-shy‐ Data sharing protocol across users (government13 OEM13 developers) with appropriate securitycontrol

-shy‐ Work with instrumentequipment OEMs to get auto ingest capabilities and open architectures-shy‐ Work together to define common vocabularies for interoperability-shy‐ Standard tests for ICME13 model data13 standards for microstructure13 data

-shy‐ Standards for ICME13 model data13 storage13 and workflows 2 Communicate the value and need13 for digital materials13 data-shy‐ Brand13 management ndash consistent messaging of goals13 and beliefs -shy‐ Share13 success stories -shy‐ Value (ROI13 Return on Investment) o data capture (globally locally)-shy‐ Give incentive ($$$) for case studies to provide evidence that a shared data repository can be

actually useful beyond the archival purpose

25

13

-shy‐ Share13 ROIs for MII (the13 Materials Innovation Infrastructure)-shy‐ ICME projects due to their iterative nature take longer and are more expensive to realize benefits

in industry implementation13 Multi-shy‐year ICME programs are becoming13 a hard sell to management difficult to13 establish ROI and13 realize benefits claimed

3 Engage the community-shy‐ Write grant proposals-shy‐ Collaboration13 (or e-shy‐collaboration) science that will help us13 identify13 the specific13 incentives13 and

impediments for establishing mutually beneficial highly productive sharing of expertise between13 disciplinary experts in13 materials manufacturing and13 data sciences

-shy‐ Continue to13 work o hard13 problemsmdashdevelop13 open13 source codes discover new descriptors data

mining uncertainty analysis-shy‐ Organize as a community (governmentindustryacademia nationalinternational) -shy‐ Follow data13 recommendations in MGI strategic planmdashworkshops to develop roadmap-shy‐ Attend13 more events like the MGI workshop13 mdash gaining13 lessons learned from other organizations is

extremely valuable

4 ldquoDefine critical datardquo13 to be13 compiled-shy‐ Consensus o what fundamental genomic data is most-shy‐important to compile (thermodynamics13

kinetics13 properties [strength]13 high-shy‐throughput13 DFT) -shy‐ Agree to13 the concept of a ldquouniversal repositoryrdquo for all genomic data that everyone will contribute

to and use (cf everyone maintaining and hosting their13 own databases) this includes issues of13 data

security and sharing etc -shy‐ List of most-shy‐important ICMEMGI13 materials development activities from an industrycommercial13

perspective (alloys composites piezoelectrics etc)-shy‐ Some13 ongoing repository efforts seem redundant We13 should avoid unnecessary competition and

try not13 to step o other peoples toes5 Explore and leverage data solutions developed13 outside materials13 community-shy‐ Semantic web infrastructure13 ndash evaluate13 feasibility of lsquounstructured materials datardquo management

(Technical pathcosttimeframe) -shy‐ Keep communicating outside13 the13 community about solutions -shy‐ Focus on available13 tools open ones when possible

6 Train13 students-shy‐ Sponsor students13 who can think in terms13 of IT and MSE

-shy‐ Training workshops (through professional societies) for training the new generation of materials students13 and researchers13 in the use of readily (already) accessible data science tools13 and cloud

services13 to make13 the13 best use13 of this emerging13 infrastructure13 in enhancing13 their own individual day-shy‐to-shy‐day productivity

7 Establish community13 norms13 for publishing materials13 data-shy‐ Involve publishers by requiring data when publishing Begin by making just the data available first

and then work on standardization panel

26

13

-shy‐ Data associated with publications should be shared along with the publications Not only the

metadata (eg average grain size vs cooling rate) but also the source of the data (eg micrographs used13 to13 generate13 grain size temperaturetime charts13 used to calculate cooling rates etc)

Government-shy‐Led13 Priorities1 Lead13 data strategyapproach13 development-shy‐ Hold meeting like this one in 6-shy‐1213 months with purpose13 of developing roadmap for MII-shy‐ Define a technical roadmap for13 the materials data infrastructure

-shy‐ Governmentrsquos got to decide on a strategy-shy‐ Whatrsquos the critical nucleus eg getting13 CALPHAD off the13 ground

-shy‐ Government must establish an official stance on security and infrastructure-shy‐ List of most-shy‐important ICMEMGI materials development activities from a government perspective

(alloys composites piezoelectrics etc)13 -shy‐ Needs assessment for professional workforce development-shy‐ Work with industry to make sure the research-shy‐to-shy‐engineering13 gap is filled What is the practical13 use

of the projects -shy‐ Provide13 the13 materials community organizing function

-shy‐ Business case for productivity improvement with13 data readily available

-shy‐ Facilitator as honest broker of capability awareness and data13 sharing

-shy‐ Interdisciplinary coordinated effort13 to avoidminimize duplication and redundancy

2 Facilitate data deposition (policy amp technology)-shy‐ Setting expectations and guidelines for best practices and standards where13 relevant -shy‐ Invest in tool13 making data management as easy as possible-shy‐ Promote standards developmentcommon data structuresshare knowledge representations -shy‐ Learn about the WBC Semantic Web capabilities and limitations -shy‐ Government should foster community development for both data package refinement and creating

standard approach to creating an enterprise toolset -shy‐ Provide13 motivation for standard data13 collection ensure13 it includes pedigreeprovenancemeta-shy‐data

-shy‐ Facilitator for materials data13 standards (good and trustworthy data13 vs bad data) -shy‐ Facilitator for forming consortium for data13 mining data13 warehousing and sharing

-shy‐ Facilitate13 industry standards for ICME13 model validation and maturity levels-shy‐ Facilitate13 standards for model-shy‐based13 process and13 supplier qualification

3 Incentivize data deposition-shy‐ Provide13 motivation for13 standard data collection ensure it13 includes13 pedigreeprovenancemeta-shy‐data-shy‐ Push ldquodata13 management planrdquo requirements to ldquodata13 depositrdquo requirements -shy‐ Make data sharing a requirement -shy‐ Contractual requirements that preserve data for future use

-shy‐ Demand data submission as a deliverable that will be in the contract before paying the funds -shy‐ Require that any data generated13 by federal grants is provided13 to13 the government (or a designated13

repository)13 within 3 years after completionmdashstart with basic13 research

-shy‐ Demonstrate the return on investments in the materials data infrastructure

27

13

4 Support13 data sharing13 and deposition13 (provide resources)-shy‐ Increase of MGI13 funds dedicated to MII13 development to 10-shy‐-shy‐repositories amp collab platforms -shy‐ Commit to13 long-shy‐term support for data repositories amp collaborative platforms-shy‐ Funding for development collection of fundamental genomic data

-shy‐ Support thermodynamicskinetics database development for light-shy‐weight alloy systems-shy‐ Funding for universal data13 repository (eg at NIST) ndash creation and maintenance

-shy‐ Fund or establish data13 repositories with commonuniform data13 collection format -shy‐ Createidentify secure data sharing site (like AMRDEC)13 for13 government13 contractorsmdashnot a

repository

-shy‐ AFRL take lead13 o making a database that can13 be13 shared

-shy‐ Infrastructure for materials related semantic web applications5 Support13 data creation-shy‐ Support experimental measurement effort to provide13 quality data for13 database development -shy‐ Support more13 high-shy‐throughput13 (HT)13 studies13 for applied programs13 to populate data Current HT

approaches support are13 limited to too fundamental single-shy‐phase DFT calculations 6 Enhance data management13 competency-shy‐ Government needs to increase in-shy‐house expertise in13 these areas or alternatively work closer with13

groups like NIST We cannot allow other organizations to define our needs for us7 Provide quality datasets13 for model13 developmentvalidation-shy‐ Need for open model benchmark cases with data to be available to the community

28

13

Attendees

John Allison University of MichiganJohn Beatty Army Research Laboratory

Ben13 Blaiszik University of ChicagoANL Carrie Campbell NIST

Sam Chance13 InovexJennifer13 Fielding Air Force Research LabMarco Fornari Central Michigan

Tim Hawes Decisive Analytics Corp

Lou Hector General MotorsAndrea Helbach Air Force Research LabKris Hill ITI-shy‐GlobalMike Glavicic Rolls Royce

Ro Gorham NCDMM

Dennis Griffin NASA

Mike Groeber Air Force Research LabScott Henry ASM InternationalMatt Jacobsen Air Force Research LabWill Joost Dept of Energy-shy‐EERE

Surya13 Kalidindi Georgia Tech

Kevin Kendig Air Force Research LabBen13 Leever Air Force Research LabLara Liou GE Aviation

Kenny Lipkowitz Office of Naval Research

Bill Mullins Office of Naval Research

Benji Maruyama Air Force Research LabRajiv Naik Pratt amp WhitneyJim Neumann Honeywell Ruth13 Pachter Air Force Research LabClare Paul Air Force Research LabAmra Peles United13 Technologies Research13 CenterAndy Powell GE Aviation

Brian13 Puchala University MichiganAjit Roy Air Force Research LabAyman13 Salem Materials ResourcesTom Searles Materials Data Management Jason Sebastian QuestekRichard13 Serwecki NASA

Dongwon Shin Oak Ridge13 National Labo

29

13

Harlan Shober RJ LeeJeff13 Sheehy NASA

Jeff13 Simmons Air Force Research LabVeera Sundararaghavan University of MichiganSesh Tamirisakandala RTI Glenn Tarcea University of MichiganTJ Turner Air Force Research LabVasisht Venkatesh Pratt amp WhitneyChuck Ward Air Force Research LabJim Warren Natl Institute of Standards amp TechJeff13 Williams GE Aviation

Terry Wong Rockedyne

Wayne Ziegler Army Research13 Laboratory

Jake Zindel Ford Motor Company

30

Page 18: Materials Genome Initiative: materials data · universallyaccepted best practices. Under the MGI, Federal agencies have funded research and development designedtoexplore, develop,

13

In the end13 it seems likely that considerable budget and time will be required to implement comprehensive informationdatabases which13 could be the largest hurdle13 of them all

QuesTekrsquos MGIICME Approach to Alloy Optimization and Design (Including Cast Iron) Jason Sebastian QuesTek Innovations LLC

Integrated Computational Materials Engineering (ICME) methods involve the holistic application13 of different computational models across various length13 scales to13 the design development and13 rapid13 qualification13 of advanced13 materials Using itsMaterials by Design technologies QuesTek has successfully applied these ICME methods to13 the design development13 and full qualification of several advanced alloys (including aerospace13 structural steels aerospace13 gear steels advanced castable13 titanium alloys high-shy‐strength corrosion-shy‐resistant13 aluminum alloys13 and high-shy‐performance cast iron)

QuesTekrsquos ICME methods benefit tremendously from the activities of the US Materials GenomeInitiative (MGI)13 Recognizing that the ldquo10 to13 2 yearsrdquo that13 it takes for alloy design and development ishindering the rapid13 deployment of higher-shy‐performance materials13 the MGI is focused on ldquocutting in halfrdquo the time and cost13 to design and deploy novel advanced materials needed to enhance and sustain UScompetitiveness

QuesTek also has ongoing13 activities on the13 application of ICME methods to the13 design and

development of materials for the energy sector including advanced13 materials that enable vehicle

lightweighting13 Of particular note is an ongoing effort with a major diesel13 engine manufacturer on the13 development of high-shy‐performance cast iron13 alloys for engine block applications QuesTekrsquos efforts have

made extensive use of various MGI-shy‐type tools and databases including 1)13 density functional theory

methods for high-shy‐throughput13 screening of cast iron inoculant stabilities (structure lattice parameterstability of various13 oxidesulfide phases) 2)13 liquid iron mobility databases (to help accurately modelsolidification phenomena) and 3) thermodynamic13 databases13 (to examine nano-shy‐precipitate stability)

Moving forward to continue to accelerate the materials development cycle efforts shouldremain focused on the development13 of13 the fundamental databases (thermodynamic kinetic etc)13 that13 enable13 ICME models There13 should also be13 increased focus o efforts related13 to13 the integration13 of the

various computational databases models and tools (the ldquoIrdquo13 in ICME)

Integrated Computational13 Materials Engineering (ICME) of Generation Three Advanced High Strength

Steels Louis G Hector Jr General Motors RampD Center

While traditional material development methods often involve laborious trial-shy‐and-shy‐error testing Integrated Computational13 Materials Engineering (ICME) which falls under the umbrella of the Materials Genome Initiative is offering a means13 for materials13 development under a more compressed time scale Advanced13 high13 strength13 steels (AHSS) which13 contain13 multiple phases (eg ferrite bainite pearlite austenite martensite) and may exhibit13 phase transformation with strain are fertile ground for13 ICME However prediction of macro-shy‐scale constitutive behavior based upon the multi-shy‐scale physical chemical and mechanical phenomena13 in these13 complex microstructures is formidable challenge for ICMEUnprecedented collaboration between universities industry and government labs will be required to

14

13

address AHSS13 development archiving of data and the13 implementation of these13 materials into products that13 benefit13 the American consumer This presentation began with a brief overview of a newDOEUSAMP (US Automotive Materials Partnership)-shy‐funded ICME program aimed at13 the development13 of Generation13 Three (Gen13 3) AHSS The lack of commercial Gen13 3 steels that meet DOE targets is offering13 significant challenges13 to this13 new ICME program The program is13 meeting these challenges13 following four thrusts (1)13 making Gen 3 steels to meet DOE targets (2) passing experimental measurements of key

properties to13 an13 ICME model that produces microstructure-shy‐based13 constitutive models (3) materialevaluation in forming13 and vehicle13 performance13 simulations (4) archiving13 all relevant data13 for easy use13 by

others in13 the program Although13 the Gen13 3 AHSS ICME program is ldquomission13 orientedrdquo it demonstrates an important role for universities in providing the necessary foundation in fundamental materialsscience and engineering

Integrated Computational13 Methods for Composites Materials (ICM2)13 Lessons Learned from

Integration Planning and Feasibility Demonstrations Lara Liou GE Aviation

General Electric Aviation and Lockheed Martin Aeronautics have teamed through an Air Force

Research13 Lab13 program to13 conduct a demonstration13 of an13 Integrated13 Computational Materials Engineering digital framework that links material processing property and13 structural relationships to13 account for processability manufacturability and system performance13 with the13 goal of demonstrating

that13 the usage of13 integrated models can contribute to future airframe and engine designs with dramatic reductions in development13 time and cost Both engine and airframe Foundational Engineering Problems will be demonstrated through a common digital framework to be developed and validated onincreasingly complex articles13 The modeling tools that13 cover composite process models localized13 mechanical behavior models and global design models (Convergent Manufacturing TechnologyrsquosCOMPRO CCA Autodeskrsquos Autodesk Simulation13 Composite Analysis and13 an13 Abaqus based13 University ofMichigan micromechanics model) are centrally linked via a commercially available model integration

interface (Phoenix Integrationrsquos ModelCenterreg)13 Detailed plans of what data will13 be passed among thevarious models as well as how it will be passed have been completed Creation of these detailed13 process maps and13 simple feasibility demonstrations have raised13 inherent technical and13 logistical challenges13 to integrating commercially13 available multi-shy‐scale models Robust model integration methods13 to address these challenges were13 discussed

The exercise of drawing up detailed process maps for what to integrate and how to integrate13 the selected tools and models resulted in key early lessons learned and identification of13 tools critical tolong-shy‐term robust and sustained application of13 Integrated Computational Methods for Composites Materials (ICM2)13 The key lessons learned include

bull Individual13 models must be relatively mature and validated before an ICM2 problem can be

confidently13 defined bull F analysis-shy‐based13 tools within13 the same digital thread should utilize13 the13 same13 FE solver bull The team ideally13 should consist of at least one expert for each13 tool and13 its underlying model as

well as an expert in the digital thread and integration interface tools and programming

15

13

bull Mesh interpolation between13 finite element analysis based13 tools will likely be required and13 acommercially13 available tool for this13 type of composite mesh interpolation is13 key13 for robust and

sustained application of ICM2

Residual Stress Engineering of Rotating Nickel Components Terry Wong13 Aerojet Rocketdyne

The goal for the PW-shy‐8 project13 The Nickel Base Superalloy Residual Stress Foundational Engineering Problem program aims to13 develop13 a multi-shy‐disciplinary analysis approach13 to13 predict and13 incorporate bulk residual13 stress in the13 design of component This will allow for design of part that is optimized for a particular application or design intent (such13 as weight performance or reduction13 ofscrap) To accomplish this13 goal the PW-shy‐813 team has two major tasks 1) the13 development of an13 infrastructure and associated tools needed to predict and incorporate bulk residual13 stress in the designstages13 of a part and 2) a demonstration task that will validate the infrastructures13 and tools13 developed

The Data13 Management task is the foundational element13 of13 the larger13 infrastructure

development task The goal for the data management work is to13 develop13 an13 infrastructure that allowsthe various team members in PW-shy‐813 to share13 data13 and to store13 data13 in database This not only ensures that data generated13 by this program is never lost but also13 allows for data to13 be transferred13 in13 a secure

manner In order to meet the goal of a system13 that securely stores and shares data three major tasks must be accomplished 1) Validation that there13 is secure13 method to transfer data13 between team

members and that the method of storing data is also secure 2)13 development of13 a database schema and

3) development of means to easily upload data13 into the13 database13 and download data13 from the13 database

Of the three the most difficult task was to create a system in which the transfer and storage ofdata was secure Various proposals were made to13 ensure the security of the data but many of them

were rejected not because the proposal was actually unsecure but because it was not acceptable byeach of the13 team memberrsquos IT security policies The13 process to get each team membersrsquo IT department to approve the security of13 the data transfer13 and storage took more effort13 than anticipated In the endthe team members along with their13 respective IT departments approved the use of13 TRUcentrix as a tool to transfer13 data and the use of13 an ASM International13 hosted13 database to13 store data

The development of database schema13 is necessary step in creating the database to store all the data generated (via physical tests as well as data generated by models) A big help in creating the

database schema was to13 use the schema generated13 by Dr Steven13 Arnold13 from NASArsquos Glenn13 Research13 Center Using NASArsquos schema eliminated13 the need13 to13 start the database build13 from scratch13 and13 allowed13 us to13 focus o the specific needs of storing residual stress data

The last task involves the development of tools that make the input of data13 easy The moredifficult it is to13 enter data into13 a database the more likely data13 will not be13 entered Excel-shy‐based13 templates are being developed in order13 to make data entry as easy as possible

The two big lessons learned from PW-shy‐813 in relationship to data13 management are

bull It is essential13 to work with each13 companiesrsquo IT and IT Security people to develop a system thatsatisfies13 the security requirements of each company

16

13

bull Database schema development should not be reinvented but begin with already established the13 schema

Perspectives on MAI ProgramsDr Mike Glavicic Rolls Royce

The current formulation of MAI (AFRLrsquos Metals Affordability Initiative) programs are structured13 independently from one another in that a standard methodology does not exist that describes whatdata is to13 be archived13 at the conclusion13 of a program Nor does there exist a standard13 protocol of howto archive things on a server13 or some other computer resource at the conclusion13 of a program As aresult the current13 practices of13 MAI programs

bull Do not address IP (intellectual property)13 and licensing issues in systematic fashion which leadsto issues in the commercialization of ICME software that is generated13 under a program

bull Do not have a formalized requirement of how data is to be packaged and stored at the end of aprogram Hence even13 in13 the best of cases in13 which13 all of the data is stored the data is archived13 in13 simple folder13 fashion that13 is not13 searchable in any manner13 possible Moreover even if13 all of13 the

data is present it is extremely cumbersome to13 untangle the data for future use using a file folder strategy

Hence at the end of the program both the government and13 MAI team are13 left to manage13 the13 data and13 results in13 an13 unregimented13 fashion

The lack of regimented protocol of how to package programrsquos13 results13 (raw data and reports) makes it extremely difficult for new government programs to leverage the data from previous programs Case in13 point the RR-shy‐1013 program was developed with the13 singular goal to integrate13 the13 models developed13 in13 the LAD-shy‐213 program into commercial software13 platform From the13 outset of the13 program data models and13 other information13 that was required to attained the13 goals of the13 RR-shy‐1013 program were13 either difficult to track down lost or not provided to the13 RR-shy‐1013 team due13 to contractual disputes As aresult funding that13 was earmarked for13 specific technical work at13 the conception of13 the13 program had to

be diverted13 for the recreation13 of data that existed13 in13 a previous program and13 the progress of the

program was stunted13 by this unforeseen13 burden In13 spite of the RR-shy‐1013 programrsquos13 ability to overcome

these issues and become a successful program it13 should be noted that13 the success of13 other13 programs that13 will encounter13 similar13 problems will likely not13 apply as RR-shy‐1013 was very fortunate13 to have13 the13 means to overcome these serious issues that13 were a result13 of13 the lack of13 a data management13 plan

Furthermore if the13 MGI initiative13 is to progressively develop materials models that are13 leveraged and improved in future government funded programs a policy on how to store and share thedata from disconnected13 programs is required The PW-shy‐1113 program is the first13 step in this process as it13 will have the data from three MAI programs (RR-shy‐10 GE-shy‐10 PW-shy‐9) uploaded into database13 that will be13 searchable and will enable the reuse of the data collected in these programs However even if this13 program is a success13 there will undoubtedly13 be other features13 that will be required in the platform

developed13 and13 hence future modification13 will be required Most importantly there does not currently13 exist server that has been set up and funded for the13 archiving13 of all13 future programs with a protocol13

17

13

13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

that13 must13 be adhered to at13 the conclusion of13 a funded program Without13 such a protocol and the

existence13 of server to be13 used in all future13 programs the13 goals set forth by MGI will be13 unnecessarily

difficult and13 more expensive to attain

MAI Data Informatics Data Standards Modeling amp UQDr Rajiv Naik Pratt amp Whitney

The MAI Data13 Informatics Data13 Standards Modeling amp Uncertainty Quantification (UQ) (PW-shy‐11)13 effort13 will enable the pervasive implementation of ICME by developing (i) a web-shy‐based13 OEM-shy‐Suppliercollaboration framework13 for production13 certification13 and13 process modeling data (ii) data standards and

protocols for interoperability between13 OEMs and13 Suppliers and13 (iii)13 UQ methods to capture variability in

process microstructure and property data Models and data13 from earlier MAI programs such as GE-shy‐10 PW-shy‐913 and RR-shy‐1013 will be13 used to develop and implement OEM-shy‐Supplier integration data standards data

structures and UQ models Several OEMs and suppliers (Pratt amp Whitney General Electric Rolls-shy‐RoyceTIMET Alcoa13 and RTI International Metals Inc) are working together on the PW-shy‐11 Team

The OEM-shy‐Supplier collaboration tool will enable data13 exchange related to mechanical testingmicrostructural13 characteristics processing and process modeling The13 data13 standards developed by the

PW-shy‐1113 team will provide13 an industry standard for such OEM-shy‐Supplier collaboration The collaboration13 framework would provide a platform to manage OEM-shy‐Supplier13 workflows data and13 informationsecurity and access13 control and the13 ability to perform system administration without the13 need13 for day-shy‐to-shy‐day intervention13 of the IT department The capture of such13 supplier data over the collaboration

framework will enable the OEMs to enhance process and property13 models data analytics and UQ

modeling Three potential technologies will be evaluated13 for the OEM-shy‐Supplier Collaboration13 Frameworkndash GE Forge Vanderbilt Forge and Semantic Web technology and one13 technology will be selected13 for the

next phase of the program

18

13

Semantic Web technology is paradigm shift from highly structured13 databases to13 more flexible

graph databases from single database searches to13 multiple and13 disparate13 database13 searches and from

keyword searches to contextual and meaningful (semantic) searches It can provide flexibility13 of data

structures and enhanced OEM-shy‐Supplier13 interoperability However it requires more work upfront13 for13 implementation as it needs the development of a common vocabulary and relationships

The data13 standards and data13 structures effort will leverage work at national and13 international standards13 organizations such13 as ASM ASTM CEN and13 NIST and13 develop13 standards and vocabularies forproduction13 certification data that can include chemistry microstructure and mechanical properties This effort will also develop means for the capture of data needed13 for the UQ and13 ICME models andbuild13 databases for13 the capture of13 the data from the GE-shy‐10 PW-shy‐9 and RR-shy‐10 MAI programs

The ICME13 models being developed under the GE-shy‐10 PW-shy‐9 and RR-shy‐10 programs involve severalinterconnected models related to processing microstructure and mechanical properties The UQ effort under PW-shy‐11 will involve considering the uncertainty at each13 step13 of the modeling process and flowing it through the interconnected models to evaluate the relationships between13 the variabilities of13 the

processing parameters the microstructure and13 the mechanical performance

ASMrsquos Computational Materials Data Network amp the NIST-shy‐ASM Structural Materials Data

Demonstration ProjectScott D Henry ASM International

ASM International ndash the worldrsquos largest13 association of13 metals-shy‐focused materials engineers scientists technicians educators and students13 ndash has launched13 a series of initiatives under the

ldquoComputational Materials Data (CMD)13 Networkrdquo umbrella The objective13 of the13 CMD Network is to serve13 as center for information collection and dissemination for materials data13 to support integrated

computational materials13 engineering (ICME) and to help realize the goals13 of the US Materials13 Genome

InitiativeThe first major project for the CMD Network is the NIST-shy‐ASM Structural Materials Data

Demonstration Project (SMDDP) a cooperative research project funded by NIST with participation from

the Kent13 State Center13 for13 Materials Informatics and Granta Design Ltd13 The main13 objectives of SMDDP

are13 to

19

13

bull Establish well-shy‐pedigreed13 and13 curated13 demonstration13 datasets for non-shy‐proprietary metallic structural materials13 data over multiple length scales The team has13 chosen to work with 6061

aluminum and the13 related Al-shy‐Mg-shy‐Si system

bull Work with NIST and the materials data community to develop materials data schema and

ontologies for the demonstration13 datasets cognizant of broader interests and13 datasets

bull Develop and carry out a series of test problems that represent relevant use cases for the13 repository

bull Make data open to the materials data community for use in data analytics modeling and

educational purposes

bull Actively engage the materials data community and13 widely disseminate the findings from the

project

bull Develop and implement data13 capture13 and curation procedures that can serve13 as models for otherdata repositories

The basic philosophy behind the project is that often standards13 emerge through trial and error and examples are13 needed to spur this process

The project is using the NIST-shy‐KSU DSpace13 repository for data13 files The13 curated structured

database is being developed13 using GRANTA13 MI software and13 is hosted13 o ASM serversThe project ndashlaunched in late 2013 and planned as an 18-shy‐month program13 ndash is being conducted in

phases The13 first phase13 involved developing starter database13 and gaining experience13 in data13 curation The second phase (currently underway) involves building out more comprehensive data13 sets that will be

suitable for carrying out test problems13 related to processing-shy‐structure-shy‐property relationships in13 Al 6061 The third phase will involve making further refinements in response to testing The complete data13 set will be opened up online to the materials community as a whole

Globus13 Scientific13 Data Services13 Current and FutureBen13 Blaiszik1 Kyle Chard1 Jim Pruyne1 Rachana Anathakrishnan1 Steve Tuecke12 Ian Foster12

1University of Chicago Computation Institute 2Argonne National Laboratory MCS Division

During this talk we discussed a variety of currently available Globus services13 Via these Globusservices developers13 and IT administrators13 can offer simple secure and robust file transfer as13 well as13 identity profile and group management to their user communities13 Globus allows end users to create

and13 manage a unique identity that can13 be linked13 to13 external identities for authentication End13 users can13 also transfer files across wide13 area13 networks faster and more13 easily whatever their location

The talk also covered prototype services related to data13 publication metadata tagging and13 data

search Globus13 publishing capabilities13 are delivered through a hosted service Metadata is13 stored in the

cloud while raw published data and a fully-shy‐formed metadata log are stored on campus institutionaland group resources that13 are managed and operated by campus administrators Published datasets are

20

13

organized13 by communities and13 their member collections Globus users can13 create and13 manage theirown13 communities and13 collections through13 the data publication13 service13 with collection level policies regarding user13 access Additionally metadata from these communities are imported into a centralized

index to allow for robust searching functionality

Semantics13 for Deep13 InteroperabilitySamuel Chance iNovex

Semantic technology is based on mature13 web standards and intended to be13 used as global framework for13 machine-shy‐to-shy‐machine communication Interoperability is enabled even when the more

basic elements of the semantic technology stack are used13 in13 conjunction with a controlled vocabularyKnowledge13 representation is enabled as one13 begins to utilize13 the13 ldquohigherrdquo elements of the13 stack

brief history of the semantic web13 was presented13 and13 started13 with13 a depiction13 of the web13 that most people see today documents and web pages presented to the user13 through a web application It13 then depicted a transition to an intermediate state where documents and data can co-shy‐exist on asemantic13 web riding atop the same infrastructure (http) Even in this13 state the data can13 be directly

interpreted by a machine13 The history concluded with a depiction of data being extracted from adocument to13 become part of the web13 of data

The basics of semantic technology were presented and emphasized that the technology has afoundation based13 o formal logic that enables software13 to understand information and reduce13 the13 need

for13 a human to interpret13 the meaning This section of13 the presentation concluded with a chart13 depicting

how ldquotriplesrdquo link to13 each13 other to13 form a web13 of dataThe final section touched upon adoption of13 and things to consider13 when semantic technologies

are13 being considered as solution for community or enterprise

21

13

Big Data13 and Semantic TechnologiesHarlan Shober RJ Lee Group

The presentation started with a description of entities that13 comprise laboratoryinformatics with the central elements being a federated architecture scientific ldquoBig Datardquo management13 and workflow-shy‐driven Laboratory Information Management13 Systems (LIMS)To aid attendee understanding initial context13 was provided by first13 defining for ldquoBig Datardquo and

Semantic Technologies Context13 was further bolstered through a description of a problem that13 included laboratory testingcharacterization and fabrication processes The challenges for atypical laboratory include the use of many systems and the creation of large amounts of data13 that13 grows daily but13 cannot13 be readily accessed

After the context13 was established the needed functionality was described and consistedof several elements contribute index search analyze and archive While bearing in mind theneeded functionality a string of universal scientific data13 management13 process categories werethen presented with the intent13 of applying the universal description to the materials design anddevelopment13 domain

22

13

Though semantic technologies are applicable over the entire lifecycle of the data13 management13 process three categories were specifically identified as primary candidates fortheir application Data13 Acquisition Data13 Manipulation and Storage and Discovery andAdvisory W3C standards for Resource Description Framework (RDF) were applied to a notionalexample with the intent13 of completing a thread that13 gave the attendee a sense of how thistechnology could be applied

The presentation continued with a description of Postulate a specific solutiondeveloped by RJ Lee Group that13 could be used as part13 of a suite of tools to enable the captureof data13 and the near-shy‐frictionless tagging of data13 with additional observations provenanceinformation or links to additional data Using tools like Postulate repositories could be moreeasily shared with others in a federated manner These technologies would enable thedevelopment13 of indices that13 increase the precision of search results and ease analytics since thedata13 would be ldquoself-shy‐describedrdquo

In the remainder of the presentation Postulate functionality was further described andsome of the challenges presented earlier were addressed For example how do you processthe quantity and variety of data how do you store it how do you find the data13 you care aboutand how can the data13 be presented to you in a meaningful way

Natural13 Language Processing (NLP)Tim Hawes Decisive Analytics

The presentation provided the attendees with a description of NLP13 and how it can be13 used It started with examples13 of how it can be used to mine ldquoBig Datardquo to predict credit card fraud or assess13 user purchase histories for product development NLP is also13 being used13 in13 the field13 of biology and13 appears to have13 enjoyed some13 level of success

Since13 the13 biology and materials communities both publish large13 volume13 of documents (abstracts papers presentations etc) many of13 the NLP approaches developed for13 biology may be

applicable13 to materials The next section of the presentation focused on describing various aspects of NLP tokenization

part-shy‐of-shy‐speech tagging disambiguation parsing entity disambiguation and role13 labeling This was helpful in13 removing the mystery surrounding NLP for those13 who hadnrsquot had previous exposure

Typical products from NLP13 would be useful to the materials development community Examples include providing the gist of a documentrsquos content providing answers to natural13 language questions and knowledge13 construction

Specific NLP13 challenges for the13 materials domain were13 identified inline13 use13 of equations prevalence of tables graphs and13 images (more of a computer vision13 problem) and13 numerically driven13 information13 Additionally the ldquogenrerdquo problem was discussed as13 well as13 a way this might be addressed

using raw data (versus learning data) and13 semi-shy‐supervised learning active learning and distant supervision approaches

23

13

The presentation concluded by reiterating how NLP13 can be used and the value of using it describing some its limitations and13 identifying tool development needed13 for the materials domain

24

13

Community13 and13 Government Priorities in13 Addressing13 Materials Data

The following numbered13 priorities for13 both community and government13 actions were lsquorolled uprsquofrom workshop participantsrsquo inputs and were reviewed at13 the workshop The raw inputs provided by

participants are captured13 under each13 numbered13 priority and provide context to13 the priority and13 often13 more specific actions

Community-shy‐Led13 Priorities1 Develop13 and deploy13 standards13 for data and federatedcollaborative environments-shy‐ Promote13 standards for data13 and collaboration environments-shy‐ Designing and deploying suitable software environments that are capable of facilitating e-shy‐

recording and accelerating (fail often-shy‐fail fast)13 of13 the communityrsquos experience (both successes and

failures)13 in exploring integrated workflows needed to realize the vision of ICME and MGI-shy‐ Data13 standards (program managementtechnical datametadata)-shy‐ Community must start developing a common13 data format data package could13 look like an13

HDF5 file and be shared in a variety of systems -shy‐ Begin13 definition13 of standardized13 infrastructure

-shy‐ Produce13 draft guideline for the metadata including information regarding authorship provenance targethellip

-shy‐ Share13 the13 data13 types being collected with the13 community Include13 nametypeunits of data

-shy‐ Community must adopt a federated13 approach13 -shy‐ allow organizations to autonomously create their13 own13 toolsets but be able to13 readwrite to13 the common13 data package mentioned13 in13 point 1

-shy‐ Good documentation on repositories are important-shy‐ Address networking of repositories (lightweight metadata databaseindex) -shy‐ Digital heterostructure microstructure mapping algorithms for both scanning electron microscopy

(SEM)13 and transmission electron microscopy (TEM) -shy‐ Materials process model (thermodynamics) integration with ICMSE (Integrated Computational

Materials Science and Engineering) data flow

-shy‐ Data sharing protocol across users (government13 OEM13 developers) with appropriate securitycontrol

-shy‐ Work with instrumentequipment OEMs to get auto ingest capabilities and open architectures-shy‐ Work together to define common vocabularies for interoperability-shy‐ Standard tests for ICME13 model data13 standards for microstructure13 data

-shy‐ Standards for ICME13 model data13 storage13 and workflows 2 Communicate the value and need13 for digital materials13 data-shy‐ Brand13 management ndash consistent messaging of goals13 and beliefs -shy‐ Share13 success stories -shy‐ Value (ROI13 Return on Investment) o data capture (globally locally)-shy‐ Give incentive ($$$) for case studies to provide evidence that a shared data repository can be

actually useful beyond the archival purpose

25

13

-shy‐ Share13 ROIs for MII (the13 Materials Innovation Infrastructure)-shy‐ ICME projects due to their iterative nature take longer and are more expensive to realize benefits

in industry implementation13 Multi-shy‐year ICME programs are becoming13 a hard sell to management difficult to13 establish ROI and13 realize benefits claimed

3 Engage the community-shy‐ Write grant proposals-shy‐ Collaboration13 (or e-shy‐collaboration) science that will help us13 identify13 the specific13 incentives13 and

impediments for establishing mutually beneficial highly productive sharing of expertise between13 disciplinary experts in13 materials manufacturing and13 data sciences

-shy‐ Continue to13 work o hard13 problemsmdashdevelop13 open13 source codes discover new descriptors data

mining uncertainty analysis-shy‐ Organize as a community (governmentindustryacademia nationalinternational) -shy‐ Follow data13 recommendations in MGI strategic planmdashworkshops to develop roadmap-shy‐ Attend13 more events like the MGI workshop13 mdash gaining13 lessons learned from other organizations is

extremely valuable

4 ldquoDefine critical datardquo13 to be13 compiled-shy‐ Consensus o what fundamental genomic data is most-shy‐important to compile (thermodynamics13

kinetics13 properties [strength]13 high-shy‐throughput13 DFT) -shy‐ Agree to13 the concept of a ldquouniversal repositoryrdquo for all genomic data that everyone will contribute

to and use (cf everyone maintaining and hosting their13 own databases) this includes issues of13 data

security and sharing etc -shy‐ List of most-shy‐important ICMEMGI13 materials development activities from an industrycommercial13

perspective (alloys composites piezoelectrics etc)-shy‐ Some13 ongoing repository efforts seem redundant We13 should avoid unnecessary competition and

try not13 to step o other peoples toes5 Explore and leverage data solutions developed13 outside materials13 community-shy‐ Semantic web infrastructure13 ndash evaluate13 feasibility of lsquounstructured materials datardquo management

(Technical pathcosttimeframe) -shy‐ Keep communicating outside13 the13 community about solutions -shy‐ Focus on available13 tools open ones when possible

6 Train13 students-shy‐ Sponsor students13 who can think in terms13 of IT and MSE

-shy‐ Training workshops (through professional societies) for training the new generation of materials students13 and researchers13 in the use of readily (already) accessible data science tools13 and cloud

services13 to make13 the13 best use13 of this emerging13 infrastructure13 in enhancing13 their own individual day-shy‐to-shy‐day productivity

7 Establish community13 norms13 for publishing materials13 data-shy‐ Involve publishers by requiring data when publishing Begin by making just the data available first

and then work on standardization panel

26

13

-shy‐ Data associated with publications should be shared along with the publications Not only the

metadata (eg average grain size vs cooling rate) but also the source of the data (eg micrographs used13 to13 generate13 grain size temperaturetime charts13 used to calculate cooling rates etc)

Government-shy‐Led13 Priorities1 Lead13 data strategyapproach13 development-shy‐ Hold meeting like this one in 6-shy‐1213 months with purpose13 of developing roadmap for MII-shy‐ Define a technical roadmap for13 the materials data infrastructure

-shy‐ Governmentrsquos got to decide on a strategy-shy‐ Whatrsquos the critical nucleus eg getting13 CALPHAD off the13 ground

-shy‐ Government must establish an official stance on security and infrastructure-shy‐ List of most-shy‐important ICMEMGI materials development activities from a government perspective

(alloys composites piezoelectrics etc)13 -shy‐ Needs assessment for professional workforce development-shy‐ Work with industry to make sure the research-shy‐to-shy‐engineering13 gap is filled What is the practical13 use

of the projects -shy‐ Provide13 the13 materials community organizing function

-shy‐ Business case for productivity improvement with13 data readily available

-shy‐ Facilitator as honest broker of capability awareness and data13 sharing

-shy‐ Interdisciplinary coordinated effort13 to avoidminimize duplication and redundancy

2 Facilitate data deposition (policy amp technology)-shy‐ Setting expectations and guidelines for best practices and standards where13 relevant -shy‐ Invest in tool13 making data management as easy as possible-shy‐ Promote standards developmentcommon data structuresshare knowledge representations -shy‐ Learn about the WBC Semantic Web capabilities and limitations -shy‐ Government should foster community development for both data package refinement and creating

standard approach to creating an enterprise toolset -shy‐ Provide13 motivation for standard data13 collection ensure13 it includes pedigreeprovenancemeta-shy‐data

-shy‐ Facilitator for materials data13 standards (good and trustworthy data13 vs bad data) -shy‐ Facilitator for forming consortium for data13 mining data13 warehousing and sharing

-shy‐ Facilitate13 industry standards for ICME13 model validation and maturity levels-shy‐ Facilitate13 standards for model-shy‐based13 process and13 supplier qualification

3 Incentivize data deposition-shy‐ Provide13 motivation for13 standard data collection ensure it13 includes13 pedigreeprovenancemeta-shy‐data-shy‐ Push ldquodata13 management planrdquo requirements to ldquodata13 depositrdquo requirements -shy‐ Make data sharing a requirement -shy‐ Contractual requirements that preserve data for future use

-shy‐ Demand data submission as a deliverable that will be in the contract before paying the funds -shy‐ Require that any data generated13 by federal grants is provided13 to13 the government (or a designated13

repository)13 within 3 years after completionmdashstart with basic13 research

-shy‐ Demonstrate the return on investments in the materials data infrastructure

27

13

4 Support13 data sharing13 and deposition13 (provide resources)-shy‐ Increase of MGI13 funds dedicated to MII13 development to 10-shy‐-shy‐repositories amp collab platforms -shy‐ Commit to13 long-shy‐term support for data repositories amp collaborative platforms-shy‐ Funding for development collection of fundamental genomic data

-shy‐ Support thermodynamicskinetics database development for light-shy‐weight alloy systems-shy‐ Funding for universal data13 repository (eg at NIST) ndash creation and maintenance

-shy‐ Fund or establish data13 repositories with commonuniform data13 collection format -shy‐ Createidentify secure data sharing site (like AMRDEC)13 for13 government13 contractorsmdashnot a

repository

-shy‐ AFRL take lead13 o making a database that can13 be13 shared

-shy‐ Infrastructure for materials related semantic web applications5 Support13 data creation-shy‐ Support experimental measurement effort to provide13 quality data for13 database development -shy‐ Support more13 high-shy‐throughput13 (HT)13 studies13 for applied programs13 to populate data Current HT

approaches support are13 limited to too fundamental single-shy‐phase DFT calculations 6 Enhance data management13 competency-shy‐ Government needs to increase in-shy‐house expertise in13 these areas or alternatively work closer with13

groups like NIST We cannot allow other organizations to define our needs for us7 Provide quality datasets13 for model13 developmentvalidation-shy‐ Need for open model benchmark cases with data to be available to the community

28

13

Attendees

John Allison University of MichiganJohn Beatty Army Research Laboratory

Ben13 Blaiszik University of ChicagoANL Carrie Campbell NIST

Sam Chance13 InovexJennifer13 Fielding Air Force Research LabMarco Fornari Central Michigan

Tim Hawes Decisive Analytics Corp

Lou Hector General MotorsAndrea Helbach Air Force Research LabKris Hill ITI-shy‐GlobalMike Glavicic Rolls Royce

Ro Gorham NCDMM

Dennis Griffin NASA

Mike Groeber Air Force Research LabScott Henry ASM InternationalMatt Jacobsen Air Force Research LabWill Joost Dept of Energy-shy‐EERE

Surya13 Kalidindi Georgia Tech

Kevin Kendig Air Force Research LabBen13 Leever Air Force Research LabLara Liou GE Aviation

Kenny Lipkowitz Office of Naval Research

Bill Mullins Office of Naval Research

Benji Maruyama Air Force Research LabRajiv Naik Pratt amp WhitneyJim Neumann Honeywell Ruth13 Pachter Air Force Research LabClare Paul Air Force Research LabAmra Peles United13 Technologies Research13 CenterAndy Powell GE Aviation

Brian13 Puchala University MichiganAjit Roy Air Force Research LabAyman13 Salem Materials ResourcesTom Searles Materials Data Management Jason Sebastian QuestekRichard13 Serwecki NASA

Dongwon Shin Oak Ridge13 National Labo

29

13

Harlan Shober RJ LeeJeff13 Sheehy NASA

Jeff13 Simmons Air Force Research LabVeera Sundararaghavan University of MichiganSesh Tamirisakandala RTI Glenn Tarcea University of MichiganTJ Turner Air Force Research LabVasisht Venkatesh Pratt amp WhitneyChuck Ward Air Force Research LabJim Warren Natl Institute of Standards amp TechJeff13 Williams GE Aviation

Terry Wong Rockedyne

Wayne Ziegler Army Research13 Laboratory

Jake Zindel Ford Motor Company

30

Page 19: Materials Genome Initiative: materials data · universallyaccepted best practices. Under the MGI, Federal agencies have funded research and development designedtoexplore, develop,

13

address AHSS13 development archiving of data and the13 implementation of these13 materials into products that13 benefit13 the American consumer This presentation began with a brief overview of a newDOEUSAMP (US Automotive Materials Partnership)-shy‐funded ICME program aimed at13 the development13 of Generation13 Three (Gen13 3) AHSS The lack of commercial Gen13 3 steels that meet DOE targets is offering13 significant challenges13 to this13 new ICME program The program is13 meeting these challenges13 following four thrusts (1)13 making Gen 3 steels to meet DOE targets (2) passing experimental measurements of key

properties to13 an13 ICME model that produces microstructure-shy‐based13 constitutive models (3) materialevaluation in forming13 and vehicle13 performance13 simulations (4) archiving13 all relevant data13 for easy use13 by

others in13 the program Although13 the Gen13 3 AHSS ICME program is ldquomission13 orientedrdquo it demonstrates an important role for universities in providing the necessary foundation in fundamental materialsscience and engineering

Integrated Computational13 Methods for Composites Materials (ICM2)13 Lessons Learned from

Integration Planning and Feasibility Demonstrations Lara Liou GE Aviation

General Electric Aviation and Lockheed Martin Aeronautics have teamed through an Air Force

Research13 Lab13 program to13 conduct a demonstration13 of an13 Integrated13 Computational Materials Engineering digital framework that links material processing property and13 structural relationships to13 account for processability manufacturability and system performance13 with the13 goal of demonstrating

that13 the usage of13 integrated models can contribute to future airframe and engine designs with dramatic reductions in development13 time and cost Both engine and airframe Foundational Engineering Problems will be demonstrated through a common digital framework to be developed and validated onincreasingly complex articles13 The modeling tools that13 cover composite process models localized13 mechanical behavior models and global design models (Convergent Manufacturing TechnologyrsquosCOMPRO CCA Autodeskrsquos Autodesk Simulation13 Composite Analysis and13 an13 Abaqus based13 University ofMichigan micromechanics model) are centrally linked via a commercially available model integration

interface (Phoenix Integrationrsquos ModelCenterreg)13 Detailed plans of what data will13 be passed among thevarious models as well as how it will be passed have been completed Creation of these detailed13 process maps and13 simple feasibility demonstrations have raised13 inherent technical and13 logistical challenges13 to integrating commercially13 available multi-shy‐scale models Robust model integration methods13 to address these challenges were13 discussed

The exercise of drawing up detailed process maps for what to integrate and how to integrate13 the selected tools and models resulted in key early lessons learned and identification of13 tools critical tolong-shy‐term robust and sustained application of13 Integrated Computational Methods for Composites Materials (ICM2)13 The key lessons learned include

bull Individual13 models must be relatively mature and validated before an ICM2 problem can be

confidently13 defined bull F analysis-shy‐based13 tools within13 the same digital thread should utilize13 the13 same13 FE solver bull The team ideally13 should consist of at least one expert for each13 tool and13 its underlying model as

well as an expert in the digital thread and integration interface tools and programming

15

13

bull Mesh interpolation between13 finite element analysis based13 tools will likely be required and13 acommercially13 available tool for this13 type of composite mesh interpolation is13 key13 for robust and

sustained application of ICM2

Residual Stress Engineering of Rotating Nickel Components Terry Wong13 Aerojet Rocketdyne

The goal for the PW-shy‐8 project13 The Nickel Base Superalloy Residual Stress Foundational Engineering Problem program aims to13 develop13 a multi-shy‐disciplinary analysis approach13 to13 predict and13 incorporate bulk residual13 stress in the13 design of component This will allow for design of part that is optimized for a particular application or design intent (such13 as weight performance or reduction13 ofscrap) To accomplish this13 goal the PW-shy‐813 team has two major tasks 1) the13 development of an13 infrastructure and associated tools needed to predict and incorporate bulk residual13 stress in the designstages13 of a part and 2) a demonstration task that will validate the infrastructures13 and tools13 developed

The Data13 Management task is the foundational element13 of13 the larger13 infrastructure

development task The goal for the data management work is to13 develop13 an13 infrastructure that allowsthe various team members in PW-shy‐813 to share13 data13 and to store13 data13 in database This not only ensures that data generated13 by this program is never lost but also13 allows for data to13 be transferred13 in13 a secure

manner In order to meet the goal of a system13 that securely stores and shares data three major tasks must be accomplished 1) Validation that there13 is secure13 method to transfer data13 between team

members and that the method of storing data is also secure 2)13 development of13 a database schema and

3) development of means to easily upload data13 into the13 database13 and download data13 from the13 database

Of the three the most difficult task was to create a system in which the transfer and storage ofdata was secure Various proposals were made to13 ensure the security of the data but many of them

were rejected not because the proposal was actually unsecure but because it was not acceptable byeach of the13 team memberrsquos IT security policies The13 process to get each team membersrsquo IT department to approve the security of13 the data transfer13 and storage took more effort13 than anticipated In the endthe team members along with their13 respective IT departments approved the use of13 TRUcentrix as a tool to transfer13 data and the use of13 an ASM International13 hosted13 database to13 store data

The development of database schema13 is necessary step in creating the database to store all the data generated (via physical tests as well as data generated by models) A big help in creating the

database schema was to13 use the schema generated13 by Dr Steven13 Arnold13 from NASArsquos Glenn13 Research13 Center Using NASArsquos schema eliminated13 the need13 to13 start the database build13 from scratch13 and13 allowed13 us to13 focus o the specific needs of storing residual stress data

The last task involves the development of tools that make the input of data13 easy The moredifficult it is to13 enter data into13 a database the more likely data13 will not be13 entered Excel-shy‐based13 templates are being developed in order13 to make data entry as easy as possible

The two big lessons learned from PW-shy‐813 in relationship to data13 management are

bull It is essential13 to work with each13 companiesrsquo IT and IT Security people to develop a system thatsatisfies13 the security requirements of each company

16

13

bull Database schema development should not be reinvented but begin with already established the13 schema

Perspectives on MAI ProgramsDr Mike Glavicic Rolls Royce

The current formulation of MAI (AFRLrsquos Metals Affordability Initiative) programs are structured13 independently from one another in that a standard methodology does not exist that describes whatdata is to13 be archived13 at the conclusion13 of a program Nor does there exist a standard13 protocol of howto archive things on a server13 or some other computer resource at the conclusion13 of a program As aresult the current13 practices of13 MAI programs

bull Do not address IP (intellectual property)13 and licensing issues in systematic fashion which leadsto issues in the commercialization of ICME software that is generated13 under a program

bull Do not have a formalized requirement of how data is to be packaged and stored at the end of aprogram Hence even13 in13 the best of cases in13 which13 all of the data is stored the data is archived13 in13 simple folder13 fashion that13 is not13 searchable in any manner13 possible Moreover even if13 all of13 the

data is present it is extremely cumbersome to13 untangle the data for future use using a file folder strategy

Hence at the end of the program both the government and13 MAI team are13 left to manage13 the13 data and13 results in13 an13 unregimented13 fashion

The lack of regimented protocol of how to package programrsquos13 results13 (raw data and reports) makes it extremely difficult for new government programs to leverage the data from previous programs Case in13 point the RR-shy‐1013 program was developed with the13 singular goal to integrate13 the13 models developed13 in13 the LAD-shy‐213 program into commercial software13 platform From the13 outset of the13 program data models and13 other information13 that was required to attained the13 goals of the13 RR-shy‐1013 program were13 either difficult to track down lost or not provided to the13 RR-shy‐1013 team due13 to contractual disputes As aresult funding that13 was earmarked for13 specific technical work at13 the conception of13 the13 program had to

be diverted13 for the recreation13 of data that existed13 in13 a previous program and13 the progress of the

program was stunted13 by this unforeseen13 burden In13 spite of the RR-shy‐1013 programrsquos13 ability to overcome

these issues and become a successful program it13 should be noted that13 the success of13 other13 programs that13 will encounter13 similar13 problems will likely not13 apply as RR-shy‐1013 was very fortunate13 to have13 the13 means to overcome these serious issues that13 were a result13 of13 the lack of13 a data management13 plan

Furthermore if the13 MGI initiative13 is to progressively develop materials models that are13 leveraged and improved in future government funded programs a policy on how to store and share thedata from disconnected13 programs is required The PW-shy‐1113 program is the first13 step in this process as it13 will have the data from three MAI programs (RR-shy‐10 GE-shy‐10 PW-shy‐9) uploaded into database13 that will be13 searchable and will enable the reuse of the data collected in these programs However even if this13 program is a success13 there will undoubtedly13 be other features13 that will be required in the platform

developed13 and13 hence future modification13 will be required Most importantly there does not currently13 exist server that has been set up and funded for the13 archiving13 of all13 future programs with a protocol13

17

13

13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

that13 must13 be adhered to at13 the conclusion of13 a funded program Without13 such a protocol and the

existence13 of server to be13 used in all future13 programs the13 goals set forth by MGI will be13 unnecessarily

difficult and13 more expensive to attain

MAI Data Informatics Data Standards Modeling amp UQDr Rajiv Naik Pratt amp Whitney

The MAI Data13 Informatics Data13 Standards Modeling amp Uncertainty Quantification (UQ) (PW-shy‐11)13 effort13 will enable the pervasive implementation of ICME by developing (i) a web-shy‐based13 OEM-shy‐Suppliercollaboration framework13 for production13 certification13 and13 process modeling data (ii) data standards and

protocols for interoperability between13 OEMs and13 Suppliers and13 (iii)13 UQ methods to capture variability in

process microstructure and property data Models and data13 from earlier MAI programs such as GE-shy‐10 PW-shy‐913 and RR-shy‐1013 will be13 used to develop and implement OEM-shy‐Supplier integration data standards data

structures and UQ models Several OEMs and suppliers (Pratt amp Whitney General Electric Rolls-shy‐RoyceTIMET Alcoa13 and RTI International Metals Inc) are working together on the PW-shy‐11 Team

The OEM-shy‐Supplier collaboration tool will enable data13 exchange related to mechanical testingmicrostructural13 characteristics processing and process modeling The13 data13 standards developed by the

PW-shy‐1113 team will provide13 an industry standard for such OEM-shy‐Supplier collaboration The collaboration13 framework would provide a platform to manage OEM-shy‐Supplier13 workflows data and13 informationsecurity and access13 control and the13 ability to perform system administration without the13 need13 for day-shy‐to-shy‐day intervention13 of the IT department The capture of such13 supplier data over the collaboration

framework will enable the OEMs to enhance process and property13 models data analytics and UQ

modeling Three potential technologies will be evaluated13 for the OEM-shy‐Supplier Collaboration13 Frameworkndash GE Forge Vanderbilt Forge and Semantic Web technology and one13 technology will be selected13 for the

next phase of the program

18

13

Semantic Web technology is paradigm shift from highly structured13 databases to13 more flexible

graph databases from single database searches to13 multiple and13 disparate13 database13 searches and from

keyword searches to contextual and meaningful (semantic) searches It can provide flexibility13 of data

structures and enhanced OEM-shy‐Supplier13 interoperability However it requires more work upfront13 for13 implementation as it needs the development of a common vocabulary and relationships

The data13 standards and data13 structures effort will leverage work at national and13 international standards13 organizations such13 as ASM ASTM CEN and13 NIST and13 develop13 standards and vocabularies forproduction13 certification data that can include chemistry microstructure and mechanical properties This effort will also develop means for the capture of data needed13 for the UQ and13 ICME models andbuild13 databases for13 the capture of13 the data from the GE-shy‐10 PW-shy‐9 and RR-shy‐10 MAI programs

The ICME13 models being developed under the GE-shy‐10 PW-shy‐9 and RR-shy‐10 programs involve severalinterconnected models related to processing microstructure and mechanical properties The UQ effort under PW-shy‐11 will involve considering the uncertainty at each13 step13 of the modeling process and flowing it through the interconnected models to evaluate the relationships between13 the variabilities of13 the

processing parameters the microstructure and13 the mechanical performance

ASMrsquos Computational Materials Data Network amp the NIST-shy‐ASM Structural Materials Data

Demonstration ProjectScott D Henry ASM International

ASM International ndash the worldrsquos largest13 association of13 metals-shy‐focused materials engineers scientists technicians educators and students13 ndash has launched13 a series of initiatives under the

ldquoComputational Materials Data (CMD)13 Networkrdquo umbrella The objective13 of the13 CMD Network is to serve13 as center for information collection and dissemination for materials data13 to support integrated

computational materials13 engineering (ICME) and to help realize the goals13 of the US Materials13 Genome

InitiativeThe first major project for the CMD Network is the NIST-shy‐ASM Structural Materials Data

Demonstration Project (SMDDP) a cooperative research project funded by NIST with participation from

the Kent13 State Center13 for13 Materials Informatics and Granta Design Ltd13 The main13 objectives of SMDDP

are13 to

19

13

bull Establish well-shy‐pedigreed13 and13 curated13 demonstration13 datasets for non-shy‐proprietary metallic structural materials13 data over multiple length scales The team has13 chosen to work with 6061

aluminum and the13 related Al-shy‐Mg-shy‐Si system

bull Work with NIST and the materials data community to develop materials data schema and

ontologies for the demonstration13 datasets cognizant of broader interests and13 datasets

bull Develop and carry out a series of test problems that represent relevant use cases for the13 repository

bull Make data open to the materials data community for use in data analytics modeling and

educational purposes

bull Actively engage the materials data community and13 widely disseminate the findings from the

project

bull Develop and implement data13 capture13 and curation procedures that can serve13 as models for otherdata repositories

The basic philosophy behind the project is that often standards13 emerge through trial and error and examples are13 needed to spur this process

The project is using the NIST-shy‐KSU DSpace13 repository for data13 files The13 curated structured

database is being developed13 using GRANTA13 MI software and13 is hosted13 o ASM serversThe project ndashlaunched in late 2013 and planned as an 18-shy‐month program13 ndash is being conducted in

phases The13 first phase13 involved developing starter database13 and gaining experience13 in data13 curation The second phase (currently underway) involves building out more comprehensive data13 sets that will be

suitable for carrying out test problems13 related to processing-shy‐structure-shy‐property relationships in13 Al 6061 The third phase will involve making further refinements in response to testing The complete data13 set will be opened up online to the materials community as a whole

Globus13 Scientific13 Data Services13 Current and FutureBen13 Blaiszik1 Kyle Chard1 Jim Pruyne1 Rachana Anathakrishnan1 Steve Tuecke12 Ian Foster12

1University of Chicago Computation Institute 2Argonne National Laboratory MCS Division

During this talk we discussed a variety of currently available Globus services13 Via these Globusservices developers13 and IT administrators13 can offer simple secure and robust file transfer as13 well as13 identity profile and group management to their user communities13 Globus allows end users to create

and13 manage a unique identity that can13 be linked13 to13 external identities for authentication End13 users can13 also transfer files across wide13 area13 networks faster and more13 easily whatever their location

The talk also covered prototype services related to data13 publication metadata tagging and13 data

search Globus13 publishing capabilities13 are delivered through a hosted service Metadata is13 stored in the

cloud while raw published data and a fully-shy‐formed metadata log are stored on campus institutionaland group resources that13 are managed and operated by campus administrators Published datasets are

20

13

organized13 by communities and13 their member collections Globus users can13 create and13 manage theirown13 communities and13 collections through13 the data publication13 service13 with collection level policies regarding user13 access Additionally metadata from these communities are imported into a centralized

index to allow for robust searching functionality

Semantics13 for Deep13 InteroperabilitySamuel Chance iNovex

Semantic technology is based on mature13 web standards and intended to be13 used as global framework for13 machine-shy‐to-shy‐machine communication Interoperability is enabled even when the more

basic elements of the semantic technology stack are used13 in13 conjunction with a controlled vocabularyKnowledge13 representation is enabled as one13 begins to utilize13 the13 ldquohigherrdquo elements of the13 stack

brief history of the semantic web13 was presented13 and13 started13 with13 a depiction13 of the web13 that most people see today documents and web pages presented to the user13 through a web application It13 then depicted a transition to an intermediate state where documents and data can co-shy‐exist on asemantic13 web riding atop the same infrastructure (http) Even in this13 state the data can13 be directly

interpreted by a machine13 The history concluded with a depiction of data being extracted from adocument to13 become part of the web13 of data

The basics of semantic technology were presented and emphasized that the technology has afoundation based13 o formal logic that enables software13 to understand information and reduce13 the13 need

for13 a human to interpret13 the meaning This section of13 the presentation concluded with a chart13 depicting

how ldquotriplesrdquo link to13 each13 other to13 form a web13 of dataThe final section touched upon adoption of13 and things to consider13 when semantic technologies

are13 being considered as solution for community or enterprise

21

13

Big Data13 and Semantic TechnologiesHarlan Shober RJ Lee Group

The presentation started with a description of entities that13 comprise laboratoryinformatics with the central elements being a federated architecture scientific ldquoBig Datardquo management13 and workflow-shy‐driven Laboratory Information Management13 Systems (LIMS)To aid attendee understanding initial context13 was provided by first13 defining for ldquoBig Datardquo and

Semantic Technologies Context13 was further bolstered through a description of a problem that13 included laboratory testingcharacterization and fabrication processes The challenges for atypical laboratory include the use of many systems and the creation of large amounts of data13 that13 grows daily but13 cannot13 be readily accessed

After the context13 was established the needed functionality was described and consistedof several elements contribute index search analyze and archive While bearing in mind theneeded functionality a string of universal scientific data13 management13 process categories werethen presented with the intent13 of applying the universal description to the materials design anddevelopment13 domain

22

13

Though semantic technologies are applicable over the entire lifecycle of the data13 management13 process three categories were specifically identified as primary candidates fortheir application Data13 Acquisition Data13 Manipulation and Storage and Discovery andAdvisory W3C standards for Resource Description Framework (RDF) were applied to a notionalexample with the intent13 of completing a thread that13 gave the attendee a sense of how thistechnology could be applied

The presentation continued with a description of Postulate a specific solutiondeveloped by RJ Lee Group that13 could be used as part13 of a suite of tools to enable the captureof data13 and the near-shy‐frictionless tagging of data13 with additional observations provenanceinformation or links to additional data Using tools like Postulate repositories could be moreeasily shared with others in a federated manner These technologies would enable thedevelopment13 of indices that13 increase the precision of search results and ease analytics since thedata13 would be ldquoself-shy‐describedrdquo

In the remainder of the presentation Postulate functionality was further described andsome of the challenges presented earlier were addressed For example how do you processthe quantity and variety of data how do you store it how do you find the data13 you care aboutand how can the data13 be presented to you in a meaningful way

Natural13 Language Processing (NLP)Tim Hawes Decisive Analytics

The presentation provided the attendees with a description of NLP13 and how it can be13 used It started with examples13 of how it can be used to mine ldquoBig Datardquo to predict credit card fraud or assess13 user purchase histories for product development NLP is also13 being used13 in13 the field13 of biology and13 appears to have13 enjoyed some13 level of success

Since13 the13 biology and materials communities both publish large13 volume13 of documents (abstracts papers presentations etc) many of13 the NLP approaches developed for13 biology may be

applicable13 to materials The next section of the presentation focused on describing various aspects of NLP tokenization

part-shy‐of-shy‐speech tagging disambiguation parsing entity disambiguation and role13 labeling This was helpful in13 removing the mystery surrounding NLP for those13 who hadnrsquot had previous exposure

Typical products from NLP13 would be useful to the materials development community Examples include providing the gist of a documentrsquos content providing answers to natural13 language questions and knowledge13 construction

Specific NLP13 challenges for the13 materials domain were13 identified inline13 use13 of equations prevalence of tables graphs and13 images (more of a computer vision13 problem) and13 numerically driven13 information13 Additionally the ldquogenrerdquo problem was discussed as13 well as13 a way this might be addressed

using raw data (versus learning data) and13 semi-shy‐supervised learning active learning and distant supervision approaches

23

13

The presentation concluded by reiterating how NLP13 can be used and the value of using it describing some its limitations and13 identifying tool development needed13 for the materials domain

24

13

Community13 and13 Government Priorities in13 Addressing13 Materials Data

The following numbered13 priorities for13 both community and government13 actions were lsquorolled uprsquofrom workshop participantsrsquo inputs and were reviewed at13 the workshop The raw inputs provided by

participants are captured13 under each13 numbered13 priority and provide context to13 the priority and13 often13 more specific actions

Community-shy‐Led13 Priorities1 Develop13 and deploy13 standards13 for data and federatedcollaborative environments-shy‐ Promote13 standards for data13 and collaboration environments-shy‐ Designing and deploying suitable software environments that are capable of facilitating e-shy‐

recording and accelerating (fail often-shy‐fail fast)13 of13 the communityrsquos experience (both successes and

failures)13 in exploring integrated workflows needed to realize the vision of ICME and MGI-shy‐ Data13 standards (program managementtechnical datametadata)-shy‐ Community must start developing a common13 data format data package could13 look like an13

HDF5 file and be shared in a variety of systems -shy‐ Begin13 definition13 of standardized13 infrastructure

-shy‐ Produce13 draft guideline for the metadata including information regarding authorship provenance targethellip

-shy‐ Share13 the13 data13 types being collected with the13 community Include13 nametypeunits of data

-shy‐ Community must adopt a federated13 approach13 -shy‐ allow organizations to autonomously create their13 own13 toolsets but be able to13 readwrite to13 the common13 data package mentioned13 in13 point 1

-shy‐ Good documentation on repositories are important-shy‐ Address networking of repositories (lightweight metadata databaseindex) -shy‐ Digital heterostructure microstructure mapping algorithms for both scanning electron microscopy

(SEM)13 and transmission electron microscopy (TEM) -shy‐ Materials process model (thermodynamics) integration with ICMSE (Integrated Computational

Materials Science and Engineering) data flow

-shy‐ Data sharing protocol across users (government13 OEM13 developers) with appropriate securitycontrol

-shy‐ Work with instrumentequipment OEMs to get auto ingest capabilities and open architectures-shy‐ Work together to define common vocabularies for interoperability-shy‐ Standard tests for ICME13 model data13 standards for microstructure13 data

-shy‐ Standards for ICME13 model data13 storage13 and workflows 2 Communicate the value and need13 for digital materials13 data-shy‐ Brand13 management ndash consistent messaging of goals13 and beliefs -shy‐ Share13 success stories -shy‐ Value (ROI13 Return on Investment) o data capture (globally locally)-shy‐ Give incentive ($$$) for case studies to provide evidence that a shared data repository can be

actually useful beyond the archival purpose

25

13

-shy‐ Share13 ROIs for MII (the13 Materials Innovation Infrastructure)-shy‐ ICME projects due to their iterative nature take longer and are more expensive to realize benefits

in industry implementation13 Multi-shy‐year ICME programs are becoming13 a hard sell to management difficult to13 establish ROI and13 realize benefits claimed

3 Engage the community-shy‐ Write grant proposals-shy‐ Collaboration13 (or e-shy‐collaboration) science that will help us13 identify13 the specific13 incentives13 and

impediments for establishing mutually beneficial highly productive sharing of expertise between13 disciplinary experts in13 materials manufacturing and13 data sciences

-shy‐ Continue to13 work o hard13 problemsmdashdevelop13 open13 source codes discover new descriptors data

mining uncertainty analysis-shy‐ Organize as a community (governmentindustryacademia nationalinternational) -shy‐ Follow data13 recommendations in MGI strategic planmdashworkshops to develop roadmap-shy‐ Attend13 more events like the MGI workshop13 mdash gaining13 lessons learned from other organizations is

extremely valuable

4 ldquoDefine critical datardquo13 to be13 compiled-shy‐ Consensus o what fundamental genomic data is most-shy‐important to compile (thermodynamics13

kinetics13 properties [strength]13 high-shy‐throughput13 DFT) -shy‐ Agree to13 the concept of a ldquouniversal repositoryrdquo for all genomic data that everyone will contribute

to and use (cf everyone maintaining and hosting their13 own databases) this includes issues of13 data

security and sharing etc -shy‐ List of most-shy‐important ICMEMGI13 materials development activities from an industrycommercial13

perspective (alloys composites piezoelectrics etc)-shy‐ Some13 ongoing repository efforts seem redundant We13 should avoid unnecessary competition and

try not13 to step o other peoples toes5 Explore and leverage data solutions developed13 outside materials13 community-shy‐ Semantic web infrastructure13 ndash evaluate13 feasibility of lsquounstructured materials datardquo management

(Technical pathcosttimeframe) -shy‐ Keep communicating outside13 the13 community about solutions -shy‐ Focus on available13 tools open ones when possible

6 Train13 students-shy‐ Sponsor students13 who can think in terms13 of IT and MSE

-shy‐ Training workshops (through professional societies) for training the new generation of materials students13 and researchers13 in the use of readily (already) accessible data science tools13 and cloud

services13 to make13 the13 best use13 of this emerging13 infrastructure13 in enhancing13 their own individual day-shy‐to-shy‐day productivity

7 Establish community13 norms13 for publishing materials13 data-shy‐ Involve publishers by requiring data when publishing Begin by making just the data available first

and then work on standardization panel

26

13

-shy‐ Data associated with publications should be shared along with the publications Not only the

metadata (eg average grain size vs cooling rate) but also the source of the data (eg micrographs used13 to13 generate13 grain size temperaturetime charts13 used to calculate cooling rates etc)

Government-shy‐Led13 Priorities1 Lead13 data strategyapproach13 development-shy‐ Hold meeting like this one in 6-shy‐1213 months with purpose13 of developing roadmap for MII-shy‐ Define a technical roadmap for13 the materials data infrastructure

-shy‐ Governmentrsquos got to decide on a strategy-shy‐ Whatrsquos the critical nucleus eg getting13 CALPHAD off the13 ground

-shy‐ Government must establish an official stance on security and infrastructure-shy‐ List of most-shy‐important ICMEMGI materials development activities from a government perspective

(alloys composites piezoelectrics etc)13 -shy‐ Needs assessment for professional workforce development-shy‐ Work with industry to make sure the research-shy‐to-shy‐engineering13 gap is filled What is the practical13 use

of the projects -shy‐ Provide13 the13 materials community organizing function

-shy‐ Business case for productivity improvement with13 data readily available

-shy‐ Facilitator as honest broker of capability awareness and data13 sharing

-shy‐ Interdisciplinary coordinated effort13 to avoidminimize duplication and redundancy

2 Facilitate data deposition (policy amp technology)-shy‐ Setting expectations and guidelines for best practices and standards where13 relevant -shy‐ Invest in tool13 making data management as easy as possible-shy‐ Promote standards developmentcommon data structuresshare knowledge representations -shy‐ Learn about the WBC Semantic Web capabilities and limitations -shy‐ Government should foster community development for both data package refinement and creating

standard approach to creating an enterprise toolset -shy‐ Provide13 motivation for standard data13 collection ensure13 it includes pedigreeprovenancemeta-shy‐data

-shy‐ Facilitator for materials data13 standards (good and trustworthy data13 vs bad data) -shy‐ Facilitator for forming consortium for data13 mining data13 warehousing and sharing

-shy‐ Facilitate13 industry standards for ICME13 model validation and maturity levels-shy‐ Facilitate13 standards for model-shy‐based13 process and13 supplier qualification

3 Incentivize data deposition-shy‐ Provide13 motivation for13 standard data collection ensure it13 includes13 pedigreeprovenancemeta-shy‐data-shy‐ Push ldquodata13 management planrdquo requirements to ldquodata13 depositrdquo requirements -shy‐ Make data sharing a requirement -shy‐ Contractual requirements that preserve data for future use

-shy‐ Demand data submission as a deliverable that will be in the contract before paying the funds -shy‐ Require that any data generated13 by federal grants is provided13 to13 the government (or a designated13

repository)13 within 3 years after completionmdashstart with basic13 research

-shy‐ Demonstrate the return on investments in the materials data infrastructure

27

13

4 Support13 data sharing13 and deposition13 (provide resources)-shy‐ Increase of MGI13 funds dedicated to MII13 development to 10-shy‐-shy‐repositories amp collab platforms -shy‐ Commit to13 long-shy‐term support for data repositories amp collaborative platforms-shy‐ Funding for development collection of fundamental genomic data

-shy‐ Support thermodynamicskinetics database development for light-shy‐weight alloy systems-shy‐ Funding for universal data13 repository (eg at NIST) ndash creation and maintenance

-shy‐ Fund or establish data13 repositories with commonuniform data13 collection format -shy‐ Createidentify secure data sharing site (like AMRDEC)13 for13 government13 contractorsmdashnot a

repository

-shy‐ AFRL take lead13 o making a database that can13 be13 shared

-shy‐ Infrastructure for materials related semantic web applications5 Support13 data creation-shy‐ Support experimental measurement effort to provide13 quality data for13 database development -shy‐ Support more13 high-shy‐throughput13 (HT)13 studies13 for applied programs13 to populate data Current HT

approaches support are13 limited to too fundamental single-shy‐phase DFT calculations 6 Enhance data management13 competency-shy‐ Government needs to increase in-shy‐house expertise in13 these areas or alternatively work closer with13

groups like NIST We cannot allow other organizations to define our needs for us7 Provide quality datasets13 for model13 developmentvalidation-shy‐ Need for open model benchmark cases with data to be available to the community

28

13

Attendees

John Allison University of MichiganJohn Beatty Army Research Laboratory

Ben13 Blaiszik University of ChicagoANL Carrie Campbell NIST

Sam Chance13 InovexJennifer13 Fielding Air Force Research LabMarco Fornari Central Michigan

Tim Hawes Decisive Analytics Corp

Lou Hector General MotorsAndrea Helbach Air Force Research LabKris Hill ITI-shy‐GlobalMike Glavicic Rolls Royce

Ro Gorham NCDMM

Dennis Griffin NASA

Mike Groeber Air Force Research LabScott Henry ASM InternationalMatt Jacobsen Air Force Research LabWill Joost Dept of Energy-shy‐EERE

Surya13 Kalidindi Georgia Tech

Kevin Kendig Air Force Research LabBen13 Leever Air Force Research LabLara Liou GE Aviation

Kenny Lipkowitz Office of Naval Research

Bill Mullins Office of Naval Research

Benji Maruyama Air Force Research LabRajiv Naik Pratt amp WhitneyJim Neumann Honeywell Ruth13 Pachter Air Force Research LabClare Paul Air Force Research LabAmra Peles United13 Technologies Research13 CenterAndy Powell GE Aviation

Brian13 Puchala University MichiganAjit Roy Air Force Research LabAyman13 Salem Materials ResourcesTom Searles Materials Data Management Jason Sebastian QuestekRichard13 Serwecki NASA

Dongwon Shin Oak Ridge13 National Labo

29

13

Harlan Shober RJ LeeJeff13 Sheehy NASA

Jeff13 Simmons Air Force Research LabVeera Sundararaghavan University of MichiganSesh Tamirisakandala RTI Glenn Tarcea University of MichiganTJ Turner Air Force Research LabVasisht Venkatesh Pratt amp WhitneyChuck Ward Air Force Research LabJim Warren Natl Institute of Standards amp TechJeff13 Williams GE Aviation

Terry Wong Rockedyne

Wayne Ziegler Army Research13 Laboratory

Jake Zindel Ford Motor Company

30

Page 20: Materials Genome Initiative: materials data · universallyaccepted best practices. Under the MGI, Federal agencies have funded research and development designedtoexplore, develop,

13

bull Mesh interpolation between13 finite element analysis based13 tools will likely be required and13 acommercially13 available tool for this13 type of composite mesh interpolation is13 key13 for robust and

sustained application of ICM2

Residual Stress Engineering of Rotating Nickel Components Terry Wong13 Aerojet Rocketdyne

The goal for the PW-shy‐8 project13 The Nickel Base Superalloy Residual Stress Foundational Engineering Problem program aims to13 develop13 a multi-shy‐disciplinary analysis approach13 to13 predict and13 incorporate bulk residual13 stress in the13 design of component This will allow for design of part that is optimized for a particular application or design intent (such13 as weight performance or reduction13 ofscrap) To accomplish this13 goal the PW-shy‐813 team has two major tasks 1) the13 development of an13 infrastructure and associated tools needed to predict and incorporate bulk residual13 stress in the designstages13 of a part and 2) a demonstration task that will validate the infrastructures13 and tools13 developed

The Data13 Management task is the foundational element13 of13 the larger13 infrastructure

development task The goal for the data management work is to13 develop13 an13 infrastructure that allowsthe various team members in PW-shy‐813 to share13 data13 and to store13 data13 in database This not only ensures that data generated13 by this program is never lost but also13 allows for data to13 be transferred13 in13 a secure

manner In order to meet the goal of a system13 that securely stores and shares data three major tasks must be accomplished 1) Validation that there13 is secure13 method to transfer data13 between team

members and that the method of storing data is also secure 2)13 development of13 a database schema and

3) development of means to easily upload data13 into the13 database13 and download data13 from the13 database

Of the three the most difficult task was to create a system in which the transfer and storage ofdata was secure Various proposals were made to13 ensure the security of the data but many of them

were rejected not because the proposal was actually unsecure but because it was not acceptable byeach of the13 team memberrsquos IT security policies The13 process to get each team membersrsquo IT department to approve the security of13 the data transfer13 and storage took more effort13 than anticipated In the endthe team members along with their13 respective IT departments approved the use of13 TRUcentrix as a tool to transfer13 data and the use of13 an ASM International13 hosted13 database to13 store data

The development of database schema13 is necessary step in creating the database to store all the data generated (via physical tests as well as data generated by models) A big help in creating the

database schema was to13 use the schema generated13 by Dr Steven13 Arnold13 from NASArsquos Glenn13 Research13 Center Using NASArsquos schema eliminated13 the need13 to13 start the database build13 from scratch13 and13 allowed13 us to13 focus o the specific needs of storing residual stress data

The last task involves the development of tools that make the input of data13 easy The moredifficult it is to13 enter data into13 a database the more likely data13 will not be13 entered Excel-shy‐based13 templates are being developed in order13 to make data entry as easy as possible

The two big lessons learned from PW-shy‐813 in relationship to data13 management are

bull It is essential13 to work with each13 companiesrsquo IT and IT Security people to develop a system thatsatisfies13 the security requirements of each company

16

13

bull Database schema development should not be reinvented but begin with already established the13 schema

Perspectives on MAI ProgramsDr Mike Glavicic Rolls Royce

The current formulation of MAI (AFRLrsquos Metals Affordability Initiative) programs are structured13 independently from one another in that a standard methodology does not exist that describes whatdata is to13 be archived13 at the conclusion13 of a program Nor does there exist a standard13 protocol of howto archive things on a server13 or some other computer resource at the conclusion13 of a program As aresult the current13 practices of13 MAI programs

bull Do not address IP (intellectual property)13 and licensing issues in systematic fashion which leadsto issues in the commercialization of ICME software that is generated13 under a program

bull Do not have a formalized requirement of how data is to be packaged and stored at the end of aprogram Hence even13 in13 the best of cases in13 which13 all of the data is stored the data is archived13 in13 simple folder13 fashion that13 is not13 searchable in any manner13 possible Moreover even if13 all of13 the

data is present it is extremely cumbersome to13 untangle the data for future use using a file folder strategy

Hence at the end of the program both the government and13 MAI team are13 left to manage13 the13 data and13 results in13 an13 unregimented13 fashion

The lack of regimented protocol of how to package programrsquos13 results13 (raw data and reports) makes it extremely difficult for new government programs to leverage the data from previous programs Case in13 point the RR-shy‐1013 program was developed with the13 singular goal to integrate13 the13 models developed13 in13 the LAD-shy‐213 program into commercial software13 platform From the13 outset of the13 program data models and13 other information13 that was required to attained the13 goals of the13 RR-shy‐1013 program were13 either difficult to track down lost or not provided to the13 RR-shy‐1013 team due13 to contractual disputes As aresult funding that13 was earmarked for13 specific technical work at13 the conception of13 the13 program had to

be diverted13 for the recreation13 of data that existed13 in13 a previous program and13 the progress of the

program was stunted13 by this unforeseen13 burden In13 spite of the RR-shy‐1013 programrsquos13 ability to overcome

these issues and become a successful program it13 should be noted that13 the success of13 other13 programs that13 will encounter13 similar13 problems will likely not13 apply as RR-shy‐1013 was very fortunate13 to have13 the13 means to overcome these serious issues that13 were a result13 of13 the lack of13 a data management13 plan

Furthermore if the13 MGI initiative13 is to progressively develop materials models that are13 leveraged and improved in future government funded programs a policy on how to store and share thedata from disconnected13 programs is required The PW-shy‐1113 program is the first13 step in this process as it13 will have the data from three MAI programs (RR-shy‐10 GE-shy‐10 PW-shy‐9) uploaded into database13 that will be13 searchable and will enable the reuse of the data collected in these programs However even if this13 program is a success13 there will undoubtedly13 be other features13 that will be required in the platform

developed13 and13 hence future modification13 will be required Most importantly there does not currently13 exist server that has been set up and funded for the13 archiving13 of all13 future programs with a protocol13

17

13

13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

that13 must13 be adhered to at13 the conclusion of13 a funded program Without13 such a protocol and the

existence13 of server to be13 used in all future13 programs the13 goals set forth by MGI will be13 unnecessarily

difficult and13 more expensive to attain

MAI Data Informatics Data Standards Modeling amp UQDr Rajiv Naik Pratt amp Whitney

The MAI Data13 Informatics Data13 Standards Modeling amp Uncertainty Quantification (UQ) (PW-shy‐11)13 effort13 will enable the pervasive implementation of ICME by developing (i) a web-shy‐based13 OEM-shy‐Suppliercollaboration framework13 for production13 certification13 and13 process modeling data (ii) data standards and

protocols for interoperability between13 OEMs and13 Suppliers and13 (iii)13 UQ methods to capture variability in

process microstructure and property data Models and data13 from earlier MAI programs such as GE-shy‐10 PW-shy‐913 and RR-shy‐1013 will be13 used to develop and implement OEM-shy‐Supplier integration data standards data

structures and UQ models Several OEMs and suppliers (Pratt amp Whitney General Electric Rolls-shy‐RoyceTIMET Alcoa13 and RTI International Metals Inc) are working together on the PW-shy‐11 Team

The OEM-shy‐Supplier collaboration tool will enable data13 exchange related to mechanical testingmicrostructural13 characteristics processing and process modeling The13 data13 standards developed by the

PW-shy‐1113 team will provide13 an industry standard for such OEM-shy‐Supplier collaboration The collaboration13 framework would provide a platform to manage OEM-shy‐Supplier13 workflows data and13 informationsecurity and access13 control and the13 ability to perform system administration without the13 need13 for day-shy‐to-shy‐day intervention13 of the IT department The capture of such13 supplier data over the collaboration

framework will enable the OEMs to enhance process and property13 models data analytics and UQ

modeling Three potential technologies will be evaluated13 for the OEM-shy‐Supplier Collaboration13 Frameworkndash GE Forge Vanderbilt Forge and Semantic Web technology and one13 technology will be selected13 for the

next phase of the program

18

13

Semantic Web technology is paradigm shift from highly structured13 databases to13 more flexible

graph databases from single database searches to13 multiple and13 disparate13 database13 searches and from

keyword searches to contextual and meaningful (semantic) searches It can provide flexibility13 of data

structures and enhanced OEM-shy‐Supplier13 interoperability However it requires more work upfront13 for13 implementation as it needs the development of a common vocabulary and relationships

The data13 standards and data13 structures effort will leverage work at national and13 international standards13 organizations such13 as ASM ASTM CEN and13 NIST and13 develop13 standards and vocabularies forproduction13 certification data that can include chemistry microstructure and mechanical properties This effort will also develop means for the capture of data needed13 for the UQ and13 ICME models andbuild13 databases for13 the capture of13 the data from the GE-shy‐10 PW-shy‐9 and RR-shy‐10 MAI programs

The ICME13 models being developed under the GE-shy‐10 PW-shy‐9 and RR-shy‐10 programs involve severalinterconnected models related to processing microstructure and mechanical properties The UQ effort under PW-shy‐11 will involve considering the uncertainty at each13 step13 of the modeling process and flowing it through the interconnected models to evaluate the relationships between13 the variabilities of13 the

processing parameters the microstructure and13 the mechanical performance

ASMrsquos Computational Materials Data Network amp the NIST-shy‐ASM Structural Materials Data

Demonstration ProjectScott D Henry ASM International

ASM International ndash the worldrsquos largest13 association of13 metals-shy‐focused materials engineers scientists technicians educators and students13 ndash has launched13 a series of initiatives under the

ldquoComputational Materials Data (CMD)13 Networkrdquo umbrella The objective13 of the13 CMD Network is to serve13 as center for information collection and dissemination for materials data13 to support integrated

computational materials13 engineering (ICME) and to help realize the goals13 of the US Materials13 Genome

InitiativeThe first major project for the CMD Network is the NIST-shy‐ASM Structural Materials Data

Demonstration Project (SMDDP) a cooperative research project funded by NIST with participation from

the Kent13 State Center13 for13 Materials Informatics and Granta Design Ltd13 The main13 objectives of SMDDP

are13 to

19

13

bull Establish well-shy‐pedigreed13 and13 curated13 demonstration13 datasets for non-shy‐proprietary metallic structural materials13 data over multiple length scales The team has13 chosen to work with 6061

aluminum and the13 related Al-shy‐Mg-shy‐Si system

bull Work with NIST and the materials data community to develop materials data schema and

ontologies for the demonstration13 datasets cognizant of broader interests and13 datasets

bull Develop and carry out a series of test problems that represent relevant use cases for the13 repository

bull Make data open to the materials data community for use in data analytics modeling and

educational purposes

bull Actively engage the materials data community and13 widely disseminate the findings from the

project

bull Develop and implement data13 capture13 and curation procedures that can serve13 as models for otherdata repositories

The basic philosophy behind the project is that often standards13 emerge through trial and error and examples are13 needed to spur this process

The project is using the NIST-shy‐KSU DSpace13 repository for data13 files The13 curated structured

database is being developed13 using GRANTA13 MI software and13 is hosted13 o ASM serversThe project ndashlaunched in late 2013 and planned as an 18-shy‐month program13 ndash is being conducted in

phases The13 first phase13 involved developing starter database13 and gaining experience13 in data13 curation The second phase (currently underway) involves building out more comprehensive data13 sets that will be

suitable for carrying out test problems13 related to processing-shy‐structure-shy‐property relationships in13 Al 6061 The third phase will involve making further refinements in response to testing The complete data13 set will be opened up online to the materials community as a whole

Globus13 Scientific13 Data Services13 Current and FutureBen13 Blaiszik1 Kyle Chard1 Jim Pruyne1 Rachana Anathakrishnan1 Steve Tuecke12 Ian Foster12

1University of Chicago Computation Institute 2Argonne National Laboratory MCS Division

During this talk we discussed a variety of currently available Globus services13 Via these Globusservices developers13 and IT administrators13 can offer simple secure and robust file transfer as13 well as13 identity profile and group management to their user communities13 Globus allows end users to create

and13 manage a unique identity that can13 be linked13 to13 external identities for authentication End13 users can13 also transfer files across wide13 area13 networks faster and more13 easily whatever their location

The talk also covered prototype services related to data13 publication metadata tagging and13 data

search Globus13 publishing capabilities13 are delivered through a hosted service Metadata is13 stored in the

cloud while raw published data and a fully-shy‐formed metadata log are stored on campus institutionaland group resources that13 are managed and operated by campus administrators Published datasets are

20

13

organized13 by communities and13 their member collections Globus users can13 create and13 manage theirown13 communities and13 collections through13 the data publication13 service13 with collection level policies regarding user13 access Additionally metadata from these communities are imported into a centralized

index to allow for robust searching functionality

Semantics13 for Deep13 InteroperabilitySamuel Chance iNovex

Semantic technology is based on mature13 web standards and intended to be13 used as global framework for13 machine-shy‐to-shy‐machine communication Interoperability is enabled even when the more

basic elements of the semantic technology stack are used13 in13 conjunction with a controlled vocabularyKnowledge13 representation is enabled as one13 begins to utilize13 the13 ldquohigherrdquo elements of the13 stack

brief history of the semantic web13 was presented13 and13 started13 with13 a depiction13 of the web13 that most people see today documents and web pages presented to the user13 through a web application It13 then depicted a transition to an intermediate state where documents and data can co-shy‐exist on asemantic13 web riding atop the same infrastructure (http) Even in this13 state the data can13 be directly

interpreted by a machine13 The history concluded with a depiction of data being extracted from adocument to13 become part of the web13 of data

The basics of semantic technology were presented and emphasized that the technology has afoundation based13 o formal logic that enables software13 to understand information and reduce13 the13 need

for13 a human to interpret13 the meaning This section of13 the presentation concluded with a chart13 depicting

how ldquotriplesrdquo link to13 each13 other to13 form a web13 of dataThe final section touched upon adoption of13 and things to consider13 when semantic technologies

are13 being considered as solution for community or enterprise

21

13

Big Data13 and Semantic TechnologiesHarlan Shober RJ Lee Group

The presentation started with a description of entities that13 comprise laboratoryinformatics with the central elements being a federated architecture scientific ldquoBig Datardquo management13 and workflow-shy‐driven Laboratory Information Management13 Systems (LIMS)To aid attendee understanding initial context13 was provided by first13 defining for ldquoBig Datardquo and

Semantic Technologies Context13 was further bolstered through a description of a problem that13 included laboratory testingcharacterization and fabrication processes The challenges for atypical laboratory include the use of many systems and the creation of large amounts of data13 that13 grows daily but13 cannot13 be readily accessed

After the context13 was established the needed functionality was described and consistedof several elements contribute index search analyze and archive While bearing in mind theneeded functionality a string of universal scientific data13 management13 process categories werethen presented with the intent13 of applying the universal description to the materials design anddevelopment13 domain

22

13

Though semantic technologies are applicable over the entire lifecycle of the data13 management13 process three categories were specifically identified as primary candidates fortheir application Data13 Acquisition Data13 Manipulation and Storage and Discovery andAdvisory W3C standards for Resource Description Framework (RDF) were applied to a notionalexample with the intent13 of completing a thread that13 gave the attendee a sense of how thistechnology could be applied

The presentation continued with a description of Postulate a specific solutiondeveloped by RJ Lee Group that13 could be used as part13 of a suite of tools to enable the captureof data13 and the near-shy‐frictionless tagging of data13 with additional observations provenanceinformation or links to additional data Using tools like Postulate repositories could be moreeasily shared with others in a federated manner These technologies would enable thedevelopment13 of indices that13 increase the precision of search results and ease analytics since thedata13 would be ldquoself-shy‐describedrdquo

In the remainder of the presentation Postulate functionality was further described andsome of the challenges presented earlier were addressed For example how do you processthe quantity and variety of data how do you store it how do you find the data13 you care aboutand how can the data13 be presented to you in a meaningful way

Natural13 Language Processing (NLP)Tim Hawes Decisive Analytics

The presentation provided the attendees with a description of NLP13 and how it can be13 used It started with examples13 of how it can be used to mine ldquoBig Datardquo to predict credit card fraud or assess13 user purchase histories for product development NLP is also13 being used13 in13 the field13 of biology and13 appears to have13 enjoyed some13 level of success

Since13 the13 biology and materials communities both publish large13 volume13 of documents (abstracts papers presentations etc) many of13 the NLP approaches developed for13 biology may be

applicable13 to materials The next section of the presentation focused on describing various aspects of NLP tokenization

part-shy‐of-shy‐speech tagging disambiguation parsing entity disambiguation and role13 labeling This was helpful in13 removing the mystery surrounding NLP for those13 who hadnrsquot had previous exposure

Typical products from NLP13 would be useful to the materials development community Examples include providing the gist of a documentrsquos content providing answers to natural13 language questions and knowledge13 construction

Specific NLP13 challenges for the13 materials domain were13 identified inline13 use13 of equations prevalence of tables graphs and13 images (more of a computer vision13 problem) and13 numerically driven13 information13 Additionally the ldquogenrerdquo problem was discussed as13 well as13 a way this might be addressed

using raw data (versus learning data) and13 semi-shy‐supervised learning active learning and distant supervision approaches

23

13

The presentation concluded by reiterating how NLP13 can be used and the value of using it describing some its limitations and13 identifying tool development needed13 for the materials domain

24

13

Community13 and13 Government Priorities in13 Addressing13 Materials Data

The following numbered13 priorities for13 both community and government13 actions were lsquorolled uprsquofrom workshop participantsrsquo inputs and were reviewed at13 the workshop The raw inputs provided by

participants are captured13 under each13 numbered13 priority and provide context to13 the priority and13 often13 more specific actions

Community-shy‐Led13 Priorities1 Develop13 and deploy13 standards13 for data and federatedcollaborative environments-shy‐ Promote13 standards for data13 and collaboration environments-shy‐ Designing and deploying suitable software environments that are capable of facilitating e-shy‐

recording and accelerating (fail often-shy‐fail fast)13 of13 the communityrsquos experience (both successes and

failures)13 in exploring integrated workflows needed to realize the vision of ICME and MGI-shy‐ Data13 standards (program managementtechnical datametadata)-shy‐ Community must start developing a common13 data format data package could13 look like an13

HDF5 file and be shared in a variety of systems -shy‐ Begin13 definition13 of standardized13 infrastructure

-shy‐ Produce13 draft guideline for the metadata including information regarding authorship provenance targethellip

-shy‐ Share13 the13 data13 types being collected with the13 community Include13 nametypeunits of data

-shy‐ Community must adopt a federated13 approach13 -shy‐ allow organizations to autonomously create their13 own13 toolsets but be able to13 readwrite to13 the common13 data package mentioned13 in13 point 1

-shy‐ Good documentation on repositories are important-shy‐ Address networking of repositories (lightweight metadata databaseindex) -shy‐ Digital heterostructure microstructure mapping algorithms for both scanning electron microscopy

(SEM)13 and transmission electron microscopy (TEM) -shy‐ Materials process model (thermodynamics) integration with ICMSE (Integrated Computational

Materials Science and Engineering) data flow

-shy‐ Data sharing protocol across users (government13 OEM13 developers) with appropriate securitycontrol

-shy‐ Work with instrumentequipment OEMs to get auto ingest capabilities and open architectures-shy‐ Work together to define common vocabularies for interoperability-shy‐ Standard tests for ICME13 model data13 standards for microstructure13 data

-shy‐ Standards for ICME13 model data13 storage13 and workflows 2 Communicate the value and need13 for digital materials13 data-shy‐ Brand13 management ndash consistent messaging of goals13 and beliefs -shy‐ Share13 success stories -shy‐ Value (ROI13 Return on Investment) o data capture (globally locally)-shy‐ Give incentive ($$$) for case studies to provide evidence that a shared data repository can be

actually useful beyond the archival purpose

25

13

-shy‐ Share13 ROIs for MII (the13 Materials Innovation Infrastructure)-shy‐ ICME projects due to their iterative nature take longer and are more expensive to realize benefits

in industry implementation13 Multi-shy‐year ICME programs are becoming13 a hard sell to management difficult to13 establish ROI and13 realize benefits claimed

3 Engage the community-shy‐ Write grant proposals-shy‐ Collaboration13 (or e-shy‐collaboration) science that will help us13 identify13 the specific13 incentives13 and

impediments for establishing mutually beneficial highly productive sharing of expertise between13 disciplinary experts in13 materials manufacturing and13 data sciences

-shy‐ Continue to13 work o hard13 problemsmdashdevelop13 open13 source codes discover new descriptors data

mining uncertainty analysis-shy‐ Organize as a community (governmentindustryacademia nationalinternational) -shy‐ Follow data13 recommendations in MGI strategic planmdashworkshops to develop roadmap-shy‐ Attend13 more events like the MGI workshop13 mdash gaining13 lessons learned from other organizations is

extremely valuable

4 ldquoDefine critical datardquo13 to be13 compiled-shy‐ Consensus o what fundamental genomic data is most-shy‐important to compile (thermodynamics13

kinetics13 properties [strength]13 high-shy‐throughput13 DFT) -shy‐ Agree to13 the concept of a ldquouniversal repositoryrdquo for all genomic data that everyone will contribute

to and use (cf everyone maintaining and hosting their13 own databases) this includes issues of13 data

security and sharing etc -shy‐ List of most-shy‐important ICMEMGI13 materials development activities from an industrycommercial13

perspective (alloys composites piezoelectrics etc)-shy‐ Some13 ongoing repository efforts seem redundant We13 should avoid unnecessary competition and

try not13 to step o other peoples toes5 Explore and leverage data solutions developed13 outside materials13 community-shy‐ Semantic web infrastructure13 ndash evaluate13 feasibility of lsquounstructured materials datardquo management

(Technical pathcosttimeframe) -shy‐ Keep communicating outside13 the13 community about solutions -shy‐ Focus on available13 tools open ones when possible

6 Train13 students-shy‐ Sponsor students13 who can think in terms13 of IT and MSE

-shy‐ Training workshops (through professional societies) for training the new generation of materials students13 and researchers13 in the use of readily (already) accessible data science tools13 and cloud

services13 to make13 the13 best use13 of this emerging13 infrastructure13 in enhancing13 their own individual day-shy‐to-shy‐day productivity

7 Establish community13 norms13 for publishing materials13 data-shy‐ Involve publishers by requiring data when publishing Begin by making just the data available first

and then work on standardization panel

26

13

-shy‐ Data associated with publications should be shared along with the publications Not only the

metadata (eg average grain size vs cooling rate) but also the source of the data (eg micrographs used13 to13 generate13 grain size temperaturetime charts13 used to calculate cooling rates etc)

Government-shy‐Led13 Priorities1 Lead13 data strategyapproach13 development-shy‐ Hold meeting like this one in 6-shy‐1213 months with purpose13 of developing roadmap for MII-shy‐ Define a technical roadmap for13 the materials data infrastructure

-shy‐ Governmentrsquos got to decide on a strategy-shy‐ Whatrsquos the critical nucleus eg getting13 CALPHAD off the13 ground

-shy‐ Government must establish an official stance on security and infrastructure-shy‐ List of most-shy‐important ICMEMGI materials development activities from a government perspective

(alloys composites piezoelectrics etc)13 -shy‐ Needs assessment for professional workforce development-shy‐ Work with industry to make sure the research-shy‐to-shy‐engineering13 gap is filled What is the practical13 use

of the projects -shy‐ Provide13 the13 materials community organizing function

-shy‐ Business case for productivity improvement with13 data readily available

-shy‐ Facilitator as honest broker of capability awareness and data13 sharing

-shy‐ Interdisciplinary coordinated effort13 to avoidminimize duplication and redundancy

2 Facilitate data deposition (policy amp technology)-shy‐ Setting expectations and guidelines for best practices and standards where13 relevant -shy‐ Invest in tool13 making data management as easy as possible-shy‐ Promote standards developmentcommon data structuresshare knowledge representations -shy‐ Learn about the WBC Semantic Web capabilities and limitations -shy‐ Government should foster community development for both data package refinement and creating

standard approach to creating an enterprise toolset -shy‐ Provide13 motivation for standard data13 collection ensure13 it includes pedigreeprovenancemeta-shy‐data

-shy‐ Facilitator for materials data13 standards (good and trustworthy data13 vs bad data) -shy‐ Facilitator for forming consortium for data13 mining data13 warehousing and sharing

-shy‐ Facilitate13 industry standards for ICME13 model validation and maturity levels-shy‐ Facilitate13 standards for model-shy‐based13 process and13 supplier qualification

3 Incentivize data deposition-shy‐ Provide13 motivation for13 standard data collection ensure it13 includes13 pedigreeprovenancemeta-shy‐data-shy‐ Push ldquodata13 management planrdquo requirements to ldquodata13 depositrdquo requirements -shy‐ Make data sharing a requirement -shy‐ Contractual requirements that preserve data for future use

-shy‐ Demand data submission as a deliverable that will be in the contract before paying the funds -shy‐ Require that any data generated13 by federal grants is provided13 to13 the government (or a designated13

repository)13 within 3 years after completionmdashstart with basic13 research

-shy‐ Demonstrate the return on investments in the materials data infrastructure

27

13

4 Support13 data sharing13 and deposition13 (provide resources)-shy‐ Increase of MGI13 funds dedicated to MII13 development to 10-shy‐-shy‐repositories amp collab platforms -shy‐ Commit to13 long-shy‐term support for data repositories amp collaborative platforms-shy‐ Funding for development collection of fundamental genomic data

-shy‐ Support thermodynamicskinetics database development for light-shy‐weight alloy systems-shy‐ Funding for universal data13 repository (eg at NIST) ndash creation and maintenance

-shy‐ Fund or establish data13 repositories with commonuniform data13 collection format -shy‐ Createidentify secure data sharing site (like AMRDEC)13 for13 government13 contractorsmdashnot a

repository

-shy‐ AFRL take lead13 o making a database that can13 be13 shared

-shy‐ Infrastructure for materials related semantic web applications5 Support13 data creation-shy‐ Support experimental measurement effort to provide13 quality data for13 database development -shy‐ Support more13 high-shy‐throughput13 (HT)13 studies13 for applied programs13 to populate data Current HT

approaches support are13 limited to too fundamental single-shy‐phase DFT calculations 6 Enhance data management13 competency-shy‐ Government needs to increase in-shy‐house expertise in13 these areas or alternatively work closer with13

groups like NIST We cannot allow other organizations to define our needs for us7 Provide quality datasets13 for model13 developmentvalidation-shy‐ Need for open model benchmark cases with data to be available to the community

28

13

Attendees

John Allison University of MichiganJohn Beatty Army Research Laboratory

Ben13 Blaiszik University of ChicagoANL Carrie Campbell NIST

Sam Chance13 InovexJennifer13 Fielding Air Force Research LabMarco Fornari Central Michigan

Tim Hawes Decisive Analytics Corp

Lou Hector General MotorsAndrea Helbach Air Force Research LabKris Hill ITI-shy‐GlobalMike Glavicic Rolls Royce

Ro Gorham NCDMM

Dennis Griffin NASA

Mike Groeber Air Force Research LabScott Henry ASM InternationalMatt Jacobsen Air Force Research LabWill Joost Dept of Energy-shy‐EERE

Surya13 Kalidindi Georgia Tech

Kevin Kendig Air Force Research LabBen13 Leever Air Force Research LabLara Liou GE Aviation

Kenny Lipkowitz Office of Naval Research

Bill Mullins Office of Naval Research

Benji Maruyama Air Force Research LabRajiv Naik Pratt amp WhitneyJim Neumann Honeywell Ruth13 Pachter Air Force Research LabClare Paul Air Force Research LabAmra Peles United13 Technologies Research13 CenterAndy Powell GE Aviation

Brian13 Puchala University MichiganAjit Roy Air Force Research LabAyman13 Salem Materials ResourcesTom Searles Materials Data Management Jason Sebastian QuestekRichard13 Serwecki NASA

Dongwon Shin Oak Ridge13 National Labo

29

13

Harlan Shober RJ LeeJeff13 Sheehy NASA

Jeff13 Simmons Air Force Research LabVeera Sundararaghavan University of MichiganSesh Tamirisakandala RTI Glenn Tarcea University of MichiganTJ Turner Air Force Research LabVasisht Venkatesh Pratt amp WhitneyChuck Ward Air Force Research LabJim Warren Natl Institute of Standards amp TechJeff13 Williams GE Aviation

Terry Wong Rockedyne

Wayne Ziegler Army Research13 Laboratory

Jake Zindel Ford Motor Company

30

Page 21: Materials Genome Initiative: materials data · universallyaccepted best practices. Under the MGI, Federal agencies have funded research and development designedtoexplore, develop,

13

bull Database schema development should not be reinvented but begin with already established the13 schema

Perspectives on MAI ProgramsDr Mike Glavicic Rolls Royce

The current formulation of MAI (AFRLrsquos Metals Affordability Initiative) programs are structured13 independently from one another in that a standard methodology does not exist that describes whatdata is to13 be archived13 at the conclusion13 of a program Nor does there exist a standard13 protocol of howto archive things on a server13 or some other computer resource at the conclusion13 of a program As aresult the current13 practices of13 MAI programs

bull Do not address IP (intellectual property)13 and licensing issues in systematic fashion which leadsto issues in the commercialization of ICME software that is generated13 under a program

bull Do not have a formalized requirement of how data is to be packaged and stored at the end of aprogram Hence even13 in13 the best of cases in13 which13 all of the data is stored the data is archived13 in13 simple folder13 fashion that13 is not13 searchable in any manner13 possible Moreover even if13 all of13 the

data is present it is extremely cumbersome to13 untangle the data for future use using a file folder strategy

Hence at the end of the program both the government and13 MAI team are13 left to manage13 the13 data and13 results in13 an13 unregimented13 fashion

The lack of regimented protocol of how to package programrsquos13 results13 (raw data and reports) makes it extremely difficult for new government programs to leverage the data from previous programs Case in13 point the RR-shy‐1013 program was developed with the13 singular goal to integrate13 the13 models developed13 in13 the LAD-shy‐213 program into commercial software13 platform From the13 outset of the13 program data models and13 other information13 that was required to attained the13 goals of the13 RR-shy‐1013 program were13 either difficult to track down lost or not provided to the13 RR-shy‐1013 team due13 to contractual disputes As aresult funding that13 was earmarked for13 specific technical work at13 the conception of13 the13 program had to

be diverted13 for the recreation13 of data that existed13 in13 a previous program and13 the progress of the

program was stunted13 by this unforeseen13 burden In13 spite of the RR-shy‐1013 programrsquos13 ability to overcome

these issues and become a successful program it13 should be noted that13 the success of13 other13 programs that13 will encounter13 similar13 problems will likely not13 apply as RR-shy‐1013 was very fortunate13 to have13 the13 means to overcome these serious issues that13 were a result13 of13 the lack of13 a data management13 plan

Furthermore if the13 MGI initiative13 is to progressively develop materials models that are13 leveraged and improved in future government funded programs a policy on how to store and share thedata from disconnected13 programs is required The PW-shy‐1113 program is the first13 step in this process as it13 will have the data from three MAI programs (RR-shy‐10 GE-shy‐10 PW-shy‐9) uploaded into database13 that will be13 searchable and will enable the reuse of the data collected in these programs However even if this13 program is a success13 there will undoubtedly13 be other features13 that will be required in the platform

developed13 and13 hence future modification13 will be required Most importantly there does not currently13 exist server that has been set up and funded for the13 archiving13 of all13 future programs with a protocol13

17

13

13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

that13 must13 be adhered to at13 the conclusion of13 a funded program Without13 such a protocol and the

existence13 of server to be13 used in all future13 programs the13 goals set forth by MGI will be13 unnecessarily

difficult and13 more expensive to attain

MAI Data Informatics Data Standards Modeling amp UQDr Rajiv Naik Pratt amp Whitney

The MAI Data13 Informatics Data13 Standards Modeling amp Uncertainty Quantification (UQ) (PW-shy‐11)13 effort13 will enable the pervasive implementation of ICME by developing (i) a web-shy‐based13 OEM-shy‐Suppliercollaboration framework13 for production13 certification13 and13 process modeling data (ii) data standards and

protocols for interoperability between13 OEMs and13 Suppliers and13 (iii)13 UQ methods to capture variability in

process microstructure and property data Models and data13 from earlier MAI programs such as GE-shy‐10 PW-shy‐913 and RR-shy‐1013 will be13 used to develop and implement OEM-shy‐Supplier integration data standards data

structures and UQ models Several OEMs and suppliers (Pratt amp Whitney General Electric Rolls-shy‐RoyceTIMET Alcoa13 and RTI International Metals Inc) are working together on the PW-shy‐11 Team

The OEM-shy‐Supplier collaboration tool will enable data13 exchange related to mechanical testingmicrostructural13 characteristics processing and process modeling The13 data13 standards developed by the

PW-shy‐1113 team will provide13 an industry standard for such OEM-shy‐Supplier collaboration The collaboration13 framework would provide a platform to manage OEM-shy‐Supplier13 workflows data and13 informationsecurity and access13 control and the13 ability to perform system administration without the13 need13 for day-shy‐to-shy‐day intervention13 of the IT department The capture of such13 supplier data over the collaboration

framework will enable the OEMs to enhance process and property13 models data analytics and UQ

modeling Three potential technologies will be evaluated13 for the OEM-shy‐Supplier Collaboration13 Frameworkndash GE Forge Vanderbilt Forge and Semantic Web technology and one13 technology will be selected13 for the

next phase of the program

18

13

Semantic Web technology is paradigm shift from highly structured13 databases to13 more flexible

graph databases from single database searches to13 multiple and13 disparate13 database13 searches and from

keyword searches to contextual and meaningful (semantic) searches It can provide flexibility13 of data

structures and enhanced OEM-shy‐Supplier13 interoperability However it requires more work upfront13 for13 implementation as it needs the development of a common vocabulary and relationships

The data13 standards and data13 structures effort will leverage work at national and13 international standards13 organizations such13 as ASM ASTM CEN and13 NIST and13 develop13 standards and vocabularies forproduction13 certification data that can include chemistry microstructure and mechanical properties This effort will also develop means for the capture of data needed13 for the UQ and13 ICME models andbuild13 databases for13 the capture of13 the data from the GE-shy‐10 PW-shy‐9 and RR-shy‐10 MAI programs

The ICME13 models being developed under the GE-shy‐10 PW-shy‐9 and RR-shy‐10 programs involve severalinterconnected models related to processing microstructure and mechanical properties The UQ effort under PW-shy‐11 will involve considering the uncertainty at each13 step13 of the modeling process and flowing it through the interconnected models to evaluate the relationships between13 the variabilities of13 the

processing parameters the microstructure and13 the mechanical performance

ASMrsquos Computational Materials Data Network amp the NIST-shy‐ASM Structural Materials Data

Demonstration ProjectScott D Henry ASM International

ASM International ndash the worldrsquos largest13 association of13 metals-shy‐focused materials engineers scientists technicians educators and students13 ndash has launched13 a series of initiatives under the

ldquoComputational Materials Data (CMD)13 Networkrdquo umbrella The objective13 of the13 CMD Network is to serve13 as center for information collection and dissemination for materials data13 to support integrated

computational materials13 engineering (ICME) and to help realize the goals13 of the US Materials13 Genome

InitiativeThe first major project for the CMD Network is the NIST-shy‐ASM Structural Materials Data

Demonstration Project (SMDDP) a cooperative research project funded by NIST with participation from

the Kent13 State Center13 for13 Materials Informatics and Granta Design Ltd13 The main13 objectives of SMDDP

are13 to

19

13

bull Establish well-shy‐pedigreed13 and13 curated13 demonstration13 datasets for non-shy‐proprietary metallic structural materials13 data over multiple length scales The team has13 chosen to work with 6061

aluminum and the13 related Al-shy‐Mg-shy‐Si system

bull Work with NIST and the materials data community to develop materials data schema and

ontologies for the demonstration13 datasets cognizant of broader interests and13 datasets

bull Develop and carry out a series of test problems that represent relevant use cases for the13 repository

bull Make data open to the materials data community for use in data analytics modeling and

educational purposes

bull Actively engage the materials data community and13 widely disseminate the findings from the

project

bull Develop and implement data13 capture13 and curation procedures that can serve13 as models for otherdata repositories

The basic philosophy behind the project is that often standards13 emerge through trial and error and examples are13 needed to spur this process

The project is using the NIST-shy‐KSU DSpace13 repository for data13 files The13 curated structured

database is being developed13 using GRANTA13 MI software and13 is hosted13 o ASM serversThe project ndashlaunched in late 2013 and planned as an 18-shy‐month program13 ndash is being conducted in

phases The13 first phase13 involved developing starter database13 and gaining experience13 in data13 curation The second phase (currently underway) involves building out more comprehensive data13 sets that will be

suitable for carrying out test problems13 related to processing-shy‐structure-shy‐property relationships in13 Al 6061 The third phase will involve making further refinements in response to testing The complete data13 set will be opened up online to the materials community as a whole

Globus13 Scientific13 Data Services13 Current and FutureBen13 Blaiszik1 Kyle Chard1 Jim Pruyne1 Rachana Anathakrishnan1 Steve Tuecke12 Ian Foster12

1University of Chicago Computation Institute 2Argonne National Laboratory MCS Division

During this talk we discussed a variety of currently available Globus services13 Via these Globusservices developers13 and IT administrators13 can offer simple secure and robust file transfer as13 well as13 identity profile and group management to their user communities13 Globus allows end users to create

and13 manage a unique identity that can13 be linked13 to13 external identities for authentication End13 users can13 also transfer files across wide13 area13 networks faster and more13 easily whatever their location

The talk also covered prototype services related to data13 publication metadata tagging and13 data

search Globus13 publishing capabilities13 are delivered through a hosted service Metadata is13 stored in the

cloud while raw published data and a fully-shy‐formed metadata log are stored on campus institutionaland group resources that13 are managed and operated by campus administrators Published datasets are

20

13

organized13 by communities and13 their member collections Globus users can13 create and13 manage theirown13 communities and13 collections through13 the data publication13 service13 with collection level policies regarding user13 access Additionally metadata from these communities are imported into a centralized

index to allow for robust searching functionality

Semantics13 for Deep13 InteroperabilitySamuel Chance iNovex

Semantic technology is based on mature13 web standards and intended to be13 used as global framework for13 machine-shy‐to-shy‐machine communication Interoperability is enabled even when the more

basic elements of the semantic technology stack are used13 in13 conjunction with a controlled vocabularyKnowledge13 representation is enabled as one13 begins to utilize13 the13 ldquohigherrdquo elements of the13 stack

brief history of the semantic web13 was presented13 and13 started13 with13 a depiction13 of the web13 that most people see today documents and web pages presented to the user13 through a web application It13 then depicted a transition to an intermediate state where documents and data can co-shy‐exist on asemantic13 web riding atop the same infrastructure (http) Even in this13 state the data can13 be directly

interpreted by a machine13 The history concluded with a depiction of data being extracted from adocument to13 become part of the web13 of data

The basics of semantic technology were presented and emphasized that the technology has afoundation based13 o formal logic that enables software13 to understand information and reduce13 the13 need

for13 a human to interpret13 the meaning This section of13 the presentation concluded with a chart13 depicting

how ldquotriplesrdquo link to13 each13 other to13 form a web13 of dataThe final section touched upon adoption of13 and things to consider13 when semantic technologies

are13 being considered as solution for community or enterprise

21

13

Big Data13 and Semantic TechnologiesHarlan Shober RJ Lee Group

The presentation started with a description of entities that13 comprise laboratoryinformatics with the central elements being a federated architecture scientific ldquoBig Datardquo management13 and workflow-shy‐driven Laboratory Information Management13 Systems (LIMS)To aid attendee understanding initial context13 was provided by first13 defining for ldquoBig Datardquo and

Semantic Technologies Context13 was further bolstered through a description of a problem that13 included laboratory testingcharacterization and fabrication processes The challenges for atypical laboratory include the use of many systems and the creation of large amounts of data13 that13 grows daily but13 cannot13 be readily accessed

After the context13 was established the needed functionality was described and consistedof several elements contribute index search analyze and archive While bearing in mind theneeded functionality a string of universal scientific data13 management13 process categories werethen presented with the intent13 of applying the universal description to the materials design anddevelopment13 domain

22

13

Though semantic technologies are applicable over the entire lifecycle of the data13 management13 process three categories were specifically identified as primary candidates fortheir application Data13 Acquisition Data13 Manipulation and Storage and Discovery andAdvisory W3C standards for Resource Description Framework (RDF) were applied to a notionalexample with the intent13 of completing a thread that13 gave the attendee a sense of how thistechnology could be applied

The presentation continued with a description of Postulate a specific solutiondeveloped by RJ Lee Group that13 could be used as part13 of a suite of tools to enable the captureof data13 and the near-shy‐frictionless tagging of data13 with additional observations provenanceinformation or links to additional data Using tools like Postulate repositories could be moreeasily shared with others in a federated manner These technologies would enable thedevelopment13 of indices that13 increase the precision of search results and ease analytics since thedata13 would be ldquoself-shy‐describedrdquo

In the remainder of the presentation Postulate functionality was further described andsome of the challenges presented earlier were addressed For example how do you processthe quantity and variety of data how do you store it how do you find the data13 you care aboutand how can the data13 be presented to you in a meaningful way

Natural13 Language Processing (NLP)Tim Hawes Decisive Analytics

The presentation provided the attendees with a description of NLP13 and how it can be13 used It started with examples13 of how it can be used to mine ldquoBig Datardquo to predict credit card fraud or assess13 user purchase histories for product development NLP is also13 being used13 in13 the field13 of biology and13 appears to have13 enjoyed some13 level of success

Since13 the13 biology and materials communities both publish large13 volume13 of documents (abstracts papers presentations etc) many of13 the NLP approaches developed for13 biology may be

applicable13 to materials The next section of the presentation focused on describing various aspects of NLP tokenization

part-shy‐of-shy‐speech tagging disambiguation parsing entity disambiguation and role13 labeling This was helpful in13 removing the mystery surrounding NLP for those13 who hadnrsquot had previous exposure

Typical products from NLP13 would be useful to the materials development community Examples include providing the gist of a documentrsquos content providing answers to natural13 language questions and knowledge13 construction

Specific NLP13 challenges for the13 materials domain were13 identified inline13 use13 of equations prevalence of tables graphs and13 images (more of a computer vision13 problem) and13 numerically driven13 information13 Additionally the ldquogenrerdquo problem was discussed as13 well as13 a way this might be addressed

using raw data (versus learning data) and13 semi-shy‐supervised learning active learning and distant supervision approaches

23

13

The presentation concluded by reiterating how NLP13 can be used and the value of using it describing some its limitations and13 identifying tool development needed13 for the materials domain

24

13

Community13 and13 Government Priorities in13 Addressing13 Materials Data

The following numbered13 priorities for13 both community and government13 actions were lsquorolled uprsquofrom workshop participantsrsquo inputs and were reviewed at13 the workshop The raw inputs provided by

participants are captured13 under each13 numbered13 priority and provide context to13 the priority and13 often13 more specific actions

Community-shy‐Led13 Priorities1 Develop13 and deploy13 standards13 for data and federatedcollaborative environments-shy‐ Promote13 standards for data13 and collaboration environments-shy‐ Designing and deploying suitable software environments that are capable of facilitating e-shy‐

recording and accelerating (fail often-shy‐fail fast)13 of13 the communityrsquos experience (both successes and

failures)13 in exploring integrated workflows needed to realize the vision of ICME and MGI-shy‐ Data13 standards (program managementtechnical datametadata)-shy‐ Community must start developing a common13 data format data package could13 look like an13

HDF5 file and be shared in a variety of systems -shy‐ Begin13 definition13 of standardized13 infrastructure

-shy‐ Produce13 draft guideline for the metadata including information regarding authorship provenance targethellip

-shy‐ Share13 the13 data13 types being collected with the13 community Include13 nametypeunits of data

-shy‐ Community must adopt a federated13 approach13 -shy‐ allow organizations to autonomously create their13 own13 toolsets but be able to13 readwrite to13 the common13 data package mentioned13 in13 point 1

-shy‐ Good documentation on repositories are important-shy‐ Address networking of repositories (lightweight metadata databaseindex) -shy‐ Digital heterostructure microstructure mapping algorithms for both scanning electron microscopy

(SEM)13 and transmission electron microscopy (TEM) -shy‐ Materials process model (thermodynamics) integration with ICMSE (Integrated Computational

Materials Science and Engineering) data flow

-shy‐ Data sharing protocol across users (government13 OEM13 developers) with appropriate securitycontrol

-shy‐ Work with instrumentequipment OEMs to get auto ingest capabilities and open architectures-shy‐ Work together to define common vocabularies for interoperability-shy‐ Standard tests for ICME13 model data13 standards for microstructure13 data

-shy‐ Standards for ICME13 model data13 storage13 and workflows 2 Communicate the value and need13 for digital materials13 data-shy‐ Brand13 management ndash consistent messaging of goals13 and beliefs -shy‐ Share13 success stories -shy‐ Value (ROI13 Return on Investment) o data capture (globally locally)-shy‐ Give incentive ($$$) for case studies to provide evidence that a shared data repository can be

actually useful beyond the archival purpose

25

13

-shy‐ Share13 ROIs for MII (the13 Materials Innovation Infrastructure)-shy‐ ICME projects due to their iterative nature take longer and are more expensive to realize benefits

in industry implementation13 Multi-shy‐year ICME programs are becoming13 a hard sell to management difficult to13 establish ROI and13 realize benefits claimed

3 Engage the community-shy‐ Write grant proposals-shy‐ Collaboration13 (or e-shy‐collaboration) science that will help us13 identify13 the specific13 incentives13 and

impediments for establishing mutually beneficial highly productive sharing of expertise between13 disciplinary experts in13 materials manufacturing and13 data sciences

-shy‐ Continue to13 work o hard13 problemsmdashdevelop13 open13 source codes discover new descriptors data

mining uncertainty analysis-shy‐ Organize as a community (governmentindustryacademia nationalinternational) -shy‐ Follow data13 recommendations in MGI strategic planmdashworkshops to develop roadmap-shy‐ Attend13 more events like the MGI workshop13 mdash gaining13 lessons learned from other organizations is

extremely valuable

4 ldquoDefine critical datardquo13 to be13 compiled-shy‐ Consensus o what fundamental genomic data is most-shy‐important to compile (thermodynamics13

kinetics13 properties [strength]13 high-shy‐throughput13 DFT) -shy‐ Agree to13 the concept of a ldquouniversal repositoryrdquo for all genomic data that everyone will contribute

to and use (cf everyone maintaining and hosting their13 own databases) this includes issues of13 data

security and sharing etc -shy‐ List of most-shy‐important ICMEMGI13 materials development activities from an industrycommercial13

perspective (alloys composites piezoelectrics etc)-shy‐ Some13 ongoing repository efforts seem redundant We13 should avoid unnecessary competition and

try not13 to step o other peoples toes5 Explore and leverage data solutions developed13 outside materials13 community-shy‐ Semantic web infrastructure13 ndash evaluate13 feasibility of lsquounstructured materials datardquo management

(Technical pathcosttimeframe) -shy‐ Keep communicating outside13 the13 community about solutions -shy‐ Focus on available13 tools open ones when possible

6 Train13 students-shy‐ Sponsor students13 who can think in terms13 of IT and MSE

-shy‐ Training workshops (through professional societies) for training the new generation of materials students13 and researchers13 in the use of readily (already) accessible data science tools13 and cloud

services13 to make13 the13 best use13 of this emerging13 infrastructure13 in enhancing13 their own individual day-shy‐to-shy‐day productivity

7 Establish community13 norms13 for publishing materials13 data-shy‐ Involve publishers by requiring data when publishing Begin by making just the data available first

and then work on standardization panel

26

13

-shy‐ Data associated with publications should be shared along with the publications Not only the

metadata (eg average grain size vs cooling rate) but also the source of the data (eg micrographs used13 to13 generate13 grain size temperaturetime charts13 used to calculate cooling rates etc)

Government-shy‐Led13 Priorities1 Lead13 data strategyapproach13 development-shy‐ Hold meeting like this one in 6-shy‐1213 months with purpose13 of developing roadmap for MII-shy‐ Define a technical roadmap for13 the materials data infrastructure

-shy‐ Governmentrsquos got to decide on a strategy-shy‐ Whatrsquos the critical nucleus eg getting13 CALPHAD off the13 ground

-shy‐ Government must establish an official stance on security and infrastructure-shy‐ List of most-shy‐important ICMEMGI materials development activities from a government perspective

(alloys composites piezoelectrics etc)13 -shy‐ Needs assessment for professional workforce development-shy‐ Work with industry to make sure the research-shy‐to-shy‐engineering13 gap is filled What is the practical13 use

of the projects -shy‐ Provide13 the13 materials community organizing function

-shy‐ Business case for productivity improvement with13 data readily available

-shy‐ Facilitator as honest broker of capability awareness and data13 sharing

-shy‐ Interdisciplinary coordinated effort13 to avoidminimize duplication and redundancy

2 Facilitate data deposition (policy amp technology)-shy‐ Setting expectations and guidelines for best practices and standards where13 relevant -shy‐ Invest in tool13 making data management as easy as possible-shy‐ Promote standards developmentcommon data structuresshare knowledge representations -shy‐ Learn about the WBC Semantic Web capabilities and limitations -shy‐ Government should foster community development for both data package refinement and creating

standard approach to creating an enterprise toolset -shy‐ Provide13 motivation for standard data13 collection ensure13 it includes pedigreeprovenancemeta-shy‐data

-shy‐ Facilitator for materials data13 standards (good and trustworthy data13 vs bad data) -shy‐ Facilitator for forming consortium for data13 mining data13 warehousing and sharing

-shy‐ Facilitate13 industry standards for ICME13 model validation and maturity levels-shy‐ Facilitate13 standards for model-shy‐based13 process and13 supplier qualification

3 Incentivize data deposition-shy‐ Provide13 motivation for13 standard data collection ensure it13 includes13 pedigreeprovenancemeta-shy‐data-shy‐ Push ldquodata13 management planrdquo requirements to ldquodata13 depositrdquo requirements -shy‐ Make data sharing a requirement -shy‐ Contractual requirements that preserve data for future use

-shy‐ Demand data submission as a deliverable that will be in the contract before paying the funds -shy‐ Require that any data generated13 by federal grants is provided13 to13 the government (or a designated13

repository)13 within 3 years after completionmdashstart with basic13 research

-shy‐ Demonstrate the return on investments in the materials data infrastructure

27

13

4 Support13 data sharing13 and deposition13 (provide resources)-shy‐ Increase of MGI13 funds dedicated to MII13 development to 10-shy‐-shy‐repositories amp collab platforms -shy‐ Commit to13 long-shy‐term support for data repositories amp collaborative platforms-shy‐ Funding for development collection of fundamental genomic data

-shy‐ Support thermodynamicskinetics database development for light-shy‐weight alloy systems-shy‐ Funding for universal data13 repository (eg at NIST) ndash creation and maintenance

-shy‐ Fund or establish data13 repositories with commonuniform data13 collection format -shy‐ Createidentify secure data sharing site (like AMRDEC)13 for13 government13 contractorsmdashnot a

repository

-shy‐ AFRL take lead13 o making a database that can13 be13 shared

-shy‐ Infrastructure for materials related semantic web applications5 Support13 data creation-shy‐ Support experimental measurement effort to provide13 quality data for13 database development -shy‐ Support more13 high-shy‐throughput13 (HT)13 studies13 for applied programs13 to populate data Current HT

approaches support are13 limited to too fundamental single-shy‐phase DFT calculations 6 Enhance data management13 competency-shy‐ Government needs to increase in-shy‐house expertise in13 these areas or alternatively work closer with13

groups like NIST We cannot allow other organizations to define our needs for us7 Provide quality datasets13 for model13 developmentvalidation-shy‐ Need for open model benchmark cases with data to be available to the community

28

13

Attendees

John Allison University of MichiganJohn Beatty Army Research Laboratory

Ben13 Blaiszik University of ChicagoANL Carrie Campbell NIST

Sam Chance13 InovexJennifer13 Fielding Air Force Research LabMarco Fornari Central Michigan

Tim Hawes Decisive Analytics Corp

Lou Hector General MotorsAndrea Helbach Air Force Research LabKris Hill ITI-shy‐GlobalMike Glavicic Rolls Royce

Ro Gorham NCDMM

Dennis Griffin NASA

Mike Groeber Air Force Research LabScott Henry ASM InternationalMatt Jacobsen Air Force Research LabWill Joost Dept of Energy-shy‐EERE

Surya13 Kalidindi Georgia Tech

Kevin Kendig Air Force Research LabBen13 Leever Air Force Research LabLara Liou GE Aviation

Kenny Lipkowitz Office of Naval Research

Bill Mullins Office of Naval Research

Benji Maruyama Air Force Research LabRajiv Naik Pratt amp WhitneyJim Neumann Honeywell Ruth13 Pachter Air Force Research LabClare Paul Air Force Research LabAmra Peles United13 Technologies Research13 CenterAndy Powell GE Aviation

Brian13 Puchala University MichiganAjit Roy Air Force Research LabAyman13 Salem Materials ResourcesTom Searles Materials Data Management Jason Sebastian QuestekRichard13 Serwecki NASA

Dongwon Shin Oak Ridge13 National Labo

29

13

Harlan Shober RJ LeeJeff13 Sheehy NASA

Jeff13 Simmons Air Force Research LabVeera Sundararaghavan University of MichiganSesh Tamirisakandala RTI Glenn Tarcea University of MichiganTJ Turner Air Force Research LabVasisht Venkatesh Pratt amp WhitneyChuck Ward Air Force Research LabJim Warren Natl Institute of Standards amp TechJeff13 Williams GE Aviation

Terry Wong Rockedyne

Wayne Ziegler Army Research13 Laboratory

Jake Zindel Ford Motor Company

30

Page 22: Materials Genome Initiative: materials data · universallyaccepted best practices. Under the MGI, Federal agencies have funded research and development designedtoexplore, develop,

13

13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

that13 must13 be adhered to at13 the conclusion of13 a funded program Without13 such a protocol and the

existence13 of server to be13 used in all future13 programs the13 goals set forth by MGI will be13 unnecessarily

difficult and13 more expensive to attain

MAI Data Informatics Data Standards Modeling amp UQDr Rajiv Naik Pratt amp Whitney

The MAI Data13 Informatics Data13 Standards Modeling amp Uncertainty Quantification (UQ) (PW-shy‐11)13 effort13 will enable the pervasive implementation of ICME by developing (i) a web-shy‐based13 OEM-shy‐Suppliercollaboration framework13 for production13 certification13 and13 process modeling data (ii) data standards and

protocols for interoperability between13 OEMs and13 Suppliers and13 (iii)13 UQ methods to capture variability in

process microstructure and property data Models and data13 from earlier MAI programs such as GE-shy‐10 PW-shy‐913 and RR-shy‐1013 will be13 used to develop and implement OEM-shy‐Supplier integration data standards data

structures and UQ models Several OEMs and suppliers (Pratt amp Whitney General Electric Rolls-shy‐RoyceTIMET Alcoa13 and RTI International Metals Inc) are working together on the PW-shy‐11 Team

The OEM-shy‐Supplier collaboration tool will enable data13 exchange related to mechanical testingmicrostructural13 characteristics processing and process modeling The13 data13 standards developed by the

PW-shy‐1113 team will provide13 an industry standard for such OEM-shy‐Supplier collaboration The collaboration13 framework would provide a platform to manage OEM-shy‐Supplier13 workflows data and13 informationsecurity and access13 control and the13 ability to perform system administration without the13 need13 for day-shy‐to-shy‐day intervention13 of the IT department The capture of such13 supplier data over the collaboration

framework will enable the OEMs to enhance process and property13 models data analytics and UQ

modeling Three potential technologies will be evaluated13 for the OEM-shy‐Supplier Collaboration13 Frameworkndash GE Forge Vanderbilt Forge and Semantic Web technology and one13 technology will be selected13 for the

next phase of the program

18

13

Semantic Web technology is paradigm shift from highly structured13 databases to13 more flexible

graph databases from single database searches to13 multiple and13 disparate13 database13 searches and from

keyword searches to contextual and meaningful (semantic) searches It can provide flexibility13 of data

structures and enhanced OEM-shy‐Supplier13 interoperability However it requires more work upfront13 for13 implementation as it needs the development of a common vocabulary and relationships

The data13 standards and data13 structures effort will leverage work at national and13 international standards13 organizations such13 as ASM ASTM CEN and13 NIST and13 develop13 standards and vocabularies forproduction13 certification data that can include chemistry microstructure and mechanical properties This effort will also develop means for the capture of data needed13 for the UQ and13 ICME models andbuild13 databases for13 the capture of13 the data from the GE-shy‐10 PW-shy‐9 and RR-shy‐10 MAI programs

The ICME13 models being developed under the GE-shy‐10 PW-shy‐9 and RR-shy‐10 programs involve severalinterconnected models related to processing microstructure and mechanical properties The UQ effort under PW-shy‐11 will involve considering the uncertainty at each13 step13 of the modeling process and flowing it through the interconnected models to evaluate the relationships between13 the variabilities of13 the

processing parameters the microstructure and13 the mechanical performance

ASMrsquos Computational Materials Data Network amp the NIST-shy‐ASM Structural Materials Data

Demonstration ProjectScott D Henry ASM International

ASM International ndash the worldrsquos largest13 association of13 metals-shy‐focused materials engineers scientists technicians educators and students13 ndash has launched13 a series of initiatives under the

ldquoComputational Materials Data (CMD)13 Networkrdquo umbrella The objective13 of the13 CMD Network is to serve13 as center for information collection and dissemination for materials data13 to support integrated

computational materials13 engineering (ICME) and to help realize the goals13 of the US Materials13 Genome

InitiativeThe first major project for the CMD Network is the NIST-shy‐ASM Structural Materials Data

Demonstration Project (SMDDP) a cooperative research project funded by NIST with participation from

the Kent13 State Center13 for13 Materials Informatics and Granta Design Ltd13 The main13 objectives of SMDDP

are13 to

19

13

bull Establish well-shy‐pedigreed13 and13 curated13 demonstration13 datasets for non-shy‐proprietary metallic structural materials13 data over multiple length scales The team has13 chosen to work with 6061

aluminum and the13 related Al-shy‐Mg-shy‐Si system

bull Work with NIST and the materials data community to develop materials data schema and

ontologies for the demonstration13 datasets cognizant of broader interests and13 datasets

bull Develop and carry out a series of test problems that represent relevant use cases for the13 repository

bull Make data open to the materials data community for use in data analytics modeling and

educational purposes

bull Actively engage the materials data community and13 widely disseminate the findings from the

project

bull Develop and implement data13 capture13 and curation procedures that can serve13 as models for otherdata repositories

The basic philosophy behind the project is that often standards13 emerge through trial and error and examples are13 needed to spur this process

The project is using the NIST-shy‐KSU DSpace13 repository for data13 files The13 curated structured

database is being developed13 using GRANTA13 MI software and13 is hosted13 o ASM serversThe project ndashlaunched in late 2013 and planned as an 18-shy‐month program13 ndash is being conducted in

phases The13 first phase13 involved developing starter database13 and gaining experience13 in data13 curation The second phase (currently underway) involves building out more comprehensive data13 sets that will be

suitable for carrying out test problems13 related to processing-shy‐structure-shy‐property relationships in13 Al 6061 The third phase will involve making further refinements in response to testing The complete data13 set will be opened up online to the materials community as a whole

Globus13 Scientific13 Data Services13 Current and FutureBen13 Blaiszik1 Kyle Chard1 Jim Pruyne1 Rachana Anathakrishnan1 Steve Tuecke12 Ian Foster12

1University of Chicago Computation Institute 2Argonne National Laboratory MCS Division

During this talk we discussed a variety of currently available Globus services13 Via these Globusservices developers13 and IT administrators13 can offer simple secure and robust file transfer as13 well as13 identity profile and group management to their user communities13 Globus allows end users to create

and13 manage a unique identity that can13 be linked13 to13 external identities for authentication End13 users can13 also transfer files across wide13 area13 networks faster and more13 easily whatever their location

The talk also covered prototype services related to data13 publication metadata tagging and13 data

search Globus13 publishing capabilities13 are delivered through a hosted service Metadata is13 stored in the

cloud while raw published data and a fully-shy‐formed metadata log are stored on campus institutionaland group resources that13 are managed and operated by campus administrators Published datasets are

20

13

organized13 by communities and13 their member collections Globus users can13 create and13 manage theirown13 communities and13 collections through13 the data publication13 service13 with collection level policies regarding user13 access Additionally metadata from these communities are imported into a centralized

index to allow for robust searching functionality

Semantics13 for Deep13 InteroperabilitySamuel Chance iNovex

Semantic technology is based on mature13 web standards and intended to be13 used as global framework for13 machine-shy‐to-shy‐machine communication Interoperability is enabled even when the more

basic elements of the semantic technology stack are used13 in13 conjunction with a controlled vocabularyKnowledge13 representation is enabled as one13 begins to utilize13 the13 ldquohigherrdquo elements of the13 stack

brief history of the semantic web13 was presented13 and13 started13 with13 a depiction13 of the web13 that most people see today documents and web pages presented to the user13 through a web application It13 then depicted a transition to an intermediate state where documents and data can co-shy‐exist on asemantic13 web riding atop the same infrastructure (http) Even in this13 state the data can13 be directly

interpreted by a machine13 The history concluded with a depiction of data being extracted from adocument to13 become part of the web13 of data

The basics of semantic technology were presented and emphasized that the technology has afoundation based13 o formal logic that enables software13 to understand information and reduce13 the13 need

for13 a human to interpret13 the meaning This section of13 the presentation concluded with a chart13 depicting

how ldquotriplesrdquo link to13 each13 other to13 form a web13 of dataThe final section touched upon adoption of13 and things to consider13 when semantic technologies

are13 being considered as solution for community or enterprise

21

13

Big Data13 and Semantic TechnologiesHarlan Shober RJ Lee Group

The presentation started with a description of entities that13 comprise laboratoryinformatics with the central elements being a federated architecture scientific ldquoBig Datardquo management13 and workflow-shy‐driven Laboratory Information Management13 Systems (LIMS)To aid attendee understanding initial context13 was provided by first13 defining for ldquoBig Datardquo and

Semantic Technologies Context13 was further bolstered through a description of a problem that13 included laboratory testingcharacterization and fabrication processes The challenges for atypical laboratory include the use of many systems and the creation of large amounts of data13 that13 grows daily but13 cannot13 be readily accessed

After the context13 was established the needed functionality was described and consistedof several elements contribute index search analyze and archive While bearing in mind theneeded functionality a string of universal scientific data13 management13 process categories werethen presented with the intent13 of applying the universal description to the materials design anddevelopment13 domain

22

13

Though semantic technologies are applicable over the entire lifecycle of the data13 management13 process three categories were specifically identified as primary candidates fortheir application Data13 Acquisition Data13 Manipulation and Storage and Discovery andAdvisory W3C standards for Resource Description Framework (RDF) were applied to a notionalexample with the intent13 of completing a thread that13 gave the attendee a sense of how thistechnology could be applied

The presentation continued with a description of Postulate a specific solutiondeveloped by RJ Lee Group that13 could be used as part13 of a suite of tools to enable the captureof data13 and the near-shy‐frictionless tagging of data13 with additional observations provenanceinformation or links to additional data Using tools like Postulate repositories could be moreeasily shared with others in a federated manner These technologies would enable thedevelopment13 of indices that13 increase the precision of search results and ease analytics since thedata13 would be ldquoself-shy‐describedrdquo

In the remainder of the presentation Postulate functionality was further described andsome of the challenges presented earlier were addressed For example how do you processthe quantity and variety of data how do you store it how do you find the data13 you care aboutand how can the data13 be presented to you in a meaningful way

Natural13 Language Processing (NLP)Tim Hawes Decisive Analytics

The presentation provided the attendees with a description of NLP13 and how it can be13 used It started with examples13 of how it can be used to mine ldquoBig Datardquo to predict credit card fraud or assess13 user purchase histories for product development NLP is also13 being used13 in13 the field13 of biology and13 appears to have13 enjoyed some13 level of success

Since13 the13 biology and materials communities both publish large13 volume13 of documents (abstracts papers presentations etc) many of13 the NLP approaches developed for13 biology may be

applicable13 to materials The next section of the presentation focused on describing various aspects of NLP tokenization

part-shy‐of-shy‐speech tagging disambiguation parsing entity disambiguation and role13 labeling This was helpful in13 removing the mystery surrounding NLP for those13 who hadnrsquot had previous exposure

Typical products from NLP13 would be useful to the materials development community Examples include providing the gist of a documentrsquos content providing answers to natural13 language questions and knowledge13 construction

Specific NLP13 challenges for the13 materials domain were13 identified inline13 use13 of equations prevalence of tables graphs and13 images (more of a computer vision13 problem) and13 numerically driven13 information13 Additionally the ldquogenrerdquo problem was discussed as13 well as13 a way this might be addressed

using raw data (versus learning data) and13 semi-shy‐supervised learning active learning and distant supervision approaches

23

13

The presentation concluded by reiterating how NLP13 can be used and the value of using it describing some its limitations and13 identifying tool development needed13 for the materials domain

24

13

Community13 and13 Government Priorities in13 Addressing13 Materials Data

The following numbered13 priorities for13 both community and government13 actions were lsquorolled uprsquofrom workshop participantsrsquo inputs and were reviewed at13 the workshop The raw inputs provided by

participants are captured13 under each13 numbered13 priority and provide context to13 the priority and13 often13 more specific actions

Community-shy‐Led13 Priorities1 Develop13 and deploy13 standards13 for data and federatedcollaborative environments-shy‐ Promote13 standards for data13 and collaboration environments-shy‐ Designing and deploying suitable software environments that are capable of facilitating e-shy‐

recording and accelerating (fail often-shy‐fail fast)13 of13 the communityrsquos experience (both successes and

failures)13 in exploring integrated workflows needed to realize the vision of ICME and MGI-shy‐ Data13 standards (program managementtechnical datametadata)-shy‐ Community must start developing a common13 data format data package could13 look like an13

HDF5 file and be shared in a variety of systems -shy‐ Begin13 definition13 of standardized13 infrastructure

-shy‐ Produce13 draft guideline for the metadata including information regarding authorship provenance targethellip

-shy‐ Share13 the13 data13 types being collected with the13 community Include13 nametypeunits of data

-shy‐ Community must adopt a federated13 approach13 -shy‐ allow organizations to autonomously create their13 own13 toolsets but be able to13 readwrite to13 the common13 data package mentioned13 in13 point 1

-shy‐ Good documentation on repositories are important-shy‐ Address networking of repositories (lightweight metadata databaseindex) -shy‐ Digital heterostructure microstructure mapping algorithms for both scanning electron microscopy

(SEM)13 and transmission electron microscopy (TEM) -shy‐ Materials process model (thermodynamics) integration with ICMSE (Integrated Computational

Materials Science and Engineering) data flow

-shy‐ Data sharing protocol across users (government13 OEM13 developers) with appropriate securitycontrol

-shy‐ Work with instrumentequipment OEMs to get auto ingest capabilities and open architectures-shy‐ Work together to define common vocabularies for interoperability-shy‐ Standard tests for ICME13 model data13 standards for microstructure13 data

-shy‐ Standards for ICME13 model data13 storage13 and workflows 2 Communicate the value and need13 for digital materials13 data-shy‐ Brand13 management ndash consistent messaging of goals13 and beliefs -shy‐ Share13 success stories -shy‐ Value (ROI13 Return on Investment) o data capture (globally locally)-shy‐ Give incentive ($$$) for case studies to provide evidence that a shared data repository can be

actually useful beyond the archival purpose

25

13

-shy‐ Share13 ROIs for MII (the13 Materials Innovation Infrastructure)-shy‐ ICME projects due to their iterative nature take longer and are more expensive to realize benefits

in industry implementation13 Multi-shy‐year ICME programs are becoming13 a hard sell to management difficult to13 establish ROI and13 realize benefits claimed

3 Engage the community-shy‐ Write grant proposals-shy‐ Collaboration13 (or e-shy‐collaboration) science that will help us13 identify13 the specific13 incentives13 and

impediments for establishing mutually beneficial highly productive sharing of expertise between13 disciplinary experts in13 materials manufacturing and13 data sciences

-shy‐ Continue to13 work o hard13 problemsmdashdevelop13 open13 source codes discover new descriptors data

mining uncertainty analysis-shy‐ Organize as a community (governmentindustryacademia nationalinternational) -shy‐ Follow data13 recommendations in MGI strategic planmdashworkshops to develop roadmap-shy‐ Attend13 more events like the MGI workshop13 mdash gaining13 lessons learned from other organizations is

extremely valuable

4 ldquoDefine critical datardquo13 to be13 compiled-shy‐ Consensus o what fundamental genomic data is most-shy‐important to compile (thermodynamics13

kinetics13 properties [strength]13 high-shy‐throughput13 DFT) -shy‐ Agree to13 the concept of a ldquouniversal repositoryrdquo for all genomic data that everyone will contribute

to and use (cf everyone maintaining and hosting their13 own databases) this includes issues of13 data

security and sharing etc -shy‐ List of most-shy‐important ICMEMGI13 materials development activities from an industrycommercial13

perspective (alloys composites piezoelectrics etc)-shy‐ Some13 ongoing repository efforts seem redundant We13 should avoid unnecessary competition and

try not13 to step o other peoples toes5 Explore and leverage data solutions developed13 outside materials13 community-shy‐ Semantic web infrastructure13 ndash evaluate13 feasibility of lsquounstructured materials datardquo management

(Technical pathcosttimeframe) -shy‐ Keep communicating outside13 the13 community about solutions -shy‐ Focus on available13 tools open ones when possible

6 Train13 students-shy‐ Sponsor students13 who can think in terms13 of IT and MSE

-shy‐ Training workshops (through professional societies) for training the new generation of materials students13 and researchers13 in the use of readily (already) accessible data science tools13 and cloud

services13 to make13 the13 best use13 of this emerging13 infrastructure13 in enhancing13 their own individual day-shy‐to-shy‐day productivity

7 Establish community13 norms13 for publishing materials13 data-shy‐ Involve publishers by requiring data when publishing Begin by making just the data available first

and then work on standardization panel

26

13

-shy‐ Data associated with publications should be shared along with the publications Not only the

metadata (eg average grain size vs cooling rate) but also the source of the data (eg micrographs used13 to13 generate13 grain size temperaturetime charts13 used to calculate cooling rates etc)

Government-shy‐Led13 Priorities1 Lead13 data strategyapproach13 development-shy‐ Hold meeting like this one in 6-shy‐1213 months with purpose13 of developing roadmap for MII-shy‐ Define a technical roadmap for13 the materials data infrastructure

-shy‐ Governmentrsquos got to decide on a strategy-shy‐ Whatrsquos the critical nucleus eg getting13 CALPHAD off the13 ground

-shy‐ Government must establish an official stance on security and infrastructure-shy‐ List of most-shy‐important ICMEMGI materials development activities from a government perspective

(alloys composites piezoelectrics etc)13 -shy‐ Needs assessment for professional workforce development-shy‐ Work with industry to make sure the research-shy‐to-shy‐engineering13 gap is filled What is the practical13 use

of the projects -shy‐ Provide13 the13 materials community organizing function

-shy‐ Business case for productivity improvement with13 data readily available

-shy‐ Facilitator as honest broker of capability awareness and data13 sharing

-shy‐ Interdisciplinary coordinated effort13 to avoidminimize duplication and redundancy

2 Facilitate data deposition (policy amp technology)-shy‐ Setting expectations and guidelines for best practices and standards where13 relevant -shy‐ Invest in tool13 making data management as easy as possible-shy‐ Promote standards developmentcommon data structuresshare knowledge representations -shy‐ Learn about the WBC Semantic Web capabilities and limitations -shy‐ Government should foster community development for both data package refinement and creating

standard approach to creating an enterprise toolset -shy‐ Provide13 motivation for standard data13 collection ensure13 it includes pedigreeprovenancemeta-shy‐data

-shy‐ Facilitator for materials data13 standards (good and trustworthy data13 vs bad data) -shy‐ Facilitator for forming consortium for data13 mining data13 warehousing and sharing

-shy‐ Facilitate13 industry standards for ICME13 model validation and maturity levels-shy‐ Facilitate13 standards for model-shy‐based13 process and13 supplier qualification

3 Incentivize data deposition-shy‐ Provide13 motivation for13 standard data collection ensure it13 includes13 pedigreeprovenancemeta-shy‐data-shy‐ Push ldquodata13 management planrdquo requirements to ldquodata13 depositrdquo requirements -shy‐ Make data sharing a requirement -shy‐ Contractual requirements that preserve data for future use

-shy‐ Demand data submission as a deliverable that will be in the contract before paying the funds -shy‐ Require that any data generated13 by federal grants is provided13 to13 the government (or a designated13

repository)13 within 3 years after completionmdashstart with basic13 research

-shy‐ Demonstrate the return on investments in the materials data infrastructure

27

13

4 Support13 data sharing13 and deposition13 (provide resources)-shy‐ Increase of MGI13 funds dedicated to MII13 development to 10-shy‐-shy‐repositories amp collab platforms -shy‐ Commit to13 long-shy‐term support for data repositories amp collaborative platforms-shy‐ Funding for development collection of fundamental genomic data

-shy‐ Support thermodynamicskinetics database development for light-shy‐weight alloy systems-shy‐ Funding for universal data13 repository (eg at NIST) ndash creation and maintenance

-shy‐ Fund or establish data13 repositories with commonuniform data13 collection format -shy‐ Createidentify secure data sharing site (like AMRDEC)13 for13 government13 contractorsmdashnot a

repository

-shy‐ AFRL take lead13 o making a database that can13 be13 shared

-shy‐ Infrastructure for materials related semantic web applications5 Support13 data creation-shy‐ Support experimental measurement effort to provide13 quality data for13 database development -shy‐ Support more13 high-shy‐throughput13 (HT)13 studies13 for applied programs13 to populate data Current HT

approaches support are13 limited to too fundamental single-shy‐phase DFT calculations 6 Enhance data management13 competency-shy‐ Government needs to increase in-shy‐house expertise in13 these areas or alternatively work closer with13

groups like NIST We cannot allow other organizations to define our needs for us7 Provide quality datasets13 for model13 developmentvalidation-shy‐ Need for open model benchmark cases with data to be available to the community

28

13

Attendees

John Allison University of MichiganJohn Beatty Army Research Laboratory

Ben13 Blaiszik University of ChicagoANL Carrie Campbell NIST

Sam Chance13 InovexJennifer13 Fielding Air Force Research LabMarco Fornari Central Michigan

Tim Hawes Decisive Analytics Corp

Lou Hector General MotorsAndrea Helbach Air Force Research LabKris Hill ITI-shy‐GlobalMike Glavicic Rolls Royce

Ro Gorham NCDMM

Dennis Griffin NASA

Mike Groeber Air Force Research LabScott Henry ASM InternationalMatt Jacobsen Air Force Research LabWill Joost Dept of Energy-shy‐EERE

Surya13 Kalidindi Georgia Tech

Kevin Kendig Air Force Research LabBen13 Leever Air Force Research LabLara Liou GE Aviation

Kenny Lipkowitz Office of Naval Research

Bill Mullins Office of Naval Research

Benji Maruyama Air Force Research LabRajiv Naik Pratt amp WhitneyJim Neumann Honeywell Ruth13 Pachter Air Force Research LabClare Paul Air Force Research LabAmra Peles United13 Technologies Research13 CenterAndy Powell GE Aviation

Brian13 Puchala University MichiganAjit Roy Air Force Research LabAyman13 Salem Materials ResourcesTom Searles Materials Data Management Jason Sebastian QuestekRichard13 Serwecki NASA

Dongwon Shin Oak Ridge13 National Labo

29

13

Harlan Shober RJ LeeJeff13 Sheehy NASA

Jeff13 Simmons Air Force Research LabVeera Sundararaghavan University of MichiganSesh Tamirisakandala RTI Glenn Tarcea University of MichiganTJ Turner Air Force Research LabVasisht Venkatesh Pratt amp WhitneyChuck Ward Air Force Research LabJim Warren Natl Institute of Standards amp TechJeff13 Williams GE Aviation

Terry Wong Rockedyne

Wayne Ziegler Army Research13 Laboratory

Jake Zindel Ford Motor Company

30

Page 23: Materials Genome Initiative: materials data · universallyaccepted best practices. Under the MGI, Federal agencies have funded research and development designedtoexplore, develop,

13

Semantic Web technology is paradigm shift from highly structured13 databases to13 more flexible

graph databases from single database searches to13 multiple and13 disparate13 database13 searches and from

keyword searches to contextual and meaningful (semantic) searches It can provide flexibility13 of data

structures and enhanced OEM-shy‐Supplier13 interoperability However it requires more work upfront13 for13 implementation as it needs the development of a common vocabulary and relationships

The data13 standards and data13 structures effort will leverage work at national and13 international standards13 organizations such13 as ASM ASTM CEN and13 NIST and13 develop13 standards and vocabularies forproduction13 certification data that can include chemistry microstructure and mechanical properties This effort will also develop means for the capture of data needed13 for the UQ and13 ICME models andbuild13 databases for13 the capture of13 the data from the GE-shy‐10 PW-shy‐9 and RR-shy‐10 MAI programs

The ICME13 models being developed under the GE-shy‐10 PW-shy‐9 and RR-shy‐10 programs involve severalinterconnected models related to processing microstructure and mechanical properties The UQ effort under PW-shy‐11 will involve considering the uncertainty at each13 step13 of the modeling process and flowing it through the interconnected models to evaluate the relationships between13 the variabilities of13 the

processing parameters the microstructure and13 the mechanical performance

ASMrsquos Computational Materials Data Network amp the NIST-shy‐ASM Structural Materials Data

Demonstration ProjectScott D Henry ASM International

ASM International ndash the worldrsquos largest13 association of13 metals-shy‐focused materials engineers scientists technicians educators and students13 ndash has launched13 a series of initiatives under the

ldquoComputational Materials Data (CMD)13 Networkrdquo umbrella The objective13 of the13 CMD Network is to serve13 as center for information collection and dissemination for materials data13 to support integrated

computational materials13 engineering (ICME) and to help realize the goals13 of the US Materials13 Genome

InitiativeThe first major project for the CMD Network is the NIST-shy‐ASM Structural Materials Data

Demonstration Project (SMDDP) a cooperative research project funded by NIST with participation from

the Kent13 State Center13 for13 Materials Informatics and Granta Design Ltd13 The main13 objectives of SMDDP

are13 to

19

13

bull Establish well-shy‐pedigreed13 and13 curated13 demonstration13 datasets for non-shy‐proprietary metallic structural materials13 data over multiple length scales The team has13 chosen to work with 6061

aluminum and the13 related Al-shy‐Mg-shy‐Si system

bull Work with NIST and the materials data community to develop materials data schema and

ontologies for the demonstration13 datasets cognizant of broader interests and13 datasets

bull Develop and carry out a series of test problems that represent relevant use cases for the13 repository

bull Make data open to the materials data community for use in data analytics modeling and

educational purposes

bull Actively engage the materials data community and13 widely disseminate the findings from the

project

bull Develop and implement data13 capture13 and curation procedures that can serve13 as models for otherdata repositories

The basic philosophy behind the project is that often standards13 emerge through trial and error and examples are13 needed to spur this process

The project is using the NIST-shy‐KSU DSpace13 repository for data13 files The13 curated structured

database is being developed13 using GRANTA13 MI software and13 is hosted13 o ASM serversThe project ndashlaunched in late 2013 and planned as an 18-shy‐month program13 ndash is being conducted in

phases The13 first phase13 involved developing starter database13 and gaining experience13 in data13 curation The second phase (currently underway) involves building out more comprehensive data13 sets that will be

suitable for carrying out test problems13 related to processing-shy‐structure-shy‐property relationships in13 Al 6061 The third phase will involve making further refinements in response to testing The complete data13 set will be opened up online to the materials community as a whole

Globus13 Scientific13 Data Services13 Current and FutureBen13 Blaiszik1 Kyle Chard1 Jim Pruyne1 Rachana Anathakrishnan1 Steve Tuecke12 Ian Foster12

1University of Chicago Computation Institute 2Argonne National Laboratory MCS Division

During this talk we discussed a variety of currently available Globus services13 Via these Globusservices developers13 and IT administrators13 can offer simple secure and robust file transfer as13 well as13 identity profile and group management to their user communities13 Globus allows end users to create

and13 manage a unique identity that can13 be linked13 to13 external identities for authentication End13 users can13 also transfer files across wide13 area13 networks faster and more13 easily whatever their location

The talk also covered prototype services related to data13 publication metadata tagging and13 data

search Globus13 publishing capabilities13 are delivered through a hosted service Metadata is13 stored in the

cloud while raw published data and a fully-shy‐formed metadata log are stored on campus institutionaland group resources that13 are managed and operated by campus administrators Published datasets are

20

13

organized13 by communities and13 their member collections Globus users can13 create and13 manage theirown13 communities and13 collections through13 the data publication13 service13 with collection level policies regarding user13 access Additionally metadata from these communities are imported into a centralized

index to allow for robust searching functionality

Semantics13 for Deep13 InteroperabilitySamuel Chance iNovex

Semantic technology is based on mature13 web standards and intended to be13 used as global framework for13 machine-shy‐to-shy‐machine communication Interoperability is enabled even when the more

basic elements of the semantic technology stack are used13 in13 conjunction with a controlled vocabularyKnowledge13 representation is enabled as one13 begins to utilize13 the13 ldquohigherrdquo elements of the13 stack

brief history of the semantic web13 was presented13 and13 started13 with13 a depiction13 of the web13 that most people see today documents and web pages presented to the user13 through a web application It13 then depicted a transition to an intermediate state where documents and data can co-shy‐exist on asemantic13 web riding atop the same infrastructure (http) Even in this13 state the data can13 be directly

interpreted by a machine13 The history concluded with a depiction of data being extracted from adocument to13 become part of the web13 of data

The basics of semantic technology were presented and emphasized that the technology has afoundation based13 o formal logic that enables software13 to understand information and reduce13 the13 need

for13 a human to interpret13 the meaning This section of13 the presentation concluded with a chart13 depicting

how ldquotriplesrdquo link to13 each13 other to13 form a web13 of dataThe final section touched upon adoption of13 and things to consider13 when semantic technologies

are13 being considered as solution for community or enterprise

21

13

Big Data13 and Semantic TechnologiesHarlan Shober RJ Lee Group

The presentation started with a description of entities that13 comprise laboratoryinformatics with the central elements being a federated architecture scientific ldquoBig Datardquo management13 and workflow-shy‐driven Laboratory Information Management13 Systems (LIMS)To aid attendee understanding initial context13 was provided by first13 defining for ldquoBig Datardquo and

Semantic Technologies Context13 was further bolstered through a description of a problem that13 included laboratory testingcharacterization and fabrication processes The challenges for atypical laboratory include the use of many systems and the creation of large amounts of data13 that13 grows daily but13 cannot13 be readily accessed

After the context13 was established the needed functionality was described and consistedof several elements contribute index search analyze and archive While bearing in mind theneeded functionality a string of universal scientific data13 management13 process categories werethen presented with the intent13 of applying the universal description to the materials design anddevelopment13 domain

22

13

Though semantic technologies are applicable over the entire lifecycle of the data13 management13 process three categories were specifically identified as primary candidates fortheir application Data13 Acquisition Data13 Manipulation and Storage and Discovery andAdvisory W3C standards for Resource Description Framework (RDF) were applied to a notionalexample with the intent13 of completing a thread that13 gave the attendee a sense of how thistechnology could be applied

The presentation continued with a description of Postulate a specific solutiondeveloped by RJ Lee Group that13 could be used as part13 of a suite of tools to enable the captureof data13 and the near-shy‐frictionless tagging of data13 with additional observations provenanceinformation or links to additional data Using tools like Postulate repositories could be moreeasily shared with others in a federated manner These technologies would enable thedevelopment13 of indices that13 increase the precision of search results and ease analytics since thedata13 would be ldquoself-shy‐describedrdquo

In the remainder of the presentation Postulate functionality was further described andsome of the challenges presented earlier were addressed For example how do you processthe quantity and variety of data how do you store it how do you find the data13 you care aboutand how can the data13 be presented to you in a meaningful way

Natural13 Language Processing (NLP)Tim Hawes Decisive Analytics

The presentation provided the attendees with a description of NLP13 and how it can be13 used It started with examples13 of how it can be used to mine ldquoBig Datardquo to predict credit card fraud or assess13 user purchase histories for product development NLP is also13 being used13 in13 the field13 of biology and13 appears to have13 enjoyed some13 level of success

Since13 the13 biology and materials communities both publish large13 volume13 of documents (abstracts papers presentations etc) many of13 the NLP approaches developed for13 biology may be

applicable13 to materials The next section of the presentation focused on describing various aspects of NLP tokenization

part-shy‐of-shy‐speech tagging disambiguation parsing entity disambiguation and role13 labeling This was helpful in13 removing the mystery surrounding NLP for those13 who hadnrsquot had previous exposure

Typical products from NLP13 would be useful to the materials development community Examples include providing the gist of a documentrsquos content providing answers to natural13 language questions and knowledge13 construction

Specific NLP13 challenges for the13 materials domain were13 identified inline13 use13 of equations prevalence of tables graphs and13 images (more of a computer vision13 problem) and13 numerically driven13 information13 Additionally the ldquogenrerdquo problem was discussed as13 well as13 a way this might be addressed

using raw data (versus learning data) and13 semi-shy‐supervised learning active learning and distant supervision approaches

23

13

The presentation concluded by reiterating how NLP13 can be used and the value of using it describing some its limitations and13 identifying tool development needed13 for the materials domain

24

13

Community13 and13 Government Priorities in13 Addressing13 Materials Data

The following numbered13 priorities for13 both community and government13 actions were lsquorolled uprsquofrom workshop participantsrsquo inputs and were reviewed at13 the workshop The raw inputs provided by

participants are captured13 under each13 numbered13 priority and provide context to13 the priority and13 often13 more specific actions

Community-shy‐Led13 Priorities1 Develop13 and deploy13 standards13 for data and federatedcollaborative environments-shy‐ Promote13 standards for data13 and collaboration environments-shy‐ Designing and deploying suitable software environments that are capable of facilitating e-shy‐

recording and accelerating (fail often-shy‐fail fast)13 of13 the communityrsquos experience (both successes and

failures)13 in exploring integrated workflows needed to realize the vision of ICME and MGI-shy‐ Data13 standards (program managementtechnical datametadata)-shy‐ Community must start developing a common13 data format data package could13 look like an13

HDF5 file and be shared in a variety of systems -shy‐ Begin13 definition13 of standardized13 infrastructure

-shy‐ Produce13 draft guideline for the metadata including information regarding authorship provenance targethellip

-shy‐ Share13 the13 data13 types being collected with the13 community Include13 nametypeunits of data

-shy‐ Community must adopt a federated13 approach13 -shy‐ allow organizations to autonomously create their13 own13 toolsets but be able to13 readwrite to13 the common13 data package mentioned13 in13 point 1

-shy‐ Good documentation on repositories are important-shy‐ Address networking of repositories (lightweight metadata databaseindex) -shy‐ Digital heterostructure microstructure mapping algorithms for both scanning electron microscopy

(SEM)13 and transmission electron microscopy (TEM) -shy‐ Materials process model (thermodynamics) integration with ICMSE (Integrated Computational

Materials Science and Engineering) data flow

-shy‐ Data sharing protocol across users (government13 OEM13 developers) with appropriate securitycontrol

-shy‐ Work with instrumentequipment OEMs to get auto ingest capabilities and open architectures-shy‐ Work together to define common vocabularies for interoperability-shy‐ Standard tests for ICME13 model data13 standards for microstructure13 data

-shy‐ Standards for ICME13 model data13 storage13 and workflows 2 Communicate the value and need13 for digital materials13 data-shy‐ Brand13 management ndash consistent messaging of goals13 and beliefs -shy‐ Share13 success stories -shy‐ Value (ROI13 Return on Investment) o data capture (globally locally)-shy‐ Give incentive ($$$) for case studies to provide evidence that a shared data repository can be

actually useful beyond the archival purpose

25

13

-shy‐ Share13 ROIs for MII (the13 Materials Innovation Infrastructure)-shy‐ ICME projects due to their iterative nature take longer and are more expensive to realize benefits

in industry implementation13 Multi-shy‐year ICME programs are becoming13 a hard sell to management difficult to13 establish ROI and13 realize benefits claimed

3 Engage the community-shy‐ Write grant proposals-shy‐ Collaboration13 (or e-shy‐collaboration) science that will help us13 identify13 the specific13 incentives13 and

impediments for establishing mutually beneficial highly productive sharing of expertise between13 disciplinary experts in13 materials manufacturing and13 data sciences

-shy‐ Continue to13 work o hard13 problemsmdashdevelop13 open13 source codes discover new descriptors data

mining uncertainty analysis-shy‐ Organize as a community (governmentindustryacademia nationalinternational) -shy‐ Follow data13 recommendations in MGI strategic planmdashworkshops to develop roadmap-shy‐ Attend13 more events like the MGI workshop13 mdash gaining13 lessons learned from other organizations is

extremely valuable

4 ldquoDefine critical datardquo13 to be13 compiled-shy‐ Consensus o what fundamental genomic data is most-shy‐important to compile (thermodynamics13

kinetics13 properties [strength]13 high-shy‐throughput13 DFT) -shy‐ Agree to13 the concept of a ldquouniversal repositoryrdquo for all genomic data that everyone will contribute

to and use (cf everyone maintaining and hosting their13 own databases) this includes issues of13 data

security and sharing etc -shy‐ List of most-shy‐important ICMEMGI13 materials development activities from an industrycommercial13

perspective (alloys composites piezoelectrics etc)-shy‐ Some13 ongoing repository efforts seem redundant We13 should avoid unnecessary competition and

try not13 to step o other peoples toes5 Explore and leverage data solutions developed13 outside materials13 community-shy‐ Semantic web infrastructure13 ndash evaluate13 feasibility of lsquounstructured materials datardquo management

(Technical pathcosttimeframe) -shy‐ Keep communicating outside13 the13 community about solutions -shy‐ Focus on available13 tools open ones when possible

6 Train13 students-shy‐ Sponsor students13 who can think in terms13 of IT and MSE

-shy‐ Training workshops (through professional societies) for training the new generation of materials students13 and researchers13 in the use of readily (already) accessible data science tools13 and cloud

services13 to make13 the13 best use13 of this emerging13 infrastructure13 in enhancing13 their own individual day-shy‐to-shy‐day productivity

7 Establish community13 norms13 for publishing materials13 data-shy‐ Involve publishers by requiring data when publishing Begin by making just the data available first

and then work on standardization panel

26

13

-shy‐ Data associated with publications should be shared along with the publications Not only the

metadata (eg average grain size vs cooling rate) but also the source of the data (eg micrographs used13 to13 generate13 grain size temperaturetime charts13 used to calculate cooling rates etc)

Government-shy‐Led13 Priorities1 Lead13 data strategyapproach13 development-shy‐ Hold meeting like this one in 6-shy‐1213 months with purpose13 of developing roadmap for MII-shy‐ Define a technical roadmap for13 the materials data infrastructure

-shy‐ Governmentrsquos got to decide on a strategy-shy‐ Whatrsquos the critical nucleus eg getting13 CALPHAD off the13 ground

-shy‐ Government must establish an official stance on security and infrastructure-shy‐ List of most-shy‐important ICMEMGI materials development activities from a government perspective

(alloys composites piezoelectrics etc)13 -shy‐ Needs assessment for professional workforce development-shy‐ Work with industry to make sure the research-shy‐to-shy‐engineering13 gap is filled What is the practical13 use

of the projects -shy‐ Provide13 the13 materials community organizing function

-shy‐ Business case for productivity improvement with13 data readily available

-shy‐ Facilitator as honest broker of capability awareness and data13 sharing

-shy‐ Interdisciplinary coordinated effort13 to avoidminimize duplication and redundancy

2 Facilitate data deposition (policy amp technology)-shy‐ Setting expectations and guidelines for best practices and standards where13 relevant -shy‐ Invest in tool13 making data management as easy as possible-shy‐ Promote standards developmentcommon data structuresshare knowledge representations -shy‐ Learn about the WBC Semantic Web capabilities and limitations -shy‐ Government should foster community development for both data package refinement and creating

standard approach to creating an enterprise toolset -shy‐ Provide13 motivation for standard data13 collection ensure13 it includes pedigreeprovenancemeta-shy‐data

-shy‐ Facilitator for materials data13 standards (good and trustworthy data13 vs bad data) -shy‐ Facilitator for forming consortium for data13 mining data13 warehousing and sharing

-shy‐ Facilitate13 industry standards for ICME13 model validation and maturity levels-shy‐ Facilitate13 standards for model-shy‐based13 process and13 supplier qualification

3 Incentivize data deposition-shy‐ Provide13 motivation for13 standard data collection ensure it13 includes13 pedigreeprovenancemeta-shy‐data-shy‐ Push ldquodata13 management planrdquo requirements to ldquodata13 depositrdquo requirements -shy‐ Make data sharing a requirement -shy‐ Contractual requirements that preserve data for future use

-shy‐ Demand data submission as a deliverable that will be in the contract before paying the funds -shy‐ Require that any data generated13 by federal grants is provided13 to13 the government (or a designated13

repository)13 within 3 years after completionmdashstart with basic13 research

-shy‐ Demonstrate the return on investments in the materials data infrastructure

27

13

4 Support13 data sharing13 and deposition13 (provide resources)-shy‐ Increase of MGI13 funds dedicated to MII13 development to 10-shy‐-shy‐repositories amp collab platforms -shy‐ Commit to13 long-shy‐term support for data repositories amp collaborative platforms-shy‐ Funding for development collection of fundamental genomic data

-shy‐ Support thermodynamicskinetics database development for light-shy‐weight alloy systems-shy‐ Funding for universal data13 repository (eg at NIST) ndash creation and maintenance

-shy‐ Fund or establish data13 repositories with commonuniform data13 collection format -shy‐ Createidentify secure data sharing site (like AMRDEC)13 for13 government13 contractorsmdashnot a

repository

-shy‐ AFRL take lead13 o making a database that can13 be13 shared

-shy‐ Infrastructure for materials related semantic web applications5 Support13 data creation-shy‐ Support experimental measurement effort to provide13 quality data for13 database development -shy‐ Support more13 high-shy‐throughput13 (HT)13 studies13 for applied programs13 to populate data Current HT

approaches support are13 limited to too fundamental single-shy‐phase DFT calculations 6 Enhance data management13 competency-shy‐ Government needs to increase in-shy‐house expertise in13 these areas or alternatively work closer with13

groups like NIST We cannot allow other organizations to define our needs for us7 Provide quality datasets13 for model13 developmentvalidation-shy‐ Need for open model benchmark cases with data to be available to the community

28

13

Attendees

John Allison University of MichiganJohn Beatty Army Research Laboratory

Ben13 Blaiszik University of ChicagoANL Carrie Campbell NIST

Sam Chance13 InovexJennifer13 Fielding Air Force Research LabMarco Fornari Central Michigan

Tim Hawes Decisive Analytics Corp

Lou Hector General MotorsAndrea Helbach Air Force Research LabKris Hill ITI-shy‐GlobalMike Glavicic Rolls Royce

Ro Gorham NCDMM

Dennis Griffin NASA

Mike Groeber Air Force Research LabScott Henry ASM InternationalMatt Jacobsen Air Force Research LabWill Joost Dept of Energy-shy‐EERE

Surya13 Kalidindi Georgia Tech

Kevin Kendig Air Force Research LabBen13 Leever Air Force Research LabLara Liou GE Aviation

Kenny Lipkowitz Office of Naval Research

Bill Mullins Office of Naval Research

Benji Maruyama Air Force Research LabRajiv Naik Pratt amp WhitneyJim Neumann Honeywell Ruth13 Pachter Air Force Research LabClare Paul Air Force Research LabAmra Peles United13 Technologies Research13 CenterAndy Powell GE Aviation

Brian13 Puchala University MichiganAjit Roy Air Force Research LabAyman13 Salem Materials ResourcesTom Searles Materials Data Management Jason Sebastian QuestekRichard13 Serwecki NASA

Dongwon Shin Oak Ridge13 National Labo

29

13

Harlan Shober RJ LeeJeff13 Sheehy NASA

Jeff13 Simmons Air Force Research LabVeera Sundararaghavan University of MichiganSesh Tamirisakandala RTI Glenn Tarcea University of MichiganTJ Turner Air Force Research LabVasisht Venkatesh Pratt amp WhitneyChuck Ward Air Force Research LabJim Warren Natl Institute of Standards amp TechJeff13 Williams GE Aviation

Terry Wong Rockedyne

Wayne Ziegler Army Research13 Laboratory

Jake Zindel Ford Motor Company

30

Page 24: Materials Genome Initiative: materials data · universallyaccepted best practices. Under the MGI, Federal agencies have funded research and development designedtoexplore, develop,

13

bull Establish well-shy‐pedigreed13 and13 curated13 demonstration13 datasets for non-shy‐proprietary metallic structural materials13 data over multiple length scales The team has13 chosen to work with 6061

aluminum and the13 related Al-shy‐Mg-shy‐Si system

bull Work with NIST and the materials data community to develop materials data schema and

ontologies for the demonstration13 datasets cognizant of broader interests and13 datasets

bull Develop and carry out a series of test problems that represent relevant use cases for the13 repository

bull Make data open to the materials data community for use in data analytics modeling and

educational purposes

bull Actively engage the materials data community and13 widely disseminate the findings from the

project

bull Develop and implement data13 capture13 and curation procedures that can serve13 as models for otherdata repositories

The basic philosophy behind the project is that often standards13 emerge through trial and error and examples are13 needed to spur this process

The project is using the NIST-shy‐KSU DSpace13 repository for data13 files The13 curated structured

database is being developed13 using GRANTA13 MI software and13 is hosted13 o ASM serversThe project ndashlaunched in late 2013 and planned as an 18-shy‐month program13 ndash is being conducted in

phases The13 first phase13 involved developing starter database13 and gaining experience13 in data13 curation The second phase (currently underway) involves building out more comprehensive data13 sets that will be

suitable for carrying out test problems13 related to processing-shy‐structure-shy‐property relationships in13 Al 6061 The third phase will involve making further refinements in response to testing The complete data13 set will be opened up online to the materials community as a whole

Globus13 Scientific13 Data Services13 Current and FutureBen13 Blaiszik1 Kyle Chard1 Jim Pruyne1 Rachana Anathakrishnan1 Steve Tuecke12 Ian Foster12

1University of Chicago Computation Institute 2Argonne National Laboratory MCS Division

During this talk we discussed a variety of currently available Globus services13 Via these Globusservices developers13 and IT administrators13 can offer simple secure and robust file transfer as13 well as13 identity profile and group management to their user communities13 Globus allows end users to create

and13 manage a unique identity that can13 be linked13 to13 external identities for authentication End13 users can13 also transfer files across wide13 area13 networks faster and more13 easily whatever their location

The talk also covered prototype services related to data13 publication metadata tagging and13 data

search Globus13 publishing capabilities13 are delivered through a hosted service Metadata is13 stored in the

cloud while raw published data and a fully-shy‐formed metadata log are stored on campus institutionaland group resources that13 are managed and operated by campus administrators Published datasets are

20

13

organized13 by communities and13 their member collections Globus users can13 create and13 manage theirown13 communities and13 collections through13 the data publication13 service13 with collection level policies regarding user13 access Additionally metadata from these communities are imported into a centralized

index to allow for robust searching functionality

Semantics13 for Deep13 InteroperabilitySamuel Chance iNovex

Semantic technology is based on mature13 web standards and intended to be13 used as global framework for13 machine-shy‐to-shy‐machine communication Interoperability is enabled even when the more

basic elements of the semantic technology stack are used13 in13 conjunction with a controlled vocabularyKnowledge13 representation is enabled as one13 begins to utilize13 the13 ldquohigherrdquo elements of the13 stack

brief history of the semantic web13 was presented13 and13 started13 with13 a depiction13 of the web13 that most people see today documents and web pages presented to the user13 through a web application It13 then depicted a transition to an intermediate state where documents and data can co-shy‐exist on asemantic13 web riding atop the same infrastructure (http) Even in this13 state the data can13 be directly

interpreted by a machine13 The history concluded with a depiction of data being extracted from adocument to13 become part of the web13 of data

The basics of semantic technology were presented and emphasized that the technology has afoundation based13 o formal logic that enables software13 to understand information and reduce13 the13 need

for13 a human to interpret13 the meaning This section of13 the presentation concluded with a chart13 depicting

how ldquotriplesrdquo link to13 each13 other to13 form a web13 of dataThe final section touched upon adoption of13 and things to consider13 when semantic technologies

are13 being considered as solution for community or enterprise

21

13

Big Data13 and Semantic TechnologiesHarlan Shober RJ Lee Group

The presentation started with a description of entities that13 comprise laboratoryinformatics with the central elements being a federated architecture scientific ldquoBig Datardquo management13 and workflow-shy‐driven Laboratory Information Management13 Systems (LIMS)To aid attendee understanding initial context13 was provided by first13 defining for ldquoBig Datardquo and

Semantic Technologies Context13 was further bolstered through a description of a problem that13 included laboratory testingcharacterization and fabrication processes The challenges for atypical laboratory include the use of many systems and the creation of large amounts of data13 that13 grows daily but13 cannot13 be readily accessed

After the context13 was established the needed functionality was described and consistedof several elements contribute index search analyze and archive While bearing in mind theneeded functionality a string of universal scientific data13 management13 process categories werethen presented with the intent13 of applying the universal description to the materials design anddevelopment13 domain

22

13

Though semantic technologies are applicable over the entire lifecycle of the data13 management13 process three categories were specifically identified as primary candidates fortheir application Data13 Acquisition Data13 Manipulation and Storage and Discovery andAdvisory W3C standards for Resource Description Framework (RDF) were applied to a notionalexample with the intent13 of completing a thread that13 gave the attendee a sense of how thistechnology could be applied

The presentation continued with a description of Postulate a specific solutiondeveloped by RJ Lee Group that13 could be used as part13 of a suite of tools to enable the captureof data13 and the near-shy‐frictionless tagging of data13 with additional observations provenanceinformation or links to additional data Using tools like Postulate repositories could be moreeasily shared with others in a federated manner These technologies would enable thedevelopment13 of indices that13 increase the precision of search results and ease analytics since thedata13 would be ldquoself-shy‐describedrdquo

In the remainder of the presentation Postulate functionality was further described andsome of the challenges presented earlier were addressed For example how do you processthe quantity and variety of data how do you store it how do you find the data13 you care aboutand how can the data13 be presented to you in a meaningful way

Natural13 Language Processing (NLP)Tim Hawes Decisive Analytics

The presentation provided the attendees with a description of NLP13 and how it can be13 used It started with examples13 of how it can be used to mine ldquoBig Datardquo to predict credit card fraud or assess13 user purchase histories for product development NLP is also13 being used13 in13 the field13 of biology and13 appears to have13 enjoyed some13 level of success

Since13 the13 biology and materials communities both publish large13 volume13 of documents (abstracts papers presentations etc) many of13 the NLP approaches developed for13 biology may be

applicable13 to materials The next section of the presentation focused on describing various aspects of NLP tokenization

part-shy‐of-shy‐speech tagging disambiguation parsing entity disambiguation and role13 labeling This was helpful in13 removing the mystery surrounding NLP for those13 who hadnrsquot had previous exposure

Typical products from NLP13 would be useful to the materials development community Examples include providing the gist of a documentrsquos content providing answers to natural13 language questions and knowledge13 construction

Specific NLP13 challenges for the13 materials domain were13 identified inline13 use13 of equations prevalence of tables graphs and13 images (more of a computer vision13 problem) and13 numerically driven13 information13 Additionally the ldquogenrerdquo problem was discussed as13 well as13 a way this might be addressed

using raw data (versus learning data) and13 semi-shy‐supervised learning active learning and distant supervision approaches

23

13

The presentation concluded by reiterating how NLP13 can be used and the value of using it describing some its limitations and13 identifying tool development needed13 for the materials domain

24

13

Community13 and13 Government Priorities in13 Addressing13 Materials Data

The following numbered13 priorities for13 both community and government13 actions were lsquorolled uprsquofrom workshop participantsrsquo inputs and were reviewed at13 the workshop The raw inputs provided by

participants are captured13 under each13 numbered13 priority and provide context to13 the priority and13 often13 more specific actions

Community-shy‐Led13 Priorities1 Develop13 and deploy13 standards13 for data and federatedcollaborative environments-shy‐ Promote13 standards for data13 and collaboration environments-shy‐ Designing and deploying suitable software environments that are capable of facilitating e-shy‐

recording and accelerating (fail often-shy‐fail fast)13 of13 the communityrsquos experience (both successes and

failures)13 in exploring integrated workflows needed to realize the vision of ICME and MGI-shy‐ Data13 standards (program managementtechnical datametadata)-shy‐ Community must start developing a common13 data format data package could13 look like an13

HDF5 file and be shared in a variety of systems -shy‐ Begin13 definition13 of standardized13 infrastructure

-shy‐ Produce13 draft guideline for the metadata including information regarding authorship provenance targethellip

-shy‐ Share13 the13 data13 types being collected with the13 community Include13 nametypeunits of data

-shy‐ Community must adopt a federated13 approach13 -shy‐ allow organizations to autonomously create their13 own13 toolsets but be able to13 readwrite to13 the common13 data package mentioned13 in13 point 1

-shy‐ Good documentation on repositories are important-shy‐ Address networking of repositories (lightweight metadata databaseindex) -shy‐ Digital heterostructure microstructure mapping algorithms for both scanning electron microscopy

(SEM)13 and transmission electron microscopy (TEM) -shy‐ Materials process model (thermodynamics) integration with ICMSE (Integrated Computational

Materials Science and Engineering) data flow

-shy‐ Data sharing protocol across users (government13 OEM13 developers) with appropriate securitycontrol

-shy‐ Work with instrumentequipment OEMs to get auto ingest capabilities and open architectures-shy‐ Work together to define common vocabularies for interoperability-shy‐ Standard tests for ICME13 model data13 standards for microstructure13 data

-shy‐ Standards for ICME13 model data13 storage13 and workflows 2 Communicate the value and need13 for digital materials13 data-shy‐ Brand13 management ndash consistent messaging of goals13 and beliefs -shy‐ Share13 success stories -shy‐ Value (ROI13 Return on Investment) o data capture (globally locally)-shy‐ Give incentive ($$$) for case studies to provide evidence that a shared data repository can be

actually useful beyond the archival purpose

25

13

-shy‐ Share13 ROIs for MII (the13 Materials Innovation Infrastructure)-shy‐ ICME projects due to their iterative nature take longer and are more expensive to realize benefits

in industry implementation13 Multi-shy‐year ICME programs are becoming13 a hard sell to management difficult to13 establish ROI and13 realize benefits claimed

3 Engage the community-shy‐ Write grant proposals-shy‐ Collaboration13 (or e-shy‐collaboration) science that will help us13 identify13 the specific13 incentives13 and

impediments for establishing mutually beneficial highly productive sharing of expertise between13 disciplinary experts in13 materials manufacturing and13 data sciences

-shy‐ Continue to13 work o hard13 problemsmdashdevelop13 open13 source codes discover new descriptors data

mining uncertainty analysis-shy‐ Organize as a community (governmentindustryacademia nationalinternational) -shy‐ Follow data13 recommendations in MGI strategic planmdashworkshops to develop roadmap-shy‐ Attend13 more events like the MGI workshop13 mdash gaining13 lessons learned from other organizations is

extremely valuable

4 ldquoDefine critical datardquo13 to be13 compiled-shy‐ Consensus o what fundamental genomic data is most-shy‐important to compile (thermodynamics13

kinetics13 properties [strength]13 high-shy‐throughput13 DFT) -shy‐ Agree to13 the concept of a ldquouniversal repositoryrdquo for all genomic data that everyone will contribute

to and use (cf everyone maintaining and hosting their13 own databases) this includes issues of13 data

security and sharing etc -shy‐ List of most-shy‐important ICMEMGI13 materials development activities from an industrycommercial13

perspective (alloys composites piezoelectrics etc)-shy‐ Some13 ongoing repository efforts seem redundant We13 should avoid unnecessary competition and

try not13 to step o other peoples toes5 Explore and leverage data solutions developed13 outside materials13 community-shy‐ Semantic web infrastructure13 ndash evaluate13 feasibility of lsquounstructured materials datardquo management

(Technical pathcosttimeframe) -shy‐ Keep communicating outside13 the13 community about solutions -shy‐ Focus on available13 tools open ones when possible

6 Train13 students-shy‐ Sponsor students13 who can think in terms13 of IT and MSE

-shy‐ Training workshops (through professional societies) for training the new generation of materials students13 and researchers13 in the use of readily (already) accessible data science tools13 and cloud

services13 to make13 the13 best use13 of this emerging13 infrastructure13 in enhancing13 their own individual day-shy‐to-shy‐day productivity

7 Establish community13 norms13 for publishing materials13 data-shy‐ Involve publishers by requiring data when publishing Begin by making just the data available first

and then work on standardization panel

26

13

-shy‐ Data associated with publications should be shared along with the publications Not only the

metadata (eg average grain size vs cooling rate) but also the source of the data (eg micrographs used13 to13 generate13 grain size temperaturetime charts13 used to calculate cooling rates etc)

Government-shy‐Led13 Priorities1 Lead13 data strategyapproach13 development-shy‐ Hold meeting like this one in 6-shy‐1213 months with purpose13 of developing roadmap for MII-shy‐ Define a technical roadmap for13 the materials data infrastructure

-shy‐ Governmentrsquos got to decide on a strategy-shy‐ Whatrsquos the critical nucleus eg getting13 CALPHAD off the13 ground

-shy‐ Government must establish an official stance on security and infrastructure-shy‐ List of most-shy‐important ICMEMGI materials development activities from a government perspective

(alloys composites piezoelectrics etc)13 -shy‐ Needs assessment for professional workforce development-shy‐ Work with industry to make sure the research-shy‐to-shy‐engineering13 gap is filled What is the practical13 use

of the projects -shy‐ Provide13 the13 materials community organizing function

-shy‐ Business case for productivity improvement with13 data readily available

-shy‐ Facilitator as honest broker of capability awareness and data13 sharing

-shy‐ Interdisciplinary coordinated effort13 to avoidminimize duplication and redundancy

2 Facilitate data deposition (policy amp technology)-shy‐ Setting expectations and guidelines for best practices and standards where13 relevant -shy‐ Invest in tool13 making data management as easy as possible-shy‐ Promote standards developmentcommon data structuresshare knowledge representations -shy‐ Learn about the WBC Semantic Web capabilities and limitations -shy‐ Government should foster community development for both data package refinement and creating

standard approach to creating an enterprise toolset -shy‐ Provide13 motivation for standard data13 collection ensure13 it includes pedigreeprovenancemeta-shy‐data

-shy‐ Facilitator for materials data13 standards (good and trustworthy data13 vs bad data) -shy‐ Facilitator for forming consortium for data13 mining data13 warehousing and sharing

-shy‐ Facilitate13 industry standards for ICME13 model validation and maturity levels-shy‐ Facilitate13 standards for model-shy‐based13 process and13 supplier qualification

3 Incentivize data deposition-shy‐ Provide13 motivation for13 standard data collection ensure it13 includes13 pedigreeprovenancemeta-shy‐data-shy‐ Push ldquodata13 management planrdquo requirements to ldquodata13 depositrdquo requirements -shy‐ Make data sharing a requirement -shy‐ Contractual requirements that preserve data for future use

-shy‐ Demand data submission as a deliverable that will be in the contract before paying the funds -shy‐ Require that any data generated13 by federal grants is provided13 to13 the government (or a designated13

repository)13 within 3 years after completionmdashstart with basic13 research

-shy‐ Demonstrate the return on investments in the materials data infrastructure

27

13

4 Support13 data sharing13 and deposition13 (provide resources)-shy‐ Increase of MGI13 funds dedicated to MII13 development to 10-shy‐-shy‐repositories amp collab platforms -shy‐ Commit to13 long-shy‐term support for data repositories amp collaborative platforms-shy‐ Funding for development collection of fundamental genomic data

-shy‐ Support thermodynamicskinetics database development for light-shy‐weight alloy systems-shy‐ Funding for universal data13 repository (eg at NIST) ndash creation and maintenance

-shy‐ Fund or establish data13 repositories with commonuniform data13 collection format -shy‐ Createidentify secure data sharing site (like AMRDEC)13 for13 government13 contractorsmdashnot a

repository

-shy‐ AFRL take lead13 o making a database that can13 be13 shared

-shy‐ Infrastructure for materials related semantic web applications5 Support13 data creation-shy‐ Support experimental measurement effort to provide13 quality data for13 database development -shy‐ Support more13 high-shy‐throughput13 (HT)13 studies13 for applied programs13 to populate data Current HT

approaches support are13 limited to too fundamental single-shy‐phase DFT calculations 6 Enhance data management13 competency-shy‐ Government needs to increase in-shy‐house expertise in13 these areas or alternatively work closer with13

groups like NIST We cannot allow other organizations to define our needs for us7 Provide quality datasets13 for model13 developmentvalidation-shy‐ Need for open model benchmark cases with data to be available to the community

28

13

Attendees

John Allison University of MichiganJohn Beatty Army Research Laboratory

Ben13 Blaiszik University of ChicagoANL Carrie Campbell NIST

Sam Chance13 InovexJennifer13 Fielding Air Force Research LabMarco Fornari Central Michigan

Tim Hawes Decisive Analytics Corp

Lou Hector General MotorsAndrea Helbach Air Force Research LabKris Hill ITI-shy‐GlobalMike Glavicic Rolls Royce

Ro Gorham NCDMM

Dennis Griffin NASA

Mike Groeber Air Force Research LabScott Henry ASM InternationalMatt Jacobsen Air Force Research LabWill Joost Dept of Energy-shy‐EERE

Surya13 Kalidindi Georgia Tech

Kevin Kendig Air Force Research LabBen13 Leever Air Force Research LabLara Liou GE Aviation

Kenny Lipkowitz Office of Naval Research

Bill Mullins Office of Naval Research

Benji Maruyama Air Force Research LabRajiv Naik Pratt amp WhitneyJim Neumann Honeywell Ruth13 Pachter Air Force Research LabClare Paul Air Force Research LabAmra Peles United13 Technologies Research13 CenterAndy Powell GE Aviation

Brian13 Puchala University MichiganAjit Roy Air Force Research LabAyman13 Salem Materials ResourcesTom Searles Materials Data Management Jason Sebastian QuestekRichard13 Serwecki NASA

Dongwon Shin Oak Ridge13 National Labo

29

13

Harlan Shober RJ LeeJeff13 Sheehy NASA

Jeff13 Simmons Air Force Research LabVeera Sundararaghavan University of MichiganSesh Tamirisakandala RTI Glenn Tarcea University of MichiganTJ Turner Air Force Research LabVasisht Venkatesh Pratt amp WhitneyChuck Ward Air Force Research LabJim Warren Natl Institute of Standards amp TechJeff13 Williams GE Aviation

Terry Wong Rockedyne

Wayne Ziegler Army Research13 Laboratory

Jake Zindel Ford Motor Company

30

Page 25: Materials Genome Initiative: materials data · universallyaccepted best practices. Under the MGI, Federal agencies have funded research and development designedtoexplore, develop,

13

organized13 by communities and13 their member collections Globus users can13 create and13 manage theirown13 communities and13 collections through13 the data publication13 service13 with collection level policies regarding user13 access Additionally metadata from these communities are imported into a centralized

index to allow for robust searching functionality

Semantics13 for Deep13 InteroperabilitySamuel Chance iNovex

Semantic technology is based on mature13 web standards and intended to be13 used as global framework for13 machine-shy‐to-shy‐machine communication Interoperability is enabled even when the more

basic elements of the semantic technology stack are used13 in13 conjunction with a controlled vocabularyKnowledge13 representation is enabled as one13 begins to utilize13 the13 ldquohigherrdquo elements of the13 stack

brief history of the semantic web13 was presented13 and13 started13 with13 a depiction13 of the web13 that most people see today documents and web pages presented to the user13 through a web application It13 then depicted a transition to an intermediate state where documents and data can co-shy‐exist on asemantic13 web riding atop the same infrastructure (http) Even in this13 state the data can13 be directly

interpreted by a machine13 The history concluded with a depiction of data being extracted from adocument to13 become part of the web13 of data

The basics of semantic technology were presented and emphasized that the technology has afoundation based13 o formal logic that enables software13 to understand information and reduce13 the13 need

for13 a human to interpret13 the meaning This section of13 the presentation concluded with a chart13 depicting

how ldquotriplesrdquo link to13 each13 other to13 form a web13 of dataThe final section touched upon adoption of13 and things to consider13 when semantic technologies

are13 being considered as solution for community or enterprise

21

13

Big Data13 and Semantic TechnologiesHarlan Shober RJ Lee Group

The presentation started with a description of entities that13 comprise laboratoryinformatics with the central elements being a federated architecture scientific ldquoBig Datardquo management13 and workflow-shy‐driven Laboratory Information Management13 Systems (LIMS)To aid attendee understanding initial context13 was provided by first13 defining for ldquoBig Datardquo and

Semantic Technologies Context13 was further bolstered through a description of a problem that13 included laboratory testingcharacterization and fabrication processes The challenges for atypical laboratory include the use of many systems and the creation of large amounts of data13 that13 grows daily but13 cannot13 be readily accessed

After the context13 was established the needed functionality was described and consistedof several elements contribute index search analyze and archive While bearing in mind theneeded functionality a string of universal scientific data13 management13 process categories werethen presented with the intent13 of applying the universal description to the materials design anddevelopment13 domain

22

13

Though semantic technologies are applicable over the entire lifecycle of the data13 management13 process three categories were specifically identified as primary candidates fortheir application Data13 Acquisition Data13 Manipulation and Storage and Discovery andAdvisory W3C standards for Resource Description Framework (RDF) were applied to a notionalexample with the intent13 of completing a thread that13 gave the attendee a sense of how thistechnology could be applied

The presentation continued with a description of Postulate a specific solutiondeveloped by RJ Lee Group that13 could be used as part13 of a suite of tools to enable the captureof data13 and the near-shy‐frictionless tagging of data13 with additional observations provenanceinformation or links to additional data Using tools like Postulate repositories could be moreeasily shared with others in a federated manner These technologies would enable thedevelopment13 of indices that13 increase the precision of search results and ease analytics since thedata13 would be ldquoself-shy‐describedrdquo

In the remainder of the presentation Postulate functionality was further described andsome of the challenges presented earlier were addressed For example how do you processthe quantity and variety of data how do you store it how do you find the data13 you care aboutand how can the data13 be presented to you in a meaningful way

Natural13 Language Processing (NLP)Tim Hawes Decisive Analytics

The presentation provided the attendees with a description of NLP13 and how it can be13 used It started with examples13 of how it can be used to mine ldquoBig Datardquo to predict credit card fraud or assess13 user purchase histories for product development NLP is also13 being used13 in13 the field13 of biology and13 appears to have13 enjoyed some13 level of success

Since13 the13 biology and materials communities both publish large13 volume13 of documents (abstracts papers presentations etc) many of13 the NLP approaches developed for13 biology may be

applicable13 to materials The next section of the presentation focused on describing various aspects of NLP tokenization

part-shy‐of-shy‐speech tagging disambiguation parsing entity disambiguation and role13 labeling This was helpful in13 removing the mystery surrounding NLP for those13 who hadnrsquot had previous exposure

Typical products from NLP13 would be useful to the materials development community Examples include providing the gist of a documentrsquos content providing answers to natural13 language questions and knowledge13 construction

Specific NLP13 challenges for the13 materials domain were13 identified inline13 use13 of equations prevalence of tables graphs and13 images (more of a computer vision13 problem) and13 numerically driven13 information13 Additionally the ldquogenrerdquo problem was discussed as13 well as13 a way this might be addressed

using raw data (versus learning data) and13 semi-shy‐supervised learning active learning and distant supervision approaches

23

13

The presentation concluded by reiterating how NLP13 can be used and the value of using it describing some its limitations and13 identifying tool development needed13 for the materials domain

24

13

Community13 and13 Government Priorities in13 Addressing13 Materials Data

The following numbered13 priorities for13 both community and government13 actions were lsquorolled uprsquofrom workshop participantsrsquo inputs and were reviewed at13 the workshop The raw inputs provided by

participants are captured13 under each13 numbered13 priority and provide context to13 the priority and13 often13 more specific actions

Community-shy‐Led13 Priorities1 Develop13 and deploy13 standards13 for data and federatedcollaborative environments-shy‐ Promote13 standards for data13 and collaboration environments-shy‐ Designing and deploying suitable software environments that are capable of facilitating e-shy‐

recording and accelerating (fail often-shy‐fail fast)13 of13 the communityrsquos experience (both successes and

failures)13 in exploring integrated workflows needed to realize the vision of ICME and MGI-shy‐ Data13 standards (program managementtechnical datametadata)-shy‐ Community must start developing a common13 data format data package could13 look like an13

HDF5 file and be shared in a variety of systems -shy‐ Begin13 definition13 of standardized13 infrastructure

-shy‐ Produce13 draft guideline for the metadata including information regarding authorship provenance targethellip

-shy‐ Share13 the13 data13 types being collected with the13 community Include13 nametypeunits of data

-shy‐ Community must adopt a federated13 approach13 -shy‐ allow organizations to autonomously create their13 own13 toolsets but be able to13 readwrite to13 the common13 data package mentioned13 in13 point 1

-shy‐ Good documentation on repositories are important-shy‐ Address networking of repositories (lightweight metadata databaseindex) -shy‐ Digital heterostructure microstructure mapping algorithms for both scanning electron microscopy

(SEM)13 and transmission electron microscopy (TEM) -shy‐ Materials process model (thermodynamics) integration with ICMSE (Integrated Computational

Materials Science and Engineering) data flow

-shy‐ Data sharing protocol across users (government13 OEM13 developers) with appropriate securitycontrol

-shy‐ Work with instrumentequipment OEMs to get auto ingest capabilities and open architectures-shy‐ Work together to define common vocabularies for interoperability-shy‐ Standard tests for ICME13 model data13 standards for microstructure13 data

-shy‐ Standards for ICME13 model data13 storage13 and workflows 2 Communicate the value and need13 for digital materials13 data-shy‐ Brand13 management ndash consistent messaging of goals13 and beliefs -shy‐ Share13 success stories -shy‐ Value (ROI13 Return on Investment) o data capture (globally locally)-shy‐ Give incentive ($$$) for case studies to provide evidence that a shared data repository can be

actually useful beyond the archival purpose

25

13

-shy‐ Share13 ROIs for MII (the13 Materials Innovation Infrastructure)-shy‐ ICME projects due to their iterative nature take longer and are more expensive to realize benefits

in industry implementation13 Multi-shy‐year ICME programs are becoming13 a hard sell to management difficult to13 establish ROI and13 realize benefits claimed

3 Engage the community-shy‐ Write grant proposals-shy‐ Collaboration13 (or e-shy‐collaboration) science that will help us13 identify13 the specific13 incentives13 and

impediments for establishing mutually beneficial highly productive sharing of expertise between13 disciplinary experts in13 materials manufacturing and13 data sciences

-shy‐ Continue to13 work o hard13 problemsmdashdevelop13 open13 source codes discover new descriptors data

mining uncertainty analysis-shy‐ Organize as a community (governmentindustryacademia nationalinternational) -shy‐ Follow data13 recommendations in MGI strategic planmdashworkshops to develop roadmap-shy‐ Attend13 more events like the MGI workshop13 mdash gaining13 lessons learned from other organizations is

extremely valuable

4 ldquoDefine critical datardquo13 to be13 compiled-shy‐ Consensus o what fundamental genomic data is most-shy‐important to compile (thermodynamics13

kinetics13 properties [strength]13 high-shy‐throughput13 DFT) -shy‐ Agree to13 the concept of a ldquouniversal repositoryrdquo for all genomic data that everyone will contribute

to and use (cf everyone maintaining and hosting their13 own databases) this includes issues of13 data

security and sharing etc -shy‐ List of most-shy‐important ICMEMGI13 materials development activities from an industrycommercial13

perspective (alloys composites piezoelectrics etc)-shy‐ Some13 ongoing repository efforts seem redundant We13 should avoid unnecessary competition and

try not13 to step o other peoples toes5 Explore and leverage data solutions developed13 outside materials13 community-shy‐ Semantic web infrastructure13 ndash evaluate13 feasibility of lsquounstructured materials datardquo management

(Technical pathcosttimeframe) -shy‐ Keep communicating outside13 the13 community about solutions -shy‐ Focus on available13 tools open ones when possible

6 Train13 students-shy‐ Sponsor students13 who can think in terms13 of IT and MSE

-shy‐ Training workshops (through professional societies) for training the new generation of materials students13 and researchers13 in the use of readily (already) accessible data science tools13 and cloud

services13 to make13 the13 best use13 of this emerging13 infrastructure13 in enhancing13 their own individual day-shy‐to-shy‐day productivity

7 Establish community13 norms13 for publishing materials13 data-shy‐ Involve publishers by requiring data when publishing Begin by making just the data available first

and then work on standardization panel

26

13

-shy‐ Data associated with publications should be shared along with the publications Not only the

metadata (eg average grain size vs cooling rate) but also the source of the data (eg micrographs used13 to13 generate13 grain size temperaturetime charts13 used to calculate cooling rates etc)

Government-shy‐Led13 Priorities1 Lead13 data strategyapproach13 development-shy‐ Hold meeting like this one in 6-shy‐1213 months with purpose13 of developing roadmap for MII-shy‐ Define a technical roadmap for13 the materials data infrastructure

-shy‐ Governmentrsquos got to decide on a strategy-shy‐ Whatrsquos the critical nucleus eg getting13 CALPHAD off the13 ground

-shy‐ Government must establish an official stance on security and infrastructure-shy‐ List of most-shy‐important ICMEMGI materials development activities from a government perspective

(alloys composites piezoelectrics etc)13 -shy‐ Needs assessment for professional workforce development-shy‐ Work with industry to make sure the research-shy‐to-shy‐engineering13 gap is filled What is the practical13 use

of the projects -shy‐ Provide13 the13 materials community organizing function

-shy‐ Business case for productivity improvement with13 data readily available

-shy‐ Facilitator as honest broker of capability awareness and data13 sharing

-shy‐ Interdisciplinary coordinated effort13 to avoidminimize duplication and redundancy

2 Facilitate data deposition (policy amp technology)-shy‐ Setting expectations and guidelines for best practices and standards where13 relevant -shy‐ Invest in tool13 making data management as easy as possible-shy‐ Promote standards developmentcommon data structuresshare knowledge representations -shy‐ Learn about the WBC Semantic Web capabilities and limitations -shy‐ Government should foster community development for both data package refinement and creating

standard approach to creating an enterprise toolset -shy‐ Provide13 motivation for standard data13 collection ensure13 it includes pedigreeprovenancemeta-shy‐data

-shy‐ Facilitator for materials data13 standards (good and trustworthy data13 vs bad data) -shy‐ Facilitator for forming consortium for data13 mining data13 warehousing and sharing

-shy‐ Facilitate13 industry standards for ICME13 model validation and maturity levels-shy‐ Facilitate13 standards for model-shy‐based13 process and13 supplier qualification

3 Incentivize data deposition-shy‐ Provide13 motivation for13 standard data collection ensure it13 includes13 pedigreeprovenancemeta-shy‐data-shy‐ Push ldquodata13 management planrdquo requirements to ldquodata13 depositrdquo requirements -shy‐ Make data sharing a requirement -shy‐ Contractual requirements that preserve data for future use

-shy‐ Demand data submission as a deliverable that will be in the contract before paying the funds -shy‐ Require that any data generated13 by federal grants is provided13 to13 the government (or a designated13

repository)13 within 3 years after completionmdashstart with basic13 research

-shy‐ Demonstrate the return on investments in the materials data infrastructure

27

13

4 Support13 data sharing13 and deposition13 (provide resources)-shy‐ Increase of MGI13 funds dedicated to MII13 development to 10-shy‐-shy‐repositories amp collab platforms -shy‐ Commit to13 long-shy‐term support for data repositories amp collaborative platforms-shy‐ Funding for development collection of fundamental genomic data

-shy‐ Support thermodynamicskinetics database development for light-shy‐weight alloy systems-shy‐ Funding for universal data13 repository (eg at NIST) ndash creation and maintenance

-shy‐ Fund or establish data13 repositories with commonuniform data13 collection format -shy‐ Createidentify secure data sharing site (like AMRDEC)13 for13 government13 contractorsmdashnot a

repository

-shy‐ AFRL take lead13 o making a database that can13 be13 shared

-shy‐ Infrastructure for materials related semantic web applications5 Support13 data creation-shy‐ Support experimental measurement effort to provide13 quality data for13 database development -shy‐ Support more13 high-shy‐throughput13 (HT)13 studies13 for applied programs13 to populate data Current HT

approaches support are13 limited to too fundamental single-shy‐phase DFT calculations 6 Enhance data management13 competency-shy‐ Government needs to increase in-shy‐house expertise in13 these areas or alternatively work closer with13

groups like NIST We cannot allow other organizations to define our needs for us7 Provide quality datasets13 for model13 developmentvalidation-shy‐ Need for open model benchmark cases with data to be available to the community

28

13

Attendees

John Allison University of MichiganJohn Beatty Army Research Laboratory

Ben13 Blaiszik University of ChicagoANL Carrie Campbell NIST

Sam Chance13 InovexJennifer13 Fielding Air Force Research LabMarco Fornari Central Michigan

Tim Hawes Decisive Analytics Corp

Lou Hector General MotorsAndrea Helbach Air Force Research LabKris Hill ITI-shy‐GlobalMike Glavicic Rolls Royce

Ro Gorham NCDMM

Dennis Griffin NASA

Mike Groeber Air Force Research LabScott Henry ASM InternationalMatt Jacobsen Air Force Research LabWill Joost Dept of Energy-shy‐EERE

Surya13 Kalidindi Georgia Tech

Kevin Kendig Air Force Research LabBen13 Leever Air Force Research LabLara Liou GE Aviation

Kenny Lipkowitz Office of Naval Research

Bill Mullins Office of Naval Research

Benji Maruyama Air Force Research LabRajiv Naik Pratt amp WhitneyJim Neumann Honeywell Ruth13 Pachter Air Force Research LabClare Paul Air Force Research LabAmra Peles United13 Technologies Research13 CenterAndy Powell GE Aviation

Brian13 Puchala University MichiganAjit Roy Air Force Research LabAyman13 Salem Materials ResourcesTom Searles Materials Data Management Jason Sebastian QuestekRichard13 Serwecki NASA

Dongwon Shin Oak Ridge13 National Labo

29

13

Harlan Shober RJ LeeJeff13 Sheehy NASA

Jeff13 Simmons Air Force Research LabVeera Sundararaghavan University of MichiganSesh Tamirisakandala RTI Glenn Tarcea University of MichiganTJ Turner Air Force Research LabVasisht Venkatesh Pratt amp WhitneyChuck Ward Air Force Research LabJim Warren Natl Institute of Standards amp TechJeff13 Williams GE Aviation

Terry Wong Rockedyne

Wayne Ziegler Army Research13 Laboratory

Jake Zindel Ford Motor Company

30

Page 26: Materials Genome Initiative: materials data · universallyaccepted best practices. Under the MGI, Federal agencies have funded research and development designedtoexplore, develop,

13

Big Data13 and Semantic TechnologiesHarlan Shober RJ Lee Group

The presentation started with a description of entities that13 comprise laboratoryinformatics with the central elements being a federated architecture scientific ldquoBig Datardquo management13 and workflow-shy‐driven Laboratory Information Management13 Systems (LIMS)To aid attendee understanding initial context13 was provided by first13 defining for ldquoBig Datardquo and

Semantic Technologies Context13 was further bolstered through a description of a problem that13 included laboratory testingcharacterization and fabrication processes The challenges for atypical laboratory include the use of many systems and the creation of large amounts of data13 that13 grows daily but13 cannot13 be readily accessed

After the context13 was established the needed functionality was described and consistedof several elements contribute index search analyze and archive While bearing in mind theneeded functionality a string of universal scientific data13 management13 process categories werethen presented with the intent13 of applying the universal description to the materials design anddevelopment13 domain

22

13

Though semantic technologies are applicable over the entire lifecycle of the data13 management13 process three categories were specifically identified as primary candidates fortheir application Data13 Acquisition Data13 Manipulation and Storage and Discovery andAdvisory W3C standards for Resource Description Framework (RDF) were applied to a notionalexample with the intent13 of completing a thread that13 gave the attendee a sense of how thistechnology could be applied

The presentation continued with a description of Postulate a specific solutiondeveloped by RJ Lee Group that13 could be used as part13 of a suite of tools to enable the captureof data13 and the near-shy‐frictionless tagging of data13 with additional observations provenanceinformation or links to additional data Using tools like Postulate repositories could be moreeasily shared with others in a federated manner These technologies would enable thedevelopment13 of indices that13 increase the precision of search results and ease analytics since thedata13 would be ldquoself-shy‐describedrdquo

In the remainder of the presentation Postulate functionality was further described andsome of the challenges presented earlier were addressed For example how do you processthe quantity and variety of data how do you store it how do you find the data13 you care aboutand how can the data13 be presented to you in a meaningful way

Natural13 Language Processing (NLP)Tim Hawes Decisive Analytics

The presentation provided the attendees with a description of NLP13 and how it can be13 used It started with examples13 of how it can be used to mine ldquoBig Datardquo to predict credit card fraud or assess13 user purchase histories for product development NLP is also13 being used13 in13 the field13 of biology and13 appears to have13 enjoyed some13 level of success

Since13 the13 biology and materials communities both publish large13 volume13 of documents (abstracts papers presentations etc) many of13 the NLP approaches developed for13 biology may be

applicable13 to materials The next section of the presentation focused on describing various aspects of NLP tokenization

part-shy‐of-shy‐speech tagging disambiguation parsing entity disambiguation and role13 labeling This was helpful in13 removing the mystery surrounding NLP for those13 who hadnrsquot had previous exposure

Typical products from NLP13 would be useful to the materials development community Examples include providing the gist of a documentrsquos content providing answers to natural13 language questions and knowledge13 construction

Specific NLP13 challenges for the13 materials domain were13 identified inline13 use13 of equations prevalence of tables graphs and13 images (more of a computer vision13 problem) and13 numerically driven13 information13 Additionally the ldquogenrerdquo problem was discussed as13 well as13 a way this might be addressed

using raw data (versus learning data) and13 semi-shy‐supervised learning active learning and distant supervision approaches

23

13

The presentation concluded by reiterating how NLP13 can be used and the value of using it describing some its limitations and13 identifying tool development needed13 for the materials domain

24

13

Community13 and13 Government Priorities in13 Addressing13 Materials Data

The following numbered13 priorities for13 both community and government13 actions were lsquorolled uprsquofrom workshop participantsrsquo inputs and were reviewed at13 the workshop The raw inputs provided by

participants are captured13 under each13 numbered13 priority and provide context to13 the priority and13 often13 more specific actions

Community-shy‐Led13 Priorities1 Develop13 and deploy13 standards13 for data and federatedcollaborative environments-shy‐ Promote13 standards for data13 and collaboration environments-shy‐ Designing and deploying suitable software environments that are capable of facilitating e-shy‐

recording and accelerating (fail often-shy‐fail fast)13 of13 the communityrsquos experience (both successes and

failures)13 in exploring integrated workflows needed to realize the vision of ICME and MGI-shy‐ Data13 standards (program managementtechnical datametadata)-shy‐ Community must start developing a common13 data format data package could13 look like an13

HDF5 file and be shared in a variety of systems -shy‐ Begin13 definition13 of standardized13 infrastructure

-shy‐ Produce13 draft guideline for the metadata including information regarding authorship provenance targethellip

-shy‐ Share13 the13 data13 types being collected with the13 community Include13 nametypeunits of data

-shy‐ Community must adopt a federated13 approach13 -shy‐ allow organizations to autonomously create their13 own13 toolsets but be able to13 readwrite to13 the common13 data package mentioned13 in13 point 1

-shy‐ Good documentation on repositories are important-shy‐ Address networking of repositories (lightweight metadata databaseindex) -shy‐ Digital heterostructure microstructure mapping algorithms for both scanning electron microscopy

(SEM)13 and transmission electron microscopy (TEM) -shy‐ Materials process model (thermodynamics) integration with ICMSE (Integrated Computational

Materials Science and Engineering) data flow

-shy‐ Data sharing protocol across users (government13 OEM13 developers) with appropriate securitycontrol

-shy‐ Work with instrumentequipment OEMs to get auto ingest capabilities and open architectures-shy‐ Work together to define common vocabularies for interoperability-shy‐ Standard tests for ICME13 model data13 standards for microstructure13 data

-shy‐ Standards for ICME13 model data13 storage13 and workflows 2 Communicate the value and need13 for digital materials13 data-shy‐ Brand13 management ndash consistent messaging of goals13 and beliefs -shy‐ Share13 success stories -shy‐ Value (ROI13 Return on Investment) o data capture (globally locally)-shy‐ Give incentive ($$$) for case studies to provide evidence that a shared data repository can be

actually useful beyond the archival purpose

25

13

-shy‐ Share13 ROIs for MII (the13 Materials Innovation Infrastructure)-shy‐ ICME projects due to their iterative nature take longer and are more expensive to realize benefits

in industry implementation13 Multi-shy‐year ICME programs are becoming13 a hard sell to management difficult to13 establish ROI and13 realize benefits claimed

3 Engage the community-shy‐ Write grant proposals-shy‐ Collaboration13 (or e-shy‐collaboration) science that will help us13 identify13 the specific13 incentives13 and

impediments for establishing mutually beneficial highly productive sharing of expertise between13 disciplinary experts in13 materials manufacturing and13 data sciences

-shy‐ Continue to13 work o hard13 problemsmdashdevelop13 open13 source codes discover new descriptors data

mining uncertainty analysis-shy‐ Organize as a community (governmentindustryacademia nationalinternational) -shy‐ Follow data13 recommendations in MGI strategic planmdashworkshops to develop roadmap-shy‐ Attend13 more events like the MGI workshop13 mdash gaining13 lessons learned from other organizations is

extremely valuable

4 ldquoDefine critical datardquo13 to be13 compiled-shy‐ Consensus o what fundamental genomic data is most-shy‐important to compile (thermodynamics13

kinetics13 properties [strength]13 high-shy‐throughput13 DFT) -shy‐ Agree to13 the concept of a ldquouniversal repositoryrdquo for all genomic data that everyone will contribute

to and use (cf everyone maintaining and hosting their13 own databases) this includes issues of13 data

security and sharing etc -shy‐ List of most-shy‐important ICMEMGI13 materials development activities from an industrycommercial13

perspective (alloys composites piezoelectrics etc)-shy‐ Some13 ongoing repository efforts seem redundant We13 should avoid unnecessary competition and

try not13 to step o other peoples toes5 Explore and leverage data solutions developed13 outside materials13 community-shy‐ Semantic web infrastructure13 ndash evaluate13 feasibility of lsquounstructured materials datardquo management

(Technical pathcosttimeframe) -shy‐ Keep communicating outside13 the13 community about solutions -shy‐ Focus on available13 tools open ones when possible

6 Train13 students-shy‐ Sponsor students13 who can think in terms13 of IT and MSE

-shy‐ Training workshops (through professional societies) for training the new generation of materials students13 and researchers13 in the use of readily (already) accessible data science tools13 and cloud

services13 to make13 the13 best use13 of this emerging13 infrastructure13 in enhancing13 their own individual day-shy‐to-shy‐day productivity

7 Establish community13 norms13 for publishing materials13 data-shy‐ Involve publishers by requiring data when publishing Begin by making just the data available first

and then work on standardization panel

26

13

-shy‐ Data associated with publications should be shared along with the publications Not only the

metadata (eg average grain size vs cooling rate) but also the source of the data (eg micrographs used13 to13 generate13 grain size temperaturetime charts13 used to calculate cooling rates etc)

Government-shy‐Led13 Priorities1 Lead13 data strategyapproach13 development-shy‐ Hold meeting like this one in 6-shy‐1213 months with purpose13 of developing roadmap for MII-shy‐ Define a technical roadmap for13 the materials data infrastructure

-shy‐ Governmentrsquos got to decide on a strategy-shy‐ Whatrsquos the critical nucleus eg getting13 CALPHAD off the13 ground

-shy‐ Government must establish an official stance on security and infrastructure-shy‐ List of most-shy‐important ICMEMGI materials development activities from a government perspective

(alloys composites piezoelectrics etc)13 -shy‐ Needs assessment for professional workforce development-shy‐ Work with industry to make sure the research-shy‐to-shy‐engineering13 gap is filled What is the practical13 use

of the projects -shy‐ Provide13 the13 materials community organizing function

-shy‐ Business case for productivity improvement with13 data readily available

-shy‐ Facilitator as honest broker of capability awareness and data13 sharing

-shy‐ Interdisciplinary coordinated effort13 to avoidminimize duplication and redundancy

2 Facilitate data deposition (policy amp technology)-shy‐ Setting expectations and guidelines for best practices and standards where13 relevant -shy‐ Invest in tool13 making data management as easy as possible-shy‐ Promote standards developmentcommon data structuresshare knowledge representations -shy‐ Learn about the WBC Semantic Web capabilities and limitations -shy‐ Government should foster community development for both data package refinement and creating

standard approach to creating an enterprise toolset -shy‐ Provide13 motivation for standard data13 collection ensure13 it includes pedigreeprovenancemeta-shy‐data

-shy‐ Facilitator for materials data13 standards (good and trustworthy data13 vs bad data) -shy‐ Facilitator for forming consortium for data13 mining data13 warehousing and sharing

-shy‐ Facilitate13 industry standards for ICME13 model validation and maturity levels-shy‐ Facilitate13 standards for model-shy‐based13 process and13 supplier qualification

3 Incentivize data deposition-shy‐ Provide13 motivation for13 standard data collection ensure it13 includes13 pedigreeprovenancemeta-shy‐data-shy‐ Push ldquodata13 management planrdquo requirements to ldquodata13 depositrdquo requirements -shy‐ Make data sharing a requirement -shy‐ Contractual requirements that preserve data for future use

-shy‐ Demand data submission as a deliverable that will be in the contract before paying the funds -shy‐ Require that any data generated13 by federal grants is provided13 to13 the government (or a designated13

repository)13 within 3 years after completionmdashstart with basic13 research

-shy‐ Demonstrate the return on investments in the materials data infrastructure

27

13

4 Support13 data sharing13 and deposition13 (provide resources)-shy‐ Increase of MGI13 funds dedicated to MII13 development to 10-shy‐-shy‐repositories amp collab platforms -shy‐ Commit to13 long-shy‐term support for data repositories amp collaborative platforms-shy‐ Funding for development collection of fundamental genomic data

-shy‐ Support thermodynamicskinetics database development for light-shy‐weight alloy systems-shy‐ Funding for universal data13 repository (eg at NIST) ndash creation and maintenance

-shy‐ Fund or establish data13 repositories with commonuniform data13 collection format -shy‐ Createidentify secure data sharing site (like AMRDEC)13 for13 government13 contractorsmdashnot a

repository

-shy‐ AFRL take lead13 o making a database that can13 be13 shared

-shy‐ Infrastructure for materials related semantic web applications5 Support13 data creation-shy‐ Support experimental measurement effort to provide13 quality data for13 database development -shy‐ Support more13 high-shy‐throughput13 (HT)13 studies13 for applied programs13 to populate data Current HT

approaches support are13 limited to too fundamental single-shy‐phase DFT calculations 6 Enhance data management13 competency-shy‐ Government needs to increase in-shy‐house expertise in13 these areas or alternatively work closer with13

groups like NIST We cannot allow other organizations to define our needs for us7 Provide quality datasets13 for model13 developmentvalidation-shy‐ Need for open model benchmark cases with data to be available to the community

28

13

Attendees

John Allison University of MichiganJohn Beatty Army Research Laboratory

Ben13 Blaiszik University of ChicagoANL Carrie Campbell NIST

Sam Chance13 InovexJennifer13 Fielding Air Force Research LabMarco Fornari Central Michigan

Tim Hawes Decisive Analytics Corp

Lou Hector General MotorsAndrea Helbach Air Force Research LabKris Hill ITI-shy‐GlobalMike Glavicic Rolls Royce

Ro Gorham NCDMM

Dennis Griffin NASA

Mike Groeber Air Force Research LabScott Henry ASM InternationalMatt Jacobsen Air Force Research LabWill Joost Dept of Energy-shy‐EERE

Surya13 Kalidindi Georgia Tech

Kevin Kendig Air Force Research LabBen13 Leever Air Force Research LabLara Liou GE Aviation

Kenny Lipkowitz Office of Naval Research

Bill Mullins Office of Naval Research

Benji Maruyama Air Force Research LabRajiv Naik Pratt amp WhitneyJim Neumann Honeywell Ruth13 Pachter Air Force Research LabClare Paul Air Force Research LabAmra Peles United13 Technologies Research13 CenterAndy Powell GE Aviation

Brian13 Puchala University MichiganAjit Roy Air Force Research LabAyman13 Salem Materials ResourcesTom Searles Materials Data Management Jason Sebastian QuestekRichard13 Serwecki NASA

Dongwon Shin Oak Ridge13 National Labo

29

13

Harlan Shober RJ LeeJeff13 Sheehy NASA

Jeff13 Simmons Air Force Research LabVeera Sundararaghavan University of MichiganSesh Tamirisakandala RTI Glenn Tarcea University of MichiganTJ Turner Air Force Research LabVasisht Venkatesh Pratt amp WhitneyChuck Ward Air Force Research LabJim Warren Natl Institute of Standards amp TechJeff13 Williams GE Aviation

Terry Wong Rockedyne

Wayne Ziegler Army Research13 Laboratory

Jake Zindel Ford Motor Company

30

Page 27: Materials Genome Initiative: materials data · universallyaccepted best practices. Under the MGI, Federal agencies have funded research and development designedtoexplore, develop,

13

Though semantic technologies are applicable over the entire lifecycle of the data13 management13 process three categories were specifically identified as primary candidates fortheir application Data13 Acquisition Data13 Manipulation and Storage and Discovery andAdvisory W3C standards for Resource Description Framework (RDF) were applied to a notionalexample with the intent13 of completing a thread that13 gave the attendee a sense of how thistechnology could be applied

The presentation continued with a description of Postulate a specific solutiondeveloped by RJ Lee Group that13 could be used as part13 of a suite of tools to enable the captureof data13 and the near-shy‐frictionless tagging of data13 with additional observations provenanceinformation or links to additional data Using tools like Postulate repositories could be moreeasily shared with others in a federated manner These technologies would enable thedevelopment13 of indices that13 increase the precision of search results and ease analytics since thedata13 would be ldquoself-shy‐describedrdquo

In the remainder of the presentation Postulate functionality was further described andsome of the challenges presented earlier were addressed For example how do you processthe quantity and variety of data how do you store it how do you find the data13 you care aboutand how can the data13 be presented to you in a meaningful way

Natural13 Language Processing (NLP)Tim Hawes Decisive Analytics

The presentation provided the attendees with a description of NLP13 and how it can be13 used It started with examples13 of how it can be used to mine ldquoBig Datardquo to predict credit card fraud or assess13 user purchase histories for product development NLP is also13 being used13 in13 the field13 of biology and13 appears to have13 enjoyed some13 level of success

Since13 the13 biology and materials communities both publish large13 volume13 of documents (abstracts papers presentations etc) many of13 the NLP approaches developed for13 biology may be

applicable13 to materials The next section of the presentation focused on describing various aspects of NLP tokenization

part-shy‐of-shy‐speech tagging disambiguation parsing entity disambiguation and role13 labeling This was helpful in13 removing the mystery surrounding NLP for those13 who hadnrsquot had previous exposure

Typical products from NLP13 would be useful to the materials development community Examples include providing the gist of a documentrsquos content providing answers to natural13 language questions and knowledge13 construction

Specific NLP13 challenges for the13 materials domain were13 identified inline13 use13 of equations prevalence of tables graphs and13 images (more of a computer vision13 problem) and13 numerically driven13 information13 Additionally the ldquogenrerdquo problem was discussed as13 well as13 a way this might be addressed

using raw data (versus learning data) and13 semi-shy‐supervised learning active learning and distant supervision approaches

23

13

The presentation concluded by reiterating how NLP13 can be used and the value of using it describing some its limitations and13 identifying tool development needed13 for the materials domain

24

13

Community13 and13 Government Priorities in13 Addressing13 Materials Data

The following numbered13 priorities for13 both community and government13 actions were lsquorolled uprsquofrom workshop participantsrsquo inputs and were reviewed at13 the workshop The raw inputs provided by

participants are captured13 under each13 numbered13 priority and provide context to13 the priority and13 often13 more specific actions

Community-shy‐Led13 Priorities1 Develop13 and deploy13 standards13 for data and federatedcollaborative environments-shy‐ Promote13 standards for data13 and collaboration environments-shy‐ Designing and deploying suitable software environments that are capable of facilitating e-shy‐

recording and accelerating (fail often-shy‐fail fast)13 of13 the communityrsquos experience (both successes and

failures)13 in exploring integrated workflows needed to realize the vision of ICME and MGI-shy‐ Data13 standards (program managementtechnical datametadata)-shy‐ Community must start developing a common13 data format data package could13 look like an13

HDF5 file and be shared in a variety of systems -shy‐ Begin13 definition13 of standardized13 infrastructure

-shy‐ Produce13 draft guideline for the metadata including information regarding authorship provenance targethellip

-shy‐ Share13 the13 data13 types being collected with the13 community Include13 nametypeunits of data

-shy‐ Community must adopt a federated13 approach13 -shy‐ allow organizations to autonomously create their13 own13 toolsets but be able to13 readwrite to13 the common13 data package mentioned13 in13 point 1

-shy‐ Good documentation on repositories are important-shy‐ Address networking of repositories (lightweight metadata databaseindex) -shy‐ Digital heterostructure microstructure mapping algorithms for both scanning electron microscopy

(SEM)13 and transmission electron microscopy (TEM) -shy‐ Materials process model (thermodynamics) integration with ICMSE (Integrated Computational

Materials Science and Engineering) data flow

-shy‐ Data sharing protocol across users (government13 OEM13 developers) with appropriate securitycontrol

-shy‐ Work with instrumentequipment OEMs to get auto ingest capabilities and open architectures-shy‐ Work together to define common vocabularies for interoperability-shy‐ Standard tests for ICME13 model data13 standards for microstructure13 data

-shy‐ Standards for ICME13 model data13 storage13 and workflows 2 Communicate the value and need13 for digital materials13 data-shy‐ Brand13 management ndash consistent messaging of goals13 and beliefs -shy‐ Share13 success stories -shy‐ Value (ROI13 Return on Investment) o data capture (globally locally)-shy‐ Give incentive ($$$) for case studies to provide evidence that a shared data repository can be

actually useful beyond the archival purpose

25

13

-shy‐ Share13 ROIs for MII (the13 Materials Innovation Infrastructure)-shy‐ ICME projects due to their iterative nature take longer and are more expensive to realize benefits

in industry implementation13 Multi-shy‐year ICME programs are becoming13 a hard sell to management difficult to13 establish ROI and13 realize benefits claimed

3 Engage the community-shy‐ Write grant proposals-shy‐ Collaboration13 (or e-shy‐collaboration) science that will help us13 identify13 the specific13 incentives13 and

impediments for establishing mutually beneficial highly productive sharing of expertise between13 disciplinary experts in13 materials manufacturing and13 data sciences

-shy‐ Continue to13 work o hard13 problemsmdashdevelop13 open13 source codes discover new descriptors data

mining uncertainty analysis-shy‐ Organize as a community (governmentindustryacademia nationalinternational) -shy‐ Follow data13 recommendations in MGI strategic planmdashworkshops to develop roadmap-shy‐ Attend13 more events like the MGI workshop13 mdash gaining13 lessons learned from other organizations is

extremely valuable

4 ldquoDefine critical datardquo13 to be13 compiled-shy‐ Consensus o what fundamental genomic data is most-shy‐important to compile (thermodynamics13

kinetics13 properties [strength]13 high-shy‐throughput13 DFT) -shy‐ Agree to13 the concept of a ldquouniversal repositoryrdquo for all genomic data that everyone will contribute

to and use (cf everyone maintaining and hosting their13 own databases) this includes issues of13 data

security and sharing etc -shy‐ List of most-shy‐important ICMEMGI13 materials development activities from an industrycommercial13

perspective (alloys composites piezoelectrics etc)-shy‐ Some13 ongoing repository efforts seem redundant We13 should avoid unnecessary competition and

try not13 to step o other peoples toes5 Explore and leverage data solutions developed13 outside materials13 community-shy‐ Semantic web infrastructure13 ndash evaluate13 feasibility of lsquounstructured materials datardquo management

(Technical pathcosttimeframe) -shy‐ Keep communicating outside13 the13 community about solutions -shy‐ Focus on available13 tools open ones when possible

6 Train13 students-shy‐ Sponsor students13 who can think in terms13 of IT and MSE

-shy‐ Training workshops (through professional societies) for training the new generation of materials students13 and researchers13 in the use of readily (already) accessible data science tools13 and cloud

services13 to make13 the13 best use13 of this emerging13 infrastructure13 in enhancing13 their own individual day-shy‐to-shy‐day productivity

7 Establish community13 norms13 for publishing materials13 data-shy‐ Involve publishers by requiring data when publishing Begin by making just the data available first

and then work on standardization panel

26

13

-shy‐ Data associated with publications should be shared along with the publications Not only the

metadata (eg average grain size vs cooling rate) but also the source of the data (eg micrographs used13 to13 generate13 grain size temperaturetime charts13 used to calculate cooling rates etc)

Government-shy‐Led13 Priorities1 Lead13 data strategyapproach13 development-shy‐ Hold meeting like this one in 6-shy‐1213 months with purpose13 of developing roadmap for MII-shy‐ Define a technical roadmap for13 the materials data infrastructure

-shy‐ Governmentrsquos got to decide on a strategy-shy‐ Whatrsquos the critical nucleus eg getting13 CALPHAD off the13 ground

-shy‐ Government must establish an official stance on security and infrastructure-shy‐ List of most-shy‐important ICMEMGI materials development activities from a government perspective

(alloys composites piezoelectrics etc)13 -shy‐ Needs assessment for professional workforce development-shy‐ Work with industry to make sure the research-shy‐to-shy‐engineering13 gap is filled What is the practical13 use

of the projects -shy‐ Provide13 the13 materials community organizing function

-shy‐ Business case for productivity improvement with13 data readily available

-shy‐ Facilitator as honest broker of capability awareness and data13 sharing

-shy‐ Interdisciplinary coordinated effort13 to avoidminimize duplication and redundancy

2 Facilitate data deposition (policy amp technology)-shy‐ Setting expectations and guidelines for best practices and standards where13 relevant -shy‐ Invest in tool13 making data management as easy as possible-shy‐ Promote standards developmentcommon data structuresshare knowledge representations -shy‐ Learn about the WBC Semantic Web capabilities and limitations -shy‐ Government should foster community development for both data package refinement and creating

standard approach to creating an enterprise toolset -shy‐ Provide13 motivation for standard data13 collection ensure13 it includes pedigreeprovenancemeta-shy‐data

-shy‐ Facilitator for materials data13 standards (good and trustworthy data13 vs bad data) -shy‐ Facilitator for forming consortium for data13 mining data13 warehousing and sharing

-shy‐ Facilitate13 industry standards for ICME13 model validation and maturity levels-shy‐ Facilitate13 standards for model-shy‐based13 process and13 supplier qualification

3 Incentivize data deposition-shy‐ Provide13 motivation for13 standard data collection ensure it13 includes13 pedigreeprovenancemeta-shy‐data-shy‐ Push ldquodata13 management planrdquo requirements to ldquodata13 depositrdquo requirements -shy‐ Make data sharing a requirement -shy‐ Contractual requirements that preserve data for future use

-shy‐ Demand data submission as a deliverable that will be in the contract before paying the funds -shy‐ Require that any data generated13 by federal grants is provided13 to13 the government (or a designated13

repository)13 within 3 years after completionmdashstart with basic13 research

-shy‐ Demonstrate the return on investments in the materials data infrastructure

27

13

4 Support13 data sharing13 and deposition13 (provide resources)-shy‐ Increase of MGI13 funds dedicated to MII13 development to 10-shy‐-shy‐repositories amp collab platforms -shy‐ Commit to13 long-shy‐term support for data repositories amp collaborative platforms-shy‐ Funding for development collection of fundamental genomic data

-shy‐ Support thermodynamicskinetics database development for light-shy‐weight alloy systems-shy‐ Funding for universal data13 repository (eg at NIST) ndash creation and maintenance

-shy‐ Fund or establish data13 repositories with commonuniform data13 collection format -shy‐ Createidentify secure data sharing site (like AMRDEC)13 for13 government13 contractorsmdashnot a

repository

-shy‐ AFRL take lead13 o making a database that can13 be13 shared

-shy‐ Infrastructure for materials related semantic web applications5 Support13 data creation-shy‐ Support experimental measurement effort to provide13 quality data for13 database development -shy‐ Support more13 high-shy‐throughput13 (HT)13 studies13 for applied programs13 to populate data Current HT

approaches support are13 limited to too fundamental single-shy‐phase DFT calculations 6 Enhance data management13 competency-shy‐ Government needs to increase in-shy‐house expertise in13 these areas or alternatively work closer with13

groups like NIST We cannot allow other organizations to define our needs for us7 Provide quality datasets13 for model13 developmentvalidation-shy‐ Need for open model benchmark cases with data to be available to the community

28

13

Attendees

John Allison University of MichiganJohn Beatty Army Research Laboratory

Ben13 Blaiszik University of ChicagoANL Carrie Campbell NIST

Sam Chance13 InovexJennifer13 Fielding Air Force Research LabMarco Fornari Central Michigan

Tim Hawes Decisive Analytics Corp

Lou Hector General MotorsAndrea Helbach Air Force Research LabKris Hill ITI-shy‐GlobalMike Glavicic Rolls Royce

Ro Gorham NCDMM

Dennis Griffin NASA

Mike Groeber Air Force Research LabScott Henry ASM InternationalMatt Jacobsen Air Force Research LabWill Joost Dept of Energy-shy‐EERE

Surya13 Kalidindi Georgia Tech

Kevin Kendig Air Force Research LabBen13 Leever Air Force Research LabLara Liou GE Aviation

Kenny Lipkowitz Office of Naval Research

Bill Mullins Office of Naval Research

Benji Maruyama Air Force Research LabRajiv Naik Pratt amp WhitneyJim Neumann Honeywell Ruth13 Pachter Air Force Research LabClare Paul Air Force Research LabAmra Peles United13 Technologies Research13 CenterAndy Powell GE Aviation

Brian13 Puchala University MichiganAjit Roy Air Force Research LabAyman13 Salem Materials ResourcesTom Searles Materials Data Management Jason Sebastian QuestekRichard13 Serwecki NASA

Dongwon Shin Oak Ridge13 National Labo

29

13

Harlan Shober RJ LeeJeff13 Sheehy NASA

Jeff13 Simmons Air Force Research LabVeera Sundararaghavan University of MichiganSesh Tamirisakandala RTI Glenn Tarcea University of MichiganTJ Turner Air Force Research LabVasisht Venkatesh Pratt amp WhitneyChuck Ward Air Force Research LabJim Warren Natl Institute of Standards amp TechJeff13 Williams GE Aviation

Terry Wong Rockedyne

Wayne Ziegler Army Research13 Laboratory

Jake Zindel Ford Motor Company

30

Page 28: Materials Genome Initiative: materials data · universallyaccepted best practices. Under the MGI, Federal agencies have funded research and development designedtoexplore, develop,

13

The presentation concluded by reiterating how NLP13 can be used and the value of using it describing some its limitations and13 identifying tool development needed13 for the materials domain

24

13

Community13 and13 Government Priorities in13 Addressing13 Materials Data

The following numbered13 priorities for13 both community and government13 actions were lsquorolled uprsquofrom workshop participantsrsquo inputs and were reviewed at13 the workshop The raw inputs provided by

participants are captured13 under each13 numbered13 priority and provide context to13 the priority and13 often13 more specific actions

Community-shy‐Led13 Priorities1 Develop13 and deploy13 standards13 for data and federatedcollaborative environments-shy‐ Promote13 standards for data13 and collaboration environments-shy‐ Designing and deploying suitable software environments that are capable of facilitating e-shy‐

recording and accelerating (fail often-shy‐fail fast)13 of13 the communityrsquos experience (both successes and

failures)13 in exploring integrated workflows needed to realize the vision of ICME and MGI-shy‐ Data13 standards (program managementtechnical datametadata)-shy‐ Community must start developing a common13 data format data package could13 look like an13

HDF5 file and be shared in a variety of systems -shy‐ Begin13 definition13 of standardized13 infrastructure

-shy‐ Produce13 draft guideline for the metadata including information regarding authorship provenance targethellip

-shy‐ Share13 the13 data13 types being collected with the13 community Include13 nametypeunits of data

-shy‐ Community must adopt a federated13 approach13 -shy‐ allow organizations to autonomously create their13 own13 toolsets but be able to13 readwrite to13 the common13 data package mentioned13 in13 point 1

-shy‐ Good documentation on repositories are important-shy‐ Address networking of repositories (lightweight metadata databaseindex) -shy‐ Digital heterostructure microstructure mapping algorithms for both scanning electron microscopy

(SEM)13 and transmission electron microscopy (TEM) -shy‐ Materials process model (thermodynamics) integration with ICMSE (Integrated Computational

Materials Science and Engineering) data flow

-shy‐ Data sharing protocol across users (government13 OEM13 developers) with appropriate securitycontrol

-shy‐ Work with instrumentequipment OEMs to get auto ingest capabilities and open architectures-shy‐ Work together to define common vocabularies for interoperability-shy‐ Standard tests for ICME13 model data13 standards for microstructure13 data

-shy‐ Standards for ICME13 model data13 storage13 and workflows 2 Communicate the value and need13 for digital materials13 data-shy‐ Brand13 management ndash consistent messaging of goals13 and beliefs -shy‐ Share13 success stories -shy‐ Value (ROI13 Return on Investment) o data capture (globally locally)-shy‐ Give incentive ($$$) for case studies to provide evidence that a shared data repository can be

actually useful beyond the archival purpose

25

13

-shy‐ Share13 ROIs for MII (the13 Materials Innovation Infrastructure)-shy‐ ICME projects due to their iterative nature take longer and are more expensive to realize benefits

in industry implementation13 Multi-shy‐year ICME programs are becoming13 a hard sell to management difficult to13 establish ROI and13 realize benefits claimed

3 Engage the community-shy‐ Write grant proposals-shy‐ Collaboration13 (or e-shy‐collaboration) science that will help us13 identify13 the specific13 incentives13 and

impediments for establishing mutually beneficial highly productive sharing of expertise between13 disciplinary experts in13 materials manufacturing and13 data sciences

-shy‐ Continue to13 work o hard13 problemsmdashdevelop13 open13 source codes discover new descriptors data

mining uncertainty analysis-shy‐ Organize as a community (governmentindustryacademia nationalinternational) -shy‐ Follow data13 recommendations in MGI strategic planmdashworkshops to develop roadmap-shy‐ Attend13 more events like the MGI workshop13 mdash gaining13 lessons learned from other organizations is

extremely valuable

4 ldquoDefine critical datardquo13 to be13 compiled-shy‐ Consensus o what fundamental genomic data is most-shy‐important to compile (thermodynamics13

kinetics13 properties [strength]13 high-shy‐throughput13 DFT) -shy‐ Agree to13 the concept of a ldquouniversal repositoryrdquo for all genomic data that everyone will contribute

to and use (cf everyone maintaining and hosting their13 own databases) this includes issues of13 data

security and sharing etc -shy‐ List of most-shy‐important ICMEMGI13 materials development activities from an industrycommercial13

perspective (alloys composites piezoelectrics etc)-shy‐ Some13 ongoing repository efforts seem redundant We13 should avoid unnecessary competition and

try not13 to step o other peoples toes5 Explore and leverage data solutions developed13 outside materials13 community-shy‐ Semantic web infrastructure13 ndash evaluate13 feasibility of lsquounstructured materials datardquo management

(Technical pathcosttimeframe) -shy‐ Keep communicating outside13 the13 community about solutions -shy‐ Focus on available13 tools open ones when possible

6 Train13 students-shy‐ Sponsor students13 who can think in terms13 of IT and MSE

-shy‐ Training workshops (through professional societies) for training the new generation of materials students13 and researchers13 in the use of readily (already) accessible data science tools13 and cloud

services13 to make13 the13 best use13 of this emerging13 infrastructure13 in enhancing13 their own individual day-shy‐to-shy‐day productivity

7 Establish community13 norms13 for publishing materials13 data-shy‐ Involve publishers by requiring data when publishing Begin by making just the data available first

and then work on standardization panel

26

13

-shy‐ Data associated with publications should be shared along with the publications Not only the

metadata (eg average grain size vs cooling rate) but also the source of the data (eg micrographs used13 to13 generate13 grain size temperaturetime charts13 used to calculate cooling rates etc)

Government-shy‐Led13 Priorities1 Lead13 data strategyapproach13 development-shy‐ Hold meeting like this one in 6-shy‐1213 months with purpose13 of developing roadmap for MII-shy‐ Define a technical roadmap for13 the materials data infrastructure

-shy‐ Governmentrsquos got to decide on a strategy-shy‐ Whatrsquos the critical nucleus eg getting13 CALPHAD off the13 ground

-shy‐ Government must establish an official stance on security and infrastructure-shy‐ List of most-shy‐important ICMEMGI materials development activities from a government perspective

(alloys composites piezoelectrics etc)13 -shy‐ Needs assessment for professional workforce development-shy‐ Work with industry to make sure the research-shy‐to-shy‐engineering13 gap is filled What is the practical13 use

of the projects -shy‐ Provide13 the13 materials community organizing function

-shy‐ Business case for productivity improvement with13 data readily available

-shy‐ Facilitator as honest broker of capability awareness and data13 sharing

-shy‐ Interdisciplinary coordinated effort13 to avoidminimize duplication and redundancy

2 Facilitate data deposition (policy amp technology)-shy‐ Setting expectations and guidelines for best practices and standards where13 relevant -shy‐ Invest in tool13 making data management as easy as possible-shy‐ Promote standards developmentcommon data structuresshare knowledge representations -shy‐ Learn about the WBC Semantic Web capabilities and limitations -shy‐ Government should foster community development for both data package refinement and creating

standard approach to creating an enterprise toolset -shy‐ Provide13 motivation for standard data13 collection ensure13 it includes pedigreeprovenancemeta-shy‐data

-shy‐ Facilitator for materials data13 standards (good and trustworthy data13 vs bad data) -shy‐ Facilitator for forming consortium for data13 mining data13 warehousing and sharing

-shy‐ Facilitate13 industry standards for ICME13 model validation and maturity levels-shy‐ Facilitate13 standards for model-shy‐based13 process and13 supplier qualification

3 Incentivize data deposition-shy‐ Provide13 motivation for13 standard data collection ensure it13 includes13 pedigreeprovenancemeta-shy‐data-shy‐ Push ldquodata13 management planrdquo requirements to ldquodata13 depositrdquo requirements -shy‐ Make data sharing a requirement -shy‐ Contractual requirements that preserve data for future use

-shy‐ Demand data submission as a deliverable that will be in the contract before paying the funds -shy‐ Require that any data generated13 by federal grants is provided13 to13 the government (or a designated13

repository)13 within 3 years after completionmdashstart with basic13 research

-shy‐ Demonstrate the return on investments in the materials data infrastructure

27

13

4 Support13 data sharing13 and deposition13 (provide resources)-shy‐ Increase of MGI13 funds dedicated to MII13 development to 10-shy‐-shy‐repositories amp collab platforms -shy‐ Commit to13 long-shy‐term support for data repositories amp collaborative platforms-shy‐ Funding for development collection of fundamental genomic data

-shy‐ Support thermodynamicskinetics database development for light-shy‐weight alloy systems-shy‐ Funding for universal data13 repository (eg at NIST) ndash creation and maintenance

-shy‐ Fund or establish data13 repositories with commonuniform data13 collection format -shy‐ Createidentify secure data sharing site (like AMRDEC)13 for13 government13 contractorsmdashnot a

repository

-shy‐ AFRL take lead13 o making a database that can13 be13 shared

-shy‐ Infrastructure for materials related semantic web applications5 Support13 data creation-shy‐ Support experimental measurement effort to provide13 quality data for13 database development -shy‐ Support more13 high-shy‐throughput13 (HT)13 studies13 for applied programs13 to populate data Current HT

approaches support are13 limited to too fundamental single-shy‐phase DFT calculations 6 Enhance data management13 competency-shy‐ Government needs to increase in-shy‐house expertise in13 these areas or alternatively work closer with13

groups like NIST We cannot allow other organizations to define our needs for us7 Provide quality datasets13 for model13 developmentvalidation-shy‐ Need for open model benchmark cases with data to be available to the community

28

13

Attendees

John Allison University of MichiganJohn Beatty Army Research Laboratory

Ben13 Blaiszik University of ChicagoANL Carrie Campbell NIST

Sam Chance13 InovexJennifer13 Fielding Air Force Research LabMarco Fornari Central Michigan

Tim Hawes Decisive Analytics Corp

Lou Hector General MotorsAndrea Helbach Air Force Research LabKris Hill ITI-shy‐GlobalMike Glavicic Rolls Royce

Ro Gorham NCDMM

Dennis Griffin NASA

Mike Groeber Air Force Research LabScott Henry ASM InternationalMatt Jacobsen Air Force Research LabWill Joost Dept of Energy-shy‐EERE

Surya13 Kalidindi Georgia Tech

Kevin Kendig Air Force Research LabBen13 Leever Air Force Research LabLara Liou GE Aviation

Kenny Lipkowitz Office of Naval Research

Bill Mullins Office of Naval Research

Benji Maruyama Air Force Research LabRajiv Naik Pratt amp WhitneyJim Neumann Honeywell Ruth13 Pachter Air Force Research LabClare Paul Air Force Research LabAmra Peles United13 Technologies Research13 CenterAndy Powell GE Aviation

Brian13 Puchala University MichiganAjit Roy Air Force Research LabAyman13 Salem Materials ResourcesTom Searles Materials Data Management Jason Sebastian QuestekRichard13 Serwecki NASA

Dongwon Shin Oak Ridge13 National Labo

29

13

Harlan Shober RJ LeeJeff13 Sheehy NASA

Jeff13 Simmons Air Force Research LabVeera Sundararaghavan University of MichiganSesh Tamirisakandala RTI Glenn Tarcea University of MichiganTJ Turner Air Force Research LabVasisht Venkatesh Pratt amp WhitneyChuck Ward Air Force Research LabJim Warren Natl Institute of Standards amp TechJeff13 Williams GE Aviation

Terry Wong Rockedyne

Wayne Ziegler Army Research13 Laboratory

Jake Zindel Ford Motor Company

30

Page 29: Materials Genome Initiative: materials data · universallyaccepted best practices. Under the MGI, Federal agencies have funded research and development designedtoexplore, develop,

13

Community13 and13 Government Priorities in13 Addressing13 Materials Data

The following numbered13 priorities for13 both community and government13 actions were lsquorolled uprsquofrom workshop participantsrsquo inputs and were reviewed at13 the workshop The raw inputs provided by

participants are captured13 under each13 numbered13 priority and provide context to13 the priority and13 often13 more specific actions

Community-shy‐Led13 Priorities1 Develop13 and deploy13 standards13 for data and federatedcollaborative environments-shy‐ Promote13 standards for data13 and collaboration environments-shy‐ Designing and deploying suitable software environments that are capable of facilitating e-shy‐

recording and accelerating (fail often-shy‐fail fast)13 of13 the communityrsquos experience (both successes and

failures)13 in exploring integrated workflows needed to realize the vision of ICME and MGI-shy‐ Data13 standards (program managementtechnical datametadata)-shy‐ Community must start developing a common13 data format data package could13 look like an13

HDF5 file and be shared in a variety of systems -shy‐ Begin13 definition13 of standardized13 infrastructure

-shy‐ Produce13 draft guideline for the metadata including information regarding authorship provenance targethellip

-shy‐ Share13 the13 data13 types being collected with the13 community Include13 nametypeunits of data

-shy‐ Community must adopt a federated13 approach13 -shy‐ allow organizations to autonomously create their13 own13 toolsets but be able to13 readwrite to13 the common13 data package mentioned13 in13 point 1

-shy‐ Good documentation on repositories are important-shy‐ Address networking of repositories (lightweight metadata databaseindex) -shy‐ Digital heterostructure microstructure mapping algorithms for both scanning electron microscopy

(SEM)13 and transmission electron microscopy (TEM) -shy‐ Materials process model (thermodynamics) integration with ICMSE (Integrated Computational

Materials Science and Engineering) data flow

-shy‐ Data sharing protocol across users (government13 OEM13 developers) with appropriate securitycontrol

-shy‐ Work with instrumentequipment OEMs to get auto ingest capabilities and open architectures-shy‐ Work together to define common vocabularies for interoperability-shy‐ Standard tests for ICME13 model data13 standards for microstructure13 data

-shy‐ Standards for ICME13 model data13 storage13 and workflows 2 Communicate the value and need13 for digital materials13 data-shy‐ Brand13 management ndash consistent messaging of goals13 and beliefs -shy‐ Share13 success stories -shy‐ Value (ROI13 Return on Investment) o data capture (globally locally)-shy‐ Give incentive ($$$) for case studies to provide evidence that a shared data repository can be

actually useful beyond the archival purpose

25

13

-shy‐ Share13 ROIs for MII (the13 Materials Innovation Infrastructure)-shy‐ ICME projects due to their iterative nature take longer and are more expensive to realize benefits

in industry implementation13 Multi-shy‐year ICME programs are becoming13 a hard sell to management difficult to13 establish ROI and13 realize benefits claimed

3 Engage the community-shy‐ Write grant proposals-shy‐ Collaboration13 (or e-shy‐collaboration) science that will help us13 identify13 the specific13 incentives13 and

impediments for establishing mutually beneficial highly productive sharing of expertise between13 disciplinary experts in13 materials manufacturing and13 data sciences

-shy‐ Continue to13 work o hard13 problemsmdashdevelop13 open13 source codes discover new descriptors data

mining uncertainty analysis-shy‐ Organize as a community (governmentindustryacademia nationalinternational) -shy‐ Follow data13 recommendations in MGI strategic planmdashworkshops to develop roadmap-shy‐ Attend13 more events like the MGI workshop13 mdash gaining13 lessons learned from other organizations is

extremely valuable

4 ldquoDefine critical datardquo13 to be13 compiled-shy‐ Consensus o what fundamental genomic data is most-shy‐important to compile (thermodynamics13

kinetics13 properties [strength]13 high-shy‐throughput13 DFT) -shy‐ Agree to13 the concept of a ldquouniversal repositoryrdquo for all genomic data that everyone will contribute

to and use (cf everyone maintaining and hosting their13 own databases) this includes issues of13 data

security and sharing etc -shy‐ List of most-shy‐important ICMEMGI13 materials development activities from an industrycommercial13

perspective (alloys composites piezoelectrics etc)-shy‐ Some13 ongoing repository efforts seem redundant We13 should avoid unnecessary competition and

try not13 to step o other peoples toes5 Explore and leverage data solutions developed13 outside materials13 community-shy‐ Semantic web infrastructure13 ndash evaluate13 feasibility of lsquounstructured materials datardquo management

(Technical pathcosttimeframe) -shy‐ Keep communicating outside13 the13 community about solutions -shy‐ Focus on available13 tools open ones when possible

6 Train13 students-shy‐ Sponsor students13 who can think in terms13 of IT and MSE

-shy‐ Training workshops (through professional societies) for training the new generation of materials students13 and researchers13 in the use of readily (already) accessible data science tools13 and cloud

services13 to make13 the13 best use13 of this emerging13 infrastructure13 in enhancing13 their own individual day-shy‐to-shy‐day productivity

7 Establish community13 norms13 for publishing materials13 data-shy‐ Involve publishers by requiring data when publishing Begin by making just the data available first

and then work on standardization panel

26

13

-shy‐ Data associated with publications should be shared along with the publications Not only the

metadata (eg average grain size vs cooling rate) but also the source of the data (eg micrographs used13 to13 generate13 grain size temperaturetime charts13 used to calculate cooling rates etc)

Government-shy‐Led13 Priorities1 Lead13 data strategyapproach13 development-shy‐ Hold meeting like this one in 6-shy‐1213 months with purpose13 of developing roadmap for MII-shy‐ Define a technical roadmap for13 the materials data infrastructure

-shy‐ Governmentrsquos got to decide on a strategy-shy‐ Whatrsquos the critical nucleus eg getting13 CALPHAD off the13 ground

-shy‐ Government must establish an official stance on security and infrastructure-shy‐ List of most-shy‐important ICMEMGI materials development activities from a government perspective

(alloys composites piezoelectrics etc)13 -shy‐ Needs assessment for professional workforce development-shy‐ Work with industry to make sure the research-shy‐to-shy‐engineering13 gap is filled What is the practical13 use

of the projects -shy‐ Provide13 the13 materials community organizing function

-shy‐ Business case for productivity improvement with13 data readily available

-shy‐ Facilitator as honest broker of capability awareness and data13 sharing

-shy‐ Interdisciplinary coordinated effort13 to avoidminimize duplication and redundancy

2 Facilitate data deposition (policy amp technology)-shy‐ Setting expectations and guidelines for best practices and standards where13 relevant -shy‐ Invest in tool13 making data management as easy as possible-shy‐ Promote standards developmentcommon data structuresshare knowledge representations -shy‐ Learn about the WBC Semantic Web capabilities and limitations -shy‐ Government should foster community development for both data package refinement and creating

standard approach to creating an enterprise toolset -shy‐ Provide13 motivation for standard data13 collection ensure13 it includes pedigreeprovenancemeta-shy‐data

-shy‐ Facilitator for materials data13 standards (good and trustworthy data13 vs bad data) -shy‐ Facilitator for forming consortium for data13 mining data13 warehousing and sharing

-shy‐ Facilitate13 industry standards for ICME13 model validation and maturity levels-shy‐ Facilitate13 standards for model-shy‐based13 process and13 supplier qualification

3 Incentivize data deposition-shy‐ Provide13 motivation for13 standard data collection ensure it13 includes13 pedigreeprovenancemeta-shy‐data-shy‐ Push ldquodata13 management planrdquo requirements to ldquodata13 depositrdquo requirements -shy‐ Make data sharing a requirement -shy‐ Contractual requirements that preserve data for future use

-shy‐ Demand data submission as a deliverable that will be in the contract before paying the funds -shy‐ Require that any data generated13 by federal grants is provided13 to13 the government (or a designated13

repository)13 within 3 years after completionmdashstart with basic13 research

-shy‐ Demonstrate the return on investments in the materials data infrastructure

27

13

4 Support13 data sharing13 and deposition13 (provide resources)-shy‐ Increase of MGI13 funds dedicated to MII13 development to 10-shy‐-shy‐repositories amp collab platforms -shy‐ Commit to13 long-shy‐term support for data repositories amp collaborative platforms-shy‐ Funding for development collection of fundamental genomic data

-shy‐ Support thermodynamicskinetics database development for light-shy‐weight alloy systems-shy‐ Funding for universal data13 repository (eg at NIST) ndash creation and maintenance

-shy‐ Fund or establish data13 repositories with commonuniform data13 collection format -shy‐ Createidentify secure data sharing site (like AMRDEC)13 for13 government13 contractorsmdashnot a

repository

-shy‐ AFRL take lead13 o making a database that can13 be13 shared

-shy‐ Infrastructure for materials related semantic web applications5 Support13 data creation-shy‐ Support experimental measurement effort to provide13 quality data for13 database development -shy‐ Support more13 high-shy‐throughput13 (HT)13 studies13 for applied programs13 to populate data Current HT

approaches support are13 limited to too fundamental single-shy‐phase DFT calculations 6 Enhance data management13 competency-shy‐ Government needs to increase in-shy‐house expertise in13 these areas or alternatively work closer with13

groups like NIST We cannot allow other organizations to define our needs for us7 Provide quality datasets13 for model13 developmentvalidation-shy‐ Need for open model benchmark cases with data to be available to the community

28

13

Attendees

John Allison University of MichiganJohn Beatty Army Research Laboratory

Ben13 Blaiszik University of ChicagoANL Carrie Campbell NIST

Sam Chance13 InovexJennifer13 Fielding Air Force Research LabMarco Fornari Central Michigan

Tim Hawes Decisive Analytics Corp

Lou Hector General MotorsAndrea Helbach Air Force Research LabKris Hill ITI-shy‐GlobalMike Glavicic Rolls Royce

Ro Gorham NCDMM

Dennis Griffin NASA

Mike Groeber Air Force Research LabScott Henry ASM InternationalMatt Jacobsen Air Force Research LabWill Joost Dept of Energy-shy‐EERE

Surya13 Kalidindi Georgia Tech

Kevin Kendig Air Force Research LabBen13 Leever Air Force Research LabLara Liou GE Aviation

Kenny Lipkowitz Office of Naval Research

Bill Mullins Office of Naval Research

Benji Maruyama Air Force Research LabRajiv Naik Pratt amp WhitneyJim Neumann Honeywell Ruth13 Pachter Air Force Research LabClare Paul Air Force Research LabAmra Peles United13 Technologies Research13 CenterAndy Powell GE Aviation

Brian13 Puchala University MichiganAjit Roy Air Force Research LabAyman13 Salem Materials ResourcesTom Searles Materials Data Management Jason Sebastian QuestekRichard13 Serwecki NASA

Dongwon Shin Oak Ridge13 National Labo

29

13

Harlan Shober RJ LeeJeff13 Sheehy NASA

Jeff13 Simmons Air Force Research LabVeera Sundararaghavan University of MichiganSesh Tamirisakandala RTI Glenn Tarcea University of MichiganTJ Turner Air Force Research LabVasisht Venkatesh Pratt amp WhitneyChuck Ward Air Force Research LabJim Warren Natl Institute of Standards amp TechJeff13 Williams GE Aviation

Terry Wong Rockedyne

Wayne Ziegler Army Research13 Laboratory

Jake Zindel Ford Motor Company

30

Page 30: Materials Genome Initiative: materials data · universallyaccepted best practices. Under the MGI, Federal agencies have funded research and development designedtoexplore, develop,

13

-shy‐ Share13 ROIs for MII (the13 Materials Innovation Infrastructure)-shy‐ ICME projects due to their iterative nature take longer and are more expensive to realize benefits

in industry implementation13 Multi-shy‐year ICME programs are becoming13 a hard sell to management difficult to13 establish ROI and13 realize benefits claimed

3 Engage the community-shy‐ Write grant proposals-shy‐ Collaboration13 (or e-shy‐collaboration) science that will help us13 identify13 the specific13 incentives13 and

impediments for establishing mutually beneficial highly productive sharing of expertise between13 disciplinary experts in13 materials manufacturing and13 data sciences

-shy‐ Continue to13 work o hard13 problemsmdashdevelop13 open13 source codes discover new descriptors data

mining uncertainty analysis-shy‐ Organize as a community (governmentindustryacademia nationalinternational) -shy‐ Follow data13 recommendations in MGI strategic planmdashworkshops to develop roadmap-shy‐ Attend13 more events like the MGI workshop13 mdash gaining13 lessons learned from other organizations is

extremely valuable

4 ldquoDefine critical datardquo13 to be13 compiled-shy‐ Consensus o what fundamental genomic data is most-shy‐important to compile (thermodynamics13

kinetics13 properties [strength]13 high-shy‐throughput13 DFT) -shy‐ Agree to13 the concept of a ldquouniversal repositoryrdquo for all genomic data that everyone will contribute

to and use (cf everyone maintaining and hosting their13 own databases) this includes issues of13 data

security and sharing etc -shy‐ List of most-shy‐important ICMEMGI13 materials development activities from an industrycommercial13

perspective (alloys composites piezoelectrics etc)-shy‐ Some13 ongoing repository efforts seem redundant We13 should avoid unnecessary competition and

try not13 to step o other peoples toes5 Explore and leverage data solutions developed13 outside materials13 community-shy‐ Semantic web infrastructure13 ndash evaluate13 feasibility of lsquounstructured materials datardquo management

(Technical pathcosttimeframe) -shy‐ Keep communicating outside13 the13 community about solutions -shy‐ Focus on available13 tools open ones when possible

6 Train13 students-shy‐ Sponsor students13 who can think in terms13 of IT and MSE

-shy‐ Training workshops (through professional societies) for training the new generation of materials students13 and researchers13 in the use of readily (already) accessible data science tools13 and cloud

services13 to make13 the13 best use13 of this emerging13 infrastructure13 in enhancing13 their own individual day-shy‐to-shy‐day productivity

7 Establish community13 norms13 for publishing materials13 data-shy‐ Involve publishers by requiring data when publishing Begin by making just the data available first

and then work on standardization panel

26

13

-shy‐ Data associated with publications should be shared along with the publications Not only the

metadata (eg average grain size vs cooling rate) but also the source of the data (eg micrographs used13 to13 generate13 grain size temperaturetime charts13 used to calculate cooling rates etc)

Government-shy‐Led13 Priorities1 Lead13 data strategyapproach13 development-shy‐ Hold meeting like this one in 6-shy‐1213 months with purpose13 of developing roadmap for MII-shy‐ Define a technical roadmap for13 the materials data infrastructure

-shy‐ Governmentrsquos got to decide on a strategy-shy‐ Whatrsquos the critical nucleus eg getting13 CALPHAD off the13 ground

-shy‐ Government must establish an official stance on security and infrastructure-shy‐ List of most-shy‐important ICMEMGI materials development activities from a government perspective

(alloys composites piezoelectrics etc)13 -shy‐ Needs assessment for professional workforce development-shy‐ Work with industry to make sure the research-shy‐to-shy‐engineering13 gap is filled What is the practical13 use

of the projects -shy‐ Provide13 the13 materials community organizing function

-shy‐ Business case for productivity improvement with13 data readily available

-shy‐ Facilitator as honest broker of capability awareness and data13 sharing

-shy‐ Interdisciplinary coordinated effort13 to avoidminimize duplication and redundancy

2 Facilitate data deposition (policy amp technology)-shy‐ Setting expectations and guidelines for best practices and standards where13 relevant -shy‐ Invest in tool13 making data management as easy as possible-shy‐ Promote standards developmentcommon data structuresshare knowledge representations -shy‐ Learn about the WBC Semantic Web capabilities and limitations -shy‐ Government should foster community development for both data package refinement and creating

standard approach to creating an enterprise toolset -shy‐ Provide13 motivation for standard data13 collection ensure13 it includes pedigreeprovenancemeta-shy‐data

-shy‐ Facilitator for materials data13 standards (good and trustworthy data13 vs bad data) -shy‐ Facilitator for forming consortium for data13 mining data13 warehousing and sharing

-shy‐ Facilitate13 industry standards for ICME13 model validation and maturity levels-shy‐ Facilitate13 standards for model-shy‐based13 process and13 supplier qualification

3 Incentivize data deposition-shy‐ Provide13 motivation for13 standard data collection ensure it13 includes13 pedigreeprovenancemeta-shy‐data-shy‐ Push ldquodata13 management planrdquo requirements to ldquodata13 depositrdquo requirements -shy‐ Make data sharing a requirement -shy‐ Contractual requirements that preserve data for future use

-shy‐ Demand data submission as a deliverable that will be in the contract before paying the funds -shy‐ Require that any data generated13 by federal grants is provided13 to13 the government (or a designated13

repository)13 within 3 years after completionmdashstart with basic13 research

-shy‐ Demonstrate the return on investments in the materials data infrastructure

27

13

4 Support13 data sharing13 and deposition13 (provide resources)-shy‐ Increase of MGI13 funds dedicated to MII13 development to 10-shy‐-shy‐repositories amp collab platforms -shy‐ Commit to13 long-shy‐term support for data repositories amp collaborative platforms-shy‐ Funding for development collection of fundamental genomic data

-shy‐ Support thermodynamicskinetics database development for light-shy‐weight alloy systems-shy‐ Funding for universal data13 repository (eg at NIST) ndash creation and maintenance

-shy‐ Fund or establish data13 repositories with commonuniform data13 collection format -shy‐ Createidentify secure data sharing site (like AMRDEC)13 for13 government13 contractorsmdashnot a

repository

-shy‐ AFRL take lead13 o making a database that can13 be13 shared

-shy‐ Infrastructure for materials related semantic web applications5 Support13 data creation-shy‐ Support experimental measurement effort to provide13 quality data for13 database development -shy‐ Support more13 high-shy‐throughput13 (HT)13 studies13 for applied programs13 to populate data Current HT

approaches support are13 limited to too fundamental single-shy‐phase DFT calculations 6 Enhance data management13 competency-shy‐ Government needs to increase in-shy‐house expertise in13 these areas or alternatively work closer with13

groups like NIST We cannot allow other organizations to define our needs for us7 Provide quality datasets13 for model13 developmentvalidation-shy‐ Need for open model benchmark cases with data to be available to the community

28

13

Attendees

John Allison University of MichiganJohn Beatty Army Research Laboratory

Ben13 Blaiszik University of ChicagoANL Carrie Campbell NIST

Sam Chance13 InovexJennifer13 Fielding Air Force Research LabMarco Fornari Central Michigan

Tim Hawes Decisive Analytics Corp

Lou Hector General MotorsAndrea Helbach Air Force Research LabKris Hill ITI-shy‐GlobalMike Glavicic Rolls Royce

Ro Gorham NCDMM

Dennis Griffin NASA

Mike Groeber Air Force Research LabScott Henry ASM InternationalMatt Jacobsen Air Force Research LabWill Joost Dept of Energy-shy‐EERE

Surya13 Kalidindi Georgia Tech

Kevin Kendig Air Force Research LabBen13 Leever Air Force Research LabLara Liou GE Aviation

Kenny Lipkowitz Office of Naval Research

Bill Mullins Office of Naval Research

Benji Maruyama Air Force Research LabRajiv Naik Pratt amp WhitneyJim Neumann Honeywell Ruth13 Pachter Air Force Research LabClare Paul Air Force Research LabAmra Peles United13 Technologies Research13 CenterAndy Powell GE Aviation

Brian13 Puchala University MichiganAjit Roy Air Force Research LabAyman13 Salem Materials ResourcesTom Searles Materials Data Management Jason Sebastian QuestekRichard13 Serwecki NASA

Dongwon Shin Oak Ridge13 National Labo

29

13

Harlan Shober RJ LeeJeff13 Sheehy NASA

Jeff13 Simmons Air Force Research LabVeera Sundararaghavan University of MichiganSesh Tamirisakandala RTI Glenn Tarcea University of MichiganTJ Turner Air Force Research LabVasisht Venkatesh Pratt amp WhitneyChuck Ward Air Force Research LabJim Warren Natl Institute of Standards amp TechJeff13 Williams GE Aviation

Terry Wong Rockedyne

Wayne Ziegler Army Research13 Laboratory

Jake Zindel Ford Motor Company

30

Page 31: Materials Genome Initiative: materials data · universallyaccepted best practices. Under the MGI, Federal agencies have funded research and development designedtoexplore, develop,

13

-shy‐ Data associated with publications should be shared along with the publications Not only the

metadata (eg average grain size vs cooling rate) but also the source of the data (eg micrographs used13 to13 generate13 grain size temperaturetime charts13 used to calculate cooling rates etc)

Government-shy‐Led13 Priorities1 Lead13 data strategyapproach13 development-shy‐ Hold meeting like this one in 6-shy‐1213 months with purpose13 of developing roadmap for MII-shy‐ Define a technical roadmap for13 the materials data infrastructure

-shy‐ Governmentrsquos got to decide on a strategy-shy‐ Whatrsquos the critical nucleus eg getting13 CALPHAD off the13 ground

-shy‐ Government must establish an official stance on security and infrastructure-shy‐ List of most-shy‐important ICMEMGI materials development activities from a government perspective

(alloys composites piezoelectrics etc)13 -shy‐ Needs assessment for professional workforce development-shy‐ Work with industry to make sure the research-shy‐to-shy‐engineering13 gap is filled What is the practical13 use

of the projects -shy‐ Provide13 the13 materials community organizing function

-shy‐ Business case for productivity improvement with13 data readily available

-shy‐ Facilitator as honest broker of capability awareness and data13 sharing

-shy‐ Interdisciplinary coordinated effort13 to avoidminimize duplication and redundancy

2 Facilitate data deposition (policy amp technology)-shy‐ Setting expectations and guidelines for best practices and standards where13 relevant -shy‐ Invest in tool13 making data management as easy as possible-shy‐ Promote standards developmentcommon data structuresshare knowledge representations -shy‐ Learn about the WBC Semantic Web capabilities and limitations -shy‐ Government should foster community development for both data package refinement and creating

standard approach to creating an enterprise toolset -shy‐ Provide13 motivation for standard data13 collection ensure13 it includes pedigreeprovenancemeta-shy‐data

-shy‐ Facilitator for materials data13 standards (good and trustworthy data13 vs bad data) -shy‐ Facilitator for forming consortium for data13 mining data13 warehousing and sharing

-shy‐ Facilitate13 industry standards for ICME13 model validation and maturity levels-shy‐ Facilitate13 standards for model-shy‐based13 process and13 supplier qualification

3 Incentivize data deposition-shy‐ Provide13 motivation for13 standard data collection ensure it13 includes13 pedigreeprovenancemeta-shy‐data-shy‐ Push ldquodata13 management planrdquo requirements to ldquodata13 depositrdquo requirements -shy‐ Make data sharing a requirement -shy‐ Contractual requirements that preserve data for future use

-shy‐ Demand data submission as a deliverable that will be in the contract before paying the funds -shy‐ Require that any data generated13 by federal grants is provided13 to13 the government (or a designated13

repository)13 within 3 years after completionmdashstart with basic13 research

-shy‐ Demonstrate the return on investments in the materials data infrastructure

27

13

4 Support13 data sharing13 and deposition13 (provide resources)-shy‐ Increase of MGI13 funds dedicated to MII13 development to 10-shy‐-shy‐repositories amp collab platforms -shy‐ Commit to13 long-shy‐term support for data repositories amp collaborative platforms-shy‐ Funding for development collection of fundamental genomic data

-shy‐ Support thermodynamicskinetics database development for light-shy‐weight alloy systems-shy‐ Funding for universal data13 repository (eg at NIST) ndash creation and maintenance

-shy‐ Fund or establish data13 repositories with commonuniform data13 collection format -shy‐ Createidentify secure data sharing site (like AMRDEC)13 for13 government13 contractorsmdashnot a

repository

-shy‐ AFRL take lead13 o making a database that can13 be13 shared

-shy‐ Infrastructure for materials related semantic web applications5 Support13 data creation-shy‐ Support experimental measurement effort to provide13 quality data for13 database development -shy‐ Support more13 high-shy‐throughput13 (HT)13 studies13 for applied programs13 to populate data Current HT

approaches support are13 limited to too fundamental single-shy‐phase DFT calculations 6 Enhance data management13 competency-shy‐ Government needs to increase in-shy‐house expertise in13 these areas or alternatively work closer with13

groups like NIST We cannot allow other organizations to define our needs for us7 Provide quality datasets13 for model13 developmentvalidation-shy‐ Need for open model benchmark cases with data to be available to the community

28

13

Attendees

John Allison University of MichiganJohn Beatty Army Research Laboratory

Ben13 Blaiszik University of ChicagoANL Carrie Campbell NIST

Sam Chance13 InovexJennifer13 Fielding Air Force Research LabMarco Fornari Central Michigan

Tim Hawes Decisive Analytics Corp

Lou Hector General MotorsAndrea Helbach Air Force Research LabKris Hill ITI-shy‐GlobalMike Glavicic Rolls Royce

Ro Gorham NCDMM

Dennis Griffin NASA

Mike Groeber Air Force Research LabScott Henry ASM InternationalMatt Jacobsen Air Force Research LabWill Joost Dept of Energy-shy‐EERE

Surya13 Kalidindi Georgia Tech

Kevin Kendig Air Force Research LabBen13 Leever Air Force Research LabLara Liou GE Aviation

Kenny Lipkowitz Office of Naval Research

Bill Mullins Office of Naval Research

Benji Maruyama Air Force Research LabRajiv Naik Pratt amp WhitneyJim Neumann Honeywell Ruth13 Pachter Air Force Research LabClare Paul Air Force Research LabAmra Peles United13 Technologies Research13 CenterAndy Powell GE Aviation

Brian13 Puchala University MichiganAjit Roy Air Force Research LabAyman13 Salem Materials ResourcesTom Searles Materials Data Management Jason Sebastian QuestekRichard13 Serwecki NASA

Dongwon Shin Oak Ridge13 National Labo

29

13

Harlan Shober RJ LeeJeff13 Sheehy NASA

Jeff13 Simmons Air Force Research LabVeera Sundararaghavan University of MichiganSesh Tamirisakandala RTI Glenn Tarcea University of MichiganTJ Turner Air Force Research LabVasisht Venkatesh Pratt amp WhitneyChuck Ward Air Force Research LabJim Warren Natl Institute of Standards amp TechJeff13 Williams GE Aviation

Terry Wong Rockedyne

Wayne Ziegler Army Research13 Laboratory

Jake Zindel Ford Motor Company

30

Page 32: Materials Genome Initiative: materials data · universallyaccepted best practices. Under the MGI, Federal agencies have funded research and development designedtoexplore, develop,

13

4 Support13 data sharing13 and deposition13 (provide resources)-shy‐ Increase of MGI13 funds dedicated to MII13 development to 10-shy‐-shy‐repositories amp collab platforms -shy‐ Commit to13 long-shy‐term support for data repositories amp collaborative platforms-shy‐ Funding for development collection of fundamental genomic data

-shy‐ Support thermodynamicskinetics database development for light-shy‐weight alloy systems-shy‐ Funding for universal data13 repository (eg at NIST) ndash creation and maintenance

-shy‐ Fund or establish data13 repositories with commonuniform data13 collection format -shy‐ Createidentify secure data sharing site (like AMRDEC)13 for13 government13 contractorsmdashnot a

repository

-shy‐ AFRL take lead13 o making a database that can13 be13 shared

-shy‐ Infrastructure for materials related semantic web applications5 Support13 data creation-shy‐ Support experimental measurement effort to provide13 quality data for13 database development -shy‐ Support more13 high-shy‐throughput13 (HT)13 studies13 for applied programs13 to populate data Current HT

approaches support are13 limited to too fundamental single-shy‐phase DFT calculations 6 Enhance data management13 competency-shy‐ Government needs to increase in-shy‐house expertise in13 these areas or alternatively work closer with13

groups like NIST We cannot allow other organizations to define our needs for us7 Provide quality datasets13 for model13 developmentvalidation-shy‐ Need for open model benchmark cases with data to be available to the community

28

13

Attendees

John Allison University of MichiganJohn Beatty Army Research Laboratory

Ben13 Blaiszik University of ChicagoANL Carrie Campbell NIST

Sam Chance13 InovexJennifer13 Fielding Air Force Research LabMarco Fornari Central Michigan

Tim Hawes Decisive Analytics Corp

Lou Hector General MotorsAndrea Helbach Air Force Research LabKris Hill ITI-shy‐GlobalMike Glavicic Rolls Royce

Ro Gorham NCDMM

Dennis Griffin NASA

Mike Groeber Air Force Research LabScott Henry ASM InternationalMatt Jacobsen Air Force Research LabWill Joost Dept of Energy-shy‐EERE

Surya13 Kalidindi Georgia Tech

Kevin Kendig Air Force Research LabBen13 Leever Air Force Research LabLara Liou GE Aviation

Kenny Lipkowitz Office of Naval Research

Bill Mullins Office of Naval Research

Benji Maruyama Air Force Research LabRajiv Naik Pratt amp WhitneyJim Neumann Honeywell Ruth13 Pachter Air Force Research LabClare Paul Air Force Research LabAmra Peles United13 Technologies Research13 CenterAndy Powell GE Aviation

Brian13 Puchala University MichiganAjit Roy Air Force Research LabAyman13 Salem Materials ResourcesTom Searles Materials Data Management Jason Sebastian QuestekRichard13 Serwecki NASA

Dongwon Shin Oak Ridge13 National Labo

29

13

Harlan Shober RJ LeeJeff13 Sheehy NASA

Jeff13 Simmons Air Force Research LabVeera Sundararaghavan University of MichiganSesh Tamirisakandala RTI Glenn Tarcea University of MichiganTJ Turner Air Force Research LabVasisht Venkatesh Pratt amp WhitneyChuck Ward Air Force Research LabJim Warren Natl Institute of Standards amp TechJeff13 Williams GE Aviation

Terry Wong Rockedyne

Wayne Ziegler Army Research13 Laboratory

Jake Zindel Ford Motor Company

30

Page 33: Materials Genome Initiative: materials data · universallyaccepted best practices. Under the MGI, Federal agencies have funded research and development designedtoexplore, develop,

13

Attendees

John Allison University of MichiganJohn Beatty Army Research Laboratory

Ben13 Blaiszik University of ChicagoANL Carrie Campbell NIST

Sam Chance13 InovexJennifer13 Fielding Air Force Research LabMarco Fornari Central Michigan

Tim Hawes Decisive Analytics Corp

Lou Hector General MotorsAndrea Helbach Air Force Research LabKris Hill ITI-shy‐GlobalMike Glavicic Rolls Royce

Ro Gorham NCDMM

Dennis Griffin NASA

Mike Groeber Air Force Research LabScott Henry ASM InternationalMatt Jacobsen Air Force Research LabWill Joost Dept of Energy-shy‐EERE

Surya13 Kalidindi Georgia Tech

Kevin Kendig Air Force Research LabBen13 Leever Air Force Research LabLara Liou GE Aviation

Kenny Lipkowitz Office of Naval Research

Bill Mullins Office of Naval Research

Benji Maruyama Air Force Research LabRajiv Naik Pratt amp WhitneyJim Neumann Honeywell Ruth13 Pachter Air Force Research LabClare Paul Air Force Research LabAmra Peles United13 Technologies Research13 CenterAndy Powell GE Aviation

Brian13 Puchala University MichiganAjit Roy Air Force Research LabAyman13 Salem Materials ResourcesTom Searles Materials Data Management Jason Sebastian QuestekRichard13 Serwecki NASA

Dongwon Shin Oak Ridge13 National Labo

29

13

Harlan Shober RJ LeeJeff13 Sheehy NASA

Jeff13 Simmons Air Force Research LabVeera Sundararaghavan University of MichiganSesh Tamirisakandala RTI Glenn Tarcea University of MichiganTJ Turner Air Force Research LabVasisht Venkatesh Pratt amp WhitneyChuck Ward Air Force Research LabJim Warren Natl Institute of Standards amp TechJeff13 Williams GE Aviation

Terry Wong Rockedyne

Wayne Ziegler Army Research13 Laboratory

Jake Zindel Ford Motor Company

30

Page 34: Materials Genome Initiative: materials data · universallyaccepted best practices. Under the MGI, Federal agencies have funded research and development designedtoexplore, develop,

13

Harlan Shober RJ LeeJeff13 Sheehy NASA

Jeff13 Simmons Air Force Research LabVeera Sundararaghavan University of MichiganSesh Tamirisakandala RTI Glenn Tarcea University of MichiganTJ Turner Air Force Research LabVasisht Venkatesh Pratt amp WhitneyChuck Ward Air Force Research LabJim Warren Natl Institute of Standards amp TechJeff13 Williams GE Aviation

Terry Wong Rockedyne

Wayne Ziegler Army Research13 Laboratory

Jake Zindel Ford Motor Company

30