NIEM as Big Data in a Network with Data Science Dr. Brand Niemann Director and Senior Enterprise...

36
NIEM as Big Data in a Network with Data Science Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community http://semanticommunity.info/ AOL Government Blogger http://gov.aol.com/bloggers/brand-niemann/ February 25, 2013 http:// semanticommunity.info/NIEM 1

Transcript of NIEM as Big Data in a Network with Data Science Dr. Brand Niemann Director and Senior Enterprise...

Page 1: NIEM as Big Data in a Network with Data Science Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community

1

NIEM as Big Data in a Network with Data Science

Dr. Brand NiemannDirector and Senior Enterprise Architect – Data Scientist

Semantic Communityhttp://semanticommunity.info/

AOL Government Bloggerhttp://gov.aol.com/bloggers/brand-niemann/

February 25, 2013http://semanticommunity.info/NIEM

Page 2: NIEM as Big Data in a Network with Data Science Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community

2

Preface

• Donna Roy, executive director of the information sharing environment office which does NIEM within the Homeland Security Department, made the astute observation recently:– "The biggest gap at the federal level is in the recruiting and

in the business case around staffing up the human support cadre and that DHS lacks data scientists at the headquarters level.“

• This data scientist and semantic web expert would tell her NIEM is not a network data model while the semantic web is!

Page 3: NIEM as Big Data in a Network with Data Science Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community

3

Definitions

• First some definitions would help:– NIEM (noun); pronounced "neam" similar to

"team"; National Information Exchange Model; a community-driven, government-wide, standards-based approach to exchanging information.

– NIEM-ified (adjective); indicating an entity such as a project or organization that has implemented a NIEM-based exchange to advance their mission.

– Make it MyNIEM: Ever wish you could have all the NIEM information that's important to you, right at your fingertips? Coming Soon: MyNIEM will give you the ability to customize NIEM.gov to meet your needs.

Page 4: NIEM as Big Data in a Network with Data Science Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community

4

Audit of NIEM Web Site

• I am doing an audit of the NIEM Web Site content as part of my work for the Japanese government.

• I have noticed the following so far:– Some web pages do not have the scroll down bar– The My NIEM page returns an error message (this has been

fixed)– Some links do not deliver as advertised e.g.

https://www.niem.gov/spotlight/Pages/niem-adoption-is-growing-states-new-report.aspx

– The Site Map appears incomplete– I am wondering what NIEMified Web site means– Why not make it Digital Government Strategy compliant?

Page 5: NIEM as Big Data in a Network with Data Science Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community

5

NIEM Web Site

https://www.niem.gov/Pages/default.aspx

Page 6: NIEM as Big Data in a Network with Data Science Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community

6

Dialogue with Donna Roy

• I asked Donna Roy: Any light you can shed on these findings would be appreciated and she replied:– "Thanks for the audit. We will fix the broken links and attempt to

clarify content. I find it hard to believe a website audit is a meaningful part of a substantial report on digital govt strategy for Japan. If you need more info, perhaps sitting with a practitioner or two might get you better case study info. Appreciate the independent assessment though."

• And I replied:– "Thank you, I just noticed MY NIEM is working now. I had Cory

Casanave and David Webber, both excellent NIEM practitioners, present and demo to the Japan Government people last week. Perhaps we should meet to discuss my work so you would have a better understanding of data science and what a data scientists does."

Page 7: NIEM as Big Data in a Network with Data Science Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community

7

My Process

• So here is what I did:– Started with the NIEM Site Map– Copied it to MindTouch to make it

Digital Government Strategy compliant– Made it Big Data and Machine Readable– Made it NIEMified and MyNIEM

Page 8: NIEM as Big Data in a Network with Data Science Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community

8

NIEM Web Site Sitemap

https://www.niem.gov/Pages/sitemap.aspx

Page 9: NIEM as Big Data in a Network with Data Science Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community

9

More Definitions• Again some definitions would help:

– MindTouch: Leading open source social knowledge base platform that supports HTML 5, well-defined URLs, and APIs (Application Programming Interfaces - Web Services),

– Digital Government Strategy: "Everything should be an API": A system of machine-to-machine interaction over a network. My Comment: Web APIs involve the transfer of data, but not a user interface.

– Big Data: The ability to find patterns, correlations and insights across multistructured data will become a mainstream requirement as companies try to better innovate and find operational efficiencies across business processes that leverage data. These include capabilities that enable the collection, storage, management, correlation, organization, exploration and analysis of multistructured data.

– Machine-Readable Format: Refers to information or data that is in a format that can be easily processed by a computer without human intervention while ensuring no semantic meaning is lost. MY NOTE: Needs to retain context which is done here.

– NIEMified and MyNIEM: Give every NIEM Web Sitemap Property (771) a well-defined Namespace (URI-URL) and customize my uses of NIEM Web Site content in MindTouch Excel, and Spotfire to create a NIEM Data Network.

Page 10: NIEM as Big Data in a Network with Data Science Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community

10

NIEM Artifact Definitions

• Some NIEM artifact definitions help:– Property: A set of related data values and their definitions

(Version 2.1 has 5,984)– Type: A description of a class of objects that share the same

operations, abstract attributes and relationships, and semantics. (Version 2.1 has 2,846)

– TypeContainsProperty: A Type contains a Property (Version 2.1 has 5,361)

– Facet: Lookup data defined in XML Schemas (XSD) (Version 2.1 has 46,742)

– Namespace (URI-URL): Associating XML element and attribute names with unique URI references (Version 21. has only 64)

Page 11: NIEM as Big Data in a Network with Data Science Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community

11

Results

• The results of my audit and NIEMification and MyNIEM are presented in:– A MindTouch Knowledge Base– An Excel Spreadsheet with Data Dictionary (64

data elements in the 5 NIEM artifact types – see previous slide)

– A NIEM Data Network in Spotfire Dashboard– Tutorial slides in PowerPoint

Page 12: NIEM as Big Data in a Network with Data Science Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community

12

NIEM Knowledge Base in MindTouch

http://semanticommunity.info/NIEM

Page 13: NIEM as Big Data in a Network with Data Science Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community

13

NIEM Knowledge Base in Excel

http://semanticommunity.info/@api/deki/files/22484/NIEMAuditFebruary2013.xls

MY NOTE: This is the Semantic WebFormat of Subject Predicate andObject in a Spreadsheet that canbe machine processed andnetwork graphed. See supplementalslides.

Page 14: NIEM as Big Data in a Network with Data Science Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community

14

NIEM Properties in Excel

http://semanticommunity.info/@api/deki/files/22484/NIEMAuditFebruary2013.xls

Page 15: NIEM as Big Data in a Network with Data Science Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community

15

NIEM Data Network in Spotfire

https://silverspotfire.tibco.com/us/library#/users/bniemann/Public?NIEM21DataNetwork-Spotfire

Page 16: NIEM as Big Data in a Network with Data Science Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community

16

NIEM Namespaces in Spotfire

https://silverspotfire.tibco.com/us/library#/users/bniemann/Public?NIEM21DataNetwork-Spotfire

MY NOTE: Only 64 for 1000s of Properties and TypesAlso a Version 4.1!

Page 17: NIEM as Big Data in a Network with Data Science Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community

17

Conclusions and Recommendations

• It was easy to NIEMify NIEM by giving every NIEM Web Sitemap Property (771) a well-defined Namespace (URI-URL).

• It was easy to create MyNIEM by customizing NIEM Web Site content in MindTouch, Excel, and Spotfire.

• The NIEM Knowledge Base and Excel Spreadsheets were used to create a Federated NIEM Data Network in Spotfire with previous NIEM and ISE Spotfire Analytics.

• There were only 64 Namespaces for all those 1000s of Properties and Types.– There should be more well-defined namespaces (URI-URL) as shown using our

Semantic Web Strategy for Data.• Add the use of NIEM by Federal agencies, State governments, private sector

organizations, and foreign partners to the Federated NIEM Data Network.– This supports the recent National Strategy For Information Sharing and Safeguarding

(See mention of NIEM in subsection on Building on Success).

Page 18: NIEM as Big Data in a Network with Data Science Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community

18

Supplemental Slides Outline

• Data Science Team• Semantic Community: Mission Statement for 2013• Current US Government Semantic Web Strategy• International Linked Open Data Strategy: Linked Open

Data Cloud Data• Our Semantic Web Strategy for Data: Simple Explanation• My 5-Step Method To Get to 5-Stars With Open Data for

a System of Systems Architecture• Spotfire for Data Science Analytics• Summary: Building a Digital Government by Example

Page 19: NIEM as Big Data in a Network with Data Science Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community

19

Data Science Team

• Dr. Brand Niemann, Director and Senior Data Scientist, Semantic Community

• Dr. Tom Rindflesch, Research Group Lead for Semantic Medline, National Library of Medicine

• Dr. Victor Pollara, Senior Principal Scientist, Noblis• Dr. Eric Little, Director of Information Management,

Orbis Technologies• Mark Guiton, Director, Government Relations, Cray Inc.– Cray-YarcData announced semifinalists last month

• http://www.yarcdata.com/press-release-12-3-12.html

Page 20: NIEM as Big Data in a Network with Data Science Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community

20

Semantic Community:Mission Statement for 2013

• Help the Data Transparency Coalition help the 113th Congress with the re-introduction of the Data Act by Building the Federal Financial Information Network in the Cloud for the 113th Congress, January 4, Slides.

• Continue to work with Big Data Analytics (e.g. Recorded Future, Spotfire, etc.), Content Analytics and Knowledge Management (e.g. MindTouch), and Semantic Technologies (e.g. Be Informed, Semantic Insights, etc.) for data science and data journalism. Slides.

• Help start Open Government Data for Japan (and the US and Europe) with the Right Data (Statistical) with the Right People (Data Scientists) Working on the Right Business Problems (Return on Investment): January 21, Slides.

• Help the Federal Big Data Senior Steering Group with A Semantic Web Strategy for Big Data and to move From the Year of Big Data to the Year of the Data Scientist Working With Big Data, January 24, Slides.

• Help the ACT-IAC AMWG, C&T SIG, and ET-SIG with Big Data on Mobile Devices, Collaboration and Transformation, & Government Challenges With Big Data , January 16 and February 23, Slides.http://semanticommunity.info/#Welcome_to_Semantic_Community.info:_Community_Infrastructure_Sandbox_for_2013

Page 21: NIEM as Big Data in a Network with Data Science Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community

21

Current US GovernmentSemantic Web Strategy

• Data.gov Advocates RDFa 1.1 Lite for Semantic Web Strategy.– See Comment From Owen Ambur on Next Slide.

• I believe there is a better way to handle this that I showed the W3C eGov Special Interest Group on January 21st and have recommended for the reintroduction of the Data Act to the 113th Congress.– Create a Semantic Index of Strong Relationships (SR) in RDF Format in a

Spreadsheet.• See next slide for example (spreadsheet and words)

– Integrate That With Other Spreadsheets and Relational Databases in An Interoperability Interface (e.g. Dashboard) That Can Searched.

• Essentially:– Computer Scientists Use RD2RDF (James Hendler)– Data Scientists Use SR2Excel2RDF (Brand Niemann)

Page 22: NIEM as Big Data in a Network with Data Science Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community

22

Comment From Owen Ambur• OMB's official guidance to agencies on implementation of section 10 of

the GPRA Modernization Act (GPRAMA) says they may use XML, JSON, spreadsheets or CSVs in order to meet the requirement to publish their strategic and performance plans and reports in machine-readable format... but not PDF or HTML -- at least not without "enhanced structural elements".[1]

• I couldn't help but chuckle at how [1] is a PDF. I get your pointhowever, which I think reinforces mine, that there is no US federalpolicy that prefers RDFa 1.1 over HTML Microdata for publishingmetadata in HTML.– [1] RDFa Lite 1.1, W3C Recommendation, June 7, 2012, Manu Sporny, editor,

see http://www.w3.org/TR/rdfa-lite/• Source: Owen Ambur, December 18, 2012, W3C eGov Mailing List.

Page 23: NIEM as Big Data in a Network with Data Science Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community

23

International Linked Open Data Strategy:Linked Open Data Cloud Data

http://semanticommunity.info/@api/deki/files/8824/=VIVO.xlsx

My Question: Is it easy to add columns for who links to who?Answer: Not in a single table. SPARQL can't do cross-tabulation (Richard Cyganiak).

Page 24: NIEM as Big Data in a Network with Data Science Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community

24

International Linked Open Data:Comments to David Wood

• The Linked Open Data Cloud is not actually “linked data”.– RDF at Data.gov is not linked data.

• The analytical and statistical communities view Data.gov and Linked Open Data as “IT projects”.– Former Census Bureau Director Robert Groves.

• Conventional tools can do linked data and data integration.– Spotfire Information Designer, Informatica, Information

Builders, etc.http://manning.com/dwood/LinkedData_MEAP_ch1.pdfhttp://semanticommunity.info/AOL_Government/Exploiting_Linked_Data_with_BI_Tools

Page 25: NIEM as Big Data in a Network with Data Science Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community

25

Our Semantic Web Strategy for Data:Simple Explanation

• One Table:– Two Columns

• Example: Column 1: Section and Column 2: URL• Note: A Column 3: Description could be in the URL• Example: See Slide 23

– Three Columns:• Example: Column 1: Subject, Column 2: Object, and Column 3: Predicate• Note: This is the Semantic Web’s Linked Open Data Cloud as Linked Open Data for

Network Analytics!• Example: See Slide 13

– Four Columns:• Examples: Column 1: Subject, Column 2: Attribute, Column 3: From, and Column 4: To,

or Column 1: City, Column 2: Country, Column 3: Longitude, and Column 4: Latitude• Note: This is the format for Spotfire’s Network Analytics Module developed for the

CIA• Example: See Next Slide and Semantic Medline

Page 26: NIEM as Big Data in a Network with Data Science Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community

26

Edge and Node TablesName

Means of Transport

From To

Mr. A By bus Boston New York

Mr. A By train New York Boston

Mr. A By bus Boston New York

Mr. A By airplane New York Amsterdam

Mr. A By airplane Amsterdam Boston

Mr. B By airplane London Amsterdam

Mr. B By airplane Amsterdam Moscow

Mr. B By airplane Moscow Stockholm

Mr. B By airplane Stockholm London

Mr. C By car Stockholm Gothenburg

Mr. C By car Gothenburg Stockholm

City Longi-tude

Latitude Country

Boston -71.06 42.36 USA

Gothenburg 11.93 57.70 Sweden

Moscow 37.67 65.77 Russia

Stockholm 18.07 59.32 Sweden

London -0.13 51.90 England

Amsterdam 4.90 52.37 Holland

New York -74.00 40.16 USA

To create a new network visualization it is necessary to provide an edge data table. It is optional to add a node data table since the application can generate a node table from your edge table as soon as you have made the necessary settings for the edges. The edge table must contain at least two columns, but usually more than two columns are needed for the network graph to give any useful insight into the data. The table should also contain a meaningful relation between the columns. For example, persons travelling to or from cities or, friendship relationships.

Page 27: NIEM as Big Data in a Network with Data Science Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community

27

Our Semantic Web Strategy for Data:Spotfire Network Analytics

http://semanticommunity.info/AOL_Government/Social_Media_-_Six_Degrees_of_Separation_and_Now_Even_Less

Page 28: NIEM as Big Data in a Network with Data Science Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community

28

My 5-Step Method

• So what I like to do to illustrate (data science) and explain (data journalism) is the following (like a recipe):– Put the Best Content into a Knowledge Base (e.g. MindTouch*)

• NASA Big Data

– Put the Knowledge Base into a Spreadsheet (Excel*)• Linked Data to Subparts of the Knowledge Base

– Put the Spreadsheet into a Dashboard (Spotfire*)• Data Integration and Interoperability Interface

– Put the Dashboard into a Semantic Model (Excel*)• Data Dictionaries and Models

– Put the Semantic Model into Dynamic Case Management (Be Informed*)• Structured Process for Updating Data in the Dashboard

* Examples of tools used.

Page 29: NIEM as Big Data in a Network with Data Science Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community

29

To Get to 5-Stars With Open DataStar Definition Example / Tool*

Make your stuff available on the Web (whatever format) under an open license

This Story / MindTouch

Make it available as structured data (e.g., Excel instead of image scan of a table)

Spreadsheet / Excel

Use non-proprietary formats (e.g., CSV instead of Excel)

Table / MindTouch and Spotfire

Use URIs to identify things, so that people can point at your stuff

Table of Contents / MindTouch and Spotfire

Link your data to other data to provide context

Table / MindTouch and Spotfire

* Examples of tools used.Source of Star and Definition: http://www.w3.org/DesignIssues/LinkedData.html

Page 30: NIEM as Big Data in a Network with Data Science Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community

30

System of Systems Architecture

SSemantic Index ofLinked Data(e.g. Excel)

Dynamic Case Management (e.g. Be Informed)

Data Science Library (e.g. Spotfire)

Data Science Products (e.g. Spotfire)

Page 31: NIEM as Big Data in a Network with Data Science Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community

31

Data Federation in Spotfire:In-Memory and In-Database Data

• In-Memory Data– When you are working with in-memory data tables (text files, Excel files,

information links, etc.) you have access to all the functionality of Spotfire. You have the opportunity to use all columns as filters and perform any number of calculations. You can also use any of the tools within Spotfire to cluster data, calculate new columns, bin columns, make predictions etc. See Working With Large Data Volumes for some tips on how to improve the performance of an analysis with lots of data.

• In-Database Data– When a connection to an external source is set up, all calculations are done

using the external system and not with the Spotfire data engine. This will allow you to work with data volumes too large to fit into primary memory and take advantage of the power of the external system. When working with external data connections, you access only the current selection of data and all aggregations and calculations are made in-database (in-db).

Page 32: NIEM as Big Data in a Network with Data Science Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community

32

Data Federation in Spotfire:Database Connections, Information Links, & Analytics Library

• A database connection dialog is used to set up a connection to say a Teradata database, where you can analyze data from the database without bringing it into your analysis.

• An information link is a structured request for data which can be sent to the database. These specifications include one or more columns, and may include one or more filters.– Stated in plain English, an information link could

be: "Fetch the Name, Address and Phone_number for employees that pass the filter High_Income."

– Information links can also be used to limit what data to open in an analysis in a number of different ways.

• The library provides publishing capabilities for all of your analysis materials, so you can share data with your colleagues. The library can be used directly from Spotfire by anyone who has at least read privileges.

Page 33: NIEM as Big Data in a Network with Data Science Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community

33

Spotfire for Big Data Analytics:Microscope

http://semanticommunity.info/Emerging_Technology_SIG_Big_Data_Committee/Government_Challenges_With_Big_Data#Spotfire_Dashboard

NASA GCMD: Gateway to Big DataNSF Big Data Awards: Follow the Work

OSTP Harnessing The Power of Digital Data Report: Well-Defined URLsPCAST Designing a Digital Future Report: Interoperability InterfaceNITRD Dashboards: Live DemonstrationsFour Clicks: See, Sort/Search, Download, & Share (iPad)

Page 34: NIEM as Big Data in a Network with Data Science Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community

34

Data Science Analytics Library:Telescope & Library

https://silverspotfire.tibco.com/us/library#/users/bniemann/Public

Live Links to Outside Data SourcesLive Information Links Between Analytics

Page 35: NIEM as Big Data in a Network with Data Science Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community

35

Summary• Semantic Community has a NIEM and Open Government Data Strategy:

– Open Source Platform for Creating Knowledge Bases:• HTML 5• Well-defined URLs• APIs

– Machine-readable Semantic Web – Linked Data Formats:• Subject• Object• Predicate

– Data Science Analytics:• Federation• Visualizations and Statistics• Network Graphs

• This Supports Building a Digital Government by Example:– See Next Slide

Page 36: NIEM as Big Data in a Network with Data Science Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community

36

Building a Digital Government by Example

http://semanticommunity.info/AOL_Government/Building_a_Digital_Government