Data.gov Review of New and Existing Applications Brand K. Niemann, Rich W. LaValley, Dr. W. Chris...

Post on 20-Jan-2016

217 views 0 download

Transcript of Data.gov Review of New and Existing Applications Brand K. Niemann, Rich W. LaValley, Dr. W. Chris...

Data.govReview of New and Existing Applications

Brand K. Niemann, Rich W. LaValley, Dr. W. Chris Hardy

Presentation to the Data Architecture Subcommittee (DAS)

September 10, 2009

Advanced Concepts and Integrated Systems (ACIS)

SAIC

© 2008 Science Applications International Corporation. All rights reserved. SAIC and the SAIC logo are registered trademarks of Science Applications International Corporation in the U.S. and/or other countries.

2

Overview

• Review Data Sets

• Review and Demonstrate New and Existing Applications

• Feedback and Comments

Summary

Tools Data

COTS & GOTS, Desktop & Web

Review variety of applications from a variety of sources

Review Datasets and Application Sources

Data Sources

• http://www.data.gov/details/92 (June 2009)

Application Sources

• http://data-gov.tw.rpi.edu/wiki/Main_Page

• http://wiki.sunlightlabs.com/Main_Page

• http://data-gov.tw.rpi.edu/wiki/Main_Page

• http://www.gov2expo.com/gov2expo2009

By the Numbers

Data Sources

www.data.gov

788

Data-gov.tw.rpi.eduhttp://data-gov.tw.rpi.edu/wiki/Demos

Applications: 11

Converted to RDF: 16

Apps for America 2

http://sunlightlabs.com/

46

Gov2.0 Expohttp://www.gov2expo.com/gov2expo2009/public/schedule/presentations

35 (5 Categories)

Other: https://analyzethe.us/

Palantir Government

See also Data.Gov dashboard

http://spreadsheets.google.com/pub?key=tchvwRko8_bEQ9c36b33fOA&gid=10

The Giant Warehouse of Data

File Type Contributed

Influence the kind of applications that are developed.

Review the Challenge (Open Government)

Gives us the tools and we will can do it ourselves. Lend your hand and your coding skills (Tim O’Reilly)

http://blip.tv/file/25528241. Be an Organizer2. Volunteer skills, developers – parse a state – 50 states3. Provide Specific Results, Work together4. Visualize Data(Clay Johnson, Sunlight Labs)

http://blip.tv/file/2075676

5. Visually explore and interact with data to facilitate sense making (DAS, 9/10/2009)

Age of Visualization and Analysis

Emerging Trends in Data Visualization, July 30,2009 DM Radiohttp://www.information-management.com/dmradio/-10015788-1.html

Heat Maps, Tag Clouds, Concepts Layers Widgets, Dashboards, Sliders, Filters

View of data over time is a storyHeat Maps, Tag Clouds, Concepts Layers Widgets, Dashboards, Sliders, Filters

View of data over time is a story

http://www.smartmoney.com/investing/bonds/the-living-yield-curve-7923/

The Yield Curve

View of data over time is a story (temporal and geospatial characteristics)

http://www.palantirtech.com/government/analysis-blog/uncovering-a-bot-net-exploring-router-data-using-palantir

Efficient Access of Data Sources

• Data Imaging

• Direct, ad hoc extraction of selected data elements from a native file• Representation of the content of the data extracted as an integer matrix

• Dates become integer in YYYYMMDD format• Time becomes number of seconds after midnight• Character names/descriptions assigned index values in table• Numerical values expressed as integer with understood base

• Benefits• Minimal overhead in configuration for data handling• Significant compression of working files without loss of content• Substantial acceleration of data retrieval and analysis capabilities achieved by:

• Reduction of tests to integer (=1 word) compares • Exploiting matrix-based processing efficiencies

Efficient Access of Data Sources

Date/Time of X-mission

Router SerialNumber Fault Code

Example479,921 records/69,588,548 bytes

1/7/2007 7:00:00 6 1 1/7/2007 6:29:00 00-08-74-36-37-21 29.224.42.199 80 1545 9S26

1/1/2007 4:00:00 3 2 1/1/2007 3:36:00 00-08-74-52-83-98 129.224.42.199 443 1970 4Z55

1/2/2007 14:01:00 14 3 1/2/2007 14:01:00 00-08-74-52-73-79 129.251.240.179 443 1073 2J89

1/7/2007 0:01:00 0 1 1/7/2007 0:01:00 00-08-74-52-83-98 129.251.240.179 80 7095 4Q66

1/7/2007 22:00:00 21 1 1/7/2007 21:01:00 00-08-74-06-36-24 129.3.1.91 22 2014 7X44

1/5/2007 8:01:00 8 6 1/5/2007 8:01:00 00-08-74-08-66-92 129.3.1.91 443 5821 5G49

1/7/2007 13:00:00 12 1 1/7/2007 12:31:00 00-08-74-52-73-79 129.40.42.144 22 1605 4Z55

1/7/2007 18:00:00 17 1 1/7/2007 17:36:00 00-08-74-52-73-79 129.40.42.144 443 922 4Z55

1/5/2007 12:01:00 12 6 1/5/2007 12:01:00 00-08-74-52-83-98 129.66.124.144 80 2825 3D39

1/5/2007 6:01:00 6 6 1/5/2007 6:01:00 00-08-74-06-36-24 129.9.137.79 21 3653 1G29

. . .

Data Elements of Interest : 21,596,488 bytes

Image Generation: 44.5 secsImage Size:

- 7,678,736 bytes plus 12,592 bytes in conversion tables - 9:1 compression over total data set

- 2.8:1 compression of data soughtQuery for Error Counts by Router:

- Direct: more than 1 minute - Matrix-Based: 9.7 secs - Image-Based: 1.2 secs

-- Neighborhood to Live

Name Source

Crime in the US 1998-2007

FBI Tableau 5 Application

Data.gov

New Application

State

--Related

Are you Safe?

http://www.areyousafedc.com/

Existing Application

City

--Related

Every Block

http://dc.everyblock.com/crime/by-offense/theft/

Existing Application

City

--RelatedDensity of firearms/ Death Rate

http://www.datamasher.org/mash-ups/test-123#table-tab

Existing Application

State

-- Purchasing a Car, Planning a Vacation

Name Source

Fuel Efficient Cars

www.fueleconomy.gov

Heat Map Explorer (COTS)

New Application

Federal

Hurricane data (1990 – 2006)

-- Related www.nhc.noaa.gov/

Tableau 5 (COTS)

New Application

Federal

See other examples

http://www.tableausoftware.com/learning/examples

Discussion and Feedback

niemannb@saic.comlavalleyr@saic.comhardywi@saic.com

-- OtherBackup

Name Source

World Copper Smelters

http://tin.er.usgs.gov/copper/output/copper-fLD.kml

Data.gov

Existing Application

World Copper Smelters.bmp

USGS Oil and Gas Assessment Database

http://energy.cr.usgs.gov/oilgas/wep

Data.gov

Existing Application

World Petroleum Assessment.bmp

-- Emerging Technologies Backup