NYC Data Web (static version) - A Semantic, Open Public Data Exchange for NYC
-
Upload
joel-natividad -
Category
Technology
-
view
788 -
download
1
description
Transcript of NYC Data Web (static version) - A Semantic, Open Public Data Exchange for NYC
Joel NatividadTCG
Thursday, June 9, 2011SemTech 2011
NYC DataWebA platform for Integrating Public Data into NYC.gov
About Me
• TCG Software
• Software Services arm of “The Chatterjee Group”
• Several Portfolio companies in Lifesciences, Telecom, Aviation, Energy, Real Estate, & Info Technology
• Headquartered in NYC
• Delivery Centers in Bangalore, Kolkata & Mumbai
• Look after Knowledge Engineering Practice of TCG
Background
• stimulate development of apps that improve access to info and govt transparency, and;
• encourage innovation & the creation of new IP with commercial potential
Main Goals
CROWDSOURCING
CROWDSOURCING
• Wisdom of the Crowd
• Self-selecting, motivated developers
• Bang for the Buck
• Ignites Entrepreneurship
CROWDSOURCING
• Challenge: Improve Recommendation Algorithm by 10%
• Dataset:
• 100 million ratings (training set)
• Half a million Users
• 18 thousand movies
• Prize:One million US Dollars
STATISTICS
• just 6 days into contest, Cinematch bested by 1%
• 20,000 Teams, 150 countries
• Entrants:
• Bell Labs
• Opera Solutions
• Well-renowned universities
CROWDSOURCING
• Challenge: Improve Recommendation Algorithm by 10%
• Dataset:
• 100 million ratings (training set)
• Half a million Users
• 18 thousand movies
• Prize:One million US Dollars
STATISTICS
• just 6 days into contest, Cinematch bested by 1%
• 20,000 Teams, 150 countries
• Entrants:
• Bell Labs
• Opera Solutions
• Well-renowned universities
CROWDSOURCING
• Washington DC CTO - Vivek Kundra
• First Federal CIO - Vivek Kundra
• First Federal CIO - Vivek Kundra
• Open Government Initiative
• Recovery.gov
• Data.gov
• USAspending.gov
• IT Dashboard
• Performance.gov
• Fedspace
• Citizen Services Dashboard
• First Federal CIO - Vivek Kundra
• Open Government Initiative
• Recovery.gov
• Data.gov
• USAspending.gov
• IT Dashboard
• Performance.gov
• Fedspace
• Citizen Services Dashboard
• First Federal CIO - Vivek Kundra
• Open Government Initiative
• Recovery.gov
• Data.gov
• USAspending.gov
• IT Dashboard
• Performance.gov
• Fedspace
• Citizen Services Dashboard
• First Federal CIO - Vivek Kundra
• Open Government Initiative
• Recovery.gov
• Data.gov
• USAspending.gov
• IT Dashboard
• Performance.gov
• Fedspace
• Citizen Services Dashboard
• First Federal CIO - Vivek Kundra
• Open Government Initiative
• Recovery.gov
• Data.gov
• USAspending.gov
• IT Dashboard
• Performance.gov
• Fedspace
• Citizen Services Dashboard
}Life Support
• First Federal CIO - Vivek Kundra
• Open Government Initiative
• Recovery.gov
• Data.gov
• USAspending.gov
• IT Dashboard
• Performance.gov
• Fedspace
• Citizen Services Dashboard
}Life SupportBudget slashed
from $34 million to
$8 million
Open Data in NYC
Council Member Gale Brewer
$ 500 million!!!
Why $ 500 million?!?!
Why $ 500 million?!?!
“Integrated” Inter-Agency System
Data Integration Alphabet Soup
SOAEAI
ORB
SOAPRPC
XML
XSLTJMS
EJB
MOM
MDA
BPM BPEL POJO
Data Integration Alphabet Soup
SOAEA
I
ORB
SOAP
RPC
XML
XSLTJMS
EJB
MOM
MDABPM BPEL POJO
Principles
• Cost Effective (NOT $500 million dollars)
• Easy to Use (Developers/Publishers/Citizens)
• based on Open Standards
• Low Adoption Curve
• Help Accelerate Open Data Innovation
• Useable Data Now!
bionic hand
The Next Web of Open Linked DataFebruary 2009
Useable Data Now
• “Beautiful” Website
• Useable by Developers/Publishers/Citizens
• based on Open Standards
• Low Adoption Curve
• Help Accelerate Open Data Innovation
• Useable Data Now!
What NYCBigApps Developers were Doing
Siloed Data
46
ETL Processes
• Spend inordinate amount of time interpreting data
• Massaged Data was then staged locally
• Developers kept reinventing the wheel
• Limited Data mashups
• Applications disconnected from NYCDatamine
Text
Download &Decipher
There must be a Better Way
How it Started
• Oct 12, 2010 - NYCBigApps 2.0 announced
• Nov 9, 2010 - NYCBigApps 2.0 kickoff meeting
• late Nov 2010 - spoke with Revelytix/Spry about collaborating
• early Dec 2010 - started work on NYCDataWeb
• Jan 26, 2011 ~4:30p - submitted entry
What We Did
51
Query &Results
Siloed Data
MappingOntology
MetadataOntology
DomainOntology
Optimizer
PlannerIndexes
Re-Writer
Cache
Re-Writer
Indexes
Optimizer
Rules
Planner
Rules
Definitions
“Beautiful” Website
Three dashboards were built
• NYC Agile Analytics (Spry)
• NYCreation (SMW+)- visualized SPARQL query results
• NYCmantics (SMW+)- NYC datamine explorer
What’s Next?
Semantic Gap
Semantic Gap
Developers
?!?
Semantic Gap
3.0
3.0Developers
JumpStart Semantics
3.0
3.0
The Computer for the rest of us.
Semantics for the rest of us.
Semantics for the REST of us.
Phase 2Aug 2011 (Powered by NYCDataWeb)
• Hide Complexity(Simplicity = Adoption)
• Incorporate the whole NYC datamine
• Make it easier for Publishers
• Make it easier for Developers
• Make it easier for Citizens
• Open-source collaboration with vendors & other institutions
• Incorporate the best of Socrata and data.gov
• Improved Visualizations
Phase 2Aug 2011 (Powered by NYCDataWeb)
• Hide Complexity(Simplicity = Adoption)
• Incorporate the whole NYC datamine
• Make it easier for Publishers
• Make it easier for Developers
• Make it easier for Citizens
• Open-source collaboration with vendors & other institutions
• Incorporate the best of Socrata and data.gov
• Improved Visualizations
• Position NYCDataWeb as the accelerated data mashup platform
Phase 3Nov 2011 (NYCBigApps 2011)
• DataWeb Deployment Framework SMW bundle
• More Data Sources (Federator - Spinner)
• Linked Open Data
• Make it easier STILL for Publishers, Developers and Citizens
• Enable Widespread adoption of NYCDataWeb(NYCDataWeb bootcamp)
NYCInformation
Web
The Broader Vision
85
Query &Results
DomainOntology
RDF RDF
RDF RDF
RDF
WebPages
Sensorss
Partners
OntologyRDF
Agency Data Other
Triplestores
Phase 4Post NYC BigApps 2011
• Multiple solutions powered by NYCDataWeb
• <Your city/community/company here> DataWeb
• Help foster a viable ecosystem of Linked Data
• ... keep standing on the shoulders of giants
Semantic Web
Hans Rosling shows the best stats you've ever seen
February 2006
PUBLIC
PUBLIC
We need your help & feedback
A Platform for Integrating Public Data into NYC.gov
Find out more athttp://knoodl.com/ui/groups/NYC_Homepage
CREDITS• Lego Faceparty picture by RichardAM (http://www.richard-am.net/)
• Lego Inauguration Pictures from various Flickr Users (sluggobear, Atwater, Dan Hontz)
• Lego Luke looses his Hand by Flickr user wwwayazdotcom
• Tim Berners-Lee highlight from TED (http://www.ted.com/talks/tim_berners_lee_on_the_next_web.html)
• Hans Rosling highlight from TED (http://www.ted.com/talks/hans_rosling_shows_the_best_stats_you_ve_ever_seen.html)
• FlowerPowerpont2.pptx provided by Anna Rosling Rönnlund of gapminder
• “Star Wars Gangsta Rap” highlight, SizzlechestXXX (http://www.youtube.com/watch?v=Ij4w7ChpuaM)
• Various screenshots provided by Revelytix, Spry Inc. and TCG Software Services