Strata NY Sep 2011: Big Data, Short URLs: Learning in Realtime
Strata NY: Best Practices for Publishing Data
-
Upload
hjalmar-gislason -
Category
Business
-
view
2.819 -
download
2
description
Transcript of Strata NY: Best Practices for Publishing Data
![Page 1: Strata NY: Best Practices for Publishing Data](https://reader033.fdocuments.us/reader033/viewer/2022051514/54b57c984a7959fd0c8b4581/html5/thumbnails/1.jpg)
F I N D A N D U N D E R S TA N D D ATA
October, 2012Hjalmar Gislason, founder & CEO - [email protected]
Best Practices for
Publishing Data
![Page 2: Strata NY: Best Practices for Publishing Data](https://reader033.fdocuments.us/reader033/viewer/2022051514/54b57c984a7959fd0c8b4581/html5/thumbnails/2.jpg)
Founder and CEO
HjalmarGislason
Twitter: @datamarketSlides: http://blog.datamarket.com/
![Page 3: Strata NY: Best Practices for Publishing Data](https://reader033.fdocuments.us/reader033/viewer/2022051514/54b57c984a7959fd0c8b4581/html5/thumbnails/3.jpg)
![Page 4: Strata NY: Best Practices for Publishing Data](https://reader033.fdocuments.us/reader033/viewer/2022051514/54b57c984a7959fd0c8b4581/html5/thumbnails/4.jpg)
![Page 5: Strata NY: Best Practices for Publishing Data](https://reader033.fdocuments.us/reader033/viewer/2022051514/54b57c984a7959fd0c8b4581/html5/thumbnails/5.jpg)
HeavyData Consumers
Providers of
Data Delivery Technology
![Page 6: Strata NY: Best Practices for Publishing Data](https://reader033.fdocuments.us/reader033/viewer/2022051514/54b57c984a7959fd0c8b4581/html5/thumbnails/6.jpg)
| BEST PRACTICES for PUBLISHING DATA | Hjalmar Gislason, [email protected] | October 2012
Computers Humans
![Page 7: Strata NY: Best Practices for Publishing Data](https://reader033.fdocuments.us/reader033/viewer/2022051514/54b57c984a7959fd0c8b4581/html5/thumbnails/7.jpg)
| BEST PRACTICES for PUBLISHING DATA | Hjalmar Gislason, [email protected] | October 2012
Computers
• Structure
Humans
• Understand and use
![Page 8: Strata NY: Best Practices for Publishing Data](https://reader033.fdocuments.us/reader033/viewer/2022051514/54b57c984a7959fd0c8b4581/html5/thumbnails/8.jpg)
| BEST PRACTICES for PUBLISHING DATA | Hjalmar Gislason, [email protected] | October 2012
Computers
• Structure
Humans
• Understand and use
![Page 9: Strata NY: Best Practices for Publishing Data](https://reader033.fdocuments.us/reader033/viewer/2022051514/54b57c984a7959fd0c8b4581/html5/thumbnails/9.jpg)
1. Simple formats2. Indexes, unique IDs and meta-data3. FAQs and feedback channels
Publishing for Computers
![Page 10: Strata NY: Best Practices for Publishing Data](https://reader033.fdocuments.us/reader033/viewer/2022051514/54b57c984a7959fd0c8b4581/html5/thumbnails/10.jpg)
"Don't anthropomorphize computers - they hate it."
- Unknown
Simple Formats
![Page 11: Strata NY: Best Practices for Publishing Data](https://reader033.fdocuments.us/reader033/viewer/2022051514/54b57c984a7959fd0c8b4581/html5/thumbnails/11.jpg)
Simple Formats
![Page 12: Strata NY: Best Practices for Publishing Data](https://reader033.fdocuments.us/reader033/viewer/2022051514/54b57c984a7959fd0c8b4581/html5/thumbnails/12.jpg)
| BEST PRACTICES for PUBLISHING DATA | Hjalmar Gislason, [email protected] | October 2012
Simple Formats:Tim Berners-Lee’s Five Stars
![Page 13: Strata NY: Best Practices for Publishing Data](https://reader033.fdocuments.us/reader033/viewer/2022051514/54b57c984a7959fd0c8b4581/html5/thumbnails/13.jpg)
| BEST PRACTICES for PUBLISHING DATA | Hjalmar Gislason, [email protected] | October 2012
Simple Formats:You lost me at “Semantics”
![Page 14: Strata NY: Best Practices for Publishing Data](https://reader033.fdocuments.us/reader033/viewer/2022051514/54b57c984a7959fd0c8b4581/html5/thumbnails/14.jpg)
| BEST PRACTICES for PUBLISHING DATA | Hjalmar Gislason, [email protected] | October 2012
Standards will emerge and there will be more and more of them
• RDF•OData vs. GData•DSPL• SDMX
![Page 15: Strata NY: Best Practices for Publishing Data](https://reader033.fdocuments.us/reader033/viewer/2022051514/54b57c984a7959fd0c8b4581/html5/thumbnails/15.jpg)
| BEST PRACTICES for PUBLISHING DATA | Hjalmar Gislason, [email protected] | October 2012
Indexes, unique ids and meta-data
![Page 16: Strata NY: Best Practices for Publishing Data](https://reader033.fdocuments.us/reader033/viewer/2022051514/54b57c984a7959fd0c8b4581/html5/thumbnails/16.jpg)
| BEST PRACTICES for PUBLISHING DATA | Hjalmar Gislason, [email protected] | October 2012
Indexes, unique ids and meta-data
![Page 17: Strata NY: Best Practices for Publishing Data](https://reader033.fdocuments.us/reader033/viewer/2022051514/54b57c984a7959fd0c8b4581/html5/thumbnails/17.jpg)
| BEST PRACTICES for PUBLISHING DATA | Hjalmar Gislason, [email protected] | October 2012
Indexes, unique ids and meta-data
![Page 18: Strata NY: Best Practices for Publishing Data](https://reader033.fdocuments.us/reader033/viewer/2022051514/54b57c984a7959fd0c8b4581/html5/thumbnails/18.jpg)
| BEST PRACTICES for PUBLISHING DATA | Hjalmar Gislason, [email protected] | October 2012
Indexes, unique IDs and meta-data
• Must: Unique ID, Title, Last updated• Should: Meta-data
• Why?• No need for scraping
• Less load on your end• Ensures full coverage• Ensures content removal and updates
![Page 19: Strata NY: Best Practices for Publishing Data](https://reader033.fdocuments.us/reader033/viewer/2022051514/54b57c984a7959fd0c8b4581/html5/thumbnails/19.jpg)
| BEST PRACTICES for PUBLISHING DATA | Hjalmar Gislason, [email protected] | October 2012
Indexes, unique IDs and meta-data
• Hard to emphasize enough!
• Unique IDs for everything: Datsets, columns, entities, ...
• Why?• Continuity: A small change for a man = giant leap for a
computer
![Page 20: Strata NY: Best Practices for Publishing Data](https://reader033.fdocuments.us/reader033/viewer/2022051514/54b57c984a7959fd0c8b4581/html5/thumbnails/20.jpg)
| BEST PRACTICES for PUBLISHING DATA | Hjalmar Gislason, [email protected] | October 2012
Indexes, unique IDs and meta-data
• Any relevant contextual information• URL(s), descriptions, methodology, next updated, authors,
keywords, units, license information, ...
![Page 21: Strata NY: Best Practices for Publishing Data](https://reader033.fdocuments.us/reader033/viewer/2022051514/54b57c984a7959fd0c8b4581/html5/thumbnails/21.jpg)
| BEST PRACTICES for PUBLISHING DATA | Hjalmar Gislason, [email protected] | October 2012
FAQs and feedback channels
#1 reason for not publishing data:
“There are errors in the data and I don'twant others to discover them”
![Page 22: Strata NY: Best Practices for Publishing Data](https://reader033.fdocuments.us/reader033/viewer/2022051514/54b57c984a7959fd0c8b4581/html5/thumbnails/22.jpg)
| BEST PRACTICES for PUBLISHING DATA | Hjalmar Gislason, [email protected] | October 2012
FAQs and feedback channels
#1 reason for not publishing data:
“There are errors in the data and I dowant others to discover them”
![Page 23: Strata NY: Best Practices for Publishing Data](https://reader033.fdocuments.us/reader033/viewer/2022051514/54b57c984a7959fd0c8b4581/html5/thumbnails/23.jpg)
| BEST PRACTICES for PUBLISHING DATA | Hjalmar Gislason, [email protected] | October 2012
FAQs and feedback channels
![Page 24: Strata NY: Best Practices for Publishing Data](https://reader033.fdocuments.us/reader033/viewer/2022051514/54b57c984a7959fd0c8b4581/html5/thumbnails/24.jpg)
| BEST PRACTICES for PUBLISHING DATA | Hjalmar Gislason, [email protected] | October 2012
FAQs and feedback channels
![Page 25: Strata NY: Best Practices for Publishing Data](https://reader033.fdocuments.us/reader033/viewer/2022051514/54b57c984a7959fd0c8b4581/html5/thumbnails/25.jpg)
1. Simple formats2. Indexes, unique IDs and meta-data3. FAQs and feedback channels
Publishing for Computers
![Page 26: Strata NY: Best Practices for Publishing Data](https://reader033.fdocuments.us/reader033/viewer/2022051514/54b57c984a7959fd0c8b4581/html5/thumbnails/26.jpg)
| BEST PRACTICES for PUBLISHING DATA | Hjalmar Gislason, [email protected] | October 2012
Computers
• Structure
Humans
• Understand and use
![Page 27: Strata NY: Best Practices for Publishing Data](https://reader033.fdocuments.us/reader033/viewer/2022051514/54b57c984a7959fd0c8b4581/html5/thumbnails/27.jpg)
1. Search / Discovery2. Visualization3. Download
Publishing for Humans
![Page 28: Strata NY: Best Practices for Publishing Data](https://reader033.fdocuments.us/reader033/viewer/2022051514/54b57c984a7959fd0c8b4581/html5/thumbnails/28.jpg)
Search / Discovery
• Requirements differ from web/text search• A lot less textual content to base on
• Synonyms, dictionaries, autocomplete• But (hopefully) good meta-data = facets and filtering
• Give people ways to browse• Categories vs. tags vs. search• Serendipity: Random, related, interesting...
![Page 29: Strata NY: Best Practices for Publishing Data](https://reader033.fdocuments.us/reader033/viewer/2022051514/54b57c984a7959fd0c8b4581/html5/thumbnails/29.jpg)
Search / Discovery
![Page 30: Strata NY: Best Practices for Publishing Data](https://reader033.fdocuments.us/reader033/viewer/2022051514/54b57c984a7959fd0c8b4581/html5/thumbnails/30.jpg)
Visualize
![Page 31: Strata NY: Best Practices for Publishing Data](https://reader033.fdocuments.us/reader033/viewer/2022051514/54b57c984a7959fd0c8b4581/html5/thumbnails/31.jpg)
![Page 32: Strata NY: Best Practices for Publishing Data](https://reader033.fdocuments.us/reader033/viewer/2022051514/54b57c984a7959fd0c8b4581/html5/thumbnails/32.jpg)
109 columnsx
340 lines=
37.060 cells
![Page 33: Strata NY: Best Practices for Publishing Data](https://reader033.fdocuments.us/reader033/viewer/2022051514/54b57c984a7959fd0c8b4581/html5/thumbnails/33.jpg)
![Page 34: Strata NY: Best Practices for Publishing Data](https://reader033.fdocuments.us/reader033/viewer/2022051514/54b57c984a7959fd0c8b4581/html5/thumbnails/34.jpg)
![Page 35: Strata NY: Best Practices for Publishing Data](https://reader033.fdocuments.us/reader033/viewer/2022051514/54b57c984a7959fd0c8b4581/html5/thumbnails/35.jpg)
![Page 36: Strata NY: Best Practices for Publishing Data](https://reader033.fdocuments.us/reader033/viewer/2022051514/54b57c984a7959fd0c8b4581/html5/thumbnails/36.jpg)
Visualize
• What you should offer depends on the data
• Statistical data• Focus on the most common charts and get them right• Do NOT invent new visualizations or chart types
• Use standards compatible technologies• No Flash!• Charting and visualization libraries
![Page 37: Strata NY: Best Practices for Publishing Data](https://reader033.fdocuments.us/reader033/viewer/2022051514/54b57c984a7959fd0c8b4581/html5/thumbnails/37.jpg)
Visualize
![Page 38: Strata NY: Best Practices for Publishing Data](https://reader033.fdocuments.us/reader033/viewer/2022051514/54b57c984a7959fd0c8b4581/html5/thumbnails/38.jpg)
Visualize
![Page 39: Strata NY: Best Practices for Publishing Data](https://reader033.fdocuments.us/reader033/viewer/2022051514/54b57c984a7959fd0c8b4581/html5/thumbnails/39.jpg)
Download
• Make it easy to use your data outside your tools• Play nicely with those providing functionality beyond what
you can offer: Tableau, R, SAS, MathLab, Mathematica, SPSS, ...
• Provide downloads in the formats most commonly used by your users:• Raw data: Excel, CSV, feeds (R, Excel live feeds, APIs)• Charts and visualizations: Bitmap, vector, PPT, embeds?
![Page 40: Strata NY: Best Practices for Publishing Data](https://reader033.fdocuments.us/reader033/viewer/2022051514/54b57c984a7959fd0c8b4581/html5/thumbnails/40.jpg)
| BEST PRACTICES for PUBLISHING DATA | Hjalmar Gislason, [email protected] | October 2012
Computers
• Structure• Simple formats• Indexes, unique IDs and
meta-data• FAQs and feedback
channels
Humans
• Understand and use• Search / Discovery• Visualization• Download
![Page 41: Strata NY: Best Practices for Publishing Data](https://reader033.fdocuments.us/reader033/viewer/2022051514/54b57c984a7959fd0c8b4581/html5/thumbnails/41.jpg)
F I N D A N D U N D E R S TA N D D ATA
Twitter: @datamarket · Facebook: DataMarket · E-mail: [email protected]
Hjalmar Gislason, founder & CEO