(PROJEKTURA) Big Data Open Data story for TGG

55
OPEN DATA, BIG DATA STORY Ratko Mutavdzic, PROJEKTURA, Experience Architect

Transcript of (PROJEKTURA) Big Data Open Data story for TGG

OPEN DATA, BIG DATA STORY

Ratko Mutavdzic, PROJEKTURA, Experience Architect

OPEN DATA VS. BIG DATA

THE NEW REALITY BUT TWO DIFFERENT THINGS

• Open Data =/= Big Data (usually)• Big Data = Open Data (usually)• Open Data could grow to Big Data• Big Data = BUSINESS THEN TRANSPARENCY (Business Conversation)• Open Data = TRANSPARENCY THEN BUSINESS (Government Conversation)

OPEN DATA

THE NEW REALITY2006 EC MEPSIR Study

International Right-To-Know Goal

“to raise global awareness of individuals’ right to access government information”, and

„to promote access to information as a fundamental human right”

WHY OPEN DATA

TRANSPARENCYPARTICIPATIONCOLLABORATION

WHAT MAKES DATA OPEN

right information is available for people to make a right decisions... at all levels of the organization

• open format • publised via industry standards like XML, RDF, HTML, CSV for data, PDF

for documents• metadata

• published via standards like Dublin Core• catalouge of open data sources

• http://logd.tw.rpi.edu/

PU

BLI

C D

ATA

OP

EN F

OR

MA

TSM

AC

HIN

E R

EAD

AB

LEA

CC

ESS

IBLE

PUBLIC INFORMATION POOL

OBJECTIVE

source: adapted from OECD, 2006

Public Information / Content Pool

Public SectorInformation

Public SectorContent

geo data, statisticaldata, other numbers

geo data, statisticaldata, other numbers

INFORMATION RE-USE

CONTENT AVAILABILITY

• transformation ofraw data by valueaddition

• frequentcombination ofinformation types

• education andcultural value

• limited commercialexploatation

• content nottransformed

CHARACTERISTICS

EXAMPLE

CATEGORY

KEY COMPONENTS

WHAT IS OPEN DATA SOLUTIONfew words of wisdom

• Open Data Portal• Open Data API

CKAN, Cloud

Most respected Open Data implementation

OPEN DATA IN U.K.open.data.gov.uk

OPEN LINKED DATA

AGENDA

(LINKED) (OPEN) DATAfew words of wisdom

• What is Open (Linked) Data?• Linked Data Standards and Tools• Linked Open Data in Practice

PUBLIC DATA vs. OPEN DATA

(LINKED) (OPEN) DATAfew words of wisdom

• difficult to find• difficult to reuse• difficult to integrate

WHAT IS OPEN DATA?

WHAT IS LINKED OPEN DATA?

WHAT IS LINKED OPEN DATA?

OSIJEKima državnu upravu

ima zapošljavanjeima financije

ima zabavni život

ima sportska događanja

ima sveučilište

LINKED OPEN DATA

2 KEY INGREDIENTS tbd

Facilitating data integration through:• Common data model• Building relations

KEY INGREDIENTS

2 KEY INGRIDIENTS tbd

1. RDF RESOURCE DESCRIPTION FRAMEWORK (GRAPH BASED DATA)• identifies objects (URIs)• interlink information (Relationships)

2. VOCABULARIES (ONTOLOGIES)• provide shared understanding of data• organize knowledge in a machine comprehensible way• give an exploitable meaning to the data

LINKED OPEN DATA

5 STARS OPEN DATA MODELTim Berners-Lee, Linked Data initiative

make your stuff available on the Web (whatever format) under an open licencemake it available as structured data (e.g. Excel instead of image scan of a table)use non-proprierary format (e.g. CSV instead of Excel)user URI to denote things, so that people can point at your stufflink your data to other data to provide context

http://lab.linkeddata.deri.ie/2010/star-scheme-by-example

ON WEB, OPEN LICENSE

1 STAR

• ON THE WEB• wide access• google can index it• people can find it themselves

• OPEN LINCENCE• regulate reuse of data• helps maintain provenance• strengthens business reuse

http://opendefinition.org/licenses/

STRUCTURED DATA

2 STAR

• MACHINE READABLE

FORMATS

2 STAR

• GOOD XLSX, CSV, JSON, MICRODATA• „GOOD” WEB, DOCX• BAD PDF• BAD, BAD charts, maps, images

• SCREENSCRAPING? http://scraperwiki.com

http://opendefinition.org/licenses/

NON PROPRIETARY FORMATS

3 STAR

• Freedom of how to process, analyse and visualise data• PROPRIETARY

• DOCX, XLSX, PDF• NON PROPRIETARY

• CSV, XML, JSON, MICRODATA, RDF

http://opendefinition.org/licenses/

USE OF URI

4 STAR

• Unique identifiers enable others to point to the data

http://opendefinition.org/licenses/

LINKING DATA (AND RDF)

5 STAR: Link your data to other data to provide context

http://lod-cloud.net

• „Linked Data” approach have its use cases in Web Applications with LOT of Data and little Semantics

• Example: definme simple relationship and apply to large, heterogenous data collections

RESOURCE DESCRIPTION FRAMEWORK

Part fo the 5 STAR story

http://lod-cloud.net

• Web is a global, universal information space for documents• Can we do the same for DATA and make the web into a database?• RDF is the DATA FORMAT for that database

RDF 101small pieces, loosely joined, easy to reuse, easy to recombine, unexpected reuse, iterative

TYPICAL DATABASE TABLE

Part of the 5 STAR story

http://lod-cloud.net

ISBN TITLE AUTHOR PUBLISHERID PAGES

112349987 Practical RDF David Nelson Jr. 11692 443

234998021 C# for Dummies Rick Torrensen 11692 1120

501334301 Calling the Stack Shelly Monroe 45009 128

...

...

TYPICAL DATABASE TABLE

Part of the 5 STAR story

http://lod-cloud.net

ISBN TITLE AUTHOR PUBLISHERID PAGES

112349987 Practical RDF David Nelson Jr. 11692 443

234998021 C# for Dummies Rick Torrensen 11692 1120

501334301 Calling the Stack Shelly Monroe 45009 128

...

...

pro

per

ties

subjects

Intersection is a property of the

subject

LINKING DATA

bookC# for

Dummies

title

subject value

property

The essence of RDF: the „TRIPLE”

TYPICAL DATABASE TABLE

SELECTING MULTIPLE PROPERTIES

ISBN TITLE AUTHOR PUBLISHERID PAGES

112349987 Practical RDF David Nelson Jr. 11692 443

234998021 C# for Dummies Rick Torrensen 11692 1120

501334301 Calling the Stack Shelly Monroe 45009 128

...

...

LINKING DATA

bookC# for

Dummies

title

2349908

Rick Torrensen

isbn

author

multiple properties graphically: think in the

terms of graphs, not

XML or documents

Amazonpublishername

publisher

Relationship between „things”

USING THE WEB INFRASTRUCTURE

Part of the 5 STAR story

http://lod-cloud.net

• For Web scale database we need to be able to identify things globally and uniquely

• URI (URLs) already provide those capabilities• Name things with URIs, specifically http://• This is THE KEY to linked data

RDF IN PRACTICE

http://example.com/thing

named relations

„text”

http://example.com/rel

3.141592

http://example.com/other

numeric values and literals

named resources

• The URI identifies the thing you are describing• If two people create data using the same URI then they are describing the same thing• That makes it easy to merge data from different sources together• RDF data can use URIs from many different websites

Cloud

Monitors air and water qualityCitizens rate quality via SMS

OPEN LINKED DATA IN U.K.open.data.gov.uk

LINKED DATA STANDARDS

Government Linked Data (GLD) WG http://www.w3.org/2011/gld/

SPARQL

Common understaning about „things”supports the automatic generation of new information

• Query language of the semantic web. It lets us:• Pull values from STRUCTURED and SEMI STRUCTURED data• Explore data by querying UNKNOWN RELATIONSHIPS• Perform, COMPLEX JOINS OF DISPARATE DATABASES• Transforms RDF from one vocabulary to another

SPARQL

Common understaning about „things”supports the automatic generation of new information

# prefix declarations

PREFIX foo: <http://example.com/resources/>

...

# dataset definition

FROM ...

# result clause

SELECT ...

# query pattern

WHERE {

...

}

# query modifiers

ORDER BY ...

SO, WHAT DO WE DO WITH THE DATA?

HACKATONS: DATA + APPLICATIONS!

BE SURE THAT YOU HAVE APP BUILDING PROCESS… or paid teams, unpaid volunteers, hackatons, open data camps, student competitions

• value is not in the raw data alone (but the data needs to be published first!)

• applications for use of the data is key to open data success

• size of the application does not guarantee its value and success

• value to the Citizen is the bottom line

MANY FORMS, SAME PURPOSE

FOR EXAMPLE, INVOLVE… HACKATONSstrange word for a noble cause. building together the future in… 48 hours.

PROTOTYPES

OR RESULTS COMING FROM HACKATONSbut also many different scenarios of organization and citizen engagement resulting in apps

OPEN GOVERNMENT: PARTICIPATION

OPEN DATA ARCHITECTURE

ARCHITECTURE FOR OPEN: HYBRID?

Department B

Internal PrivatePORTAL

Published Data

Department A

Internal Network

Model that controls sensitive data and supports external scalability and availability. Brigde to PUBLIC.Keywords: Public and Private Cloud. Provider Datacenter. SLA.

Agency

External PublicPORTAL

Published Data

is using

is publishing

Everybody

External Network

CURRENT VIEW ON OPTIONS

CKAN ?

Private Cloud

LINUX VMMS VM

Public Cloud (MS AZURE) Public Cloud (nonMS)

public azure infrastructure public cloud infrastructure

PaaS IaaS (LINUX VM)

CKAN SOCRATAODGI

IaaS (LINUX VM)

CKAN SOCRATA

private infrastructure

private infrastructure that can be builton Microsoft or Linux based stack

public Microsoft Azure cinfrastructure supporting„pure play” PaaS solutions and VM based solutions(MS and nonMS)

public cloud infrastrucutre nonMicrosoft (usually AWS orOpenStack or …)

solutions on Linux

solutions on MS

CKAN

OPEN SOURCE DATA PORTAL SOFTWARECKAN is open source and can be downloaded and used for free

• fully featured, mature, open source data management solution:• publish and find datasets• store and manage data• engage with users and others• customize and extend

• rich user base: data.gov.uk, publicdata.eu,…

BIG DATA

CIO

Source: Forrester, „Evaluating Big Data Predictive Analytics Solutions”, 2012

„Big Data is the frontier of a firm’s ability to store, process, and access (SPA) all of the data it needs to operate, make decisions, reduce risks, and serve customers.”

“Predictive analytics solutions allow firms to discover, evaluate, optimize, and deploy predictive models by analyzing data sources to improve business outcomes.”

Internet of ThingsInternet of Everything

source: „US Unprepared for Internet Device Flood”, Kurt Stammbergerm MOCANA

source: „Big Data Analytics”, survey of 325 companies, TDWI 2011

Examples of New MultiStructured Data

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Other

Scientific (atronomy, genomes, physics)

Machine-generated (sensors, RFID, devices)

Spatial (long/lat coordinates, GPS output)

Web logs and clickstreams

Social media (blogs, tweets, social networks)

Unstructured (text, audio, video)

Events (messages, usually in real time)

Complex (hierarchical or legacy)

Semistructured (XML and similar)

Structured (tables, records)

Data types collected as big data and/or with advanced analytics

source: „Understanding te elements of Big Data”, Karmasphere, 2011

Analytics and Use

Data Management and Storage

BI and Visualization

Applications

DATA USE

Unstructured

Structured

DATA

AnalyticsDevelopment

BIG ANALYTICS

Hadoop

BIG DATA

Key Elements of Big Data

Where is BigData coming from... today

• twitter 12+ TB of tweet data every day• facebook 25+ TB of logs data every day• google XX+ TB of search logs every day• people 2+ bilion people on the web today

• RFID 30 billion RFID tags today• smart meters 76 million smart meters today• GPS devices 100+ millions GPS devices per year• trade 5 mililon trade events per second

• phone cams 4,6 billion cams today• cameras 100+ thousands video feeds from surveillance

Two types of Big Data• Data in Movements (Streams)• Data at Rest (Oceans)

New era of computing

New Questions

VOLUME: SIZESOCIAL ANALYTICS: What’s the social sentiment of my product?

VELOCITY: SPEEDLIVE DATA FEED: How do I optimize my services based on patterns of weather, traffic, etc.?

VARIETY: STRUCTUREADVANCED ANALYTICS: How do I better predict future outcomes?

... so that is not our ordinary enterprise environment? Well...

So What? Well... Big Data For FinanceSocial Media: Trustworthy Borrowers vs. Defaulters

They are all using BigData approach and combine that with „socring as a services mechanisms” like...

KREDITECH• Looks at 8.000 indicators like

location data, social graph, behaviooral analytics, e-commerce shopping behavior and device data...

• So, GPS, likes, friends, locations, posts, movement, duration on page, shopping, apps installed, operating system...

ZESTFINANCECredit socring information via big data, looks at 70.000 signals and feeds them into 10 spearate underwriting models

KLOUT

LENDDO https://www.lenddo.com/ Looking at applicant’s connection on Facebook and Twitter Key to get the loan: highly trusted individuals in your social

network

LENDUP https://www.lendup.com/ looks at social media activity to ensure that factual data provided

on the online application matches what can be inferred from Facebook and Twitter.

WONGA https://www.wonga.com/ considers the time of the day and the way a candidate clicks

around the site in determining whether to grant a loan

So What? Well... Big Data for Telcos

• Two different strategies for growth and mature markets:• Growth: aquisition strategy, simple BI needs (reporting)• Mature: differentiation strategy, complex BI needs (data mining)

• Classical BigData problems: Churn Management (on prepaids)• When to engage (mid of billing cycle) www.globys.com• When not to engage (leave good customers alone) www.venda.com

• How telcos can invent business models?• IMPROVE SERVICES: Data = Improved Business (Amazon)• MOBILE ADVERTISING: Data = Better Advertising (Google)• SELL ACCESS TO INSIGHTS: Data = Business (comScore)• BECOME GATEKEEPER: Data = Personal Risk (www.reputation.com )

Telco: competition in a mature market

Early operator initiatives will still involve a strong element of traditional business intelligence and analytics: structure records: Call and Billing Records, Electronic Data Records, Location Records...

Unstructured: Phone Calls, Text Messages, Social Media posts...

OPEN DATA + BIG DATA?

IMAGINE THE WORLD...Where you dont have control over the things that happen around you.

• OPEN DATA• You can fetch and use any data that exist around. You can connect that

data to any other source and personalize the use.• BIG DATA

• You can fetch and use any volume of data that is flowing from devices around you and from your own usage.

• OPEN BIG DATA• WE CAN PREDICT AND REACT TO ANY ACTION IMMIDIATELLY

INTROjust a few words about me

so, if we all nod our heads... we can continue...

• Ratko Mutavdzic is founder of PROJEKTURA, consulting company that work with new and emerging technologies and introduce them to the corporate and enterprise environments. Prior to this one, he spent 15 years Microsoft, starting in a consulting practice and then leading several different sales and technology teams.

• He is the author of number of published papers on different aspects of the technology, successful blogs on new technologies and project management, and active contributor in a number of social networks exploring the use and advance of new ways to connect and share innovation and invention.

• He frequently speaks on conferences, meetings, workshops, coffee shops and generally at every place where people like to explore, challenge, investigate, think and innovate.

• Keywords: change, project, program, portfolio, innovation, startup

note: more contact info on a last slide