Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage

46
Steven Ramage Head of Ordnance Survey International Big Data Considerations Geospatial Intelligence Middle East, May 2013

description

Some initial considerations and discussion points around geospatial big data. Location adds context and relevance. Need to consider a number of V factors including Value.

Transcript of Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage

Page 1: Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage

Steven Ramage

Head of Ordnance Survey International

Big Data Considerations

Geospatial Intelligence Middle East,

May 2013

Page 2: Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage

Geospatial Intelligence Middle East 2013

Recently the Military GIS and Intelligence communities have

gained a better understanding of the incredible increase of “Cloud”

empowered applications, the challenges and opportunities of Big

Data, the importance of social media, the availability of improved

applications, and the dramatic improvement in quality and

availability of remote sensing data. This, and the increased speed

of GIS applications and the integration of a full-motion video

analysis product, empowers military forces and national security

agencies to exploit and analyze full motion video from UAVs and

other airborne vehicles.

http://tinyurl.com/cd8z6y5

Page 3: Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage

http://www.computerweekly.com/feature/Ordnance-

Survey-gets-to-grips-with-geospatial-big-data

“ Ordnance Survey has all but completed a five-year IT

improvement programme to enhance its operations. That

programme – with Oracle as the main IT partner – has already

transformed those operations into an enterprise grid computing

system that pulls 17 databases into one Oracle spatial

database management platform. The platform supports all

geospatial data types and models. The system combines open

source Linux with Oracle’s grid computing architecture, which

makes it possible to coordinate large numbers of low-cost servers

and corresponding storage so they operate like one large

computer. ”

Page 4: Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage

• Big thinking (value)

• Big strategy (necessary)

• Big governance (stewardship)

• Big access (sharing)

• Big cooperation (supply chain)

• Big privacy (security)

• Big quality (QA/QC)

• Big people (skills training)

Big Data challenges

Page 5: Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage

What is big data?

Shutterstock

Page 6: Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage

http://hortonworks.com/blog/big-data-defined/ April 4th, 2013 Russell Jurney

• Wikipedia defines as problems posed by the awkwardness of

legacy tools in supporting massive datasets: what is a massive

dataset? Megabytes Yottabytes.

• Collection of data sets so large and complex that it becomes

difficult to process using on-hand database management tools

or traditional data processing applications.

• There is a ‘Big Data’ opportunity: transformative economics.

Big Data is the opportunity space created by new open source,

distributed systems from the consumer internet space.

Page 7: Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage

The big data environment

• Volume

• Data at rest; levels increasing

• Velocity

• Data in motion; speed at which it transits enterprises and entire industries is faster than ever

• Variety

• Data in many forms; hundreds of millions of web pages, emails and unstructured data, such as Word documents and PDFs, as well as a nearly infinite number of events and information from every enterprise data centres

• Value

• Do you need it?

Page 8: Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage

Facebook now has 50 billion photographs

Page 9: Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage

• It uses local storage to be fast but inexpensive

• It uses clusters of commodity hardware to be inexpensive

• It uses free software to be inexpensive

• It is open source to build from community learning

• Cheap storage means logging enormous volumes of data to

many disks is easy. Processing this data is less so. Distributed

systems which have the above four properties are disruptive

because they are approximately 100 times cheaper than

other systems for processing large volumes of data, and

because they deliver high I/O performance.

The big data environment

Page 10: Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage

• Apache Hadoop is one such system. Hadoop ties together a

cluster of commodity machines with local storage using free and

open source software to store and process vast amounts of data

at a fraction of the cost of other systems [Example: Esri/spatial-

framework-for-hadoop, GitHub: social network for programmers]

• SAN Storage $2-10/GB Local Storage $0.05/GB

The big data environment

Page 11: Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage

• Capture every shred of data in the cheapest place possible

• Provide access to this data across the organization

• Mine the data for value

• “To undergo the transformative processes that unabridged

access to data provides, enabling bigger, better, faster more

profound insight than ever before”. Blogger

The big data environment

Page 12: Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage

• How many of us need to undertake operations that rank every

web page that exists?

• What processing tasks cannot be handled on a single computer

or even a laptop? [Megabyte to Gigabyte range]

• Weren’t you doing data analysis before data became big?

• Do you have the requirement or capability to check

correlations or patterns that you can act on if you have

even more data?

• False positives. Vincent Granville wrote ‘The curse of big data’,

even if a dataset includes 1000 items there are many millions of

correlations, a few will be extremely high just by chance.

• Getting more into the field of data science (stats, quality, etc.)

Most data isn’t big and businesses are wasting

money pretending it is:

www.qz.com/81661/most-data

Page 13: Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage

Mapping the global Twitter heartbeat:

the geography of Twitter

http://firstmonday.org/ojs/index.php/fm/article/vi

ew/4366/3654

• In 2012, supercomputing manufacturer Silicon Graphics

International (SGI), the University of Illinois and social media

data vendor GNIP collaborated to create the “Global Twitter

Heartbeat” project (http://www.sgi.com/go/twitter) in order to

map global emotion expressed on Twitter in real-time.

• GNIP provided access to the Twitter Decahose, which consists

of 10 percent of all tweets sent globally each day.

• SGI provided access to one of its new UV2000 supercomputers

with 256 processors and 4TB of RAM running the Linux

operating system.

Page 14: Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage

Twitter

From 12:01AM 23 October

2012 through to11:59PM 30

November 2012

Twitter Decahose from GNIP

streamed 1,535,929,521

tweets from 71,273,997

unique users, averaging 38

million tweets from 13.7

million users each day.

Use the location of social

media posts for emergency

warning, real-time local

situation reporting, etc.

Page 15: Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage

Big data perspective on mapping the

geography of Twitter

• iPhones and Blackberries yield an additional 1% of all tweets

being georeferenced

• However, they’ve been missed by previous studies because

• They store their geographic information in the textual

Location field rather than the machine-readable Geo

metadata field

• In the big data era we need to look at the data itself, not just

assume it follows the manual.

Kalev Leetaru, University of Illinois on CrisisMappers

http://www.CrisisMappers.net

Page 16: Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage

Shutterstock

Why do we need big data?

Page 17: Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage

Analytics plus geospatial data is changing the

way we get insights (hidden patterns)

• Geospatial analytics gives you the ability to ask “where”

questions of business data

Where did it

happen?

Where will it

happen?

Where is it

happening?

Source: Teradata

Page 18: Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage

Analytics plus geospatial data is changing the

way we get insights

• Where are my customers?

• Where are my competitors?

• How far will customers travel to a branch or store?

• Which of my competitor’s customers can I draw to a branch or store?

• Which customers live close to a branch or store?

• Where can I increase profitability?

• How can I mitigate financial risk from flooding?

Page 19: Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage

Is there a ‘problem with crowdsourcing

intelligence’?

DefenceIQ, May 2013 Thomas Chappelow

http://www.defenceiq.com/defence-technology/articles/the-problem-

with-crowdsourcing-intelligence-in-syr/

• blogging, tweeting, mapping and photographing every single

detail…creating an unprecedented mountain of information that

can be farmed for actionable intelligence

• lack of traditional sources to rely on, the global intelligence

community has to look elsewhere for information…

crowdsourcing appears a juicy prospect – until it goes wrong

• Provenance, verification and trust

• Just as important for HUMINT as GEOINT

Page 20: Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage

Big data cycle

Shutterstock

Page 21: Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage

Some experience gained

Page 22: Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage

• Ordnance Survey is 222 years old

• Civilian organisation since 1983; 1100 staff

• Independent Government Department and Executive Agency reporting directly to a Government Minister

• Trading Fund since April 1999

• Annual Report for 2011/12: Revenue of £141.8m, profit before exceptional items of £31.9m, dividend £17.2m

• Southampton headquarters with 26 field offices in Great Britain

Ordnance Survey today

Page 23: Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage

The size of the task

Topographic Layer (approximate volumes)

1:1250 Scale = 17 000 km2

1:2500 Scale = 158 000 km2

1:10 000 Scale = 66 000 km2

Over one million units of change per year.

Address Layer 27.5 million geocoded postal addresses, with 500 000 changes per year.

Transport Network Layer 5.37 million kms of roads, 3.97million links, 885 881 route instructions – over 20 000 changes per month.

Page 24: Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage

Updating the Ordnance Survey database

Page 25: Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage

Wide Range of Customers and Markets

Page 26: Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage
Page 27: Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage

A database to connect via real world information

• Every object represented in OS MasterMap has a unique

Reference identifier called a TOID. These TOIDs can be used to

connect other information and are linked to other core references

Page 28: Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage

OS MasterMap current layers

Page 29: Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage

Ordnance Survey

and

IBM Netezza

Shutterstock

Page 30: Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage

Stress

Testing our

Data

Data Queries

New Insights

Storytelling with

Location Data

Using IBM Netezza for high performance

geospatial analytics

Page 31: Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage

Netezza and geospatial analytics

• In-database geospatial analytic functions

• Native understanding of geospatial data

• High performance out of the box

• Scales to terabytes of data

• No indexes or aggregates to manage

• Open, standards-based interface and data model

Analyse all data in a single appliance

Page 32: Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage

Stress

Testing our

Data

Page 33: Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage

Stress testing our data – Volume of data

Page 34: Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage

Data

Queries

? ?

? ?

?

?

?

? ?

Page 35: Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage

Data queries – Volume of data

Page 36: Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage

Data queries – Volume of data

We analysed 41 million

records in 19 hours.

We could not run this

query in the past.

Page 37: Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage

New Insights

Page 38: Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage

New insights – Volume and variety of data

Page 39: Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage

Storytelling

with

Location

Data

Page 40: Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage

Storytelling with location data

Page 41: Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage

Big Data – Linked Data

• As Ordnance Survey approaches the end of the transformation of its

operations, it is preparing its data to exploit the myriad

interconnections that can exist between physical entities in what has

been described as the “Internet of Things”. This web of

interconnections between disparate objects and ideas is made

possible through linked data technology.

• Linked data assigns a unique tag – a three-fact, uniform resource

identifier known as a triple – to each thing of interest. For example,

population data can be linked to socio-economic statistics for a

given town.

• Linked Data Web, currently estimated to include more than 30 billion

triples, with some 20% of those having geographic content.

Page 42: Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage

Joining up Government

Page 43: Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage

‘Find me all GPs in my ward, bus stops within a 500 metre radius of those GPs, but exclude bus stops in areas of high crime’.

Environment

Transport

Health

Education

Business

Weather

Crime

Council

Hyperlocal example

Page 44: Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage

• Big thinking (value)

• Big strategy (necessary)

• Big governance (stewardship)

• Big access (sharing)

• Big cooperation (supply chain)

• Big privacy (security)

• Big quality (QA/QC)

• Big people (skills training)

Big Data challenges

Page 45: Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage

• Strategic review and assessment

• Capacity and capability building

• Knowledge transfer and training

• Value of geographic information

• Technology direction – 3D, quality,

open standards and much more

• National authoritative mapping

• National address infrastructure

• National geodetic infrastructure

• National spatial data infrastructure

Ordnance Survey International: advisory services

Page 46: Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage

Thank you for your attention. For further information contact:

Steven Ramage, Head of Ordnance Survey International

[email protected]

Ordnance Survey International