BCS Address Day - Open Addresses
-
Upload
jeni-tennison -
Category
Technology
-
view
506 -
download
3
Transcript of BCS Address Day - Open Addresses
Address Daywhat next after the Address Wars
Jeni Tennison - @JeniT5 March 2015
https://openaddressesuk.org@openaddressesuk
In economics, a public good is a good that is both non-excludable and non-rivalrous in that individuals cannot be effectively excluded from use and where use by one individual does not reduce availability to others.
Wikipedia - Public good
"Tompkins Square Park Central Knoll" by David Shankbone - (CC BY-SA 3.0) via Wikimedia Commons
open data
public good
sum of what everyone would pay
what it costs to maintain
When should a good be public?
Address data should be open data
● National Information Infrastructure● Not just for posting mail...
○ geocoding for route finding○ associating people with areas○ classification for targeting interventions○ linking datasets together
● Denmark has taken this step○ 1000% increase use of address data○ costs = €0.2M - benefits = €14M
Current real life problems
● startup wanting to build an application○ prohibitive costs○ prohibitive licensing complexity
● SME with a geodemographic product○ prohibitive costs○ limiting customer base & growth
● New build owners○ 3 months to register to vote, order pizza
Funding public goods
● Government via taxation● Collaborative bound by contract● Cross-subsidy by selling other goods● Voluntary effort● Social norms
"The sale of the PAF with the Royal Mail was a mistake. Public access to public sector data must never be sold or given away again. This type of information, like census information and many other data sets, is very expensive to collect and collate into useable form, but it also has huge potential value to the economy and society as a whole if it is kept as an open, public good."
Bernard Jenkin, Chair of Public Administration Select Committee
Hypothesis 1: the maintenance of open address data can only be effectively funded through taxation
Hypothesis 2: it is possible to build and maintain a sustainable open address database using collaboration, cross-subsidy and voluntary effort
Goals
● Free, openly licensed, up-to-date bulk downloads of addresses
● Freemium services over that data○ eg validation, auto-completion, geocoding
● 100% open source, collaboratively maintained
● Initial ~£400k investment from government○ compared with £25M annual cost maintaining PAF
Eventual Architecture
“Definitive” UK address list- where the address data is safe to use- where each record has confidence and provenance
Bulk - Download- Upload
APIs- Add- Sort- Validate- Search
URLs- Linked data- Extensibility
Service Providers Aggregators, digital, telecoms, public sector, distribution, academics, manufacturers etc
Services - Websites, Users
Val
ue
Rev
enue
for s
usta
inab
ility
This takes time
Large datasets and inference to tackle the bulk of the challenge “80/20” rule
Ongoing, collaborative maintenance
Targeted work. Low-volume records to fill existing gaps in available datasets
NB: dates are “just for fun”
Approaches
1. Load open datasets containing addresses2. Build out crowdsourcing mechanisms3. Use inference to fill gaps
and throughout:● keep track of provenance● keep track of confidence
Loading datasets
Third Party IPRPossibly infected if validated against PAF or AddressBase ⇒ most Government “open” data is infectedA few not:● Companies House● err...
Platform for loading bulk data
Originally developed for OpenCorporatesSandboxed environment for running scripts
Motivating crowdsourcing
Bulk - Download- Upload
APIs- Add- Sort- Validate- Search
URLs- Linked data- Extensibility
Val
ue
Building Blocks- towns, postcodes, streets- used to parse data and provide
confidence in the address list- links between towns, postcodes
and streets are learned from addresses
Authoritative and definitive UK address list
- where the address data is safe to use
- where each record has confidence and provenanceR
even
ue fo
r sus
tain
abili
ty
● Turn free-text addresses into building blocks
● Can be used with data containing third party IPR
● Optional “contribute” option
Address parsing service
Inference
FograleaZE1 0SE
© Open Addresses Ltd.
7 9 11 13 15 17 19 21 23 25 27 29
6 8 10 12 14 16 18 20 22 24 26 28
FograleaZE1 0SE
7 9 11 13 15 17 19 21 23 25 27 29
6 8 10 12 14 16 18 20 22 24 26 28
FograleaZE1 0SE
What about nos. 1 to 4?
Same postcode? We cannot know!
FograleaZE1 0SE
Enabling collaborative maintenance
St James House, St James Square, Cheltenham, GL50 3PR7, St James Square, Cheltenham, GL50 3PTSt James North 1, St James Square, Cheltenham, GL50 3PRSt James North 3, St James Square, Cheltenham, GL50 3PR3, St James Square, Cheltenham, GL50 3PRSt James House, St James Square, Cheltenham Spa, GL50 3PRSt James North 1, St James Square, Cheltenham, GL50 3PRSt James Place, Jessop Avenue, Cheltenham, GL50 3PRSt James House, St James Square, Cheltenham, GL50 3PRApt. 3, St James Place, Jessop Avenue, Cheltenham, GL50 3PR56, Cheltenham Road, London, SE15 3AR
Calculating confidence
St James House, St James Square, Cheltenham, GL50 3PR7, St James Square, Cheltenham, GL50 3PTSt James North 1, St James Square, Cheltenham, GL50 3PRSt James North 3, St James Square, Cheltenham, GL50 3PR3, St James Square, Cheltenham, GL50 3PRSt James House, St James Square, Cheltenham Spa, GL50 3PRSt James North 1, St James Square, Cheltenham, GL50 3PRSt James Place, Jessop Avenue, Cheltenham, GL50 3PRSt James House, St James Square, Cheltenham, GL50 3PRApt. 3, St James Place, Jessop Avenue, Cheltenham, GL50 3PR56, Cheltenham Road, London, SE15 3AR
Calculating confidence
Sector Town Count Total Confidence
...
HD3 4 HUDDERSFIELD 66 66 87.71%
...
DG8 6 NEWTON STEWART 11 12 65.69%
DG8 6 STRANRAER 1 12 0.00%
DG8 7 NEWTON STEWART 1 1 0.00%
...
W3 6 LONDON 196 196 92.96%
...
CH44 4 WALLASEY 23 29 76.06%
CH44 4 WIRRAL 6 29 8.22%
Calculating confidence
This postcode/town association is right but confidence is low because of the low count
This postcode/town association is incorrect
Another correct postcode/town association, but with a higher count
This is what happens when post towns are re-organised; Wirral is now split in Birkenhead, Wallasey, Wirral and Prenton
This is how a correct postcode/town association looks like
Provenance
Summary
● Built most of the supporting platform○ parsing free text / messy addresses○ collaborative loading of data○ providing downloads, search & URL identity○ recording provenance & assigning confidence○ using inference to fill in gaps
● We have low numbers of addresses currently○ but the right mechanisms to add more○ and many potential partners
What next?
● Building the platform● Building the community of collaborators● Building services to aid cross-subsidy● Increasing quantity & quality of addresses● Can anyone else reuse the technology?● Can anyone else reuse the approach?
Open Addresses Ltd. is a new company being set up to create and maintain an address database for the UK that will be made available to the public as Open Data. It will facilitate the collaborative maintenance of the address database with various stakeholders from the UK Government, industry and non-profit.
Offices
Where?