ALL OVER THE MAP: COMPARING THE ACCURACY OF …...Accurate geocoding of a small record batch. Free...
Transcript of ALL OVER THE MAP: COMPARING THE ACCURACY OF …...Accurate geocoding of a small record batch. Free...
The picture can't be displayed.
Robert Saul, EMI Consulting
Brian Billing, AEP Ohio
August 21, 2019
1
A L L O V E R T H E M A P : C O M P A R I N G T H E A C C U R A C Y O F G E O C O D I N G S O U R C E S
2 0 1 9 I E P E C C O N F E R E N C E P R E S E N T A T I O N
A G E N D A
The picture can't be displayed.
1. “Why Does it Matter?”
2. Introduction
3. Methods
4. Results
5. Conclusion
“ W H Y D O E S I T M A T T E R ? ”B E N E F I T S O F S P A T I A L A N A L Y S I S I N E E P R O G R A M S
M I C R O - T A R G E T I N G S P A T I A L P A T T E R N S S E G M E N T A T I O N
“ W H Y D O E S I T M A T T E R ? ”B E N E F I T S O F S P A T I A L A N A L Y S I S I N E E P R O G R A M S
As energy efficiency programs become more successful, utilities are more frequently using spatial analysis to identify and influence hard-to-reach customers, such as:
• Low-income
• Environmentally disadvantaged
M I C R O - T A R G E T I N G S P A T I A L P A T T E R N S S E G M E N T A T I O N
“ W H Y D O E S I T M A T T E R ? ”B E N E F I T S O F S P A T I A L A N A L Y S I S I N E E P R O G R A M S
As energy efficiency programs become more successful, utilities are more frequently using spatial analysis to identify and influence hard-to-reach customers, such as:
• Low-income
• Environmentally disadvantaged
Many utilities are considering new program designs that hinge on specific geographic energy usage patterns, such as:
• Demand-response
• Electric vehicles
• Non-wires solutions
M I C R O - T A R G E T I N G S P A T I A L P A T T E R N S S E G M E N T A T I O N
“ W H Y D O E S I T M A T T E R ? ”B E N E F I T S O F S P A T I A L A N A L Y S I S I N E E P R O G R A M S
As energy efficiency programs become more successful, utilities are more frequently using spatial analysis to identify and influence hard-to-reach customers, such as:
• Low-income
• Environmentally disadvantaged
Many utilities are considering new program designs that hinge on specific geographic energy usage patterns, such as:
• Demand-response
• Electric vehicles
• Non-wires solutions
M I C R O - T A R G E T I N G S P A T I A L P A T T E R N S
To better inform program design, utilities are combining program data with geographic data such as:
• Weather
• Traffic
• Demographics
S E G M E N T A T I O N
“ W H Y D O E S I T M A T T E R ? ”L A R G E S H I F T S I N G E O C O D I N G S E R V I C E S
O L D I N F O I N S T A N D -A L O N E S E R V I C E S
C H A N G E I N A P I R E Q U I R E M E N T S
C H A N G E S T O S E R V I C E C A P A B I L I T I E S
1 .
2.
3.
“ W H Y D O E S I T M A T T E R ? ”L A R G E S H I F T S I N G E O C O D I N G S E R V I C E S
O L D I N F O I N S T A N D -A L O N E S E R V I C E S
C H A N G E I N A P I R E Q U I R E M E N T S
C H A N G E S T O S E R V I C E C A P A B I L I T I E S
1 .
2.
3.https://www.esri.com/arcgis-blog/products/analytics/analytics/data-and-maps-for-arcgis-2015-is-now-available/
“ W H Y D O E S I T M A T T E R ? ”L A R G E S H I F T S I N G E O C O D I N G S E R V I C E S
O L D I N F O I N S T A N D -A L O N E S E R V I C E S
C H A N G E I N A P I R E Q U I R E M E N T S
C H A N G E S T O S E R V I C E C A P A B I L I T I E S
1 .
2.
3.https://www.littlemissdata.com/blog/ggmap-updated
“Google has recently changed its API requirements… users are now required to provide an API key and enable billing.”
“ W H Y D O E S I T M A T T E R ? ”L A R G E S H I F T S I N G E O C O D I N G S E R V I C E S
O L D I N F O I N S T A N D -A L O N E S E R V I C E S
C H A N G E I N A P I R E Q U I R E M E N T S
C H A N G E S T O S E R V I C E C A P A B I L I T I E S
1 .
2.
3.https://multiplottr.com/
“ W H Y D O E S I T M A T T E R ? ”W H A T A C C U R A C Y I S N E E D E D A N D A T W H A T C O S T ?
“ W H Y D O E S I T M A T T E R ? ”W H A T A C C U R A C Y I S N E E D E D A N D A T W H A T C O S T ?
https://community.tableau.com/thread/232540
A G E N D A
The picture can't be displayed.
1. “Why Does it Matter?”
2. Introduction
3. Methods
4. Results
5. Conclusion
I N T R O D U C T I O NR E V I E W O F G E O C O D I N G S E R V I C E S A N D S I T E S
G E O C O D I N G S E R V I C EF R E E O R P A I D
S E R V I C EH A S A P I ?
Alteryx Free
ArcGIS Paid
BatchGeo Free and Paid
GeoLytics Paid
Google Free and Paid
Google Sheets Free and Paid
HERE Free and Paid
MapLarge Paid
OpenStreetMap Free
Texas A&M Geoservices Geocoding Free and Paid
TomTom Free and Paid
United States Census Bureau Geocoder API Free
I N T R O D U C T I O NC L I E N T A N D P R O G R A M
C L I E N T
AEP Ohio: Electric investor-owned utility serving nearly 1.5 million customers.
I N T R O D U C T I O NC L I E N T A N D P R O G R A M
C L I E N T G E O G R A P H Y
AEP Ohio: Electric investor-owned utility serving nearly 1.5 million customers.
Majority of AEP Ohio customers located in and around Columbus, OH.
I N T R O D U C T I O NC L I E N T A N D P R O G R A M
G E O G R A P H Y D A T A
Majority of AEP Ohio customers located in and around Columbus, OH.
Examined the geolocational data of 151 AEP Ohio residential survey respondents who completed a Qualtrics web survey in January 2019.
Survey respondents part of residential rebate program evaluation.
C L I E N T
AEP Ohio: Electric investor-owned utility serving nearly 1.5 million customers.
A G E N D A
The picture can't be displayed.
1. “Why Does it Matter?”
2. Introduction
3. Methods
4. Results
5. Conclusion
M E T H O D SD A T A S E T S
R A W D A T A S E T
Raw dataset of survey respondent service addresses as received from AEP Ohio.
M E T H O D SD A T A S E T S
R A W D A T A S E T C L E A N D A T A S E T
Raw dataset of survey respondent service addresses as received from AEP Ohio.
Cleaned dataset of survey respondent service addresses using cleaning best-practices recommended by the Harvard School of Public Health (HSPH 2017).
M E T H O D SD A T A S E T S
R A W D A T A S E T C L E A N D A T A S E T
Raw dataset of survey respondent service addresses as received from AEP Ohio.
Cleaned dataset of survey respondent service addresses using cleaning best-practices recommended by the Harvard School of Public Health (HSPH 2017).
Differences in the geocoded locations of raw data and the cleaned data would give an indication of how effectively each service handles data quality issues.
H Y P O T H E S I S
M E T H O D SG E O C O D I N G S O U R C E S T E S T E D
U S E D G O O G L E A P I A S “ B A S E L I N E ”
P E R F O R M E D S P O T -C H E C K O N 2 0 R E C O R D S
C O M P A R E D B A S E L I N E T O 6 O T H E R S E R V I C E S
1 .
2.
3.
M E T H O D SG E O C O D I N G S O U R C E S T E S T E D
U S E D G O O G L E A P I A S “ B A S E L I N E ”
P E R F O R M E D S P O T -C H E C K O N 2 0 R E C O R D S
C O M P A R E D B A S E L I N E T O 6 O T H E R S E R V I C E S
1 .
2.
3.
M E T H O D SG E O C O D I N G S O U R C E S T E S T E D
U S E D G O O G L E A P I A S “ B A S E L I N E ”
P E R F O R M E D S P O T -C H E C K O N 2 0 R E C O R D S
C O M P A R E D B A S E L I N E T O 6 O T H E R S E R V I C E S
1 .
2.
3.
X
M E T H O D SG E O C O D I N G S O U R C E S T E S T E D
U S E D G O O G L E A P I A S “ B A S E L I N E ”
P E R F O R M E D S P O T -C H E C K O N 2 0 R E C O R D S
C O M P A R E D B A S E L I N E T O 6 O T H E R S E R V I C E S
1 .
2.
3.
G E O C O D I N GS E R V I C E
G E O C O D I N G T Y P EG E O C O D E D
A SB A S E L I N E
G E O C O D E SP E R F O R M E D
O N B O T HR A W A N D
C L E A NA D D R E S S
D A T A
G E O C O D E SB A S E D O N
I P A D D R E S S
2013 Garmin
Map layer used as address locator tool in ArcMap
ArcGIS Online Online application
Census API API
Google API API
Google Sheets
Script within online application
OSM API API
QualtricsLocational data provided with survey responses
A G E N D A
The picture can't be displayed.
1. “Why Does it Matter?”
2. Introduction
3. Methods
4. Results
5. Conclusion
R E S U L T SS U M M A R Y S T A T I S T I C S F R O M C O M P A R I S O N T O B A S E L I N E
In this particular experiment, found little difference in how the services geocoded the raw addresses and the cleaned addresses.
Likely speaks to good data practices on the part of AEP Ohio rather than the capabilities of the geocoding services.
R A W V S C L E A N
R E S U L T SS U M M A R Y S T A T I S T I C S F R O M C O M P A R I S O N T O B A S E L I N E
In this particular experiment, found little difference in how the services geocoded the raw addresses and the cleaned addresses.
Likely speaks to good data practices on the part of AEP Ohio rather than the capabilities of the geocoding services.
• Qualtrics was the outlier.
R A W V S C L E A N B A S E L I N E V S O T H E R S E R V I C E S
G E O C O D I N GS E R V I C E
U N M A T C H E DG E O C O D I N G
R E C O R D S
A V E R A G E D I S T A N C EF R O M B A S E L I N EC O O R D I N A T E ( I N
M I L E S )
M E D I A N D I S T A N C EF R O M B A S E L I N EC O O R D I N A T E ( I N
M I L E S )
Google Sheets 0 < 0.01 < 0.01
Census API 10 0.05 0.02
2013 Garmin 0 0.22 0.02
OSM API 61 0.32 0.11
Qualtrics 0 95.30 6.15
R E S U L T SS U M M A R Y S T A T I S T I C S F R O M C O M P A R I S O N T O B A S E L I N E
In this particular experiment, found little difference in how the services geocoded the raw addresses and the cleaned addresses.
Likely speaks to good data practices on the part of AEP Ohio rather than the capabilities of the geocoding services.
• Qualtrics was the outlier.
• OSM and the Census did not find matches for all addresses.
R A W V S C L E A N B A S E L I N E V S O T H E R S E R V I C E S
G E O C O D I N GS E R V I C E
U N M A T C H E DG E O C O D I N G
R E C O R D S
A V E R A G E D I S T A N C EF R O M B A S E L I N EC O O R D I N A T E ( I N
M I L E S )
M E D I A N D I S T A N C EF R O M B A S E L I N EC O O R D I N A T E ( I N
M I L E S )
Google Sheets 0 < 0.01 < 0.01
Census API 10 0.05 0.02
2013 Garmin 0 0.22 0.02
OSM API 61 0.32 0.11
Qualtrics 0 95.30 6.15
R E S U L T SS U M M A R Y S T A T I S T I C S F R O M C O M P A R I S O N T O B A S E L I N E
In this particular experiment, found little difference in how the services geocoded the raw addresses and the cleaned addresses.
Likely speaks to good data practices on the part of AEP Ohio rather than the capabilities of the geocoding services.
• Qualtrics was the outlier.
• OSM and the Census did not find matches for all addresses.
• Excluding Sheets, Census API had results closest to baseline.
R A W V S C L E A N B A S E L I N E V S O T H E R S E R V I C E S
G E O C O D I N GS E R V I C E
U N M A T C H E DG E O C O D I N G
R E C O R D S
A V E R A G E D I S T A N C EF R O M B A S E L I N EC O O R D I N A T E ( I N
M I L E S )
M E D I A N D I S T A N C EF R O M B A S E L I N EC O O R D I N A T E ( I N
M I L E S )
Google Sheets 0 < 0.01 < 0.01
Census API 10 0.05 0.02
2013 Garmin 0 0.22 0.02
OSM API 61 0.32 0.11
Qualtrics 0 95.30 6.15
R E S U L T SF U R T H E R E X P L O R I N G D I F F E R E N C E S
Q U A L T R I C S : S U R V E Y R E S P O N D E N T L O C A T I O N
G E O C O D I N G S E R V I C E S V I S U A L C O M P A R I S O N
C O M P A R I N G B A S E L I N E T O A R C G I S O N L I N E
1 .
2.
3.
R E S U L T SF U R T H E R E X P L O R I N G D I F F E R E N C E S
Q U A L T R I C S : S U R V E Y R E S P O N D E N T L O C A T I O N
G E O C O D I N G S E R V I C E S V I S U A L C O M P A R I S O N
C O M P A R I N G B A S E L I N E T O A R C G I S O N L I N E
1 .
2.
3.
R E S U L T SF U R T H E R E X P L O R I N G D I F F E R E N C E S
Q U A L T R I C S : S U R V E Y R E S P O N D E N T L O C A T I O N
G E O C O D I N G S E R V I C E S V I S U A L C O M P A R I S O N
C O M P A R I N G B A S E L I N E T O A R C G I S O N L I N E
1 .
2.
3.
R E S U L T SF U R T H E R E X P L O R I N G D I F F E R E N C E S
Q U A L T R I C S : S U R V E Y R E S P O N D E N T L O C A T I O N
G E O C O D I N G S E R V I C E S V I S U A L C O M P A R I S O N
C O M P A R I N G B A S E L I N E T O A R C G I S O N L I N E
1 .
2.
3.
A G E N D A
The picture can't be displayed.
1. “Why Does it Matter?”
2. Introduction
3. Methods
4. Results
5. Conclusion
C O N C L U S I O NT I P S A N D T R I C K S
Geocode a small amount of records (less than 100): may be easiest to use to Google Sheets.
S M A L L B A T C H
C O N C L U S I O NT I P S A N D T R I C K S
Geocode a small amount of records (less than 100): may be easiest to use to Google Sheets.
Geocode a larger amount of records (less than 40k): using the Google API may be free to use.*
S M A L L B A T C H L A R G E B A T C H
* Statement was true at the time the paper was written.
C O N C L U S I O NT I P S A N D T R I C K S
S M A L L B A T C H L A R G E B A T C H
Can perform a second set of geocoding on a subset of records using a different geocoding service.
Can check the subset of geocodes that are flagged as potentially less accurate based on the geocoders accuracy field.
I N C R E A S E A C C U R A C Y
* Statement was true at the time the paper was written.
Geocode a larger amount of records (less than 40k): using the Google API may be free to use.*
Geocode a small amount of records (less than 100): may be easiest to use to Google Sheets.
C O N C L U S I O NS U M M A R Y T A B L E
G E O C O D I N GS E R V I C E
R E C O M M E N D E D U S E
C O S T F O R S M A L LN U M B E R O FG E O C O D E DL O C A T I O N S
C O S T F O R L A R G EN U M B E R O FG E O C O D E DL O C A T I O N S
A C C U R A C YI N C L U D E SA C C U R A C Y
M E T R I C
Google API Accurate geocoding of a small number of records; have help from a proficient coder
Free (less than 40,000
calls a month)
High(between $4 and
$5 per 1,000 calls)High Yes
Google Sheets Accurate geocoding of a small record batch.Free
(unclear when pay barrier starts)
Unknown(unclear when pay
barrier starts)High No
Census API
Fairly accurate geocoding of large number of records; unmatched records won’t pose an issue; have help from a proficient coder; pair with another service for unmatched addresses
Free Moderate –High Yes
2013 Garmin North America
Fairly accurate geocoding of large number of records; fast processing time; own Garmin map layers; have help from an ArcGIS user
Cost of acquiring Garmin map layer (cost unknown)
Moderate –High Yes
OSM API
Less accurate geocoding of large number of records; unmatched records won’t pose an issue; have help from a proficient coder; pair with another service for unmatched addresses
Free Low –Moderate No
Qualtrics Quick check that survey respondents are answering in roughly the predicted pattern
Free (no additional cost with Qualtrics service) Low No
ArcGIS Online Accurate geocoding; high price Approximately $4 for 1,000 geocodes High Yes
C O N C L U S I O NW H A T A C C U R A C Y I S N E E D E D A N D A T W H A T C O S T ?
C O N C L U S I O NW H A T A C C U R A C Y I S N E E D E D A N D A T W H A T C O S T ?
https://oehha.ca.gov/calenviroscreen/maps-data