Post on 16-Dec-2015
Evaluating district-level income distribution for India using nighttime satellite imagery and other datasets
Tilottama Ghosh, CIRES, University of Colorado, Boulder, USA, and Indicus Analytics Private Limited, New Delhi, India
Mayuri Chaturvedi, Indicus Analytics Private Limited,New Delhi, India
Laveesh Bhandari, Indicus Analytics Private Limited,New Delhi, India
Chris Elvidge , NOAA National Geophysical Data Center (NGDC),Boulder, Colorado, USA
Kim Baugh, CIRES, University of Colorado, Boulder, USA
India Geospatial Forum, Gurgaon, Haryana8th February, 2012
Overview
• Introduction• Research objective• Methods – data used • Analysis – Step 1: State-level graphical analysis Step 2: Model 1 Step 3: Model 2• Results• Discussion• Conclusion and Future considerations
Why use nightlights to study income distribution?• Inclusive growth one of the major policy thrust areas in the current
as well as next Five-Year Plan
• Income distribution data not easy to come by
• Limitations include:– Under-reporting, Over-reporting, Misreporting– Inappropriate sampling and/or weighting– Lack of standardization across sampling organizations– Enormous expense involved in data collection– Political and economic situations in areas inhibiting data collection– Huge time lags between collection and publication, and low
frequency of data collection– Coarse spatial resolution, Modifiable Areal Unit Problem
• Nightlights (NL) can help circumvent these problems
Introduction
Research objective In this paper, we take a look at the relationship between night lights and Income distribution, as captured by the number of households in different income brackets. We then include other datasets to improve the estimation.
Use multinomial regression techniques to study the statistical relationship
Map the prediction errors to identify regions of maximum estimation errors
Use socio-economic insights to understand probable reasons behind the errors
Research objective
Data used
Radiance-calibrated nighttime image of India,
2004Source: NOAA, NGDC
LandScan population data, 2004
Source: Oak Ridge National Laboratory with
United States Department of Energy.
State and districts shapefile of India
Source: Indicus Analytics Pvt. Ltd.
Methods
Data used• Three categories of households defined on the
basis of annual household income
– Upper income households (earning more than Rs 10 lakh per annum)
– Middle income households (earning Rs 3-10 lakh per annum)
– Lower income households (earning less than Rs 3 lakh per annum)
• Sum of lights extracted for the States and the Districts
• Area calculated for the districts
• Total population extracted for the districts
• Percentage of rural population in each district calculated from Indicus’ data repository comprising of urban, rural, and total population
• Sum of lights and number of households in each income category graphed at the State level
Methods
State-level graphical analysis Lower income households
0
5000
10000
15000
20000
25000
30000
35000
Tota
l H
H 0
-3l
('00
0)
MH
AP
RJ
TN
KRGJ
UP
PB
MP
HR
WB
DEL
KLOR
CH
BI
JH
J&KUK
0 500 1000 1500 2000 2500 3000 3500
Sum of lights ('000)
0
5000
10000
15000
20000
25000
30000
35000
Tota
l H
H 0
-3l
('00
0)
MH
AP
RJ
TN
KRGJ
UP
PB
MP
HR
WB
DEL
KLOR
CH
BI
JH
J&KUK
0 500 1000 1500 2000 2500 3000 3500
Sum of lights ('000)
R2 =0.61
Analysis – Step 1
State-level graphical analysis Middle income households
0
500
1000
1500
2000
To
tal
HH
3-1
0l
('0
00
)
MH
AP
RJ
TN
KR
GJ
UP
PB
MP
HR
WB
DEL
KL
ORCH
BIJH
J&KUK
0 500 1000 1500 2000 2500 3000 3500
Sum of lights ('000)
0
500
1000
1500
2000
To
tal
HH
3-1
0l
('0
00
)
MH
AP
RJ
TN
KR
GJ
UP
PB
MP
HR
WB
DEL
KL
ORCH
BIJH
J&KUK
0 500 1000 1500 2000 2500 3000 3500
Sum of lights ('000)
R2 =0.81
Analysis – Step 1
State-level graphical analysis Upper income households
0
100
200
300
400
500
Tota
l HH
>10
l ('0
00)
MH
AP
RJ
TN
KR
UP
PB
MP
HR
WBDEL
KL
CHBIJ&K
0 500 1000 1500 2000 2500 3000 3500
Sum of lights ('000)
0
100
200
300
400
500
Tota
l HH
>10
l ('0
00)
MH
AP
RJ
TN
KR
UP
PB
MP
HR
WBDEL
KL
CHBIJ&K
0 500 1000 1500 2000 2500 3000 3500
Sum of lights ('000)
R2 =0.77
Analysis – Step 1
• Lights definitely have a relationship with households in different income categories, but is not able to capture the entire picture at the state-level
• Examples highlight the need of analysis at a finer spatial resolution – Maharashtra and Andhra Pradesh (similar lights, dissimilar
incomes)– Madhya Pradesh and Rajasthan (similar incomes, dissimilar lights)– Uttar Pradesh in the graph and in the NL Image (variegated lighting
pattern)
• Complex role of population is highlighted
State-level graphical analysis - inferences Analysis – Step 1
Model 1: Using nighttime lights and dummy variables
The relationship between nighttime lights and household income suggested a logarithmic relationship
Analysis – Step 2, Developing Model 1
Model 1Dummy variables were created for commercially and
administratively important districts which are also high population zones
Analysis – Step 2, Developing Model 1
Model 1Hypotheses of the model
• While we can have data on households in different income brackets, we can obtain information only on total sum of lights in a region
• Hypothesis One: NL should be more closely associated with the richer in any given region than with the poorer
• Hypothesis Two: NL will most likely tend to under-estimate the number of poor households and over-estimate the rich households
• Logarithmic multivariate regression model used for all three income categories using the same predictor variables
Number of households Contribution to Nightlights
Analysis – Model 1
Model 1Model coefficients
Ln Y = α + β1 (Ln X1) + β2 X2 + β3X3 + β4X4 + β5X5 Ln Y = α + β1 (Ln X1) + β2 X2 + β3X3 + β4X4 + β5X5
* Significant at the 99% Confidence Interval, $ Significant at the 95% Confidence Interval, # Significant at the 90% Confidence Interval
Analysis – Model 1
Model 1Inferences
• Tightening of relationship between NL and households’ categories as the income goes up as seen in higher adjusted R2 values for middle and upper income category models
• Magnitude of the coefficient for NL (β1) increases as we move from the lower to the higher income segments
• Most of the predictor variables significant at the 99% level of significance
• Coefficients of all dummy variables go up monotonically for higher income group
• Lights are better able to estimate households in more affluent categories (Hypothesis One)
• β’s consistently highest for the Metropolitan dummy followed by dummy for Suburbs of Metros for all three models
Analysis – Model 1
Model 1Discussion
• Error maps were created to study the pattern of relationship between nighttime lights and number of households in each income category
• Under-estimation of number of households was observed in lower income category for highly populated states with over 80% rural population
• Under-estimation of upper income households by NL observed in high population density states of UP, Bihar and Kerala
• Under-estimation was lesser for upper- and middle-income households
• Over-estimation of lower income households in border districts of Rajasthan
• Over-estimation of lower income households in agriculturally rich states of Punjab, Haryana
• Thus, both Hypothesis one and Hypothesis two proved to be true
Analysis – Model 1
Model 2: Using nighttime lights, population density data & including another dummy variable
Analysis – Step 3, Developing Model 2
Population density calculated at the district level
A dummy variable created for districts with percentage of rural population
greater than 80%
Model 2Model coefficients
Ln Y = α + β1 (Ln X1) + β2 (Ln X2) + β3X3 + β4X4 + β5X5 + β6X6+ β7X7Ln Y = α + β1 (Ln X1) + β2 (Ln X2) + β3X3 + β4X4 + β5X5 + β6X6+ β7X7
* Significant at the 99% Confidence Interval, $ Significant at the 95% Confidence Interval, # Significant at the 90% Confidence Interval
Analysis – Model 2
Model 2Inferences
• Inclusion of population density and the dummy variable of districts with rural population greater than 80%, increases the R2 for all the three income categories
• Highest percentage increase (about 13%) in R2 value is seen for households in the lowest income category
• Magnitude of the coefficient for NL (β1) is highest for the higher income group
• Magnitude of the coefficient for population density (β2) is lowest for the higher income group
• The rural population’s indicator is most significant for the lowest income group
• In fact, the rural indicator is negatively correlated with the middle and upper income households
• Coefficients of all other dummy variables go up monotonically for higher income group
Analysis – Model 2
Comparing error maps of Model 1 and Model 2Error maps – Lower income households
Results
Model 1 Model 2
Results Comparing error maps of Model 1 and Model 2Error maps – Middle income households Model 1 Model 2
Results Comparing error maps of Model 1 and Model 2Error maps – Upper income households Model 1 Model 2
• Good relationship exists between nighttime lights and income distribution at the district level, with the relationship being stronger for households in the highest income category
• Inclusion of population density and dummy variable for districts with rural population greater than 80% causes the greatest improvement in the estimates of the lower income households
• A study of the error maps show that, in general , Model 2 expands the yellow areas in the maps (-5 to +5 % error) , which we are considering as ‘acceptable’ percentage errors, across all the income groups
• High population density in urban areas, big share of rural population and presence of large expanse of cultivated areas which are not lit, lack of government provision of public amenities, presence of affluent farmers, presence of military base along border areas, are some of the characteristics noticed of districts with anomalous estimates of economic activity by nightlights
DiscussionDiscussion
Conclusion and Future considerationsConclusion
• Finer spatial resolution analysis of nightlights is more effective in understanding and using this remotely sensed spatial data as a proxy of economic activity
• The same holds true for spatial population data
• The developed models (with further improvements) can be used to estimate households in different income categories for years when such data are not available
• These models can be useful in studying income inequality.
• Inclusion of data such as land use, land cover, vegetation cover, are some of the variables that can be considered for improving the model
Thank You!!
Questions?