U-Air: When Urban Air Quality Meets Big Data
description
Transcript of U-Air: When Urban Air Quality Meets Big Data
![Page 1: U-Air: When Urban Air Quality Meets Big Data](https://reader035.fdocuments.us/reader035/viewer/2022062521/56816721550346895ddba6ab/html5/thumbnails/1.jpg)
U-Air: When Urban Air Quality Meets Big Data
Yu ZhengLead Researcher, Microsoft Research Asia
![Page 2: U-Air: When Urban Air Quality Meets Big Data](https://reader035.fdocuments.us/reader035/viewer/2022062521/56816721550346895ddba6ab/html5/thumbnails/2.jpg)
Background• Air quality
– NO2, SO2– Aerosols: PM2.5, PM10
• Why it matters – Healthcare– Pollution control and dispersal
• Reality– Building a measurement station
is not easy– A limited number of stations
(poor coverage)
Beijing only has 15 air quality monitor stations in its urban areas (50kmx40km)
Air quality monitor station
![Page 3: U-Air: When Urban Air Quality Meets Big Data](https://reader035.fdocuments.us/reader035/viewer/2022062521/56816721550346895ddba6ab/html5/thumbnails/3.jpg)
2PM, June 17, 2013
![Page 4: U-Air: When Urban Air Quality Meets Big Data](https://reader035.fdocuments.us/reader035/viewer/2022062521/56816721550346895ddba6ab/html5/thumbnails/4.jpg)
Challenges• Air quality varies by locations
non-linearly• Affected by many factors
– Weathers, traffic, land use…– Subtle to model with a clear formula
0 40 80 120 160 200 240 280 320 360 400 440 4800.00
0.05
0.10
0.15
0.20
0.25
0.30
Por
titio
n
Deviation of PM2.5 between S12 and S13
>35%Prop
ortio
n
A) Beijing (8/24/2012 - 3/8/2013)
![Page 5: U-Air: When Urban Air Quality Meets Big Data](https://reader035.fdocuments.us/reader035/viewer/2022062521/56816721550346895ddba6ab/html5/thumbnails/5.jpg)
We do not really know the air quality of a location without a monitoring station!
![Page 6: U-Air: When Urban Air Quality Meets Big Data](https://reader035.fdocuments.us/reader035/viewer/2022062521/56816721550346895ddba6ab/html5/thumbnails/6.jpg)
Challenges• Existing methods do not work well
– Linear interpolation– Classical dispersion models
• Gaussian Plume models and Operational Street Canyon models• Many parameters difficult to obtain: Vehicle emission rates, street
geometry, the roughness coefficient of the urban surface…
– Satellite remote sensing• Suffer from clouds• Does not reflect ground air quality• Vary in humidity, temperature, location, and seasons
– Outsourced crowd sensing using portable devices• Limited to a few gasses: CO2 and CO • Sensors for detecting aerosol are not portable: PM10, PM2.5• A long period of sensing process, 1-2 hours 30,000 + USD, 10ug/m3
202×85×168( mm)
![Page 7: U-Air: When Urban Air Quality Meets Big Data](https://reader035.fdocuments.us/reader035/viewer/2022062521/56816721550346895ddba6ab/html5/thumbnails/7.jpg)
Inferring Real-Time and Fine-Grained air quality throughout a city using Big Data
Meteorology Traffic POIs Road networksHuman Mobility
Historical air quality data Real-time air quality reports
![Page 8: U-Air: When Urban Air Quality Meets Big Data](https://reader035.fdocuments.us/reader035/viewer/2022062521/56816721550346895ddba6ab/html5/thumbnails/8.jpg)
![Page 9: U-Air: When Urban Air Quality Meets Big Data](https://reader035.fdocuments.us/reader035/viewer/2022062521/56816721550346895ddba6ab/html5/thumbnails/9.jpg)
Applications• Location-based air quality awareness
– Fine-grained pollution alert– Routing based on air quality
• Identify candidate locations for setup new monitoring stations• A step towards identifying the root cause of air pollution
S2
S1
S5
S3
S7
S6S4
S1
S8
S9
S10
B) Shanghai
![Page 10: U-Air: When Urban Air Quality Meets Big Data](https://reader035.fdocuments.us/reader035/viewer/2022062521/56816721550346895ddba6ab/html5/thumbnails/10.jpg)
Difficulties• How to identify features from each kind of data source
• Incorporate multiple heterogeneous data sources into a learning model– Spatially-related data: POIs, road networks– Temporally-related data: traffic, meteorology, human mobility
• Data sparseness (little training data)– Limited number of stations– Many places to infer
![Page 11: U-Air: When Urban Air Quality Meets Big Data](https://reader035.fdocuments.us/reader035/viewer/2022062521/56816721550346895ddba6ab/html5/thumbnails/11.jpg)
Methodology Overview• Partition a city into disjoint grids• Extract features for each grid from its affecting region
– Meteorological features– Traffic features– Human mobility features– POI features– Road network features
• Co-training-based semi-supervised learning model for each pollutant
– Predict the AQI labels– Data sparsity– Two classifiers
![Page 12: U-Air: When Urban Air Quality Meets Big Data](https://reader035.fdocuments.us/reader035/viewer/2022062521/56816721550346895ddba6ab/html5/thumbnails/12.jpg)
Meteorological Features: Fm
• Rainy, Sunny, Cloudy, Foggy• Wind speed• Temperature• Humidity• Barometer pressure
GoodModerate
UnhealthyUnhealthy-S
Very Unhealthy
AQI of PM10
August to Dec. 2012 in Beijing
![Page 13: U-Air: When Urban Air Quality Meets Big Data](https://reader035.fdocuments.us/reader035/viewer/2022062521/56816721550346895ddba6ab/html5/thumbnails/13.jpg)
Traffic Features: Ft
• Distribution of speed by time: F(v)• Expectation of speed: E(V)• Standard deviation of Speed: D
Good Moderate
km
Unhealthy-S Very UnhealthyUnhealthy
0≤ v<20
20≤ v<40
v≥ 40
E(v)
D(v)
GPS trajectories generated by over 30,000 taxisFrom August to Dec. 2012 in Beijing
![Page 14: U-Air: When Urban Air Quality Meets Big Data](https://reader035.fdocuments.us/reader035/viewer/2022062521/56816721550346895ddba6ab/html5/thumbnails/14.jpg)
Extracting Traffic Features• Offline spatio-temporal indexing
– : arrival time– Traj: trajectory ID– : the index of the first GPS point (in the trajectory) entering a grid– : the index of the last GPS point (in the trajectory) leaving a grid
gi
td
td
tp
Taxi2
Taxi7
Taxim
drop1
drop2
picki
dropm
earliest
latesttd
ta, Traj, Ii, Io
The time span of the data
Traj1
Trajl
Trajm
p1→ p2→. . . → pn
ta, Traj, Ii, Io
ta, Traj, Ii, Io
p1→ p2→. . . → pn
p1→ p2→. . . → pn
![Page 15: U-Air: When Urban Air Quality Meets Big Data](https://reader035.fdocuments.us/reader035/viewer/2022062521/56816721550346895ddba6ab/html5/thumbnails/15.jpg)
Human Mobility Features: Fh
• Human mobility implies– Traffic flow– Land use of a location – Function of a region (like residential or business areas)
• Features: – Number of arrivals and leavings
A) AQI of PM10 B) AQI of NO2
fl fl
fa fa
GoodModerate
UnhealthyUnhealthy-S
Very Unhealthy
GoodModerate
UnhealthyUnhealthy-S
Very Unhealthy
![Page 16: U-Air: When Urban Air Quality Meets Big Data](https://reader035.fdocuments.us/reader035/viewer/2022062521/56816721550346895ddba6ab/html5/thumbnails/16.jpg)
POI Features: Fp
• Why POI– Indicate the land use and the function of the region – the traffic patterns in the region
• Features– Distribution of POIs over categories– Portion of vacant places – The changes in the number of POIs
• Factories, shopping malls, • hotel and real estates • Parks, decoration and furniture markets
![Page 17: U-Air: When Urban Air Quality Meets Big Data](https://reader035.fdocuments.us/reader035/viewer/2022062521/56816721550346895ddba6ab/html5/thumbnails/17.jpg)
Road Network Features: Fr• Why road networks
– Have a strong correlation with traffic flows– A good complementary of traffic modeling
• Features:– Total length of highways – Total length of other (low-level) road segments – The number of intersections in the grid’s affecting region
![Page 18: U-Air: When Urban Air Quality Meets Big Data](https://reader035.fdocuments.us/reader035/viewer/2022062521/56816721550346895ddba6ab/html5/thumbnails/18.jpg)
Semi-Supervised Learning Model
• Philosophy of the model– States of air quality
• Temporal dependency in a location• Geo-correlation between locations
– Generation of air pollutants• Emission from a location• Propagation among locations
– Two sets of features• Spatially-related• Temporally-related
s2
s1s3
s4l
s2
s1s3
s4l
s2
s1
s3
s4
ti
t1
t2
lTim
e
Geo
spac
e
A location with AQI labels A location to be inferred Temporal dependencySpatial correlation
POIs: Spatial
Fh Temporal
Road Networks: Fr
Ft FmMeteorologic:Traffic:Human mobility:
FpSpatial Classifier
Temporal ClassifierCo-T
rain
ing
![Page 19: U-Air: When Urban Air Quality Meets Big Data](https://reader035.fdocuments.us/reader035/viewer/2022062521/56816721550346895ddba6ab/html5/thumbnails/19.jpg)
Co-Training-Based Learning Model
• Temporal classifier– Model the temporal dependency of the air quality in a location– Using temporally related features– Based on a Linear-Chain Conditional Random Field (CRF)
Yt-1
Fm(t-1) t-1Ft(t-1) Fh(t-1) Fm(t) tFt(t) Fh(t) Fm(t+1) t+1Ft(t+1) Fh(t+1)
Yt Yt-1
![Page 20: U-Air: When Urban Air Quality Meets Big Data](https://reader035.fdocuments.us/reader035/viewer/2022062521/56816721550346895ddba6ab/html5/thumbnails/20.jpg)
Co-Training-Based Learning Model
• Spatial classifier– Model the spatial correlation between AQI of different locations– Using spatially-related features– Based on a BP neural network
• Input generation– Select n stations to pair with– Perform m rounds
∆P1x
∆R1x
c
D1Fp
Fr
l1 D2
c
d1x
D1
D2
D1
D1
1
1
Fp
Fr
lk
k
k
Fpx Fr
x lx
∆Pkx
∆Rkx
cdkx
1
k
x
ANNInput generation
w'11
w'qr
w1
wr
wpq
w11b1
bq
b'r
b'1
b''
∆P1x
∆R1x
c
D1Fp
Fr
l1 D2
c
d1x
D1
D2
D1
D1
1
1
Fp
Fr
lk
k
k
Fpx Fr
x lx
∆Pkx
∆Rkx
cdkx
1
k
x
ANNInput generation
w'11
w'qr
w1
wr
wpq
w11b1
bq
b'r
b'1
b''
![Page 21: U-Air: When Urban Air Quality Meets Big Data](https://reader035.fdocuments.us/reader035/viewer/2022062521/56816721550346895ddba6ab/html5/thumbnails/21.jpg)
Learning Process
Yt-1
Fm(t-1) t-1Ft(t-1) Fh(t-1) Fm(t) tFt(t) Fh(t) Fm(t+1) t+1Ft(t+1) Fh(t+1)
Yt Yt-1
∆P1x
∆R1x
c
D1Fp
Fr
l1 D2
c
d1x
D1
D2
D1
D1
1
1
Fp
Fr
lk
k
k
Fpx Fr
x lx
∆Pkx
∆Rkx
cdkx
1
k
x
ANNInput generation
w'11
w'qr
w1
wr
wpq
w11b1
bq
b'r
b'1
b''
Training
Temporally-related features
Spatially-related features
Labeled data Unlabeled data
Inference
![Page 22: U-Air: When Urban Air Quality Meets Big Data](https://reader035.fdocuments.us/reader035/viewer/2022062521/56816721550346895ddba6ab/html5/thumbnails/22.jpg)
Inference Process
Yt-1
Fm(t-1) t-1Ft(t-1) Fh(t-1) Fm(t) tFt(t) Fh(t) Fm(t+1) t+1Ft(t+1) Fh(t+1)
Yt Yt-1
∆P1x
∆R1x
c
D1Fp
Fr
l1 D2
c
d1x
D1
D2
D1
D1
1
1
Fp
Fr
lk
k
k
Fpx Fr
x lx
∆Pkx
∆Rkx
cdkx
1
k
x
ANNInput generation
w'11
w'qr
w1
wr
wpq
w11b1
bq
b'r
b'1
b''
Inference
Temporally-related features
Spatially-related features
Inference
, …,
, …,
× ()
![Page 23: U-Air: When Urban Air Quality Meets Big Data](https://reader035.fdocuments.us/reader035/viewer/2022062521/56816721550346895ddba6ab/html5/thumbnails/23.jpg)
Evaluation
Data sources Beijing Shanghai Shenzhen Wuhan
POI 2012 Q1 271,634 321,529 107,061 102,4672012 Q3 272,109 317,829 107,171 104,634
Road
#.Segments 162,246 171,191 45,231 38,477Highways 1,497km 1,963km 256km 1,193km
Roads 18,525km 25,530km KM 6,100km 9,691km
#. Intersec. 49,981 70,293 32,112 25,359
AQI
#. Station 22 10 9 10Hours 23,300 8,588 6,489 6,741
Time spans 8/24/2012-3/8/2013
1/19/2013-3/8/2013
2/4/2013-3/8/2013 2/4/2013-3/8/2013
Urban Size (grids) 5050km (2500) 5050km (2500) 5745km(2565) 4525km (1165)
• Datasets
S1
S2
S4
S5
S8
S5
S2
S1
S7
S5
S3
S3
S6 S7S6
S9S10
S12
S11
S13 S14
S22S15
S16
S16
S17
S18
S19
S20
S21S3
S7
S6
S4
S1
S8
S9
S10
S1
S4
S2
S6
S9
S8
S1
S2
S4
S3S10
S5
S9
S6 S7
S8
A) Beijing B) Shanghai C) Shenzhen D) Wuhan
![Page 24: U-Air: When Urban Air Quality Meets Big Data](https://reader035.fdocuments.us/reader035/viewer/2022062521/56816721550346895ddba6ab/html5/thumbnails/24.jpg)
Evaluation
• Ground Truth– Remove a station– Cross cities
• Baselines– Linear and Gaussian Interpolations– Classical Dispersion Model – Decision Tree (DT):– CRF-ALL – ANN-ALL
![Page 25: U-Air: When Urban Air Quality Meets Big Data](https://reader035.fdocuments.us/reader035/viewer/2022062521/56816721550346895ddba6ab/html5/thumbnails/25.jpg)
Evaluation
PM10 NO2
Features Precision Recall Precision Recall
0.572 0.514 0.477 0.4540.341 0.36 0.371 0.350.327 0.364 0.411 0.483
+ 0.441 0.443 0.307 0.354+ 0.664 0.675 0.634 0.635
+++ 0.731 0.734 0.701 0.691++++ 0.773 0.754 0.723 0.704
• Does every kind of feature count?
![Page 26: U-Air: When Urban Air Quality Meets Big Data](https://reader035.fdocuments.us/reader035/viewer/2022062521/56816721550346895ddba6ab/html5/thumbnails/26.jpg)
Evaluation• Overall performance of the co-training
0 20 40 60 80 100 120 140 160
0.65
0.70
0.75
0.80
Pre
cisi
on
Num. of Iterations
SC TC Co-Training
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
NO2PM10
Acc
urac
y
U-Air Linear Guassian Classical DT CRF-ALL ANN-ALL
Accu
racy
![Page 27: U-Air: When Urban Air Quality Meets Big Data](https://reader035.fdocuments.us/reader035/viewer/2022062521/56816721550346895ddba6ab/html5/thumbnails/27.jpg)
Evaluation
Ground Truth
Predictions
G M S U
G 3789 402 102 0 0.883
Recall
M 602 3614 204 0 0.818 S 41 200 532 50 0.646 U 0 22 70 219 0.704
0.855 0.853 0.586 0.814 0.828Precision
• Confusion matrix of Co-Training on PM10
![Page 28: U-Air: When Urban Air Quality Meets Big Data](https://reader035.fdocuments.us/reader035/viewer/2022062521/56816721550346895ddba6ab/html5/thumbnails/28.jpg)
Evaluation
Cities PM2.5 PM10 NO2
Prec. Rec. Prec. Rec. Prec. Rec.
Beijing 0.764 0.763 0.762 0.745 0.730 0.749
Shanghai 0.705 0.725 0.702 0.718 0.715 0.706
Shenzhen 0.740 0.737 0.710 0.742 0.732 0.722
Wuhan 0.727 0.723 0.731 0.739 0.744 0.719
• Performance of Spatial classifier
![Page 29: U-Air: When Urban Air Quality Meets Big Data](https://reader035.fdocuments.us/reader035/viewer/2022062521/56816721550346895ddba6ab/html5/thumbnails/29.jpg)
Evaluation
Procedures Time(ms) Procedures Time(ms)
Feature extraction(per grid)
& 53.2 Inference(per grid)
SC 21.528.8 TC 13.1
14.4 Total 131
• Efficiency study• Inferring the AQIs for entire Beijing in 5 minutes
![Page 30: U-Air: When Urban Air Quality Meets Big Data](https://reader035.fdocuments.us/reader035/viewer/2022062521/56816721550346895ddba6ab/html5/thumbnails/30.jpg)
Conclusion• Infer fine-grained air quality with
– Real-time and historical air quality readings from existing stations – Other data sources: meteorology, POIs, road network, human mobility, and
traffic condition
• Co-Training-based semi-supervised learning approach– Deal with data sparsity by learning from unlabeled data– Model the spatial correlation among the air quality of different locations– Model the temporal dependency of the air quality in a location
• Results – 0.82 with traffic data (co-training)– 0.76 if only using spatial classifier
![Page 31: U-Air: When Urban Air Quality Meets Big Data](https://reader035.fdocuments.us/reader035/viewer/2022062521/56816721550346895ddba6ab/html5/thumbnails/31.jpg)
Next Step
• Predict the air quality in 1~2 hours• Identify the root cause of the air pollutions by
– Studying the correlations between AQIs and different features – Data visualization across multiple-domain data sources