Smart Traffic Lights that Learn ! Multi-Agent Reinforcement Learning Integrated Network of...
-
Upload
jonathan-laba -
Category
Technology
-
view
291 -
download
2
description
Transcript of Smart Traffic Lights that Learn ! Multi-Agent Reinforcement Learning Integrated Network of...
Smart Traffic Lights that Learn !
Multi-Agent Reinforcement Learning Integrated Network of Adaptive Traffic Signal Controllers
M A R L I N
Samah El-Tantawy, Ph.D. Post Doctoral Fellow, Dept of Civil Engineering
Baher Abdulhai, Ph.D., P.Eng. Director, ITS Centre and Testbed, Dept of Civil Engineering
Hossam Abdelgawad, Ph.D., P.Eng. Manager of ITS Centre and Testbed
ACGM 2013- Intelligent Transport for Smart Cities
Outline 2
1. In a Nutshell 2. Theory in Brief Reinforcement Learning and Game Theory
3. Applications City of Toronto Testbed
4. Hardware in the Loop Testing Approach Integration with PEEK ATC-1000
Next Steps Q&A
In a Nutshell 3
Grand objective
Intersections "talk to each other",
Each is affected by what is happening upstream
Each affects what is happening downstream –
Whole network control in one shot from a grand brain is the dream
Issue
Intractable theoretically,
Too complex practically,
Requires massive and very expensive communication
Solution
Decentralized,
Self learning: agents learn to control their local intersection, and
Game theory based: agents learn to collaborate
What is MARLIN? 4
Artificial-intelligence-based control software
Enables traffic lights to self-learn and self-collaborate with neighbouring traffic lights
Cuts down motorists’ delay, fuel consumption and the negative environmental effects of congestion
Easier to operate (self learning)
Less expensive communication if even necessary (less costly)
MARLIN-ATSC: Level 4
Evolution of “Adaptive” Signal Control
Level 0 • Fixed-Time
and Actuated Control
• TRANSYT • 1969, UK
Level 1 • Centralized
Control, Off-line Optimization
• SCATS • 1979,
Australia • >50
installations worldwide
Level 2 • Centralized
Control, On-line Optimization
• SCOOT • 1981, UK • >170
installations worldwide
Level 3 • Distributed
Control, Model-Based
• OPAC, RHODES • 1992, USA • 5 installations in
USA
Level 4 • Distributed
Self-Learning Control
• MARLIN-ATSC • 2011, Canada
5
Issues with Leading ATSC Technologies?
• Expensive • Not scalable • Not robust
Centralized
• Relying on an accurate traffic modelling framework
• the accuracy of which is questionable Model-Based
• Increasing the complexity of the system exponentially with the increase in the number of intersections/controllers
Curse of Dimensionality
• Requiring highly skilled labour to operate due to their complexity.
Human Intervention
Requirements
6
Why is MARLIN Different? 7
MARLIN
Self-Learning
Decentralized
Model-Free
Coordinated Scalable
Pattern Sensitive
Generic
Human Intervention Requirements
Centralized
Inefficient Coordination
Model-Based
Curse of Dimensionality
Prediction Requirement
Specific Design
Learning the Control Law: Reinforcement Learning Architecture
8
Environment
RL Architecture
Agent
State Reward Action
Goal: Optimal Control law = mapping between states and actions
)],(),(max[),(),( 111 kkkkk
a
kkkkkkk asQasQrasQasQ
),(maxarg 11 asQa kk
a
k Balancing exploration and exploitation
Q a1 a2
s1 -10 -5
s2 -3 -15
Q Table
RL-based ATSC Architecture
RL Software Agent
State (Queue Lengths)
Reward
(Delay Savings)
Action (Extend /Switch)
Traffic Simulation Environment
9
10
Each agent plays a game with each adjacent intersection in its neighborhood
I5 I6 I4
I2 I3 I1
I8 I9 I7
I5 I6 I4
I2 I3 I1
I8 I9 I7
I5 I6 I4
I2 I3 I1
I8 I9 I7
Example for Intermediate Intersection (4 Games )
Example for Edge Intersection ( 3 Games)
Example for Corner Intersection ( 2 Games)
MARLIN- ATSC: Coordination Principle
MARLIN-ATSC: (a) Independent Mode, (b) Integrated Mode
MARLIN-ATSC Available Modes
Queue Length 1
Delay 1 Extend 1 Queue Length 2
Delay 2 Extend 2
MARLIN-ATSC
Queue Length 1
Delay 1 Extend 1 Queue Length 2
Delay 2 Extend 2
(a)
(b)
11
Large-Scale Application Network-Wide MOE in the Normal Scenario
12
System
MOE
BC
%
Improvments
MARL-TI Vs.
BC
%
Improvments
MARLIN-IC Vs.
BC
% Improvments
MARLIN-IC Vs.
MARL-TI
Average Intersection
Delay (sec/veh)35.27 27% 38% 14%
Throughput (veh) 23084 3% 6% 3%
Avg Queue Length (veh) 8.66 24% 32% 11%
Std. Avg. Queue Length
(veh)2.12 23% 31% 10%
Avg. Link Delay (sec) 9.45 10% 47% 41%
Avg. Link Stop Time (sec) 2.74 6% 26% 21%
Avg. Link Travel Time
(sec)16.81 6% 27% 22%
CO2 Emission Factor
(gm/km)587.28 28% 30% 2%
Large-Scale Application % Improvement in Average Delay
13
MARLIN-IC vs BC
% Improvement Area 1
Area 2
Area 3
Large-Scale Application Average Route Travel Time for Selected Routes
14
0
1
2
3
4
5
6
7
8
1 2 3 4 5 6 7 8 9 10 11 12
Average T
ravel
Tim
e (
min
)
Time Interval (5 min)
Gardiner EB
BC MARL-TI MARLIN-IC
Fre
eway
Large-Scale Application Average Route Travel Time for Selected Routes
15
0
2
4
6
8
10
12
14
16
18
20
1 2 3 4 5 6 7 8 9 10 11 12
Av
era
ge T
ra
vel
Tim
e (
min
)
Time Interval (5 min)
LakeShore EB to Spadina NB
BC MARL-TI MARLIN-IC
Maj
or
Art
eria
l
Controller Interface Device(CID) RS485 to USB
Traffic Signal Controller
RS485 - SDLC protocol
USB - SDLC protocol
Industrial Computer
Ethernet - NTCIP protocol
Paramics Modeller
MARLIN-HILS Architecture 16
HILS Setup: Demo 17
Conclusion 18
MARLIN state of the art gen4+
Thoroughly developed and tested
Patent Pending Status
On going:
HILS & PEEK ATC-1000 Integration
Potential Field Operation Test
Productization
From TSP to People Priority (PSP)
Samah El-Tantawy [email protected]
Baher Abdulhai [email protected]
Hossam Abdelgawad [email protected]
ACGM 2013- Intelligent Transport for Smart Cities
Smart Traffic Lights that Learn !
Multi-Agent Reinforcement Learning Integrated Network of Adaptive Traffic Signal Controllers
M A R L I N
Samah ElTantawy, Ph.D. Post Doctoral Fellow, Dept of Civil Engineering
Baher Abdulhai, Ph.D., P.Eng. Director, ITS Centre and Testbed, Dept of Civil Engineering
Hossam Abdelgawad, Ph.D., P.Eng. Manager of ITS Centre and Testbed
ACGM 2013- Intelligent Transport for Smart Cities