Adaptive cognitive driving systems

What has been done what need to be done.

An Adaptive Cognitive System to Support the Driving Task of Vehicles of the Future

Francesco Biral

University of Trento

[email protected]

mailto:[email protected]

What do we need to do?7

What I will talk about today?

What kind of technology do we need to support drivers and interact with them in complex road scenarios?

Self driving cars does not take driver into the loop8

Self driving cars are only a part of the answer

Will be an artificial robot that drivers as an expert human driver sufficient?

Will the driver accept robotic cars?9

Potential reactions of some customers

certain segments of the population will be less likely to embrace autonomous driving (e.g. Car enthusiasts)

The “Digital Natives” and “Gen. Now” generations’ identity is less likely to be attached to the “driving experience.”

like horse and rider

like a driving instructor

Driver and machine need to cooperate

10

We must understand/know the driverWe must know his/her goals and intentions and driving abilities11

A human would drive in distinctly different ways

depending on whether his/her goal is.

depending on his/her driving skills and experience

Like in the riding-horse metaphor

understand your intentions

understand your driving abilities

silently and gently support you when necessary

improving your manoeuvre

executing autonomously a manoeuvre initiated

execute a task when required (eg. take me there)

leave you the control but intervene only when driver reaches his/her limits or underestimate a scenario or did not see a better option

The ideal support

12

Design principles of Co-Driver Definition of Codriver

Architecture

The simulation theory of cognition and human like sensory motor cycles

Co-Driver developed in EU interactIVe project Instantiation of a CoDriver

Experimental examples

Limitations

Impact of Co-Driver technology & research lines

Contents

13

What a Co-Drivers is? Theoretical background

Incomplete definitions Autopilot (emphasis on automatism)

Companion Driver (robot: emphasis on human-robot interaction)

Virtual User (alter-ego: emphasis on reproducing human skills)

Driver Tutor (emphasis on supervising)

Natural co-drivers exist (animals and especially horses – H-metaphor, Flemish, Norman, et. al.)

What a Co-Driver might be?

What is a codriver?15

A ”co-driver” is an intelligent agent which: Understands human driver intentions.

Produces human-like motor behaviours.

has an internal structure that copies the human one (at least in some sense)

Interacts accordingly and rectifying mistakenly executed actions.

Characteristics of a Co-Driver

What a CoDriver is?16

Co-Driver must “understand” human driver

How would a human drive?

This question has multiple answers! Answer depend on some higher level motivations/goals.

The co-driver must put himself “in the shoes” of the human driver and understand the goal.

It is not an easy task since there are a multiplicity of goals

Key Technology #1

17

Co-driver must understand the goals of the human driver (humans can do that with other humans, how do they do?)

Simulation theory of Cognition is a conceptual framework which essentially states that “thinking is simulation of perception and action”, carried out as covert motor-sensory activity.Also understanding of others’ intentions is also a simulation process, carried out via the “mirroring” of observed motor activities of others. Hurley, S.L., 2008. The shared circuits model (SCM): how control, mirroring, and simulation can enable imitation, deliberation, and mindreading. Behav. Brain Sci. 31, 1–58.

Grush, R. 2004. "The Emulation Theory of Representation: Motor Control, Imagery, and Perception." Behavioral and Brain Sciences 27 (3): 377-396.

Jeannerod, M. 2001. "Neural Simulation of Action: A Unifying Mechanism for Motor Cognition." NeuroImage 14 (1 II): S103-S109.

Understand the goal: theoretical background

18

Generative Approach is to generate agent behaviours under a number of alternative hypotheses, which are then tested by comparison with observed behaviours. “multiple simulations” are run in parallel, and the most salient one(s) are selected). The observed behaviour identifies the internal states of the observed agent, and thus the intentions (Haruno, Wolpert, and Kawato 2001; Demiris and Khadhouri 2006).

“Like me” framework for understanding of others’ intentions: “others who act like me have internal states like me” (Meltzoff).The “like-me” framework essentially states that one agent “stands in the shoes of another”.

Simulation/mirroring theories of cognition

19

Summing up: agents with similar sensory-motor system and capable of covert motor activities can use their sensory-motor system to “simulate” observed actions, and thus know the intentions of the observed agent

“putting the co-driver in the shoes of the real driver” means the co-driver “emulates” the real driver such as in covert motor activities.

Objective: link driver behaviour to meaningful goals (understand driver goals/motivations).

Simulation/mirroring theories of cognition

20

Exemplification of terms used21

longitudinal control

lateral control

1 2 3 4 5 6 7 8

3c

3a

3b

Goals

Driver behaviours

Map of controls

alternative hypotesis

covert motor activities

Co-driver must be able to generate human like motor primitives:

Humanlike. Reproduce human sensory-motor strategies (path planning and motor patterns just like a human).

D Liu, E. Todorov, Evidence for the Flexible Sensorimotor Strategies Predicted by Optimal Feedback Control, Journal of Neuroscience, 2007 • 27(35):9354 –9368

P. Viviani, T. Flash, Minimum-jerk, two-thirds power law, and isochrony: converging approaches to movement planning. J. Exp. Psychol. 21: 32-53, 1995.

“Even if skilled performance on a certain task is not exactly optimal, but is just ‘good enough’, it has been made good enough by processes whose limit is optimality”.

Human motor patterns respond to optimality criteria and may be reproduced by Receding Horizon Optimal Control (minimum intervention principle)

Key Technology #2

22

Experimental data drivers reduce speed in curves to maintain the accuracy in lateral position

23

Acceleration patterns and Two third law

In order to improve movement accuracy, while preserving average speed, it is convenient to increase speed in straighter arcs and reduce it along curvier ones.

alat =a0s✓

1�⇣

vv0

⌘2◆2

+ 2⇣

vv0

⌘2v =

↵3p

Architecture#1: sense-think-act paradigm It is the traditional architecture of AI also known as computer metaphor.The central idea is the existence of an “internal model of the world”.Problems:

perception “per se”;

not scalable (interfaces are choke points);

difficult to test;

is not what happens in the human brain;

not fault tolerant;

hard to conceal with motor imagery and covert sensory-motor activity.

Key Technology #3: Architecture

24

A tutor made of a one-level virtual driver (called “reference maneuver”) was built into SASPENCE and INSAFES (+ evasive maneuver).

Limitation: missing motor imagery it was not able to “understand” the driver goal (giving recommendation for a pre-defined goal).

Da Lio, Biral et. al, T-ITS, 2010 (2 papers)

Sense-think-act success story/1

25

(Versailles test track: reference manovre (red) vs. real driver (blue) movie)

Sense-think-act success story/2

26

Optimal control replanning on 200m horizon every 0.1s

Behaviour on a curve

Other examples

27

Decomposition in parallel behaviours (hierarchical levels of competence).

Is based on Perception-Action cycles (no internal model of the world).

Multi-goal, multi-sensor (perceptual synthesis), robust, scalable, subsumptive, each level includes sub-level competences.

R. A. Brooks. A robust layered control system for a mobile robot. IEEE Journal of Robotics and Automation, 14(23), April 1986.

Architecture: the behavioural model

28

Shared Circuit Model combines the above ideas into an interpretative scheme named with in the behavioural architecture

“Thinking” is a simulated interaction.

Emulation theory of cognition (Grush, Hurley, Jannerod, et al.) enables imitation, motor imagery, deliberation, mindreading, understanding….

Theory of Cognition by means of emulation

29

PERCEPTION ACTION

INVERSE MODEL

FORWARD EMULATOR

ENVIRONMENT

IMITATION SIMULATED INPUT

OWNSACT

BODYOTHER'S

ACT

OUTPUTINHIBITED

The general idea that the brain monitors a small number of task parameters y instead of the full state x, generates abstract commands v, and maps them into muscle activations u using motor synergies.

Humans are organized in hierarchies of subsumptive behaviors

(Brooks, 1986; Michon 1985; Hatakka et al. 2002; Hollnagel and Woods 1999, 2005).

Human cognition is “grounded” in which the intelligent agent is seen in the loop with the environment, and perception and action are no longer divided.

(Gibson 1986; Varela, Thompson, and Rosch 1991; Thelen and Smith 1994; Van Gelder 1995; Harvey 1996; Clark 1997; Seitz 2000; Beer 2000; Barsalou 2008)

The traditional paradigm of AI (the computer metaphor: input-processing-output) suffers symbol grounding problems of the abstract amodal symbol systems. It fails in modeling mutual understanding of agents.

Human-like sensory-motor systems

30

The ECOM is a subsumptive hierarchical behavioural model of human driving.

(Hollnagel and Woods, 1999, 2002, 2005)

Successfully used in FP7 DIPLECS.

The Extended Control Model (ECOM)

31

Long term goals and psychological states (e.g., go home quickly)

Short term goals and driving styles. (e.g. overtake “a” instead of “b”).

Space-Time Trajectories (i.e., including speed).

Vehicle control.

It is inspired by this general organization of the sensorimotor system.

The low-level controller receives information about the plant state x, and generates an abstract and more compact state representation y(x) that is sent to the high level. The high-level controller monitors task progress, and issues commands v(y) which in general specify how y should change.

The job of the low-level controller is to compute energy-efficient controls u(v,x) consistent with v. Thus the low-level controller does not solve a specific subtask (as usually assumed in hierarchical reinforcement learning), but instead performs an instantaneous feedback transformation. This enables the high level to control y unencumbered by the full details of the plant.

Hierarchical control scheme

32

1. INTRODUCTION

The control of complex redundant systems is a chal-lenging problem, of interest in both robotics and bio-logical motor control. The nonlinear dynamics andhigh-dimensional state and control spaces of suchsystems prevent the use of many traditional methodsfor controller design. One way to tackle very complexproblems is through divide-and-conquer strategies.Indeed, the most advanced control system known todate—the nervous system—appears to rely on suchstrategies. Sensorimotor control occurs simulta-neously on many levels.1–3 Lower-level circuits inter-act with the musculoskeletal system directly: They re-ceive rich sensory input, and generate correspondingmotor output before the rest of the brain has had timeto react to that input. Higher-level circuits interactwith an augmented plant, that consists of the lowerlevels and the musculoskeletal system. The lower lev-els perform a !not well understood" transformation,allowing higher levels to operate on increasinglymore abstract and more goal-related movementrepresentations.4

Here, we propose a hierarchical control schemeinspired by this general organization of thesensorimotor system, as well as by prior work on hi-erarchical control in robotics.5–7 We focus on two-level feedback control hierarchies as illustrated inFigure 1. The low-level controller receives informa-tion about the plant state x, and generates an abstractand more compact state representation y!x" that issent to the high level. The high-level controller moni-tors task progress, and issues commands v!y" whichin general specify how y should change. The job ofthe low-level controller is to compute energy-efficientcontrols u!v ,x" consistent with v. Thus the low-levelcontroller does not solve a specific subtask !as usuallyassumed in hierarchical reinforcement learning",8,9

but instead performs an instantaneous feedback

transformation. This enables the high level to controly unencumbered by the full details of the plant.

While the proposed scheme is designed to re-semble the sensorimotor system on a structural level,achieving functional resemblance is equally impor-tant. Functionally, sensorimotor control is best de-scribed as being near optimal.10 It may seem surpris-ing that a hierarchical controller can closelyapproximate an optimal controller. But as we haveshown elsewhere,11,12 optimal feedback controllersfor redundant systems exhibit hierarchical organiza-tion similar to Figure 1, even when such organizationis not imposed by design. This finding provides yetanother motivation for the present scheme: If full-blown optimization on redundant tasks is known toyield hierarchical structure, it makes sense to restrictthe optimization to an !appropriately chosen" familyof hierarchical controllers.

The general idea that the brain monitors a smallnumber of task parameters y instead of the full statex, generates abstract commands v, and maps theminto muscle activations u using motor synergies, hasbeen around for a long time.13,14 A number of concretemodels of end-effector control have been formulatedin the context of reaching tasks.15–20 The high-levelstate in such models is assumed to be hand position,the abstract command is desired velocity in handspace or in joint space, and the high-level controlleris a simple positional servo. While these models arerelated to our work, in some sense they leave all thehard questions unanswered: It is unclear how the taskparameters are actually controlled !i.e., what the cor-responding muscle synergies are", and whether thischoice of task parameters can yield satisfactory per-formance. We address these questions here.

Our framework is related in interesting ways toinput-output feedback linearization21,22 as well as tothe operational space formulation6—which also fit inthe general scheme in Figure 1. These methods yieldlinear dynamics on the high level, by cancelling theplant nonlinearities at the low level. However, manysystems of interest cannot be linearized, and further-more it is not clear that linearization is desirable inthe first place. Suppressing the natural plant dynam-ics may require large control signals—which are en-ergetically expensive, and also increase error in sys-tems subject to control-multiplicative noise !auniversal characteristic of biological movement".23–25

In contrast, we summarize the plant dynamics on thehigh level and thus create opportunities for exploit-ing them. Recent work in biped locomotion26 under-scores the potential of such approaches. In general,our objective is dimensionality reduction rather than

Figure 1. Schematic illustration of the proposedframework.

692 • Journal of Robotic Systems—2005

Journal of Robotic Systems 22(11), 691–710 (2005)

CoDriver in InteractIVe CRF Demonstrator

Design an agent capable of enacting the “like-me” framework, which means that it must have sensory- motor strategies similar to that of a human, and that it must be capable of using them to mirror human behaviour for inference of intentions and human-machine interaction for preventive safety, emergency-handling and efficient vehicle control.

Implemented Four layers ECOM-like behavioural subsumptive architecture.

Forward/mirroring mechanisms (by Optimal Control).

Motor imagery, inference of driver’s goals.

Main goal

34

states goals

Forward emulators are vehicle dynamics models that neglect high frequencies (not afforded by humans) but consider non-linarites.

A predictive model here serves to test the viability of different hypotheses of human driving intentions. Thus, its main requirement is similarity to humans’ used model (if not, co-driver predictions will not match observations even for a correct hypothesis).

we make the assumption that slow, if non-linear, phenomena are capable of being human-directed whereas faster ones are not (due to human actuation limits and band width).

Inverse emulators are minimum jerk/minimum time optimal control (OC) plans that links perceptual goals (i.e. desired states) to the actions needed to achieve those goals

Other approaches may be based on machine learning; for example, the learning of either or both inverse and forward models.

OC has the advantage that needs knowing only the forward model and optimality criteria

Building blocks#1 - Emulators

35

Motor primitives are parametric instantiations of inverse emulators that achieve specified goals. They are the solution of inverse models that determine the (optimal) control required to reach a desired state a some future time T.

Since there may be several types of final states and optimization criteria, the inversion problem produces a corresponding number of solutions, which we may regard as different motor primitives parameterized.

There are 4 motor primitives:

Speed Adaptation (SA)

Speed Matching (SM)

Lateral Displacement (LD)

Lane Alignment (LA).

Building blocks#2 – Motor primitives

36

Motor primitives37

Example for longitudinal dynamics #1

Optimal control formulation with some simplifications

d

dts(t) = u(t)

d

dtu(t) =

1

Me

�fx

(p(t), u(t))� k0 � kv

u(t)2�

| {z }a(t)

d

dta(t) = k

p

jx

(t)

J =

ZT

0

�jx

(t)2 + wT

�dtgoal function

system model

B(x(0),x(T )) = 0Boundary conditionsJ =

ZT

0[jx

(t)2 + wT

+ �1(t)

✓d

dts(t)� u(t)

◆

+ �2(t)

✓d

dtu(t)� a(t)

◆

+ �3(t)

✓d

dta(t)� k

p

jx

(t)

◆]dt

x = [s(t), u(t), a(t)]states

Speed matching38


We define the motor primitive boundary conditions

After first variation and applying Pontryagin Principle we get:

perceptual goalsBoundary conditions for Speed Matching (SM)

B(x(0),x(T )) =

2

6666664

s(0) = 0u(0) = ui

a(0) = uf

s(T ) = sfu(T ) = uf

a(T ) = 0

3

7777775

jx

(t) = �kp

�3(t)

2Optimal control law

d

dt�1(t) = 0

d

dt�2(t) + �1(t) = wT

d

dt�3(t) + �2(t) = 0

Co-State equations

Speed matching39


Given the space with can solve for the minimum time T

Non linear equation in T to be solve numerically:

a (⇣) =

✓3⇣2

T 2 � 4⇣

T+ 1

◆ai +

✓�6

⇣2

T 3 + 6⇣

T 2

◆uf +

✓6⇣2

T 3 � 6⇣

T 2

◆ui +

✓⇣3

12� ⇣2

8T +

T 2⇣

24

◆wT

sf = s (⇣) =

✓� 2⇣3

3T+

⇣2

2+

1⇣4

4T 2

◆ai +

✓� 1⇣4

2T 3 +⇣3

T 2

◆uf +

✓1⇣4

2T 3 + ⇣ � ⇣3

T 2

◆ui

+

✓1

240⇣5 � 1

96⇣4T +

1

144⇣3T 2

◆wT

u (⇣) =

✓�2

⇣2

T+ ⇣ +

⇣3

T 2

◆ai +

✓�2

⇣3

T 3 + 3⇣2

T 2

◆uf +

✓2⇣3

T 3 + 1� 3⇣2

T 2

◆ui +

✓⇣4

48� ⇣3T

24+

⇣2T 2

48

◆wT

�3 (⇣) =

✓�12

⇣

T 2 + 8T�1

◆ai +

✓24

⇣

T 3 � 12T�2

◆uf +

✓�24

⇣

T 3 + 12T�2

◆ui +

✓�⇣2

2+

T ⇣

2� T 2

12

◆wT

jx

(⇣) = ��3(⇣)

2Optimal control

Speed matching40


for different wT: please note the different value of initial jerk

Space-time trajectories deals with either longitudinal or lateral control to manage one single motor task.

There are 6 functions:

FollowObject (FO): approach a preceding obstacle with desired time gap TH (Time Headway)

ClearObject (CO): The purpose of this maneuver is to clear a frontal object on either side of the host vehicle

FreeFlow (FF)This maneuver produces a SA primitive by guessing a target speed uT

LaneFollow (LF)

LandMarks (LM)

Curves (CU)

Building blocks#3 - Trajectories

41

It is a combination of motor primitives

Obstacle lateral movement model: align the road in some time T* according to the same ego-vehicle lateral forward model

first compute the encounter time T° using the longitudinal motion models;

produce an LD primitive (i.e., parameters nT and T) such that a specified clearance c0, is obtained at T°.

Example Clear Object

42

For Review O

nly

T-ITS-13-11-0605.R1 7

carry out inference of host vehicle intentions, namely: re-using the framework we are developing.

However, although a fascinating research possibility, we opted not to use this approach here for a number of practical reasons: the speed and direction of travel of other vehicles is known with less accuracy than one’s own, acceleration meas-urement tends not to be reliable, other driver controls are not directly observable and a view of the road network from the host vehicle perspective is not easily available. These limita-tions could of course be overcome in future cooperative sys-tems applications.

Follow Object (FO). The purpose of this maneuver is to

approach a preceding vehicle, as in Fig.1 using maneuver a, producing a desired time headway gap th.

The simplified obstacle longitudinal motion model assumes that longitudinal velocity vo (i.e., the obstacle velocity project-ed onto lane direction) is fairly constant. To deal with acceler-ating obstacles we rely on the continuous updating of motor plans in receding horizon iterations. In section IV below we discuss the limitations of this simplification.

The FollowObject maneuver is thus a perception-action map, which takes as input the object, the desired time head-way, and the time pressure parameter wT and returns a Speed Matching (SM) primitive:

FollowObject : object, th , wT( )→ SM xT ,uT ,wT( ) (15)

This means computing the target point xT and velocity uT that correspond to following the object as required:

uT = voxT = so + vo T − vo th − lo

(16)

where lo is a longitudinal clearance that accounts for the

lengths of host vehicle and obstacle plus any extra desired clearance, voth is the aimed-at time headway gap, so is the ini-tial distance of the object and T is the maneuver duration, which is obtained by solving (16) together with (5).

The FollowObject function thus instantiates an SM primi-tive. In Fig.4 arrows between two levels indicate this form of input/output relationship.

The current value of the longitudinal control:

jp,0 = jp, SM 0, xo,u o ,wT( ) (17)

is here of particular significance, because it indicates how the co-driver ought to drive now in order to follow the object, which can be directly compared with the longitudinal control that the human driver employs.

Note that the followed object does not need to be in the host vehicle’s lane for this function to apply. If it is travelling in a parallel lane, including the case where it is behind the host car, then this function may be used to compute maneuvers that, for example, open a gap before a lane change may be executed.

Clear Object (CO). The purpose of this maneuver is to

clear a frontal object on either side of the host vehicle (Fig.5). As indicated, understanding the directional intentions of the

object vehicle would ideally require knowing the road network in order to find which lane the obstacle might be following. Since, in the present version of the system, we only know the geometry of our own road/lane, what we can do is to assess whether the object is moving in our own road, or whether it is moving across the road, in which case our understanding of its intentions will be correspondingly degraded.

If the object were moving in our road, its lateral movement would follow a model similar to (14), i.e., the object would sooner or later re-align with the lane. Since we cannot measure the curvature of the object trajectory directly, we simplify the problem by setting Δ0=0 in (10), and κ(.)=0 in A1.1. With the-se simplifications, the OC problem given by (9, 10, A1.1, 13) can be solved analytically, yielding the approximate predictive model for object lateral motion employed here:

sn = sn, 0 + vn t 1−tT *

⎛⎝⎜

⎞⎠⎟2

+ 12

tT *

⎛⎝⎜

⎞⎠⎟3⎡

⎣⎢⎢

⎤

⎦⎥⎥

(18)

Note that the model contains a parameter T*, which stands

for how long the object maneuver will last: in essence, a kind of intentional assessment. At this point we do not try to esti-mate T*, but use the heuristically-derived figure T*~ 2.5 s (see also next section and section IV).

The maximum lateral displacement of the object will be achieved at t = T*:

sn,max = sn, 0 + vnT *

2 (19)

If this position falls within one lane of the current object

lane then model (18) is confirmed (i.e., we assume the object is following our road, possibly changing one lane only). If not, the object is considered to be crossing our road. In this case its transverse motion is taken to be uniform:

sn = sn, 0 + vn t (20)

With an object predictive model (in our case the simple

equations 16-first, 18, 20) we can now compute evasive ma-neuvers, as Fig.5 shows. The dark vehicle is the obstacle and

Fig. 5. Evasive maneuvers.

Page 7 of 19

PLEASE KEEP CONFIDENTIAL

IEEE Intelligent Transportation Systems Transactions and Magazine

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

econtounter time T°

nO = nO0 + vnt

2t = 0 . . . T ⇤

Combines level 2 motor chunks into maneuvers.

Makes arrays of hypotheses, including incorrect ones that will be used for inference of intentions.

!

Building blocks#4 – Navigation hypotheses

43

4

14

Ji = w�||j� � ˆj�||2 + wp||jp � ˆjp||2 + wnJn

wnJn = steering cost

Building blocks#5a – Inference of intentionsCompares co-driver motor output of the generated hypotheses with human control (generative approach)

Use a saliency approach.

44

longitudinal control

lateral control

1 2 3 4 5 6 7 8

3c

3a

3b

Ji = w�||j� � ˆj�||2 + wp||jp � ˆjp||2 + wnJn

wnJn = steering cost

Building blocks#5a – Interaction

45

Interactions are application-dependent (CRF CS is an assistive system)

Intention is known.

If it is correct the system does nothing.

If t is incorrect, the system knows two ways to rectify it.

The system suggests the longitudinal correction, except when the lateral correction is in lane.

Inference of driver intentions (model identification problem).

Implements motor imagery, imitation, mindreading (model identification) for all meaningful goal.

Top level (goal/motivations level)

46

User tests Examples and discussion

The test route was a 53 km loop from CRF headquarters in Orbassano (Turin, Italy) to Pinerolo, Piossasco and back, which included urban arterials, extra urban roads, motorways, roundabouts, ramps, and intersections.

A total of 35 hours of logs have been collected, including sensor data, co-driver output, and images from a front camera.

24 test users

Vehicle demonstrator & User Test Route

48

How does it work? Experimental results49

Example #1: Car following


Example #2: Pedestrian

Standing still Crossing


Example #3: Cut-in manoeuvre

when the two agents disagree, to assess the reason it is necessary to manually inspect the recordings. Reasons for mismatch may be:

poorly-optimized co-driver (frequent during development)

perception noise

driver error

simple difference of “opinions” between the two agents (see below).

52

For Review O

nlyT-ITS-13-11-0605.R1 11

poorly-optimized co-driver (frequent during development), b) perception noise, c) driver error, d) simple difference of “opinions” between the two agents (see below).

Fig.8 (a) shows an example situation, which happened 1.1 s before the event depicted in Fig.7, when, for the first time, the co-driver detected a risk for maneuver 1.

Fig. 8 (a) compares the longitudinal acceleration of the two

agents: the driver did nothing for approximately the next se-cond (in the meantime a warning was issued). Then, beginning at about 1.1 s, the driver used the same longitudinal accelera-tion the co-driver planned 1 second earlier. Later, after ~3.7 s, the driver departs again from the (current) co-driver plan. In this example the difference between the two agents may be attributed to a delayed reaction of the human driver and (after 3.7 s) to sudden brake release, accepting a slightly short time headway (0.9 s) for a while.

If the difference between the two agents (driver minus co-driver acceleration) is plotted for all frames together, a distri-bution similar to that of Fig.8 (b) is obtained. Similarly, for

lateral control, the difference between the lateral positions (co-driver minus driver) yields the distribution in Fig.8 (c).

Before commenting further, let us introduce Fig.9, which provides a different interpretation, looking at data during a course sample of 10 minutes.

Fig. 9 (a) thus plots the average difference of the two agents’ acceleration in the 5-second prediction window (i.e., the mean of the differences shown in Fig.8 (a)).

In a, d, e, and similar situations later, the human driver em-ploys relatively high accelerations in low gears. The co-driver does not try to discover exactly how fast the human would like to drive because it is not important: these are FreeFlow states.

Another reason for mismatch occurs in situations like b. This case corresponds to a FollowObject state (15), which happens when the longitudinal control is limited by obstacles ahead. FO states are marked with vertical bands (light green in the online version of the paper). In the FollowObject state (like in b) a typical pattern is often observed: as the vehicle ahead gets closer, the difference between co-driver accelera-tion plans and the real driver execution increases (the driver going faster) until the co-driver state switches to short (be-tween 1.1 and 0.6 s) and very short (below 0.6 s) time head-way. Note this is not the current time headway, but the time headway goal used in (15), which may be reached at the end of the motor unit, i.e., the intention of the driver. A yellow or red warning is issued by the HMI, marked with gray dents in the bands, for short (less than 1.1 s) or very short (less than 0.6 s) intentional time headways.

These two reasons together explain the upper part of the distribution of Fig.8 (b), with the 0.0025 quantiles curve cor-responding to the largest differences occurring in a dangerous situation. Thus, the curve peaks at approximately 2-3 s in the future (related to delayed reaction of the driver).

Mismatches of the opposite sign (the driver using less ac-celeration than the co-driver) occur primarily before curves, such as c and g, which are, however, at a roundabout entrance. The driver, approaching a ‘yield’ sign, reduces speed more than simply required by the curve (and sometimes even stops), presumably for visibility reasons that are not considered by the co-driver. This discrepancy is not dangerous. The condition in f is instead one curve where the driver goes faster. However not faster enough to trigger a dangerous curve state.

The “slowness” of the driver explains the lower part of Fig.8 (b).

Overall the two agents disagree for three reasons, of which one is incorrect driver behavior, the other two being concrete-ly different choices, in which one of the two opts for going slower.

The root mean square of the acceleration mismatch is 0.25 m/s2, which is approximately the darker strip in Fig.8 (b).

Fig.9 (b) deals with the lateral dynamics. A partly rectified representation is used, which shows the lane changes. Both trajectory and prediction error are plotted (respectively violet and blue in the online version). The lane edges, as seen from the lane recognition camera, are shown and the lane itself is shaded. Non-motorway sections, including ramps and rounda-bouts, exhibit a quite complex geometry. Note how close the

Fig. 8. (a, top) difference in longitudinal acceleration between the two agents; (b, center) distribution of acceleration difference; (c, bottom) distribution of lateral position difference.

Page 11 of 19



123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

accepting a slightly short time headway (0.9 s) for a while.

Anticipation of overtake53

For Review O

nlyT-ITS-13-11-0605.R1 13

These events represent a different form of inference of in-tentions, pertaining to a higher cognitive level, which are de-tected by the switching from FollowObject to ClearObject second-level behaviors, with the intention to overtake being not yet manifested in terms of LD motor primitives.

This happens with anticipation ranging from 1.6 s to 8.1 s (median 4.1 s).

Fig.12 summarizes the situations for the 5 lane changes with overtaking. It shows the camera view when the overtake intention is detected (first row), when the lane crossing is pre-dicted (center) and when it actually takes place (bottom). The last element of first row, at the top right, is associated with Fig.11. In case 14, the “overtake” state and the “lane crossing” state happen simultaneously, when the left lane becomes free (fourth column in Fig.12).

A. Comparison with other approaches within the literature A number of alternative approaches exist in the literature

for prediction of intent in driving. A review is given by Doshi [121], and, one more focused on gaze, in [122].

The vast majority of methods are classifiers that learn and recognize stereotyped head or gaze patterns preceding action execution. Pre-attentive vision may also be important [72].

While these methods may use a variety of algorithms (e.g., Hidden Markov Models, Neural Networks of various kinds, Bayesian Networks, etc.) they belong to a single class of in-tention inference methods termed as “action-to-goal” (Csibra [123]), which predicts the “likely outcome of ongoing activi-ty”.

Conversely, the method of this paper belongs to the “goal-to-action” approach, which is also defined as “teleological”, meaning that actions are functional to some end [123] and consequential to that end. The teleological interpretation thus moves from a plausible goal to a generation (from which the method is also termed “generative” [53]) of the expected se-quence of actions, with the granularity level sufficient for comparison with observed actions. It thus anticipates expected actions before they actually begin (if they do not begin, a revi-sion of the intentions is carried out) and is hence termed “pre-

dictive tracking of dynamic actions” [123]. The prediction of overtaking, for maneuvers 2, 6, 10, 14 and

18 is one example of “goal-to-action”. It is obtained because the overtake maneuver is the only one that has meaning (Fig.11) within the context. It thus anticipates the appearance of the LD motor primitives by a few seconds (Fig.12).

An interesting comparison can be carried out with the

method of McCall [124], which shows that learning and clas-sification of the head pose helps to predict lane changes. Sim-ple trajectory forecasts produce discrimination power (DP) equal to 0.95 at 2.5 s before lane change (95% detection prob-ability and 5% false alarm rate in lane-keeping). By including head movement classification the same DP is achieved 0.5 s earlier.

For trajectory forecast, a comparison may be attempted with Fig.8 (c). This distribution potentially allows for derivation of false alarms rates and detection probabilities. For example, for a vehicle keeping the lane at the center, the two horizontal lines at -1.8 and +1.8 meters represent the lane edges. The forecast co-driver trajectory falls outside of the lane at 5 s with 5% probability (2.5% per side), which becomes 0.5% if the forecast is considered at 2.5 s. Conversely, for a lane-change maneuver, the left edge might look like the dashed s-shaped line (from +1.8 to -1.8 m). Thus an estimation of 97.5% detec-tion probability may be derived (as the proportion of trajecto-ries falling above the lane edge at the end).

However, this ideal situation is not achieved because the vehicle starting position may be closer to one edge or may follow a different profile. Indeed, the two false alarms oc-curred within the example data set in such conditions. The situation is actually clearer for motorways than complex curved geometries due to greater quantities of perception noise and lower accuracy of the forward emulator.

McCall (and others) use a classifier to predict the probabil-ity of lane change, which is then binarized with a proper threshold choice. Different thresholds produce different cou-plings of detection probability and false alarm rates, which are plotted as receiver operator characteristic (ROC) curves.

Fig. 12. Anticipation of overtake, anticipation of lane change and actual lane change for cases 2, 6, 10, 14 and 18 (left to right). First row: overtake intention detected as second-level state transition (FollowObject to ClearObject behavior). Second row: lane change detected as motor primitive level transition (LD crossing the lane). Third row: actual lane crossing.

Page 13 of 19



123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Ove

rtak

e

inte

ntio

nla

ne c

hang

e d

etec

ted

actu

al

lane

cro

ssin

g

Anticipation of overtake54

For Review O

nly

T-ITS-13-11-0605.R1 13

These events represent a different form of inference of in-tentions, pertaining to a higher cognitive level, which are de-tected by the switching from FollowObject to ClearObject second-level behaviors, with the intention to overtake being not yet manifested in terms of LD motor primitives.

This happens with anticipation ranging from 1.6 s to 8.1 s (median 4.1 s).

Fig.12 summarizes the situations for the 5 lane changes with overtaking. It shows the camera view when the overtake intention is detected (first row), when the lane crossing is pre-dicted (center) and when it actually takes place (bottom). The last element of first row, at the top right, is associated with Fig.11. In case 14, the “overtake” state and the “lane crossing” state happen simultaneously, when the left lane becomes free (fourth column in Fig.12).

A. Comparison with other approaches within the literature A number of alternative approaches exist in the literature

for prediction of intent in driving. A review is given by Doshi [121], and, one more focused on gaze, in [122].

The vast majority of methods are classifiers that learn and recognize stereotyped head or gaze patterns preceding action execution. Pre-attentive vision may also be important [72].

While these methods may use a variety of algorithms (e.g., Hidden Markov Models, Neural Networks of various kinds, Bayesian Networks, etc.) they belong to a single class of in-tention inference methods termed as “action-to-goal” (Csibra [123]), which predicts the “likely outcome of ongoing activi-ty”.

Conversely, the method of this paper belongs to the “goal-to-action” approach, which is also defined as “teleological”, meaning that actions are functional to some end [123] and consequential to that end. The teleological interpretation thus moves from a plausible goal to a generation (from which the method is also termed “generative” [53]) of the expected se-quence of actions, with the granularity level sufficient for comparison with observed actions. It thus anticipates expected actions before they actually begin (if they do not begin, a revi-sion of the intentions is carried out) and is hence termed “pre-

dictive tracking of dynamic actions” [123]. The prediction of overtaking, for maneuvers 2, 6, 10, 14 and

18 is one example of “goal-to-action”. It is obtained because the overtake maneuver is the only one that has meaning (Fig.11) within the context. It thus anticipates the appearance of the LD motor primitives by a few seconds (Fig.12).

An interesting comparison can be carried out with the

method of McCall [124], which shows that learning and clas-sification of the head pose helps to predict lane changes. Sim-ple trajectory forecasts produce discrimination power (DP) equal to 0.95 at 2.5 s before lane change (95% detection prob-ability and 5% false alarm rate in lane-keeping). By including head movement classification the same DP is achieved 0.5 s earlier.

For trajectory forecast, a comparison may be attempted with Fig.8 (c). This distribution potentially allows for derivation of false alarms rates and detection probabilities. For example, for a vehicle keeping the lane at the center, the two horizontal lines at -1.8 and +1.8 meters represent the lane edges. The forecast co-driver trajectory falls outside of the lane at 5 s with 5% probability (2.5% per side), which becomes 0.5% if the forecast is considered at 2.5 s. Conversely, for a lane-change maneuver, the left edge might look like the dashed s-shaped line (from +1.8 to -1.8 m). Thus an estimation of 97.5% detec-tion probability may be derived (as the proportion of trajecto-ries falling above the lane edge at the end).

However, this ideal situation is not achieved because the vehicle starting position may be closer to one edge or may follow a different profile. Indeed, the two false alarms oc-curred within the example data set in such conditions. The situation is actually clearer for motorways than complex curved geometries due to greater quantities of perception noise and lower accuracy of the forward emulator.

McCall (and others) use a classifier to predict the probabil-ity of lane change, which is then binarized with a proper threshold choice. Different thresholds produce different cou-plings of detection probability and false alarm rates, which are plotted as receiver operator characteristic (ROC) curves.

Fig. 12. Anticipation of overtake, anticipation of lane change and actual lane change for cases 2, 6, 10, 14 and 18 (left to right). First row: overtake intention detected as second-level state transition (FollowObject to ClearObject behavior). Second row: lane change detected as motor primitive level transition (LD crossing the lane). Third row: actual lane crossing.

Page 13 of 19



123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Ove

rtak

e

inte

ntio

nla

ne c

hang

e d

etec

ted

actu

al

lane

cro

ssin

g

For Review O

nly

T-ITS-13-11-0605.R1 12

driver passes to edges during transitions (e.g., taking the first exit ramp at cycle ~700). The numbered light green bands stand for the lane changes. The interval between when the LD motor primitive predicts the lane crossing and the actual cross-ing of the lane is shaded. There are 21 changes correctly pre-dicted, with anticipation ranging from 1.1 s to 2.4 s (median 1.6 s).

There are two false crossing predictions, labeled a, and b, that happen in the non-motorway section when the trajectory pass-es close to edges. Fig 10 shows the camera view to demon-strate how demanding this situation actually is. Despite the false prediction, the absolute value of the lateral prediction error is limited (a fraction of the vehicle width). Non-motorway segments are characterized by often-irregular lane geometry, with splitting and merging lanes, often with missing marking traits (Fig.10 b), or else the camera failing to recog-nize them.

Fig.8 (c) shows that 0.0025 quantile curves, i.e., 99.5% of

the LD motor primitives depart from the real trajectory for less than one quarter of lane in 2 s, less than half lane in 2.5 s and less than one full lane in 5 s. The points where larger devia-tions happen may be seen in Fig.9 (the prediction error is plot-ted). They are typically at inversions of the heading angle and in complex geometries (a and b).

In Fig.9 (b), the dashed blue vertical lines before lane changes 2, 6 10, 14 and 18, mark the point where the co-driver switches from FollowObject (second-level) behavior to Clear-Object behavior. This is the point where the agent realizes that the human intention may be to overtake.

For example, Fig.11 shows the control output space 4.1 s

before lane change 18, showing how the driver is going to chose the overtake maneuver.

Fig. 10 Complex geometry at false alarm points.

Fig. 11 Detection of the intention to overtake (the camera view for this is given by the top right frame of Fig.12).

Fig. 9. Comparison of driver and co-drover on a 10 minute course. (a, top) longitudinal dynamics (see text). (b, bottom) lateral dynamics.

Page 12 of 19



123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Incorrect interpretation55

Sensory system plays a fundamental role.

Must be accurate.

For Review O

nly

T-ITS-13-11-0605.R1 12

driver passes to edges during transitions (e.g., taking the first exit ramp at cycle ~700). The numbered light green bands stand for the lane changes. The interval between when the LD motor primitive predicts the lane crossing and the actual cross-ing of the lane is shaded. There are 21 changes correctly pre-dicted, with anticipation ranging from 1.1 s to 2.4 s (median 1.6 s).

There are two false crossing predictions, labeled a, and b, that happen in the non-motorway section when the trajectory pass-es close to edges. Fig 10 shows the camera view to demon-strate how demanding this situation actually is. Despite the false prediction, the absolute value of the lateral prediction error is limited (a fraction of the vehicle width). Non-motorway segments are characterized by often-irregular lane geometry, with splitting and merging lanes, often with missing marking traits (Fig.10 b), or else the camera failing to recog-nize them.

Fig.8 (c) shows that 0.0025 quantile curves, i.e., 99.5% of

the LD motor primitives depart from the real trajectory for less than one quarter of lane in 2 s, less than half lane in 2.5 s and less than one full lane in 5 s. The points where larger devia-tions happen may be seen in Fig.9 (the prediction error is plot-ted). They are typically at inversions of the heading angle and in complex geometries (a and b).

In Fig.9 (b), the dashed blue vertical lines before lane changes 2, 6 10, 14 and 18, mark the point where the co-driver switches from FollowObject (second-level) behavior to Clear-Object behavior. This is the point where the agent realizes that the human intention may be to overtake.

For example, Fig.11 shows the control output space 4.1 s

before lane change 18, showing how the driver is going to chose the overtake maneuver.

Fig. 10 Complex geometry at false alarm points.

Fig. 11 Detection of the intention to overtake (the camera view for this is given by the top right frame of Fig.12).

Fig. 9. Comparison of driver and co-drover on a 10 minute course. (a, top) longitudinal dynamics (see text). (b, bottom) lateral dynamics.

Page 12 of 19



123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Missing behaviors (incomplete PA architecture), e.g., overtake maneuver. Complex behaviors may often be decomposed in simpler ones (e.g., overtake -> lane change + free flow + lane change)

The system in this case understands the single phase but not the complex one (it still works!).

Inaccurate behaviors. A co-driver with non-human behaviors fails to understand the intentions (e.g., forgetting to model under steer).

Missing hypotheses. The co-driver uses the closest hypotheses and may fail.

Plausibility approach, together with behavioral discretization increase robustness at the expense of granularity of intention resolution.

Behaviours from basic principles (e.g., adaptive lane keeping arise naturally from the use of the second manoeuvre.

Discussion– Inference of intentions

56

The system has been tested on a ~50 km road path with ordinary drivers (24 drivers, twice each for a total of 35 hours).

False alarms were a few (2-4) per trip most due to noise in the perception system. Very few alarms may be ascribed to incomplete/missing/imperfectly designed co-driver behaviors (most of these being due to mismatch between the driving styles, so not critical).

Collect data will help refine the motor primitives and behaviors built into the system.

The hierarchical architecture is easily scalable, maintainable and testable.

Major limitations: the system does not work yet in intersecting roads (has poor understanding of intersecting vehicles intentions).

Discussion– User tests

57

Technology impact on future applications

A co-driver that understands driver goals, is a “friend” that:

Enables Adaptive Automation (offer the appropriate support type and level at any time as a human peer would do)

Improve execution of manoeuvres (substitute human execution with machine execution while preserving the goal – just like chassis control but at navigation-cognitive level)

Navigate by hints (just like a horse) and largely autonomously until new goals come manifest form the human

Take over/supervise driver control (under certain conditions)

Is understandable to other drivers.

Unified framework for smart (safe, green, comfort) functions.

Peer-to-peer human-robot interactions

59

Virtual drivers enables cooperative swarm behaviors.

They exchange each other goal (i.e., their Drivers ECOM states).

Inference of other agents goals can be carried out if they are not cooperative (they also have some goal).

Safety as emergent behavior. Each agent adapts own plans to the others, producing a collective emerging swarm behavior.

Green cooperative driving as an emerging behavior (produced by energy efficiency criterion in the mirroring mechanisms).

Cooperative systems

60

Novel situations happens occasionally.

Drivers are often not prepared to handle rare events, because they have no prior experience.

Motor imagery can be used to analyze them by simulation and extend by synthetic learning the subsumtive architecture with novel PA loops.

Accidents are rarer such events.

Co-drivers (even if they don’t survive) will collect very detailed accident data that could be later used for synthetic learning.

Accidents (and near miss) of some co-driver will teach something to others, which will become more and more capable of reacting properly in rare events.

Cognitve co-driver will be able to self-extend their application domain.

Synthetic learning for novel situations and rare events

61

Co-drivers will gradually collect more and more experience

This can be shared:

improved driver profiles (personas),

improved interactions,

improved handling of critical situations,

special driver classes (elder, and people with some disabilities),

naturalistic data collection,

other...

Learning interaction

62

Adaptive cognitive driving systems

Technology

Transcript of Adaptive cognitive driving systems