Joint Attention in Autonomous Driving (JAAD) › pdf › 1609.04741.pdf · 2020-04-24 ·...

1

Joint Attention in Autonomous Driving (JAAD)Iuliia Kotseruba Amir Rasouli and John K Tsotsos

yulia_k aras tsotsoscseyorkuca

AbstractmdashIn this paper we present a novel dataset for a criticalaspect of autonomous driving the joint attention that must occurbetween drivers and of pedestrians cyclists or other driversThis dataset is produced with the intention of demonstrating thebehavioral variability of traffic participants We also show howvisual complexity of the behaviors and scene understanding isaffected by various factors such as different weather conditionsgeographical locations traffic and demographics of the peopleinvolved The ground truth data conveys information regardingthe location of participants (bounding boxes) the physical con-ditions (eg lighting and speed) and the behavior of the partiesinvolved

I INTRODUCTION

Autonomous driving has been a topic of interest for decadesImplementing autonomous vehicles can have a great economicand social impacts including reducing the cost of drivingincreasing fuel efficiency and safety enabling transportationfor non-drivers and reducing the stress of driving by allowingmotorists to rest and work while traveling [1] As for themacroeconomic impacts it is estimated that autonomous vehi-cles industry and related software and hardware technologieswill account for a market size of more than 40 billion dollarsby 2030 [2]

Partial autonomy has long been used in commercial vehiclesin the form of technologies such as cruise control parkassist automatic braking etc Fully autonomous vehicles alsohave been successfully developed and tested under certainconditions For example the DARPA challenge 2005 set thetask of autonomously driving a 732 miles predefined terrainin the deserts of Nevada Out of 23 final contestants 4 carssuccessfully completed the course within the allowable timelimit (10 hours) while driving fully autonomously [3]

Despite such success stories in autonomous control systemsdesigning fully autonomous vehicles for urban environmentsstill remains an unsolved problem Aside from challengesassociated with developing suitable infrastructures and regu-lating the autonomous behaviors [1] in order to be usable inurban environments autonomous cars must have a high levelof precision and meet very high safety standards [4]

Today one of the major dilemmas faced by autonomousvehicles is how to interact with the environment includinginfrastructure cars drivers or pedestrians [5] [6] [7] Thelapses in communication can be a source of numerous erro-neous behaviors [8] such as failure to predict the movement ofother vehicles [9] [10] or to respond to unexpected behaviorsof other drivers [11]

The impact of perceptual failures on the behavior of anautonomous car is also evident in the 2015 annual report onGooglersquos self-driving car [12] This report is based on testingself-driving cars for more than 424 000 miles of driving on

public roads including both highways and streets Throughoutthese trials a total of 341 disengagements occurred in whichthe driver had to take over the car and about 90 of thecases occurred in busy streets The interesting implication hereis that over 13 of the disengagements were due toldquoperceptiondiscrepancyrdquo in which the vehicle was unable to understand itsenvironment and about 10 of the cases were due to incorrectprediction of traffic participants and inability to respond toreckless behaviors

There have been a number of recent developments toaddress these issues A natural solution is establishing wirelesscommunication between traffic participants This approach hasbeen tested for a number of years using cellular technology[7] [13] This technique enables vehicle to vehicle (V2V)and vehicle to infrastructure (V2I) communication allowingtasks such as Cooperative Adaptive Cruise Control (CACC)improving positioning technologies such as GPS and intel-ligent speed adoption in various roads Peer to peer trafficcommunication is expected to enter the market by 2019

Although V2V and V2I communications are deemed tosolve a number of issues in autonomous driving they alsohave a number of drawbacks This technology relies heavilyon cellular technology which is costly and has much lowerreliability compared to traditional sensors such as radars andcameras In addition communication highly depends on allparties functioning properly A malfunction in any commu-nication device in any of the systems involved can lead tocatastrophic safety issues

Maintaining communication with pedestrians is even moreimportant for safe autonomous driving Inattention from bothdrivers and pedestrians regardless of their knowledge of thevehicle code is one of the major reasons for traffic accidentsmost of which involve pedestrians at crosswalk locations [14]

Honda recently released a new technology similar to V2Vcommunication that attempts to establish a connection withpedestrians through their cellular phones [15] Using thismethod the pedestrianrsquos phone broadcasts its position warningthe autonomous car that a pedestrian is about to cross the streetso the car can respond accordingly This technology also cango one step further and inform the car about the state of thepedestrian for instance whether heshe is listening to musictexting or is on a call Given the technological and regulatoryobstacles to developing such technologies using them in thenear future does not seem feasible

In late 2015 Google patented a different technology tocommunicate with pedestrians using a visual interface calledpedestrian notifications [16] In this approach the Google carestimates the trajectories of pedestrian movements If the carfinds the behavior of a pedestrian to be uncertain (ie cannotdecide whether or not heshe is crossing the street) it notifies

arX

iv1

609

0474

1v6

[cs

RO

] 2

2 A

pr 2

020

2

the corresponding pedestrian about the action it is about totake using a screen installed on the front hood of the carAnother proposed option for communication is via a sounddevice or other kinds of physical devices (possibly a roboticarm) This technology has been criticized for being distractingand lacking the ability to efficiently communicate if more thanone pedestrian is involved

Given the problems associated with establishing explicitcommunication with other vehicles and pedestrians Nissanin their latest development announced a passive method ofdealing with uncertainties in the behavior of pedestrians [17][18] in attempt to understand human drivers and pedestriansbehaviors in various traffic scenarios The main objective ofthis work is to passively predict pedestrian behavior usingvisual input and only use an interface eg a green light toinform them about the intention of the autonomous car

Toyota in partnership with MIT and Stanford recently an-nounced using a similar passive approach toward autonomousdriving [19] Information such as the type of equipmentthe pedestrian carries hisher pose direction of motion andbehavior as well as the human driverrsquos reactions to eventsare extracted from videos of traffic situations and their 3Dreconstructions This information is used to design a controlsystem for determining what course of action to take in givensituations At present no data resulting from this study hasbeen released and very little information is available aboutthe type of autonomous behavior the scientists are seeking toobtain

Given the importance of pedestrian safety multiple studieswere conducted in the last several decades to find factors thatinfluence decisions and behavior of traffic participants Forexample a driverrsquos pedestrian awareness can be measuredbased on whether the driver is decelerating upon seeing apedestrian ([20] [21] [22]) Several recent studies also pointout that pedestrianrsquos behavior such as establishing eye-contactsmiling and other forms of non-verbal communication canhave a significant impact on the driverrsquos actions ([23] [24])Although a majority of these studies are aimed at developingbetter infrastructure and traffic regulations their conclusionsare relevant for autonomous driving as well

In an attempt to better understand the problem of vehicle tovehicle (V2V) and vehicle to pedestrian (V2P) communicationin the autonomous driving context we suggest viewing it as aninstance of joint attention and discuss why existing approachesmay not be adequate in this context We propose a noveldataset that highlights the visual and behavioral complexityof traffic scene understanding and is potentially valuable forstudying the joint attention issues

II AUTONOMOUS DRIVING AND JOINT ATTENTION

According to a common definition joint attention is theability to detect and influence an observable attentional be-havior of another agent in social interaction and acknowledgethem as an intentional agent [25] However it is important tonote that joint attention is more than simultaneous lookingattention detection and social coordination but also includesan intentional understanding of the observed behavior ofothers

Since joint attention is a prerequisite for efficient commu-nication it has been gaining increasing interest in the fieldsof robotics and human-robot interaction Kismet [26] and Cog[27] both built at MIT in the late 1990s were some of thefirst successes in social robotics These robots were able tomaintain and follow eye gaze reacted to the behavior of theircaregivers and recognized simple gestures such as declarativepointing More recent work in this area is likewise concernedwith gaze following [28] [29] [27] pointing [30] [27] andreaching [31] turn-taking [32] and social referencing [33]With a few exceptions [34] [35] almost all joint attentionscenarios are implemented with stationary robots or roboticheads according to a recent comprehensive survey [36]

Surprisingly despite increasing interest for joint attentionin the fields of robotics and human-robot interaction it hasnot been explicitly mentioned in the context of autonomousdriving For example communication between the driver andpedestrian is an instance of joint attention Consider thefollowing scenario a pedestrian crossing the street (shown inFigure 1a) Initially she is looking at her cell phone but as sheapproaches the curb she looks up and slows down because thevehicle is still moving When the car slows down she speedsup and crosses the street In this scenario all elements of jointattention are apparent Looking at the car and walking sloweris an observable attention behavior The driver slowing downthe car indicates that he noticed the pedestrian and is yieldingHis intention is clearly interpreted as such as the pedestrianspeeds up and continues to cross A similar scene is shown inFigure 1d Here the pedestrian is standing at the crossing andlooking both ways to find a gap in traffic Again once shenotices that the driver is slowing down she begins to cross

While these are fairly typical behaviors for marked cross-ings there are many more possible scenarios of communi-cation between the traffic participants Humans recognize amyriad of ldquosocial cuesrdquo in everyday traffic situations Apartfrom establishing eye contact or waving hands people maybe making assumptions about the way a driver would behavebased on visual characteristics such as the carrsquos make andmodel [6] Understanding these social cues is not alwaysstraightforward Aside from visual processing challenges suchas variation in lighting conditions weather or scene clutterthere is also a need to understand the context in which thesocial cue is observed For instance if the autonomous carsees someone waving his hand it needs to know whetherit is a policeman directing traffic a pedestrian attempting tocross the street or someone hailing a taxi Consider Figure 1bwhere a man is crossing the street and makes a slight gestureas a signal of yielding to the driver or Figure 1c wherea man is jaywalking and acknowledges the driver with ahand gesture Responding to each of these scenarios from thedriverrsquos perspective can be quite different and would requirehigh-level reasoning and deep scene analysis

Today automotive industry giants such as BMW Tesla Fordand Volkswagen who are actively working on autonomousdriving systems rely on visual analysis technologies devel-oped by Mobileye1 to handle obstacle avoidance pedestrian

1httpwwwmobileyecom

3

(a)

(b)

(c)

(d)

Figure 1 Examples of joint attention

4

detection or traffic scene understanding Mobileyersquos approachto solving visual tasks is to use deep learning techniques whichrequire a large amount of data collected from hundreds ofhours of driving This system has been successfully testedand is currently being used in semi-autonomous vehiclesHowever the question remains open whether deep learningsuffices for achieving full autonomy in which tasks are notlimited to detection of pedestrians cars or obstacles (whichare not still fully reliable [37] [38]) but also involve mergingwith ongoing traffic dealing with unexpected behaviors suchas jaywalking responding to emergency vehicles and yieldingto other vehicles or pedestrians at intersections

To answer this question we need to consider the followingcharacteristics of deep learning algorithms First even thoughdeep learning algorithms perform very well in tasks such asobject recognition they lack the ability to establish causalrelationships between what is observed and the context inwhich it has occurred [39] [40] This problem also has beenempirically demonstrated by training neural networks overvarious types of data [39]

The second limitation of deep learning is the lack ofrobustness to changes in visual input [41] This problem canoccur when a deep neural network misclassifies an object dueto minor changes (at a pixel level) to an image [42] or evenrecognizes an object from a randomly generated image [43]

III EXISTING DATASETS

The autonomous driving datasets currently available to thepublic are primarily intended for applications such as 3Dmapping navigation and car and pedestrian detection Outof these datasets only a limited number contain data that canbe used for behavioral studies Below some of these datasetsare listed

bull KITTI [44] This is perhaps one of the most knownpublicly available datasets for autonomous driving Itcontains data collected from various locations such asresidential areas and city streets highways and gatedenvironments The main application is for 3D reconstruc-tion 3D detection tracking and visual odometry Someof the videos in KITTI show pedestrians other vehiclesand cyclists movements alongside the car The data hasno annotation of their behaviors

bull Caltech pedestrian detection benchmark [45] This is avery large dataset of pedestrians consisting of approx-imately 10 hours of driving in regular traffic in urbanenvironments The annotations include temporal corre-spondence between bounding boxes around pedestriansand detailed occlusion lables

bull Berkeley pedestrian dataset [46] This dataset consists ofa large number of videos of pedestrians collected froma stationary car at street intersections Bounding boxesaround pedestrians are provided for pedestrian detectionand tracking

bull Semantic Structure From Motion (SSFM) [47] As thename implies this dataset is collected for scene under-standing The annotation is limited to bounding boxesaround the objects of interest and name tags for the

purpose of detection This dataset includes a number ofstreet view videos of cars and pedestrians walking

bull The German Traffic Sign Detection Benchmark [48] Thisdataset consists of 900 high-resolution images of roadsand streets some of which show pedestrians crossing andcars The ground truth for the dataset only specifies thepositions of traffic signs in the images

bull The enpeda (environment perception and driver assis-tance) Image Sequence Analysis Test Site (EISATS) [49]EISATS contains short synthetic and real videos of carsdriving on roads and streets The sole purpose of thisdataset is comparative performance evaluation of stereovision and motion analysis The available annotation islimited to the camerarsquos intrinsic parameters

bull Daimler Pedestrian Benchmark Datasets [50] These areparticularly useful datasets for various scenarios of pedes-trian detection such as segmentation classification andpath prediction The sensors of choice are monocular andbinocular cameras and the datasets contain both colorand grayscale images The ground truth data is limitedto the detection applications and does not include anybehavioral analysis

bull UvA Person Tracking from Overlapping CamerasDatasets [51] These datasets mainly are concerned withthe tasks of tracking and pose and trajectory estimationusing multiple cameras The ground truth is also limitedto only facilitate tracking applications

In recent years traffic behavior of drivers and pedestrians hasbecame a widely studied topic for collision prevention andtraffic safety Several large-scale naturalistic driving studieshave been conducted in the USA [52] [53] [54] whichaccumulated over 4 petabytes of data (video audio instrumen-tal traffic weather etc) from hundreds of volunteer driversin multiple locations However only some depersonalizedgeneral statistics are available to the general public [55] whileonly qualified researchers have access to the raw video andsensor data

IV THE JAAD DATASET

The JAAD dataset was created to facilitate studying thebehavior of traffic participants The data consists of 346high-resolution video clips (5-15s) with annotations showingvarious situations typical for urban driving These clips wereextracted from approx 240 hours of driving videos collectedin several locations Two vehicles equipped with wide-anglevideo cameras were used for data collection (Table I) Cameraswere mounted inside the cars in the center of the windshieldbelow the rear view mirror

The video clips represent a wide variety of scenarios involv-ing pedestrians and other drivers Most of the data is collectedin urban areas (downtown and suburban) only a few clipsare filmed in rural locations Many of the situations resemblethe ones we have described earlier where pedestrians waitat the designated crossings In other samples pedestrians maybe walking along the road and look back to see if there isa gap in traffic (Figure 4c) peek from behind the obstacleto see if it is safe to cross Figure 4d waiting to cross on a

5

Figure 2 A timeline of events recovered from the behavioral data Here a single pedestrian is crossing the parking lot Initiallythe driver is moving slow and as he notices the pedestrian ahead slows down to let her pass At the same time the pedestriancrosses without looking first then turns to check if the road is safe and as she sees the driver yielding continues to crossThe difference in resolution between the images is due to the changes in distance to the pedestrian as the car moves forward

divider between the lanes carrying heavy objects or walkingwith children or pets Our dataset captures pedestrians ofvarious ages walking alone and in groups which may be afactor affecting their behavior For example elderly people andparents with children may walk slower and be more cautious

of clips Location Resolution Camera model55 North York ON Canada 1920times 1080 GoPro HERO+

276 Kremenchuk Ukraine 1280times 720 Highscreen BlackBox Connect

6 Hamburg Germany 1280times 720 Highscreen BlackBox Connect

5 New York USA 1920times 1080 GoPro HERO+4 Lviv Ukraine 1920times 1080 Garmin GDR-35

Table I Locations and equipment used to capture videos inthe JAAD dataset

The dataset contains fewer clips of interactions with otherdrivers most of them occur in uncontrolled intersections inparking lots or when another driver is moving across severallanes to make a turn

Most of the videos in the dataset were recorded during thedaytime and only a few clips were filmed at night sunset andsunrise The last two conditions are particularly challengingas the sun is glaring directly into the camera (Figure 5b)We also tried to capture a variety of weather conditions(Figure 5) as yet another factor affecting the behavior oftraffic participants For example during the heavy snow or rainpeople wearing hooded jackets or carrying umbrellas may havelimited visibility of the road Since their faces are obstructedit is also harder to tell if they are paying attention to the trafficfrom the driverrsquos perspective

We attempted to capture all of these conditions for furtheranalysis by providing two kinds of annotations for the databounding boxes and textual annotations Bounding boxes areprovided only for cars and pedestrians that interact with orrequire attention of the driver (eg another car yielding to the

driver pedestrian waiting to cross the street etc) Boundingboxes for each video are written into an xml file with framenumber coordinates width height and occlusion flag

Textual annotations are created using BORIS2 [56] - eventlogging software for video observations It allows to assignpredefined behaviors to different subjects seen in the videoand can also save some additional data such as video file idlocation where the observation has been made etc

A list of all behaviors independent variables and theirvalues is shown in Table II We save the following data foreach video clip weather time of the day age and gender of thepedestrians location and whether it is a designated crosswalkEach pedestrian is assigned a label (pedestrian1 pedestrian2etc) We also distinguish between the driver inside the car andother drivers which are labeled as ldquoDriverrdquo and ldquoDriver_otherrdquorespectively This is necessary for the situations where two ormore drivers are interacting Finally a range of behaviors isdefined for drivers and pedestrians walking standing lookingmoving etc

An example of textual annotation is shown in Figure 3The sequence of events recovered from this data is shown inFigure 2

The dataset is available to download at httpdatanvision2eecsyorkucaJAAD_dataset

V CONCLUSION

In this paper we presented a new dataset for the purpose ofstudying joint attention in the context of autonomous drivingTwo types of annotations accompanying each video clip inthe dataset make it suitable for pedestrian and car detectionas well as other areas of research which could benefit fromstudying joint attention and human non-verbal communicationsuch as social robotics

2httpwwwborisunitoit

6

Categorical variable Valuestime_of_day daynight

weather clearsnowraincloudylocation streetindoorparking_lot

designated_crossing yesnoage_gender ChildYoungAdultSenior MaleFemale

Behavior event TypeCrossing stateStopped state

Moving fast stateMoving slow state

Speed up stateSlow down stateClear path stateLooking state

Look pointSignal point

Handwave point

Table II Variables associated with each video and types of events represented in the dataset There are two types of behaviorevents state and point State event may have an arbitrary duration while point events last a short fixed amount of time (01sec) and signify a quick glance or gestures made by pedestrians

Observation id GOPR0103_528_542Media file(s)Player 1 GOPR0103_528_542MP4Observation date 2016-07-15 151538DescriptionTime offset (s) 0000

Independent variablesvariable value

weather rain

age_gender AF

designated no

location plaza

time_of_day daytime

Time Media file path Media total length FPS Subject Behavior Comment Status

019 GOPR0088_335_344MP4 901 2997 Driver moving slow START

0208 GOPR0088_335_344MP4 901 2997 pedestrian crossing START

0308 GOPR0088_335_344MP4 901 2997 pedestrian looking START

1301 GOPR0088_335_344MP4 901 2997 Driver moving slow STOP

1302 GOPR0088_335_344MP4 901 2997 Driver slow down START

1892 GOPR0088_335_344MP4 901 2997 pedestrian looking STOP

8351 GOPR0088_335_344MP4 901 2997 pedestrian crossing STOP

899 GOPR0088_335_344MP4 901 2997 Driver slow down STOP

Figure 3 Example of textual annotation for a video created using BORIS The file contains the id and the name of the videofile a tab-separated list of independent variables (weather age and gender of pedestrians whether the crossing is designatedor not location and time of the day) and a tab-separated list of events Each event has an associated time stamp subjectbehavior and status which may be used to recover sequence of events for analysis

ACKNOWLEDGMENT

We thank Mr Viktor Kotseruba for assistance with process-ing videos for this dataset

REFERENCES

[1] T Litman (2014 dec) Autonomous Vehicle Implementation PredictionsImplications for Transport Planning Online [Online] Availablehttpwwwvtpiorgavippdf

[2] (2014 nov) Think Act Autonomous Driving Online[Online] Available httpsnewrolandbergercomwp-contentuploadsRoland_Berger_Autonomous-Driving1pdf

[3] S Thrun M Montemerlo H Dahlkamp D Stavens A Aron J DiebelJ G Philip Fong G H Morgan Halpenny K Lau M P Celia OakleyV Pratt and P Stang ldquoStanley The Robot that Won the DARPA GrandChallengerdquo Journal of Field Robotics vol 23 no 9 pp 661ndash692 2006

[4] N Kalra and S M Paddock (2006 apr) Driving to Safety HowMany Miles of Driving Would It Take to Demonstrate AutonomousVehicle Reliability [Online] Available httpwwwrandorgpubsresearch_reportsRR1478html

[5] W Knight (2015 dec) Can This Man Make AI More Human Online[Online] Available httpswwwtechnologyreviewcoms544606can-this-man-make-aimore-human

[6] L Gomes (2014 jul) Urban Jungle a Tough Chal-lenge for Googlersquos Autonomous Cars Online [Online]Available httpswwwtechnologyreviewcoms529466urban-jungle-a-tough-challenge-for-googles-autonomous-cars

[7] G Silberg and R Wallace (2012) Self-driving cars The next revolution Online [On-line] Available httpswwwkpmgcomCaenIssuesAndInsightsArticlesPublicationsDocumentsself-driving-cars-next-revolutionpdf

[8] S E Anthony (2016 mar) The Trollable Self-Driving CarOnline [Online] Available httpwwwslatecomarticlestechnologyfuture_tense201603

[9] M Richtel and C Dougherty (2016 sep) Googlersquos Driverless CarsRun Into Problem Cars With Drivers Online [Online] Availablehttpwwwnytimescom20150902technologypersonaltechgoogle-says-its-not-the-driverless-cars-fault-its-other-drivershtml_r=1

[10] (2016 feb) Google Self-Driving Car Project Monthly Report Online[Online] Available httpsstaticgoogleusercontentcomselfdrivingcarfilesreportsreport-0216pdf

[11] W Knight (2013 oct) Driverless Cars Are Further Away Than YouThink Online [Online] Available httpswwwtechnologyreviewcom

7

(a)

(b)

(c)

(d)

Figure 4 More examples of joint attention

8

(a) Sunset (b) Sunrise

(c) After a heavy snowfall (d) During a heavy rain

(e) Multiple pedestrians crossing (f) At the parking lot

Figure 5 Sample frames from the dataset showing different weather conditions and locations

s520431driverless-cars-are-further-away-than-you-think[12] (2015 dec) Google Self-Driving Car Testing Report on

Disengagements of Autonomous Mode Online [Online] Avail-able httpsstaticgoogleusercontentcommediawwwgooglecomenselfdrivingcarfilesreportsreport-annual-15pdf

[13] W Knight (2015) Car-to-Car Communication Online [On-line] Available httpswwwtechnologyreviewcoms534981car-to-car-communication

[14] D R Ragland and M F Mitman ldquoDriverPedestrian Understanding andBehavior at Marked and Unmarked Crosswalksrdquo Safe TransportationResearch amp Education Center Institute of Transportation Studies(UCB) UC Berkeley Tech Rep 2007 [Online] Available httpescholarshiporgucitem1h52s226

[15] (2016 may) Honda tech warns drivers of pedestrian presence Online[Online] Available httpwwwcnetcomroadshownewsnikola-motor-company-the-ev-startup-with-the-worst-most-obvious-name-ever

[16] C P Urmson I J Mahon D A Dolgov and J Zhu ldquoPedestrian

notificationsrdquo 2015 [Online] Available httpswwwgooglecompatentsUS8954252

[17] G Stern (2015 feb) Robot Cars and Coordinated Chaos Online[Online] Available httpwwwwsjcomarticlesrobot-cars-and-the-language-of-coordinated-chaos-1423540869

[18] R Jones (2016 apr) T3 Interview Nissanrsquos research chief talksautonomous vehicles and gunning it in his Nissan GT-R Black EditionOnline [Online] Available httpwwwt3comfeaturest3-interview-dr-maarten-sierhuis-nissan-s-director-of-research-at-silicon-valley-talks-autonomous-vehicles-and-gunning-it-in-his-gt-r-black-edition

[19] E Ackerman and E Guizzo (2015 sep) Toyota Announces MajorPush Into AI and Robotics Wants Cars That Never Crash Online [On-line] Available httpspectrumieeeorgautomatonroboticsartificial-intelligencetoyota-announces-major-push-into-ai-and-robotics

[20] M Akamatsu Y Sakaguchi and M Okuwa ldquoModeling of DrivingBehavior When Approaching an Intersection Based on Measured Be-havioral Data on An Actual Roadrdquo in Human factors and Ergonomics

9

Figure 6 A selection of images of pedestrians from the dataset

Society Annual Meeting Proceedings vol 47 no 16 2003 pp 1895ndash1899

[21] Y Fukagawa and K Yamada ldquoEstimating driver awareness of pedes-trians from driving behavior based on a probabilistic modelrdquo in IEEEIntelligent Vehicles Symposium (IV) 2013

[22] M T Phan I Thouvenin V Fremont and V Cherfaoui ldquoEstimatingDriver Unawareness of Pedestrian Based On Visual Behaviors andDriving Behaviorsrdquo in International Joint Conference on ComputerVision Imaging and Computer Graphics Theory and Applications 2014

[23] Z Ren X Jiang and W Wang ldquoAnalysis of the Influence of Pedes-trianrsquos eye Contact on Driverrsquos Comfort Boundary During the CrossingConflictrdquo Green Intelligent Transportation System and Safety ProcediaEngineering 2016

[24] N Gueguen C Eyssartier and S Meineri ldquoA pedestrianrsquos smile anddriverrsquos behavior When a smile increases careful drivingrdquo Journal ofSafety Research vol 56 pp 83ndash88 2016

[25] F Kaplan and V V Hafner ldquoThe Challenges of Joint AttentionrdquoInteraction Studies vol 7 no 2 pp 135ndash169 2006

[26] C Breazeal and B Scassellati ldquoHow to build robots that makefriends and influence peoplerdquo Intelligent Robots and Systems pp

858ndash863 1999 [Online] Available httpieeexploreieeeorgxplsabs_alljsparnumber=812787

[27] B Scassellati ldquoImitation and Mechanisms of Joint Attention ADevelopmental Structure for Building Social Skills on a HumanoidRobotrdquo Lecture Notes in Computer Science vol 1562 pp 176ndash195 1999 [Online] Available httpwwwspringerlinkcomcontentwljp04e4h5b4lthh

[28] A P Shon D B Grimes C L Baker M W Hoffman Z Shengli andR P N Rao ldquoProbabilistic gaze imitation and saliency learning in arobotic headrdquo Proceedings - IEEE International Conference on Roboticsand Automation vol 2005 pp 2865ndash2870 2005

[29] I Fasel G Deak J Triesch and J Movellan ldquoCombining embodiedmodels and empirical research for understanding the development ofshared attentionrdquo Proceedings 2nd International Conference on Devel-opment and Learning ICDL 2002 pp 21ndash27 2002

[30] V V Hafner and F Kaplan ldquoLearning to interpret pointing gesturesExperiments with four-legged autonomous robotsrdquo Lecture Notes inComputer Science (including subseries Lecture Notes in Artificial In-telligence and Lecture Notes in Bioinformatics) vol 3575 LNAI pp225ndash234 2005

10

[31] M Doniec G Sun and B Scassellati ldquoActive Learning of JointAttentionrdquo IEEE-RAS International Conference on Humanoid Robotspp 34ndash39 2006 [Online] Available httpieeexploreieeeorglpdocsepic03wrapperhtmarnumber=4115577

[32] P Andry P Gaussier S Moga J Banquet and J Nadel ldquoLearning andCommunication in Imitation An Autnomous Robot Perspectiverdquo IEEETransaction on Systems Man and Cybernetics Part A Systems andHumans vol 31 no 5 pp 431ndash444 2001

[33] S Boucenna P Gaussier and L Hafemeister ldquoDevelopment of jointattention and social referencingrdquo 2011 IEEE International Conferenceon Development and Learning ICDL 2011 pp 1ndash6 2011

[34] A D May C Dondrup and M Hanheide ldquoShow me your movesConveying navigation intention of a mobile robot to humansrdquo MobileRobots (ECMR) 2015 European Conference on pp 1ndash6 2015

[35] H Ishiguro T Ono M Imai and T Maeda ldquoRobovie an InteractiveHumanoid Robotrdquo IEEE International Conference vol 2 no 1 pp1848ndash 1855 2002 [Online] Available httponlinelibrarywileycomdoi101002cbdv200490137abstract$delimiter026E30F$nhttpwwwingentaconnectcomcontentmcb04920010000002800000006art00006$delimiter026E30F$nhttpieeexploreieeeorgstampstampjsptp=amparnumber=1014810ampisnumber=21842

[36] J F Ferreira and J Dias ldquoAttentional mechanisms for socially inter-active robots - A surveyrdquo IEEE Transactions on Autonomous MentalDevelopment vol 6 no 2 pp 110ndash125 2014

[37] L Kelion ldquoTesla says autopilot involved in second car crashrdquo jul 2016[Online] Available httpwwwbbccomnewstechnology-36783345

[38] S Thielman ldquoFatal crash prompts federal investiga-tion of Tesla self-driving carsrdquo jul 2016 [Online]Available httpswwwtheguardiancomtechnology2016jul13tesla-autopilot-investigation-fatal-crash

[39] J Vincent ldquoWhat counts as artificially intelligent AI anddeep learning explainedrdquo The Verge feb 2016 [Online] Avail-able httpwwwthevergecom201622911133682deep-learning-ai-explained-machine-learning

[40] W Knight ldquoCan This Man Make AI More Hu-manrdquo MIT Technology Review dec 2015 [Online] Avail-able httpswwwtechnologyreviewcoms544606can-this-man-make-aimore-human

[41] B Goertzel ldquoAre there Deep Reasons Underlying the Pathologies ofToday rsquo s Deep Learning Algorithms rdquo Artificial General IntelligenceLecture Notes in Computer Science vol 9205 pp 70ndash79 2015

[42] C Szegedy W Zaremba and I Sutskever ldquoIntriguing propertiesof neural networksrdquo in ICLR 2014 pp 1ndash10 [Online] Availablehttparxivorgabs13126199

[43] A Nguyen J Yosinski and J Clune ldquoDeep Neural Networks are EasilyFooled High Confidence Predictions for Unrecognizable Imagesrdquo inCVPR 2015 [Online] Available httparxivorgabs14121897

[44] (2016 may) The KITTI Vision Benchmark Suite Online [Online]Available httpwwwcvlibsnetdatasetskitti

[45] P Dollar C Wojek B Schiele and P Perona ldquoPedestrian detectionAn evaluation of the state of the artrdquo PAMI vol 34 2012

[46] K Fragkiadaki W Zhang G Zhang and J Shi ldquoTwo-GranularityTracking Mediating Trajectory and Detection Graphs for Tracking underOcclusionsrdquo in ECCV 2012

[47] S Y Bao M Bagra Y-W Chao and S Savarese ldquoSemantic Structurefrom Motion with Points Regions and Objectsrdquo in CVPR 2012

[48] (2015) The German Traffic Sign Detection Benchmark Online [Online]Available httpbenchmarkinirubdesection=gtsdbampsubsection=dataset

[49] R Klette (2014) The enpeda Image Sequence AnalysisTest Site (EISATS) Online [Online] Available httpccvwordpressfosaucklandacnzeisats1

[50] D M Gavrila (2015) Daimler PedestrianBenchmark Datasets [Online] Available httpwwwgavrilanetDatasetsDaimler_Pedestrian_Benchmark_Ddaimler_pedestrian_benchmark_dhtml

[51] UvA Person Tracking from Overlapping CamerasDataset [Online] Available httpwwwgavrilanetDatasetsUniv__of_Amsterdam_Multi-Cam_Puniv__of_amsterdam_multi-cam_phtml

[52] 100-Car Naturalistic Driving Study [Online] Available httpwwwnhtsagovResearchHuman+FactorsNaturalistic+driving+studies

[53] Transportation Active Safety 110-Car Naturalistic DrivingStudy [Online] Available httpwwwengriupuiedu~yiduresearchhtmlTransportation

[54] SHRP2 Naturalistic Driving Study [Online] Available httpsinsightshrp2ndsus

[55] Virginia Tech Transportation Institute Data Warehouse [Online] Avail-able httpforumsvttivteduindexphpfilescategory2-vtti-data-sets

[56] O Friard and M Gamba ldquoBoris a free versatile open-source event-logging software for videoaudio coding and live observationsrdquo Methodsin Ecology and Evolution vol 7 no 11 pp 1325ndash1330 2016

I Introduction

II Autonomous driving and joint attention

III Existing datasets

IV The JAAD Dataset

V Conclusion

References

2

the corresponding pedestrian about the action it is about totake using a screen installed on the front hood of the carAnother proposed option for communication is via a sounddevice or other kinds of physical devices (possibly a roboticarm) This technology has been criticized for being distractingand lacking the ability to efficiently communicate if more thanone pedestrian is involved

Given the problems associated with establishing explicitcommunication with other vehicles and pedestrians Nissanin their latest development announced a passive method ofdealing with uncertainties in the behavior of pedestrians [17][18] in attempt to understand human drivers and pedestriansbehaviors in various traffic scenarios The main objective ofthis work is to passively predict pedestrian behavior usingvisual input and only use an interface eg a green light toinform them about the intention of the autonomous car

Toyota in partnership with MIT and Stanford recently an-nounced using a similar passive approach toward autonomousdriving [19] Information such as the type of equipmentthe pedestrian carries hisher pose direction of motion andbehavior as well as the human driverrsquos reactions to eventsare extracted from videos of traffic situations and their 3Dreconstructions This information is used to design a controlsystem for determining what course of action to take in givensituations At present no data resulting from this study hasbeen released and very little information is available aboutthe type of autonomous behavior the scientists are seeking toobtain

Given the importance of pedestrian safety multiple studieswere conducted in the last several decades to find factors thatinfluence decisions and behavior of traffic participants Forexample a driverrsquos pedestrian awareness can be measuredbased on whether the driver is decelerating upon seeing apedestrian ([20] [21] [22]) Several recent studies also pointout that pedestrianrsquos behavior such as establishing eye-contactsmiling and other forms of non-verbal communication canhave a significant impact on the driverrsquos actions ([23] [24])Although a majority of these studies are aimed at developingbetter infrastructure and traffic regulations their conclusionsare relevant for autonomous driving as well

In an attempt to better understand the problem of vehicle tovehicle (V2V) and vehicle to pedestrian (V2P) communicationin the autonomous driving context we suggest viewing it as aninstance of joint attention and discuss why existing approachesmay not be adequate in this context We propose a noveldataset that highlights the visual and behavioral complexityof traffic scene understanding and is potentially valuable forstudying the joint attention issues

II AUTONOMOUS DRIVING AND JOINT ATTENTION

According to a common definition joint attention is theability to detect and influence an observable attentional be-havior of another agent in social interaction and acknowledgethem as an intentional agent [25] However it is important tonote that joint attention is more than simultaneous lookingattention detection and social coordination but also includesan intentional understanding of the observed behavior ofothers

Since joint attention is a prerequisite for efficient commu-nication it has been gaining increasing interest in the fieldsof robotics and human-robot interaction Kismet [26] and Cog[27] both built at MIT in the late 1990s were some of thefirst successes in social robotics These robots were able tomaintain and follow eye gaze reacted to the behavior of theircaregivers and recognized simple gestures such as declarativepointing More recent work in this area is likewise concernedwith gaze following [28] [29] [27] pointing [30] [27] andreaching [31] turn-taking [32] and social referencing [33]With a few exceptions [34] [35] almost all joint attentionscenarios are implemented with stationary robots or roboticheads according to a recent comprehensive survey [36]

Surprisingly despite increasing interest for joint attentionin the fields of robotics and human-robot interaction it hasnot been explicitly mentioned in the context of autonomousdriving For example communication between the driver andpedestrian is an instance of joint attention Consider thefollowing scenario a pedestrian crossing the street (shown inFigure 1a) Initially she is looking at her cell phone but as sheapproaches the curb she looks up and slows down because thevehicle is still moving When the car slows down she speedsup and crosses the street In this scenario all elements of jointattention are apparent Looking at the car and walking sloweris an observable attention behavior The driver slowing downthe car indicates that he noticed the pedestrian and is yieldingHis intention is clearly interpreted as such as the pedestrianspeeds up and continues to cross A similar scene is shown inFigure 1d Here the pedestrian is standing at the crossing andlooking both ways to find a gap in traffic Again once shenotices that the driver is slowing down she begins to cross

While these are fairly typical behaviors for marked cross-ings there are many more possible scenarios of communi-cation between the traffic participants Humans recognize amyriad of ldquosocial cuesrdquo in everyday traffic situations Apartfrom establishing eye contact or waving hands people maybe making assumptions about the way a driver would behavebased on visual characteristics such as the carrsquos make andmodel [6] Understanding these social cues is not alwaysstraightforward Aside from visual processing challenges suchas variation in lighting conditions weather or scene clutterthere is also a need to understand the context in which thesocial cue is observed For instance if the autonomous carsees someone waving his hand it needs to know whetherit is a policeman directing traffic a pedestrian attempting tocross the street or someone hailing a taxi Consider Figure 1bwhere a man is crossing the street and makes a slight gestureas a signal of yielding to the driver or Figure 1c wherea man is jaywalking and acknowledges the driver with ahand gesture Responding to each of these scenarios from thedriverrsquos perspective can be quite different and would requirehigh-level reasoning and deep scene analysis

Today automotive industry giants such as BMW Tesla Fordand Volkswagen who are actively working on autonomousdriving systems rely on visual analysis technologies devel-oped by Mobileye1 to handle obstacle avoidance pedestrian

1httpwwwmobileyecom

3

(a)

(b)

(c)

(d)


4
















IV THE JAAD DATASET



5
















V CONCLUSION



6








Handwave point




weather rain

age_gender AF

designated no

location plaza

time_of_day daytime











ACKNOWLEDGMENT


REFERENCES












7

(a)

(b)

(c)

(d)


8
















9














10



























I Introduction



IV The JAAD Dataset

V Conclusion

References

3

(a)

(b)

(c)

(d)


4
















IV THE JAAD DATASET



5
















V CONCLUSION



6








Handwave point




weather rain

age_gender AF

designated no

location plaza

time_of_day daytime











ACKNOWLEDGMENT


REFERENCES












7

(a)

(b)

(c)

(d)


8
















9














10



























I Introduction



IV The JAAD Dataset

V Conclusion

References

4
















IV THE JAAD DATASET



5
















V CONCLUSION



6








Handwave point




weather rain

age_gender AF

designated no

location plaza

time_of_day daytime











ACKNOWLEDGMENT


REFERENCES












7

(a)

(b)

(c)

(d)


8
















9














10



























I Introduction



IV The JAAD Dataset

V Conclusion

References

5
















V CONCLUSION



6








Handwave point




weather rain

age_gender AF

designated no

location plaza

time_of_day daytime











ACKNOWLEDGMENT


REFERENCES












7

(a)

(b)

(c)

(d)


8
















9














10



























I Introduction



IV The JAAD Dataset

V Conclusion

References

6








Handwave point




weather rain

age_gender AF

designated no

location plaza

time_of_day daytime











ACKNOWLEDGMENT


REFERENCES












7

(a)

(b)

(c)

(d)


8
















9














10



























I Introduction



IV The JAAD Dataset

V Conclusion

References

7

(a)

(b)

(c)

(d)


8
















9














10



























I Introduction



IV The JAAD Dataset

V Conclusion

References

8
















9














10



























I Introduction



IV The JAAD Dataset

V Conclusion

References

9














10



























I Introduction



IV The JAAD Dataset

V Conclusion

References

10



























I Introduction



IV The JAAD Dataset

V Conclusion

References

Joint Attention in Autonomous Driving (JAAD) › pdf › 1609.04741.pdf · 2020-04-24 ·...

Documents

Transcript of Joint Attention in Autonomous Driving (JAAD) › pdf › 1609.04741.pdf · 2020-04-24 ·...