Electronic Outlook - A Comparison of Human Outlook with a … · 2019. 5. 13. · Teledyne Dalsa...

43
Electronic Outlook - A Comparison of Human Outlook with a Computer-vision Solution Mogens Blanke 1 , Søren Hansen 1 , Jonathan Dyssel Stets 2 Thomas Koester 3 , Jesper E. Brøsted 3 , Adrian Llopart Maurin 1 & Nicolai Nykvist 1 1 Department of Electrical Engineering, Automation and Control Group, Technical University of Denmark, Elektrovej Building 326, 2800 Kongens Lyngby 2 Department of Applied Mathematics and Computer Science, Image Analysis and Computer Graphics, Technical University of Denmark, Richard Petersens Plads, Building 324, 2800 Kongens Lyngby 3 FORCE Technology, Hjortekærsvej 99, 2800 Kongens Lyngby Updated version of March 19, 2019 Research report prepared for the Danish Maritime Authority

Transcript of Electronic Outlook - A Comparison of Human Outlook with a … · 2019. 5. 13. · Teledyne Dalsa...

Page 1: Electronic Outlook - A Comparison of Human Outlook with a … · 2019. 5. 13. · Teledyne Dalsa Manufacturer of LWIR camera and others Tobiir glasses Eye-tracking glass from Tobii

Electronic Outlook - A Comparison of HumanOutlook with a Computer-vision Solution

Mogens Blanke1, Søren Hansen1, Jonathan Dyssel Stets2

Thomas Koester3, Jesper E. Brøsted3,Adrian Llopart Maurin1 & Nicolai Nykvist1

1 Department of Electrical Engineering, Automation and Control Group,Technical University of Denmark,

Elektrovej Building 326, 2800 Kongens Lyngby

2 Department of Applied Mathematics and Computer Science,Image Analysis and Computer Graphics, Technical University of Denmark,

Richard Petersens Plads, Building 324, 2800 Kongens Lyngby

3 FORCE Technology, Hjortekærsvej 99, 2800 Kongens Lyngby

Updated version of March 19, 2019

Research report prepared for the Danish Maritime Authority

Page 2: Electronic Outlook - A Comparison of Human Outlook with a … · 2019. 5. 13. · Teledyne Dalsa Manufacturer of LWIR camera and others Tobiir glasses Eye-tracking glass from Tobii

Printed at DTUKongens Lyngby, DanmarkMarch 2019

c© 2019 DTU and FORCE TechnologyThis report can be freely distributed with proper reference to the source.No changes are permitted in the document.

ISBN 978-87-971357-0-9DOI 10.13140/RG.2.2.26172.08325

2

Page 3: Electronic Outlook - A Comparison of Human Outlook with a … · 2019. 5. 13. · Teledyne Dalsa Manufacturer of LWIR camera and others Tobiir glasses Eye-tracking glass from Tobii

CONTENTS 3

Contents

Contents 3

1 Summary 5

2 Introduction 9

3 Outlook for navigation 113.1 Human outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.2 State of art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.3 Scope of this investigation . . . . . . . . . . . . . . . . . . . . . . . . 15

4 Electronic outlook 174.1 Camera comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

5 Object detection and classification 235.1 Image-based Object Detection . . . . . . . . . . . . . . . . . . . . . . 235.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245.3 Mask R-CNN detection and classification . . . . . . . . . . . . . . . . 255.4 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255.5 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275.6 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

6 Results 336.1 Temporal Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . 33

7 Discussion 37

8 Conclusions 39

Bibliography 41

Page 4: Electronic Outlook - A Comparison of Human Outlook with a … · 2019. 5. 13. · Teledyne Dalsa Manufacturer of LWIR camera and others Tobiir glasses Eye-tracking glass from Tobii
Page 5: Electronic Outlook - A Comparison of Human Outlook with a … · 2019. 5. 13. · Teledyne Dalsa Manufacturer of LWIR camera and others Tobiir glasses Eye-tracking glass from Tobii

Chapter 1

Summary

Considering whether a temporarily unattended bridge could be allowed, MaritimeAuthorities wish to investigate whether sensor technology is available that, whenseconded by sophisticated computer algorithms, is able to provide outlook with thesame reliability and safety as that of the average human outlook.

This document report findings from a comparative study of human versus elec-tronic outlook. Assessment of navigator’s outlook is based on measurements with awearable eye-tracker and areas of their visual attention are recorded on video. Simul-taneously, a set of electro-optical sensors provides image-data as input to computeralgorithms that detect and classify objects within visual range.

Ambient light conditions on the bridge prevented eye-tracking to disclose whichobjects on Radar and ECDIS screens caught attention of the navigator. The scopeof this investigation was therefore limited to focus on the navigator’s visual atten-tion on objects. The report compares these eye-tracking measurements with objectrecognition made by camera recordings in the visual spectrum and subsequent com-puterized object classification.

The report deducts, from the observations of eye fixations, when the naviga-tor became aware of a particular object. It analyses how the human observationscompare with those of the technology solution. On the technology side, the reportpresents approaches to detection and classification, which appeared to be efficientin coastal areas with confined passages. The quality of outlook in different ambientlight conditions is illustrated. The main findings are:

• Eye-tracking glasses were found useful to show fixations on objects at sea indaylight conditions.

• The computer-vision algorithms detects objects in parallel, the human does sosequentially, and the computer classifies objects in average 24 sec faster thanthe navigator has a fixation on the object. The deep learning algorithm trainedin this study should, however, be improved to achieve better performance insome situations.

• The time between object detection and passage of own ship is adequate formaking navigation decisions with both human and electronic outlook.

5

Page 6: Electronic Outlook - A Comparison of Human Outlook with a … · 2019. 5. 13. · Teledyne Dalsa Manufacturer of LWIR camera and others Tobiir glasses Eye-tracking glass from Tobii

6 CHAPTER 1. SUMMARY

• Low-light conditions (dusk and night) are effectively dealt with by Long WaveInfraRed (LWIR) camera technology. LWIR shows objects as equally visibleat day and night.

• Colour information from cameras is necessary to assist decision support andelectronic navigation.

• A system for electronic outlook should employ sensor and data fusion withradar, AIS and ECDIS.

• Decision support based on electronic outlook should include object trackingand situation awareness techniques.

• Quality assurance and approval of machine learning algorithms for object clas-sification at sea has unsolved issues. A standard vocabulary ought be availablefor objects at sea, and publicly available databases with annotated images fromtraffic in both open seas, near coast areas and rivers should be available in or-der for authorities to assess quality or approve navigation support based onmachine learning methods.

Page 7: Electronic Outlook - A Comparison of Human Outlook with a … · 2019. 5. 13. · Teledyne Dalsa Manufacturer of LWIR camera and others Tobiir glasses Eye-tracking glass from Tobii

7

Abbreviations and acronyms

AIS Automatic Identification System for marine vehiclesCNN Convolutional Neural NetECDIS Electronic Chart Display and Information SystemGNSS Global Navigation Satellite SystemHFOV Horizontal Field of ViewJAI Manufacturer of industry-grade camerasLWIR Long Wave InfraRed (8 - 14 µm wavelength)Mask R-CNN R-CNN network with segmentation dedicatedNIR Near InfraRed (800-1000 nm wavelength)Radar RAdio Detection And Ranging systemR-CNN Region based CNNRGB Red Green Blue color coding of digital imageRoI Region of Interest used in image analysisTeledyne Dalsa Manufacturer of LWIR camera and othersTobiir glasses Eye-tracking glass from Tobii AB, Sweden

Page 8: Electronic Outlook - A Comparison of Human Outlook with a … · 2019. 5. 13. · Teledyne Dalsa Manufacturer of LWIR camera and others Tobiir glasses Eye-tracking glass from Tobii
Page 9: Electronic Outlook - A Comparison of Human Outlook with a … · 2019. 5. 13. · Teledyne Dalsa Manufacturer of LWIR camera and others Tobiir glasses Eye-tracking glass from Tobii

Chapter 2

Introduction

Look-out for navigation is the task of observing various objects which can have animpact on a ships planned route and maneuvering capabilities, for example other ves-sels, buoys and land. If the outlook is a separate person on the bridge, observationsare reported to the officer in charge who decide any remedial actions. The look-outis made using sight and aided by available technology such as radar, AIS and ECDISsystems. Development within camera technology and computer vision algorithmshas provided an additional possible source for look-out. This report investigates thequality of this “electronic outlook” and compares with human look-out.

A survey of maritime object detection and tracking methods was recently (2017)published in the survey by [24], who emphasized that Radar, which is an IMO re-quired instrument on merchant vessels, is sensitive to the meteorological conditionand the shape, size, and material of the targets. They emphasize that radar dataneed to be supplemented by other situational awareness sensors to obtain safe nav-igation and collision avoidance. Commercial electro-optic sensors are available forseveral spectral ranges: visible (450-800 nm), near infrared, (NIR 800-950 nm) andlong wave infrared (LWIR 8-14 µm). [28] investigated detectability of objects at seaseen from an aircraft in visible and LWIR spectral ranges.

A brief overview of the approach taken in this investigation was presented in [4].This report first summarises the task of watchkeeping/lookout for navigation.

Chapter 3 describes how human outlook is observed through measurements wherea navigator wears eye-tracking glasses. Chapter 4 outlines the use of electro-opticaland other sensors to provide electronic means to replicate the human observation ofsurroundings. Chapter 5 introduces to available technology for object detection andclassification at sea where image processing and machine learning techniques areinstrumental. Chapter 6 presents findings of this study from ferries in near-costaland shallow water navigation. Chapter 7 discusses the results, their limitationsand perspectives, and Chapter 8 finally offers conclusions and outlines directions forfurther research.

9

Page 10: Electronic Outlook - A Comparison of Human Outlook with a … · 2019. 5. 13. · Teledyne Dalsa Manufacturer of LWIR camera and others Tobiir glasses Eye-tracking glass from Tobii
Page 11: Electronic Outlook - A Comparison of Human Outlook with a … · 2019. 5. 13. · Teledyne Dalsa Manufacturer of LWIR camera and others Tobiir glasses Eye-tracking glass from Tobii

Chapter 3

Outlook for navigation

The analysis of manual lookout/watchkeeping is based on a combination of observa-tions on board several vessels in Danish waters. Camera recordings and mapping ofobjects at sea were conducted on measurement campaigns during 2018. Eye trackingmeasurements were conducted during the summer of 2018.

Background knowledge for this study include generic observations, made by theFORCE Technology in Lyngby, on board a large number of vessels during the period2000-2018. The generic experience also includes observations from ship simulatorexercises, as well as literature-based studies, [31], [33], and on general knowledge ofmaritime human factors. Eye-tracking glasses have been used in simulator contextbut the use on board a bridge seems to be a new undertaking.

3.1 Human outlook

The manual outlook serves several purposes: prevention of collision with other ves-sels, safe and efficient navigation in the sea, avoiding grounding and collision witheither fixed or floating objects, optimal steering and manoeuvring in relation towaves, wind and visibility, general observation of the sea, including observation ofnot normal conditions like emergencies or pollution.

The look-out also involves an element of continuous self-monitoring of the ef-ficiency of the look-out and adjustment when necessary, e.g.: in cases includingreduced visibility, rain, disturbing sun light, observe far away objects in binoculars,stress condition or fatigue of the look-out.

Outlook is made by both sight and sound, which literally means that not onlythe eyes are used, but also the hearing is active. Attention is made to maneuvering,fog or emergency signals from other vessels and to relevant radio communication.

The outlook is just one among several tasks of the navigator on the bridge. Othertasks that require attention include, observation of the condition of engines andsystems, handling of cargo and passengers, safety-related routines, communicationinternally on board the vessel and with external parties, management of staff andother administrative tasks, QA and documentation and handling of safety-criticalsituations on board. These supplemental duties imply that the navigator needs toshare his/her attention among several tasks.

11

Page 12: Electronic Outlook - A Comparison of Human Outlook with a … · 2019. 5. 13. · Teledyne Dalsa Manufacturer of LWIR camera and others Tobiir glasses Eye-tracking glass from Tobii

12 CHAPTER 3. OUTLOOK FOR NAVIGATION

Figure 3.1: Screen-shot from eye tracking of the navigator onboard the Elsinore-Helsingborg ferry M/S Pernille

Endogenous and exogenous driven visual attention

The look-out task involves both endogenous- and exogenous-driven activities.

Endogenous activities are visual attention controlled by the navigator himself onhis own initiative and based on relevant knowledge and experience, such as observingnavigational markings, sighting of land and watching out for other vessels.

Exogenous activities are those caused by an external event catching the attentionof the navigator. For instance, the sight of a vessel that the navigator has not beenlooking for or signals by light, sound or radio. Everyday scenarios will typically bea combination of endogenous and exogenous look-out activities.

As an example of the activity of endogenous controlled visual attention by thenavigator, look at the screen-shot in Figure 3.1.

In this case the ferry Pernille is crossing a traffic separation scheme in whichinbound traffic could be approaching from port side. The visibility (for both navi-gator and radar) can easily be obscured by the Kronborg Castle headland, but thenavigator knows from experience that larger ships with considerable speed mightbe approaching. Based on this experience the navigator keeps an eye on AIS in-formation and on the headland area to stay prepared for possible traffic at portside. The visibility of the ferry Pernille from other vessels is likewise obscured bythe same headland, thus making it a possible hazard to proceed without extra cau-tion. The navigator’s visual attention is not drawn to the area by a concrete vessel(which would have been an activity of exogenous visual attention) but directed therebased on his experience and anticipation, which is an activity of endogenous visualattention.

When it comes to performing an outlook, it makes sense to distinguish betweentwo types, namely the pure observations that do not requiring action and observa-tions that requiring action, e.g. to follow COLREGS and ultimately to prevent acollision or grounding. The navigator’s actions are often seen as a combination ofseveral elements including signaling, steering and engine manoeuvres.

Page 13: Electronic Outlook - A Comparison of Human Outlook with a … · 2019. 5. 13. · Teledyne Dalsa Manufacturer of LWIR camera and others Tobiir glasses Eye-tracking glass from Tobii

3.2. STATE OF ART 13

The decision to act or not to act depends on the navigator’s interpretation ofthe information stemming from these look-out related behaviours:

1. Repeated observations to determine if there is a risk of collision and if so takingcountermeasures

2. Repeated observations to determine if executed countermeasures (during 1)above) have the desired effect

3. General visual observation (watching) of nothing in particular, but often fo-cused on the direction of the vessel and abeam/passed in relation to the pro-gression of the navigation

4. Exogenous visual attention triggered by something turns up, possibly in com-bination with information on bridge instruments

5. Endogenous visual attention, i.e. the navigator expects upcoming, observableobjects using e.g. sea charts (buoys), radar or AIS.

The above unfolding of human outlook reveals the activity as complex and mul-tifaceted. It is therefore generally accepted in the maritime domain, that there is nounified, single, correct or “perfect” way to perform outlook. Rather there is an ac-ceptable margin within which different navigator’s different behaviours can unfold,while still fulfilling the purpose of a good-enough outlook.

3.2 State of art

In the maritime context, the use of eye tracking as means to examine the visualattention of ship navigators is nothing new. At least not when it comes to theuse of eye tracking in simulation environments. [3] investigated the operators’ fociof attention during simulated dynamic position operation. [2] examined the differ-ence in attention-allocation comparing novice and expert navigators during use ofthe Conning Officer Virtual Environment, a simulation system developed to trainshiphandling. [2] concluded a clear link between the experts’ superior shiphandlingperformance and a “tight Attention-allocation pattern that focused only on the rel-evant areas of interest. Novices’ Attention-allocation patterns were highly scatteredand irregular” (p. xviii). [22] and [27] focused on evaluating and improving thetraining of navigators using eye tracking data and [23] suggested using (stationary)eye tracking to determine or monitor the level of fatigue in the boat driver with thepurpose of enhancing situation awareness. [15] used eye tracking data examinationto suggest improvement of usability design on the ships’ bridge layout and in thesoftware’s graphical user interface on a maritime navigation display. [14] also in-vestigated eye tracking data in the pursuit of a recommendable optimal visual scanpattern for navigators aiming to mitigate the mental workload needed to monitorthe increasing amount of technology used at the maritime ship bridge.

[11] performed a somewhat rare example of an investigation using eye trackingduring actual, real life navigation. They investigated gaze behavior data from 16experienced and novice boat drivers during high speed navigation and concludedthat novices looked more at objects closer to the boat while experts looked more at

Page 14: Electronic Outlook - A Comparison of Human Outlook with a … · 2019. 5. 13. · Teledyne Dalsa Manufacturer of LWIR camera and others Tobiir glasses Eye-tracking glass from Tobii

14 CHAPTER 3. OUTLOOK FOR NAVIGATION

Figure 3.2: Tobii reye tracking glasses.Photo: FORCE Technology 2018

things far from the boat. Also, novice boat drivers were more focused on electronicdisplays, while the experts were focused mostly outside the boat and “further usedthe paper-based sea chart to a larger extent than the novice drivers” (p 277).

The methodology of using eye tracking devices in real life maritime situations isnot often seen, thus making this study quite unique.

Eye tracking technology applied in this investigation

The eye tracking data was collected using Tobii Pro Glasses 2 ([1]), which is alightweight wearable technology. It consists of two units: a head unit (the glasses)connected via a HDMI cable to a belt clip recorder unit.

The head unit has a scene camera recording the wearer’s front view (includingaudio) and the frame has infrared illuminators and sensors installed thereby using theeye tracking technique Corneal reflection (dark pupil). The belt clip unit holds a SDcard for recording data, operates on rechargeable batteries and is Wi-Fi controlledthrough PC-based software (in this case iMotions). This setup makes it very easyfor the person wearing the eye trackers to freely move around on the ship and dueto the non-invasive design, most subjects easily forget they are even wearing themwhile performing their job. Additional specifications are shown in the table below,adapted from the Tobii Pro Glasses 2 User’s Manual (2018, p. 40).

Based on the recording from the scene camera and the associated eye trackingdata, the iMotions software (current version 7.1) produces a video showing what wasin the wearer’s field of view during the recording (a 1st person perspective replay),including a graphical overlay: A yellow dot indicating where in the field of view theperson was looking at any given time. The software can be set to illustrate fixationsby either increasing size of the yellow dot or color change hereof. A fixation is definedas a period (100 ms or more) in which the person’s eyes are locked toward a specificobject (or location) in the field of view. Fixations are excellent measures of visualattention. Fixations are directly related to cognitive processing [17] and during afixation, the person is analyzing and interpreting information from the object orarea of focus [22].

Page 15: Electronic Outlook - A Comparison of Human Outlook with a … · 2019. 5. 13. · Teledyne Dalsa Manufacturer of LWIR camera and others Tobiir glasses Eye-tracking glass from Tobii

3.3. SCOPE OF THIS INVESTIGATION 15

Figure 3.3: Eye tracking example.

The image in Figure 3.3, shows a single frame from replay of an eye trackingrecording (MF Højestene). The yellow dot is the location of the navigator’s fixationand the yellow line illustrates eye movements faster than 100 ms, referred to assaccades.

3.3 Scope of this investigation

In the course of using the Tobiir eye-tracking glasses on a bridge and iMotionr

software, the following limitations appeared

• Objects at the Radar of ECDIS screens are tiny compared to outside objectsand the fixation circle is too big to distinguish which object on a screen iscatching attention

• Very high differences in light intensity from outside to instrument screens

• Slight drift between direction of view indicated by the eye-tracking and actualdirection

• The Tobii glass camera is excellent for daylight conditions but is not sensitiveenough for dusk or night conditions.

These technical limitations with eye tracking measurements restricted the scopeof this investigation, and it was necessary to limit the scope to simply compare eye-tracking fixations on objects at sea with classification of the same objects made bythe computer-vision based part of electronic outlook. Observation with eye-trackingwas limited to daylight conditions.

The electronic outlook then focus on object detection and classification in day-light images. Machine learning techniques are explained and the quality of objectrecognition at sea is assessed from training and validation images. It is discussedwhy the usual softmax, precision and recall measures from deep-learning are not

Page 16: Electronic Outlook - A Comparison of Human Outlook with a … · 2019. 5. 13. · Teledyne Dalsa Manufacturer of LWIR camera and others Tobiir glasses Eye-tracking glass from Tobii

16 CHAPTER 3. OUTLOOK FOR NAVIGATION

sufficient as quality indicators for navigation purposes. Instead, we show how tra-ditional statistical measures (true and false, positives and negatives) are employedat an object level and provide useful quality indicators for use in navigation.

This study has not employed means to disclose how the navigator interpretswhat he sees. The eye tracking glasses can determine where the navigator has hadvisual focus. The detailed recognition of objects and their behaviour are thereforenot in the scope of this investigation. The study does not reflect on which actionsthe navigator could or should make.

Page 17: Electronic Outlook - A Comparison of Human Outlook with a … · 2019. 5. 13. · Teledyne Dalsa Manufacturer of LWIR camera and others Tobiir glasses Eye-tracking glass from Tobii

Chapter 4

Electronic outlook

This chapter describes the instrument platform used for electronic outlook and itpresents images available from the camera suite in different spectral ranges selectedfor the study.

The electronic outlook system in this comparison consists of 5 cameras, anFMCW radar, an AIS receiver, a GPS unit and a 6 degrees of freedom inertialmeasurement unit.

Figure 4.1: Sketch of the sensor platform. The five camera houses are lookingforward in the ships heading. Camera units, CW-FM radar and GPS receiver plusIMU are fixed on the platform.

The vision system comprises,

• 2 color cameras (JAI GO-5000C), 5M pixel, resolution is 2560 x 2048, 12 bit,lens 55 HFOV. A polarizing filter is added.

• 2 monochrome cameras (JAI GO-5000M), 5M pixel, resolution is 2560 x 2048,12 bit, lens 55 HFOV. NIR low pass filter (800-1000 nm can be added).

• 1 infrared LWIR camera, uncooled micro bolometer sensor (Teledyne DalsaCalibir 640), resolution 640 x 480, 14 bit, lens has 90 deg HFOV.

The equipment is mounted on a forward facing stand on board the ferries. Objectdetection and classification algorithms are run as post-processing of the images. Theplatform is sketched in Fig. 4.1. The color and monochrome cameras have an overlapof 4 degrees. Fig. 4.2 shows the instrument platform mounted on a ferry in a narrowpassage in the southern Funen archipelago.

17

Page 18: Electronic Outlook - A Comparison of Human Outlook with a … · 2019. 5. 13. · Teledyne Dalsa Manufacturer of LWIR camera and others Tobiir glasses Eye-tracking glass from Tobii

18 CHAPTER 4. ELECTRONIC OUTLOOK

Figure 4.2: Southern Funen archipelago. Sensor platform mounted beyond wheel-house / ship’s bridge.

4.1 Camera comparisons

Measurements were made to assess camera properties in daylight, at dusk and atlate evening to assess image properties in conditions of normal, low and very lowambient illumination.

Daylight

Daylight conditions depicted by RGB camera recordings and monochrome NIR rangephotos are shown in Figures 4.3 and 4.4. High contrast is a visual challenge asvisual images have 8 bit depth only, but the full 12 bit resolution of the cameras ismaintained in the subsequent image analysis.

Figure 4.3: RGB camera with polarizing filters. South Funen archipelago. Ambientcondition is daylight and good visibility

Page 19: Electronic Outlook - A Comparison of Human Outlook with a … · 2019. 5. 13. · Teledyne Dalsa Manufacturer of LWIR camera and others Tobiir glasses Eye-tracking glass from Tobii

4.1. CAMERA COMPARISONS 19

Figure 4.4: Monochrome camera with NIR filter 800 − 1000nm. South Funenarchipelago. Ambient condition is daylight and good visibility

Figure 4.5: LWIR camera 8 − 14µm. South Funen archipelago. Ambient conditionis daylight and good visibility

The LWIR image in Figure 4.5 is recorded at the same instant as the RGB andNIR images. The 14 bit resolution of this camera allows a temperature resolution ofobjects of 0.006 deg. When visualizing the LWIR image, we use range compressionto zoom in on the temperature range that is relevant for the image. The LWIR isnot sensitive to sun reflections in the water or in clouds. As seen, several vesselsare recognizable in the LWIR image, which are less apparent in the RGB and NIRpictures, even though the LWIR has 4 times lower horizontal resolution than theJAI cameras.

Dusk - 26 min after sunset

Figures 4.6, 4.7 and 4.8, show the RGB, NIR and LWIR images observed 26 minafter Almanac sunset. Navigation lights on the approaching ferry are clearly visibleon in the NIR image, and the heat silhouette very clearly visible in the LWIR image.

Page 20: Electronic Outlook - A Comparison of Human Outlook with a … · 2019. 5. 13. · Teledyne Dalsa Manufacturer of LWIR camera and others Tobiir glasses Eye-tracking glass from Tobii

20 CHAPTER 4. ELECTRONIC OUTLOOK

Disturbed by the glare from sunset, the RGB image will need contrast enhancementto distinguish the navigation lights on the approaching ferry.

Figure 4.6: RGB camera with polarizing filters. South Funen archipelago at dusk

Figure 4.7: Monochrome with NIR filter. South Funen archipelago at dusk

Dark - 60 min after sunset

Measurements were made with Cameras mounted with parallel optical axes, pointingforward in own vessel. Figure 4.10 shows crossing traffic, a ferry with illuminationalong the vessel and from cabins, saloons and deck, and a freighter behind with twomast lights and sparse deck illumination. The cargo vessel’s lights could be mistakenfor land/street illumination. Images from Long Wave Infrared Cameras, shown inFigure 4.9, are not sensitive to visual light, but only to surface temperatures ofobserved objects, and both of the crossing vessels are clearly distinguishable. TheLWIR cameras have different horizontal field of view (HFOV), 45deg for the FLIRcamera and 90deg for the Teledyne Dalsa. While the LWIR cameras clearly showsignatures of other vessels, the red-white-green colour codes of navigation lightsprovide essential information that is visible in the RGB images, but not in theLWIR images.

Page 21: Electronic Outlook - A Comparison of Human Outlook with a … · 2019. 5. 13. · Teledyne Dalsa Manufacturer of LWIR camera and others Tobiir glasses Eye-tracking glass from Tobii

4.1. CAMERA COMPARISONS 21

Figure 4.8: LWIR camera, south Funen archipelago at dusk

(a) Teledyne-DALSA LWIR (90◦HFOV ) (b) FLIR LWIR (30◦HFOV )

Figure 4.9: LWIR camera recordings, 640 × 480 pixels. Oresund toward Elsinore.Ambient condition: dark but good visibility one hour after sunset

Page 22: Electronic Outlook - A Comparison of Human Outlook with a … · 2019. 5. 13. · Teledyne Dalsa Manufacturer of LWIR camera and others Tobiir glasses Eye-tracking glass from Tobii

22 CHAPTER 4. ELECTRONIC OUTLOOK

Figure 4.10: RGB image. Deck and cabin lights are clearly visible on the crossingferry, a cargo vessel behind it is somewhat difficult to distinguish from lights at thecoast line. Strait of Oresund toward Elsinore. One hour after sunset

Page 23: Electronic Outlook - A Comparison of Human Outlook with a … · 2019. 5. 13. · Teledyne Dalsa Manufacturer of LWIR camera and others Tobiir glasses Eye-tracking glass from Tobii

Chapter 5

Object detection and classification

This chapter gives a brief overview of object detection and classification techniquesemployed in this study and provides a description of the choices made for sensorsystem design. We start by introducing the object detection methods and relatedworks. Then we introduce learning-based object detection algorithm we use and thetraining data captured with the on-board camera equipment. Finally, we discussthe results and performance of our implementation.

5.1 Image-based Object Detection

Object detection and classification is the task of determining what is present in agiven image. There can be multiple ways to implement a detector and classifier, andwhat method to use depends on the specific use case. The relevant types of objectdetection and classification outputs are described below to give an understanding ofwhat results can be obtained from images:

Image Classification To determine what one or more classes that an image canbelong to. Often, the result of image classification is a probability of the imagebelonging to certain classes from a predefined list. In a maritime environment,classes such as ’ferry’, ’sailboat’, ’buoy’, etc. would be obvious examples. It ishowever not directly possible to tell what specifically in the image that causedthe classification result.

Object localization To localize an object in a specific image with, e.g., a boundingbox. With object localization, it is possible to tell where in the image a specificobject is located as well as an approximation of its relative size in the image.In other words, we can get a coordinate of the position of the object in theimage.

Semantic segmentation Is a pixel-wise segmentation of one or more objects inthe image. This will provide more exact information, as only the pixels thatbelong to a certain object is classified. Consequently, it is possible to not onlyget the position of the object in the image, but also a good approximation ofthe object shape.

23

Page 24: Electronic Outlook - A Comparison of Human Outlook with a … · 2019. 5. 13. · Teledyne Dalsa Manufacturer of LWIR camera and others Tobiir glasses Eye-tracking glass from Tobii

24 CHAPTER 5. OBJECT DETECTION AND CLASSIFICATION

Instance segmentation Is similar to the semantic segmentation, but objects ofthe same class will be classified as different instances. This is advantageouswhen two of the same class are partially overlapping, they will be segmentedseparately. This enables the possibility to count the number of objects belong-ing to a class or prepares the base for tracking them individually in a series ofimages.

Many proposed solutions exist for all of the above-mentioned outputs. Recently,data-driven solutions, such as deep neural networks, have proved to give robust andaccurate results but these require large sets of annotated training data. Annotationsoften have to be done manually, and especially pixel-wise annotations for semanticand instance segmentation requires accurate and therefore cumbersome annotation.Techniques that require less or no prior data also exist but tend are less generalthan a learning-based approach. Since our system is operating near the coast, manytypes and sizes of boats and ship can appear in the images. Additionally, we canhave both land and water as background. The following section provides an outlineof some challenges for maritime environments along with related prior work.

5.2 Related work

Object detection, classification and tracking in a maritime environment is a well-explored area, and several previous works address this. Challenges include wavesthat can cause a rapid change in the frame of reference [10], sudden change ofillumination and unwanted reflections from the water [5], and the possibility of poorweather conditions that reduce the range of sight. The survey papers [24], [21]mention a range of methods that deal with detection and classification in images ofmaritime environments. Horizon line detection and background subtraction appearsto be effective for object detection [34], [32], [25]. Object tracking was shown by [7] tobe efficient when using a scale-invariant feature transform (SIFT). Utilizing infraredand visible light images was analyzed in [24], and also thermal imaging seems tohave the ability to provide information about objects on the water [19]. With recentprogress in deep learning based segmentation and classification methods, visiblelight images is an obvious choice for object detection. Training data is available inthe ImageNet [8] picture base, but images already exists and can provide a base fortraining. For specifically maritime environments [18] and [6] show that deep learningbased methods are effective and annotated data from the maritime environment existfrom harbour of Singapore [24]. This study has used training data collected fromobservations on board ferries in Danish coastal waters.

In our study, [30] investigated different networks for machine learning, includingR-CNN and Fast R-CNN, which do not employ segmentation. The classificationwas of modest accuracy in general, but objects with many instances in the trainingdata set, in this case buoys in danish coastal areas, performed much better thanaverage precision and recurrence. False positives observed by machine learning, e.g.confusing a cloud for a sail, could be remedied by combining deep learning withmethods from classical computer vision. These were investigated in [25].

Page 25: Electronic Outlook - A Comparison of Human Outlook with a … · 2019. 5. 13. · Teledyne Dalsa Manufacturer of LWIR camera and others Tobiir glasses Eye-tracking glass from Tobii

5.3. MASK R-CNN DETECTION AND CLASSIFICATION 25

Figure 5.1: Mask R-CNN network. (From:https://medium.com/@jonathan hui/image-segmentation-with-mask-r-cnn-ebe6d793272 )

5.3 Mask R-CNN detection and classification

Based on results of eye-tracking measurements, we decided to use the Mask R-CNN [16] on visible light images, which is able to perform pixel-wise segmentationof several classes. This section will give an introduction to the specific architectureof the Mask R-CNN implementation utilized for the visible light images. Objectsthat are within visual range of the cameras are detected and classified using a Con-volutional Neural Network (CNN), also referred to as deep learning technology. Thenetwork architecture employed in this study to detect different objects in the mar-itime environment is Mask R-CNN [16], which has the novelty of not only beingable to recognize and detect (bounding box) of several classes, but is also able tosegment all instances of each one and create the corresponding binary mask at apixel level. Mask R-CNN is the culmination of an architectural model that startedwith a Region-Based Convolutional Neural Network (RCNN) [13], followed by Fast-RCNN [12] and then Faster-RCNN [26]. The Mask R-CNN architecture, shown inFigure 5.1, was employed in this study. It has branch at the end of the CNN thatpredicts mask-segmentation at a pixel-to-pixel level. Misalignment and loss of dataare avoided, and pixel level precision is obtained by a RoI-Align layer that usesbi-linear interpolation to remove quantization at the region of interest boundaries.

5.4 Dataset

In this study object detection is currently purely based on image data, so the fol-lowing will give a basic description of the digital images produced by our visionsystem. We found that existing maritime image data-sets are not sufficient to coverthe scenarios we encounter in our recordings. Consequently, we choose to manuallylabel a small set of images to refine the Mask R-CNN. The annotation process willalso be mentioned in this section.

Page 26: Electronic Outlook - A Comparison of Human Outlook with a … · 2019. 5. 13. · Teledyne Dalsa Manufacturer of LWIR camera and others Tobiir glasses Eye-tracking glass from Tobii

26 CHAPTER 5. OBJECT DETECTION AND CLASSIFICATION

Image Format

We define a digital image as a 2-dimensional array of discrete values representingthe pixels in the image. The number of horizontal and vertical pixels in the imageis referred to as the resolution of the image and is a determining factor of howdetailed the information of a given scene can be stored in an image. The amountof information stored per pixel is referred to as the bit-depth. In a grayscale image,a pixel commonly contains 8 bits of information which means that each pixel cantake 256 levels of intensity. A higher bit depth allows storing more information ineach pixel, as for example, a higher dynamic range. An image can consist of one ormore channels, and an example of this is a colour image which contains 3 channels:red, green and blue respectively. This means that a colour image takes more storagespace and consequently is slower to process than a grayscale image of the sameresolution. On the other hand, it contains more information which can be valuablefor some applications. In our case, the resolution the colour images are a quarter ofthe monochrome images, because they need to be GRBG demosaiced to produce acolour image from their raw format.

To be able to accurately detect objects in the images we acquire with our system,we need to train our model with similar data. We do this by manually annotatinga set of images, which will be further described in the following.

Instance Labeling

A subset of images are hand-annotated and used for both network refinement and totest the performance of the detection algorithm. The subset is labelled for instancesegmentation so that pixels belonging to each object in the image is labelled sepa-rately with a polygon shape. Manually labelling images for instance segmentationis a time consuming and to ease the process we use a free web-based annotation toolLabelMe [29] to create polygons. Figure 5.2 shows the LabelMe interface where theuser can separately draw polygons for each object as desired. When the user entersthe interface a selected image from the repository will be displayed in the centre.The user may label object of interest by clicking on ’create polygon’ and selectingcontrol points along the boundary of the object. Each object is assigned a class bynaming them appropriately and an image can contain several classes and objects.

The process of labeling is rather time-consuming when segmentation is desired,because the exact shape of each object has to be marked. Annotation using only abounding box around objects is much faster. The simpler bounding box approach forlabeling can be used for detection and classification, albeit with a different network.

Data

The training data consist of images captured with our onboard RGB camera setupand additional images were acquired with a DSLR camera on separate trips. Imagesfrom internet sources are also added to the training data. All images are manuallylabelled using the previously mentioned technique.

Page 27: Electronic Outlook - A Comparison of Human Outlook with a … · 2019. 5. 13. · Teledyne Dalsa Manufacturer of LWIR camera and others Tobiir glasses Eye-tracking glass from Tobii

5.5. TRAINING 27

Figure 5.2: Screenshot of the LabelMe interface. The green polygons represent theboundary of the labelled pixels for a boat and two buoys respectively.

The validation set only consists of images from the onboard RGB camera setup,as we wish to evaluate the performance of the object detection on our on-boardcamera system. In summary, the labelled images for the dataset consists of:

Data source Number of imagesOn-board RGB camera setup 330On-board DSLR 179Internet source 8In total 517

The 517 images are annotated with two classes: buoy and ship. A total of 600buoys and 639 ship instances are annotated across the dataset. Examples of theimages labelled for instance segmentation are shown in Figure 5.3.

5.5 Training

We split the onboard RGB images so that 406 images are used for training and111 images are used for validation. To produce additional training data we use thefollowing data augmentation on each of the on-board RGB training images:

Image rotation Randomly rotating the images randomly between −25 deg and25 deg.

Image flip Flipping images horizontally (mirroring).

Page 28: Electronic Outlook - A Comparison of Human Outlook with a … · 2019. 5. 13. · Teledyne Dalsa Manufacturer of LWIR camera and others Tobiir glasses Eye-tracking glass from Tobii

28 CHAPTER 5. OBJECT DETECTION AND CLASSIFICATION

Figure 5.3: Sample images from our labelled dataset. The first row shows the imagescaptured by our RGB cameras. The second row shows the masks of the labelledpixels, where the two classes are shown in red and blue respectively. Note that theinstances within a class are labelled individually, despite they are visualized withthe same colour.

(a) Original (b) Rotated (c) Flipped(d) Flip + Ro-tate (e) Added noise

Figure 5.4: Examples of data augmentation on the training set. The original train-ing image is shown on the left and the next 4 images show the output after dataaugmentation.

Image flipping and rotating Combining the flipping and rotation.

Noise addition A colour replaces the image pixel for every 50 pixels.

The augmentation is visualized in Figure 5.4 and increases the dataset with anadditional 5 × 406 images. Each of these images are cropped into 16 regions in a4 × 4 grid. After this operation, the total increase of the dataset is 16 × 5 × 406images, resulting in 16 × 5 × 406 + 406 × 5 = 34510 images.

The Mask R-CNN has been pre-trained using the weights from the COCOdataset [20] and we fine-tune the network to detect the two classes provided inour training data: buoy and ship. The network was trained for 40 epochs on thefirst 4 layers (classificatory), then another 60 epochs for the rest of the layers andfinally 80 epochs for the whole network. The learning rate was set to 0.0003 and

Page 29: Electronic Outlook - A Comparison of Human Outlook with a … · 2019. 5. 13. · Teledyne Dalsa Manufacturer of LWIR camera and others Tobiir glasses Eye-tracking glass from Tobii

5.6. PERFORMANCE 29

the momentum was 0.9. The total training time took around 24 hours on a GeForceGTX 1080 GPU.

5.6 Performance

We evaluate the performance of our network using the validation images consisting ofonly acquisitions from the on-board RGB camera system. With the above-mentionedtraining procedure, we obtain a mean average precision (mAP) of 62.74%. The 0.5-mAP is used which means that intersections of regions less than 50% are not includedin the calculation.

For our study, we work with two stages of object detection. First of all, to detectand classify a relevant object in the image. Secondly, how accurately it is segmented.To discuss the results, we use the following terminology:

True positive Object present in the frame and detected.

False positive A detection occurs in the frame but without the presence of anobject.

True negative Object not present in the frame and no detection occurs.

False negative Object present in the frame and not detected.

For our application, we need a good overall localization of the object in the image,but not necessarily a precise segmentation border around the object. Figure 5.5shows sample predictions of the network and with visual inspection, we concludethat segmentation of the objects are acceptable in most cases where a true positivedetection occurs.

We also wish to investigate to what extent the network is detecting the objects itis supposed to find, the occurrence of false positives and false classifications. To dothis we note down the comparison of the reference (ground truth) annotations withthe predictions provided by the network. The precision of the segmentation maskis omitted here, so it is only the object classification which is reflected in this partof the results. Note that our validation set consists of annotated images with oneor more objects, but also images without objects are included in the set. Table 5.1shows the results of the object detections and classifications. We consider the twoobject classes buoy and ship and divide the detections in a near and far field. Theborder between near and far is determined by human inspection and is therefore nota measurable distance.

The results show that the near-field object detections are working the best, 100%of the buoys and 100% ships are found and 0 buoys and ships misclassified. In thefar field, only approximately 33% of the buoys and 66% of the ships are detected.Mis-classification is present at a far distance, where 1 buoy is detected as a ship,while 0 ships are detected as a buoy. A total of 6 buoys and 34 ships were detected inthe images without actually being present. Note that these numbers were originallya bit higher, as our own ship is visible in some of our validation images, see examplesin Figure 5.6. False detections caused by our own ship can be omitted as they can

Page 30: Electronic Outlook - A Comparison of Human Outlook with a … · 2019. 5. 13. · Teledyne Dalsa Manufacturer of LWIR camera and others Tobiir glasses Eye-tracking glass from Tobii

30 CHAPTER 5. OBJECT DETECTION AND CLASSIFICATION

Figure 5.5: Example outputs from the Mask R-CNN network fine-tuned with ourtraining data. The predicted instance segmentations are visualized on top of theinput image.

Detected

B S ∼B ∼S

Ref

eren

ce

nearB 47 0 0 -

S 0 83 - 0

farB 27 1 54 -

S 0 51 - 0

none∼B 6 - -

∼S 34 - -

Table 5.1: Performance of the object detection. The detected objects are comparedto the reference objects from the validation set. The number of detections (and nodetections indicated with ∼) is noted for the two types of objects: buoy (B) andship (S). The buoy and ship detections are divided into two categories: near andfar.

Page 31: Electronic Outlook - A Comparison of Human Outlook with a … · 2019. 5. 13. · Teledyne Dalsa Manufacturer of LWIR camera and others Tobiir glasses Eye-tracking glass from Tobii

5.6. PERFORMANCE 31

Figure 5.6: Examples of false positives. Some origin from sea reflections and othersfrom clouds. Combination with classical methods could be one way to avoid suchartifacts.

easily be identified as false positives since they will occur at a fixed location in allimages.

The remaining part of the false positives are most often detections on the waterwhere a piece of land far away is detected as a ship or in the region above thehorizon line, where clouds are detected as ships. Figure 5.6 shows examples of thefalse positive detections. False positives in the cloud region could be removed bydetecting the horizon in the image. Additionally, it would be an advantage to addmore object classes to fully cover the objects present on the water in our image set.

The false negatives shown in Table 5.1 appear in particular when objects are faraway. In particular, these missed detection of objects at long distance is related tothe pixel area the object has in the images. Figures 5.7 and 5.8 show histogramsof missed detections, shown i blue color, and of detections, shown in red color, asfunction of pixel area occupied by the object. All objects larger than 2500 pixelsare detected but are not shown in these histograms. This classification behaviour isa consequence of the choice of network and of its training. The Figures also showthat missed detection happens in a few cases for objects of larger area in the image.Object tracking and more robust classification methods could improve the detection,but it should be noted that approaching objects are eventually detected as distanceto own ship decreases.

Other methods that are specialized to detect objects of very small size in images[9], could detect a rib boat already when it occupied 5 pixels in an image and couldestimate the category when occupying 38 × 4 pixels.

Using object tracking and fusion with information from NIR and WLIR imagingcould provide additional suppression of artifacts, but this could not be included inthe scope of this study.

Page 32: Electronic Outlook - A Comparison of Human Outlook with a … · 2019. 5. 13. · Teledyne Dalsa Manufacturer of LWIR camera and others Tobiir glasses Eye-tracking glass from Tobii

32 CHAPTER 5. OBJECT DETECTION AND CLASSIFICATION

Figure 5.7: Histogram of pixel area versus buoy detections.

Figure 5.8: Histogram of pixel area versus ship detections.

Page 33: Electronic Outlook - A Comparison of Human Outlook with a … · 2019. 5. 13. · Teledyne Dalsa Manufacturer of LWIR camera and others Tobiir glasses Eye-tracking glass from Tobii

Chapter 6

Results

This chapter compares the human outlook, made by assessing the fixations de-termined by the eye-tracking system, with object classifications made using RGBimages.

Comparison between the electronic systems outlook capabilities and the humancounterpart are hence done by looking at the instant of first observations of a givenobject. The eye-tracking software gives an indication of fixation on an object whenthe human lookout has been gazing at it for a certain length of time. This time iscompared to the timestamp that the Mask R-CNN indicates its first detection andclassification of the object. Figure 6.1 shows a snapshot of of eye-tracking. Theright part shows what the lookout is focusing on. The yellow line on this shows thatthe eyes wanders around, which is normal. Fixation is indicated by the red circle.The Electronic Outlook is illustrated in Figure 6.2.

Figure 6.1: Eye-tracking of the manual look-outs fixations. Left: Forward facingcamera used as reference in the analysis. Right: Eye-tracking result. The yellowspot surrounded by a thin red line indicates fixation on an object.

6.1 Temporal Comparison

This section presents an analysis of the time-wise differences between the electroniclookout system and the human counterpart. This is achieved by timestamping thedetections of objects observed by the electronic lookout and comparing them withfixations captured by the eye-tracking system. A comparison is done by examining

33

Page 34: Electronic Outlook - A Comparison of Human Outlook with a … · 2019. 5. 13. · Teledyne Dalsa Manufacturer of LWIR camera and others Tobiir glasses Eye-tracking glass from Tobii

34 CHAPTER 6. RESULTS

Figure 6.2: Object detection and classification on two RGB images are shown byhighlighting the detected object and showing the bearing to detected objects.

the difference∆tobs = tHO − tEO (6.1)

where tHO is the time that the eye-tracking system indicates the first fixation onan object, and tEO is the time that the electronic outlook first detects and classifiesthe same object. Figure 6.3 shows a histogram of ∆tobs. Figure 6.4 shows the

Figure 6.3: Histogram of time differences between observations done by the humanlookout and the electronic lookout (calculated by (6.1)). The impossed normaldistribution has the following parameters: µ = 23.9 s and σ = 41.0 s. This meansthat electronic outlook classifies objects earlier than the human eye fixation by 24seconds in average.

time difference ∆tobs histogram for ships and buoys separately. A positive value of

Page 35: Electronic Outlook - A Comparison of Human Outlook with a … · 2019. 5. 13. · Teledyne Dalsa Manufacturer of LWIR camera and others Tobiir glasses Eye-tracking glass from Tobii

6.1. TEMPORAL COMPARISON 35

time difference means that electronic outlook classifies an object earlier than thenavigator has a fixation on it.

Figure 6.4: Histogram of time differences between observations done by the humanlookout and the electronic lookout (calculated by (6.1)). In mean, the electronicoutlook detects and classifies objects 30 s faster for ships and 11 s for buoys, com-pared to the human eye fixation. It is expected that the negative outliers could beavoided by improving the neural network.

The time elapsed between the instant of detection of an object and the instantwhen this object passes behind the RGB camera’s field of view is defined as the timeto react. Two time differences are defined to analyse this characteristic,

∆tHO = tpass − tHO (6.2)

∆tEO = tpass − tEO (6.3)

where tpass is defined as the time instant when the object passes behind the RGBcameras’ field of view.

Figure 6.5 shows a scatter plot of ∆tHO vs. ∆tEO. It is seen that electronicoutlook allows more time to react. Figure 6.6 shows the same results but zoomedin at the range 0 − 200 s before passing own vessel. This range covers the majorityof the measurements.

Page 36: Electronic Outlook - A Comparison of Human Outlook with a … · 2019. 5. 13. · Teledyne Dalsa Manufacturer of LWIR camera and others Tobiir glasses Eye-tracking glass from Tobii

36 CHAPTER 6. RESULTS

Figure 6.5: Scatter diagram of time to react. The trend line shows that time toreact is longer with electronic outlook than time to react from a fixation.

Figure 6.6: Scatter diagram of time to react. The plot shows the range 0 − 200 s.Again, the trend line shows that time to react is longer with electronic outlook thantime to react from a fixation.

Page 37: Electronic Outlook - A Comparison of Human Outlook with a … · 2019. 5. 13. · Teledyne Dalsa Manufacturer of LWIR camera and others Tobiir glasses Eye-tracking glass from Tobii

Chapter 7

Discussion

Since the ship has Radar and AIS sensors on board, the detection of objects that arevisible to Radar or have AIS transmitters installed, could be done quite accurately.However, several objects are not visible on Radar, such as leisure surf borders and seakayaks, boats without Radar reflector and AIS transmitter, and even containers thatwere accidentally dropped over board. Electronic outlook with object classificationis therefore very important for the ship so that it acts in a safe manner also when nonRadar detectable objects are in the area. Thus, a combination of object positionsfrom these sensors and the Mask R-CNN architecture could increase the performanceand the results. An example of this is by using the detected objects positions fromthe radar as possible region proposals in the network.

Further results will therefore fuse on-board radar and AIS information to improvethe performance of the vision system. This will require calibration that enablesRadar and AIS data to be studied from their respective coordinate systems into e.g.the pixel-coordinates of the input images to the CNN. This data could be used forregion proposal in the network and be particularly useful in situations with reducedvisibility of the cameras.

Coverage of this analysis

Some kinds of behaviour are related to look-out, which are not captured by onlyobserving the areas of fixtures with eye tracking glasses, but require further inter-pretation:

• General visual observation (watching) of nothing in particular, but often fo-cused on the direction of the vessel and abeam/passed in relation to the pro-gression of the navigation.

• Exogenous-oriented attention in relation to above item 1 – something turns up.This can include comparison or verification with information from instrumentse.g. radar or AIS.

• Endogenous-driven observation of objects from other sources – for instance seacharts (buoys), radar and AIS (vessels) – expected to be observable.

Such interpretation of the situation, which is part of a situational awarenessscenario, was not within the scope of this study.

37

Page 38: Electronic Outlook - A Comparison of Human Outlook with a … · 2019. 5. 13. · Teledyne Dalsa Manufacturer of LWIR camera and others Tobiir glasses Eye-tracking glass from Tobii

38 CHAPTER 7. DISCUSSION

Repeated observations to determine if there is a risk of collision and in thisconnection take countermeasures is visible in the eye-tracking measurements, andso is repeated observations to determine if countermeasures have the desired effect,but such awareness behaviours of the navigator are also outside the scope of theobject detection and classification presented in this report.

Electronic outlook as decision support

Look-out is just one among several tasks of the navigator on the bridge. Other tasksinclude: Observation of the condition of engines and systems; Handling of cargo andpassengers; Safety-related routines; Communication internally on board the vesseland with external parties; Management of staff and other administrative tasks; QAand documentation tasks; Handling of safety-critical situations on board.

With several other tasks to care for, which might sometimes distract the naviga-tor, it is believed that electronic outlook could serve as a fifth sense decision supportfor the navigator and perhaps make it possible to have temporally unmanned bridgein conditions with little to no other traffic.

Page 39: Electronic Outlook - A Comparison of Human Outlook with a … · 2019. 5. 13. · Teledyne Dalsa Manufacturer of LWIR camera and others Tobiir glasses Eye-tracking glass from Tobii

Chapter 8

Conclusions

This study compared human outlook with electronic. Using instance of fixationof eye-tracking glasses with instance of electronic outlook by cameras and mask-RCNN classification, the study provided statistics for a comparison of instant ofdetection/fixation and time to object passes out of camera’s field of view, which isclose to passage of own ship and hence one of the essential parameters for objectdetection.

The performance of the Mask-RCNN was evaluated on the validation set of an-notated RGB images. Object detection performance showed a satisfactory detectionprobability for objects larger than 400-500 pixels in an image, a quantification thatis useful for camera system design for electronic outlook.

Some outliers were found to exist in form of false detections. A single instance ofmissed detections was also also found in the validation data. Robustification of theclassifiers will be needed to obtain the required dependability of electronic outlookand is a topic of further research, and techniques could include object trackingand combination of classical image processing with the deep learning methods forclassification.

The situational awareness elements in a comparison were not covered but willbe the subject of further research.

The main findings were:

• Eye-tracking glasses were found useful to show fixations on objects at sea indaylight conditions.

• The computer-vision algorithms detects objects in parallel, the human does sosequentially, and the computer classifies objects in average 24 sec faster thanthe navigator has a fixation on the object. The deep learning algorithm trainedin this study should, however, be improved to achieve better performance insome situations.

• The time between object detection and passage of own ship is adequate formaking navigation decisions with both human and electronic outlook.

• Low-light conditions (dusk and night) are effectively dealt with by Long WaveInfraRed (LWIR) camera technology. LWIR shows objects as equally visibleat day and night.

39

Page 40: Electronic Outlook - A Comparison of Human Outlook with a … · 2019. 5. 13. · Teledyne Dalsa Manufacturer of LWIR camera and others Tobiir glasses Eye-tracking glass from Tobii

40 CHAPTER 8. CONCLUSIONS

• Colour information from cameras is necessary to assist decision support andelectronic navigation.

• A system for electronic outlook should employ sensor and data fusion withradar, AIS and ECDIS.

• Decision support based on electronic outlook should include object trackingand situation awareness techniques.

• Quality assurance and approval of machine learning algorithms for object clas-sification at sea has unsolved issues. A standard vocabulary ought be availablefor objects at sea, and publicly available databases with annotated images fromtraffic in both open seas, near coast areas and rivers should be available in or-der for authorities to assess quality or approve navigation support based onmachine learning methods.

Acknowledgements

The project will like to acknowledge the collaboration from Owner, Master andNavigators on M/F Højestene, M/F Marstal and M/S Pernille for allowing us onboard for eye-tracking measurements and to these ferries and to M/F Isefjord forcollaborating by inviting the team on board for camera mapping of objects at seawhere the respective ferries operate. The authors would also like to acknowledgethe dedicated efforts made by present and former laboratory engineers and graduatestudents. Jakob Bang from the Danish Maritime Authority and Jens Peter WeissHartmann from the Danish Geo Data Authorithy have guided and advised duringthe study, and the study was made possible by funding from the Danish MaritimeFoundation via DTU’s Maritime Centre. The authors gratefully appreciate theenthusiastic collaboration with the above mentioned collaborators and advisers andacknowledge, in particular, the funding made available.

Page 41: Electronic Outlook - A Comparison of Human Outlook with a … · 2019. 5. 13. · Teledyne Dalsa Manufacturer of LWIR camera and others Tobiir glasses Eye-tracking glass from Tobii

Bibliography

[1] (2018). Tobii Pro Glasses 2, User’s Manual,, 1.0.5 edition.

[2] Aleem, S. (2017). Analysis of shiphandlers’ eye-gaze and simulation data forimprovements in COVE-ITS system. PhD thesis, Naval Postgraduate School,,Monterey, California.

[3] Bjørneseth, F., Loraine, C., Dunlop, M., and Komandur, S. (2014). Towardsan understanding of operator focus using eye-tracking in safety-critical maritimesettings. In Proc. Int. Conference on Human Factors in Ship Design & Operation,Glasgow.

[4] Blanke, M., Hansen, S., Maurin, A. L., Stets, J. D., Koester, T., Brøsted, J. E.,Nykvist, N., and Bang, J. (2018). Outlook for Navigation – Comparing HumanPerformance with Robotic Solution. In Proc. ICMASS 2018, 1st InternationalConference on Maritime Autonomous Surface Ship, Busan, Republic of Korea.

[5] Bloisi, D. D., Pennisi, A., and Iocchi, L. (2014). Background modeling in themaritime domain. Machine vision and applications, 25(5):1257–1269.

[6] Bousetouane, F. and Morris, B. (2016). Fast cnn surveillance pipeline for fine-grained vessel classification and detection in maritime scenarios. In AdvancedVideo and Signal Based Surveillance (AVSS), 2016 13th IEEE International Con-ference on, pages 242–248. IEEE.

[7] Can, T., Karali, A. O., and Aytac, T. (2011). Detection and tracking of sea-surface targets in infrared and visual band videos using the bag-of-features tech-nique with scale-invariant feature transform. Applied Optics, 50(33):6302 – 6312.

[8] Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. (2009). Ima-genet: A large-scale hierarchical image database. In Computer Vision and PatternRecognition, 2009. CVPR 2009. IEEE Conference on, pages 248–255. Ieee.

[9] Dulski, R., Milewski, S., Kastek, M., Trzaskawa, P., Szustakowski, M., Ciurap-inski, W., and yczkowski, M. (2011). Detection of small surface vessels in near,medium and far infrared spectral bands. In Proceedings of SPIE, volume 8185,pages U1–U11, Prague.

[10] Fefilatyev, S., Goldgof, D., Shreve, M., and Lembke, C. (2012). Detection andtracking of ships in open sea with rapidly moving buoy-mounted camera system.Ocean Engineering, 54:1–12.

41

Page 42: Electronic Outlook - A Comparison of Human Outlook with a … · 2019. 5. 13. · Teledyne Dalsa Manufacturer of LWIR camera and others Tobiir glasses Eye-tracking glass from Tobii

42 BIBLIOGRAPHY

[11] Forsman, F., Sjors, A., Dahlman, J., Falkmer, T., and Lee, H. C. (2012). Eyetracking during high speed navigation at sea. Journal of Transportation Tech-nologies, 277.

[12] Girshick, R. (2015). Fast r-cnn. 2015 IEEE International Conference on Com-puter Vision (ICCV).

[13] Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014). Rich featurehierarchies for accurate object detection and semantic segmentation. 2014 IEEEConference on Computer Vision and Pattern Recognition.

[14] Hareide, O. and Ostnes, R. (2017a). Scan pattern for the maritime navigator.Transnav – the International Journal on Marine Navigation and Safety of SeaTransportation.

[15] Hareide, O. S. and Ostnes, R. (2017b). Maritime usability study by analysingeye tracking data. The Journal of Navigation, 70(5):927–943.

[16] He, K., Gkioxari, G., Dollar, P., and Girshick, R. (2017). Mask R-CNN. 2017IEEE International Conference on Computer Vision (ICCV).

[17] Holmqvist, K., Nystrom, M., Andersson, R., Dewhurst, R., Jarodzka, H., andVan de Weijer, J. (2011). Eye Tracking: A Comprehensive Guide to Methods andMeasures. Oxford University Press.

[18] Leclerc, M., Tharmarasa, R., Florea, M. C., Boury-Brisset, A.-C., Kirubara-jan, T., and Duclos-Hindie, N. (2018). Ship classification using deep learningtechniques for maritime target tracking. In 2018 21st International Conferenceon Information Fusion (FUSION), pages 737–744. IEEE.

[19] Leira, F. S., Johansen, T. A., and Fossen, T. I. (2015). Automatic detection,classification and tracking of objects in the ocean surface from uavs using a ther-mal camera. In Aerospace Conference, 2015 IEEE, pages 1–10. IEEE.

[20] Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollar,P., and Zitnick, C. L. (2014). Microsoft coco: Common objects in context. InEuropean conference on computer vision, pages 740–755. Springer.

[21] Moreira, R. D. S., Ebecken, N. F. F., Alves, A. S., Livernet, F., and Campillo-Navetti, A. (2014). A survey on video detection and tracking of maritime vessels.International Journal of Recent Research and Applied Studies, 20(1).

[22] Muczynski, B., Gucma, M., Bilewski, M., and Zalewski, P. (2013). Using eyetracking data for evaluation and improvement of training process on ship’s navi-gational bridge simulator. Maritime University of Szczecin, 33(105):75–78.

[23] Pico, M., D., H., Bik, R., van der Wiel, S., and van Basten Batenburg, R.(2015). Enhancing situational awareness. a research about improvement of situ-ational awareness on board of vessels. Technical report, . Rotterdam MainportUniversity of Applied Sciences.

Page 43: Electronic Outlook - A Comparison of Human Outlook with a … · 2019. 5. 13. · Teledyne Dalsa Manufacturer of LWIR camera and others Tobiir glasses Eye-tracking glass from Tobii

BIBLIOGRAPHY 43

[24] Prasad, D. K., Rajan, D., Rachmawati, L., Rajabally, E., and Quek, C. (2017).Video processing from electro-optical sensors for object detection and tracking in amaritime environment: a survey. IEEE Transactions on Intelligent TransportationSystems, 18(8):1993–2016.

[25] Ramırez, J. M. (2018). Detection and classification of objects at sea usingcomputer vision. Master’s thesis, Technical University of Denmark, Departmentof Electrical Engineering, Automation and Control Group.

[26] Ren, S., He, K., Girshick, R., and Sun, J. (2017). Faster r-cnn: Towardsreal-time object detection with region proposal networks. IEEE Transactions onPattern Analysis and Machine Intelligence, 39(6):1137–1149.

[27] Renganayagalu, S. and Komandur, S. (2013). Video support tools for trainingin maritime simulators. In Proc. of the International Conference on ContemporaryErgonomics and Human Factors, Cambridge UK.

[28] Rodin, C. D. and Johansen, T. A. (2018). Detectability of objects at the sea sur-face in visible light and thermal camera images. In Proceedings of OCEANS’2018.

[29] Russell, B. C., Torralba, A., Murphy, K. P., and Freeman, W. T. (2008). La-belme: A database and web-based tool for image annotation. Int. J. Comput.Vision, 77(1-3):157–173.

[30] Salvador, E. V. (2018). Image analysis using machine learning methods of elec-tronic outlook on board ships. Master’s thesis, Technical University of denmark,Department of Electrical Engineering, Automation and Control Group.

[31] Stadler, M. (1984). Psychology of Sailing. The sea’s effects on mind and body.Adlard Coles Ltd.

[32] Szpak, Z. L. and Tapamo, J. R. (2011). Maritime surveillance: Tracking shipsinside a dynamic background using a fast level-set. Expert systems with applica-tions, 38(6):6669–6680.

[33] van Westrenen, F. C. (1999). The Maritime Pilot at Work. Evaluation anduse of a time-to-boundary model of mental workload in human-machine systems.Dr.ing. thesis, The Netherlands TRAIL Research School. TRAIL Thesis Seriesnr T99/2.

[34] Zhang, Y., Li, Q.-Z., and Zang, F.-N. (2017). Ship detection for visual maritimesurveillance from non-stationary platforms. Ocean Engineering, 141:53–63.