Eindhoven University of Technology MASTER Impact of ... › files › 130174946 › K... · Master...

Eindhoven University of Technology

MASTER

Impact of algorithmic approximation on quality-of-control for image-based control systems

Bimpisidis, K.

Award date:2019

Link to publication

DisclaimerThis document contains a student thesis (bachelor's or master's), as authored by a student at Eindhoven University of Technology. Studenttheses are made available in the TU/e repository upon obtaining the required degree. The grade received is not published on the documentas presented in the repository. The required complexity or quality of research of student theses may vary by program, and the requiredminimum study period may vary in duration.

General rightsCopyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright ownersand it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.

• Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain

https://research.tue.nl/en/studentthesis/impact-of-algorithmic-approximation-on-qualityofcontrol-for-imagebased-control-systems(e11d9e0a-f24f-43ed-960d-dfdd5575c22b).html

Impact of Algorithmic Approximation onQuality-of-Control for Image-Based

Control Systems

Master Thesis Report

Konstantinos Bimpisidis

Department of Electrical EngineeringElectronic Systems Research Group

Assessment Committee:Assistant Professor Dr. Dip Goswami

Ph.D. Candidates Sajid Mohamed and Sayandip DeAssistant Professor Dr. Vladimir Cuk

Eindhoven, April 2019

Abstract

Image Processing (IP) applications have become popular with the advent of efficient algorithmsand low-cost CMOS cameras with high resolution. However, IP applications are compute-intensive,consume a lot of energy and have long processing times. Image approximation has been proposedby recent works for an energy-efficient design of these applications. It also reduces the impactof long processing times. The challenge here is that the IP applications often work as a part ofbigger closed-loop control systems, e.g. advanced driver assistance system (ADAS). The impactof image approximations that tolerate certain error on these image-based control (IBC) systemsis very important. However, there is a lack of tool support to evaluate the performance of suchclosed-loop IBC systems when the IP is approximated.

In this work, we study the impact of algorithmic approximation on the quality-of-control forIBC systems. We propose a framework for performance evaluation of image approximation on aclosed-loop automotive IBC system. Our framework is written in C++ and uses V-REP as thesimulation environment. For the simulation, V-REP runs as a server and the C++ module asa client in synchronous mode. We show the effectiveness of our framework using a vision-basedlateral control example.

Our results show that approximate computing allows to improve the processing time up to afactor of 3.5. The measurements on our framework allowed us to develop a thorough understandingon the impact of approximation and achieve an overall quality-of-control improvement of up to50%, when using approximate computing.

iii

Contents

Contents v

List of Figures vii

List of Tables ix

1 Introduction 1

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Related Work 5

2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 Image-Based Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.3 Lane Departure Detection Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.4 Approximate Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.5 Applications of Approximate Computing . . . . . . . . . . . . . . . . . . . . . . . . 6

3 Problem Statement 9

3.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.3 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

4 Image Based Control 11

4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4.2 Vision-Based Lateral Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

5 Approximate Computing 13

5.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

5.2 Image Signal Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

5.3 Approximating the Image Signal Processing . . . . . . . . . . . . . . . . . . . . . . 14

5.4 Performance Optimization of the Image Signal Processing . . . . . . . . . . . . . . 15

6 Lane Departure Detection 17

6.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

6.2 Performance of Feature Extraction Algorithms . . . . . . . . . . . . . . . . . . . . 18

6.3 Lane Detection Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

7 Experimental Setup 23

7.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

v

CONTENTS

8 Application Profiling 278.1 Profiling Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278.2 Profiling Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

9 Quality-of-Control Degradation due to Approximation 359.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

10 Quality-of-Control Improvement due to Controller Design 3910.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3910.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3910.3 Comparative Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

11 Conclusion 4511.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4511.2 Future Work and Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

Bibliography 47

vi

List of Figures

1.1 Generic camera-based autonomous vehicle control . . . . . . . . . . . . . . . . . . . 2

4.1 Sensor-to-actuator delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

5.1 Traditional ISP pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145.2 Scheduling between producer and consumer stages . . . . . . . . . . . . . . . . . . 16

6.1 Lane departure detection flowchart . . . . . . . . . . . . . . . . . . . . . . . . . . . 176.2 Performance of feature extraction algorithms on non-approximated images through

probability density function of identified lane pixels . . . . . . . . . . . . . . . . . 196.3 Performance of various feature extraction algorithms on approximated images through

probability density function of identified lane pixels . . . . . . . . . . . . . . . . . . 196.4 Bird’s eye view transformation using the region of interest points . . . . . . . . . . 216.5 Lane detection algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

7.1 Software framework of experimental setup . . . . . . . . . . . . . . . . . . . . . . . 24

8.1 Profiling diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278.2 Side-by-side comparison between a boxplot and a probability density function of a

normal distribution [1] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288.3 Single vs multi-image comparison of the different functions (outliers removed) . . . 318.4 Single vs multi-image comparison of the different pipelines (outliers removed) . . . 32

9.1 Image quality degradation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359.2 Modelled degradation vs actual response on curved track . . . . . . . . . . . . . . . 369.3 Normalized MSE results on straight vs curved track . . . . . . . . . . . . . . . . . 379.4 Response of various pipelines for the straight track . . . . . . . . . . . . . . . . . . 38

10.1 Straight track simulation results for all pipelines in comparison to v0 . . . . . . . . 4010.2 Curved track simulation results for all pipelines in comparison to v0 . . . . . . . . 4110.3 Normalized MSE results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4210.4 Cumulative comparison of different normalized performance metrics . . . . . . . . 4310.5 Side-by-side comparison of different normalized performance metrics . . . . . . . . 44

vii

List of Tables

4.1 Vehicle parameters and terminology . . . . . . . . . . . . . . . . . . . . . . . . . . 12

5.1 Pipeline versions for different ISP stages . . . . . . . . . . . . . . . . . . . . . . . . 155.2 Profiling of unoptimized ISP (order of magnitude) . . . . . . . . . . . . . . . . . . 16

6.1 Region of interest selection points . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

7.1 Experimental setup hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

8.1 Profiling results on single image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308.2 Sensor-to-actuator delay profiling results . . . . . . . . . . . . . . . . . . . . . . . . 308.3 Profiling results on dataset of 200 images . . . . . . . . . . . . . . . . . . . . . . . 33

10.1 Controller design parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4010.2 Overall performance results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

ix

Chapter 1

Introduction

1.1 Introduction

Autonomous driving is a trend that has been driven by the fact that human beings are known tobe susceptible to driving mistakes [2], whether it is distractions from the direct environment orthe fact that driving is done under fatigue, alcohol or drugs. Although having a fully automatedvehicle is still a challenge [3], significant progress has been achieved by ongoing research [4]; fromthe Linriccan Wonder, the very first radio controlled car, to the Mercedes-Benz Van in 1980, whichincorporated computer vision, and to the cars of today from most major car manufacturers thatpossess numerous electronic aids such as collision avoidance, advanced driving assistance systems(ADAS), and lane departure warning systems (LDWS).

Those aids, which can be considered as a substantial step towards fully automated vehicles,use at least one camera to enhance safety. As part of a Sensing-Perception-Decision architecture[5] of autonomous driving systems, cameras are a fundamental sensing device, the data of whichcan be utilized in object recognition and tracking tasks that allow for proper path planning. Withcameras being largely integrated into both mid and luxury-class vehicles [6], their annual growthrate is expected to be around 20% per vehicle by 2023 [7], playing a vital role in computer visionand, therefore, in image-based control (IBC).

As cameras are being increasingly adopted by the car industry, the urge for diverse functionalitybecomes highly relevant. Camera sensors need to be able to provide pleasant images for humandisplay, as well as predictable and reliable images to be used by computer vision [7]. Thosecamera capabilities allow ADAS to be employed for vision-based perception tasks, such as objectrecognition, localization, and mapping, and path or motion planning [6], which are, in principle,IBC tasks and may enable a safer driving experience and bridge the gap towards fully autonomousdriving.

As can be seen in Figure 1.1, an autonomous vehicle may consist of an image signal processor(ISP), the image processing algorithm and the controller. A traditional image signal processorprocesses the RAW output of an image sensor and produces a compressed image that can be usedby a vision application. Typical ISP stages include demosaicing, denoising, white balancing andcolor mapping, gamut mapping, tone mapping, and compression. Those stages, albeit standard,require excessive computational intensity to produce a high-quality output image, which in thecase of computer vision applications is not necessarily essential [8] and can be approximated.

Approximate computing is a concept that relies on building systems with acceptable behaviorfrom inexact hardware or software components. It allows to trade-off application accuracy inorder to achieve considerable performance and energy gains [9] at design time. A key challengein approximate computing is the identification of those sections either in hardware or softwarethat can actually be approximated, as there is always a risk of crashing an application if a criticalcomponent is being approximated.

1

CHAPTER 1. INTRODUCTION

System Camera

ISPApproximated

ImageProcessingController

Figure 1.1: Generic camera-based autonomous vehicle control

1.2 Motivation

An autonomous vehicle uses multiple cameras for additional safety during navigation. A use caseof a vision-based lateral control example using a single camera is studied in this project, wherethe camera output is being processed by an image processing application and the autonomousfunctionality is maintained by a controller that actuates based on the input from the camera. Thecamera produces 60 frames per second and the rest of the application needs to achieve real timeperformance in order to process all those frames.

The camera is the sensor and is attached to the system (vehicle). Each frame from the camerasensor goes through a series of processing stages. Initially, the frames need to be processed by animage signal processor (ISP), which converts the RAW output of the camera sensor to a format thatis useful for the human and computer vision. The resulting image is, then, ready to be processed bythe image processing stage, which performs feature extraction and provides information about thecar’s environment to the image-based controller. The controller uses the visual feedback from thecamera and actuates the required steering angle in order to allow the car to drive autonomously.In principle, a fundamental parameter in the control design is the sampling period. It dependson the duration the application needs in order to finish the required computations, from sensingto actuating. Typically, shorter sampling periods are necessary to achieve real time performanceand maintain high quality-of-control.

The autonomous car application with the pipeline of stages can be seen in Figure 1.1. TheISP stage can be rather time consuming and is usually mapped onto specialized hardware, such asdigital signal processors (DSP). Given its high computational complexity, we attempt to approx-imate the ISP, by coarsely skipping some of the ISP stages. Approximate computing is a techniquethat allows to improve the runtime performance of an error-resilient application, which is desiredby the controller. The error resilience of the application can be achieved in the image processingstage. When modified accordingly for the use case of lane detection, the image processing canhandle the approximation, by successfully extracting features from the inaccurate images.

For the purpose of improving the quality-of-control of this image-based control application,seven different approximated ISP pipelines are developed. This allows to analyze the trade-off between runtime improvement and quality degradation with respect to quality-of-control fordifferent degrees of approximation. The pipelines are optimized in order to exploit the availableparallelism using the Halide programming language.

2

CHAPTER 1. INTRODUCTION

1.3 Thesis Structure

This thesis consists of nine chapters. Chapter 1 is provides an introduction to the topic. Therelated work on the topic is discussed on Chapter 2. Chapter 3 provides the problem statementand the contribution of this project. Chapters 4, 5, and 6 provide background information onimage-based control, approximate computing, and lane departure detection, as those are the fun-damental blocks of this thesis. The design choices are also discussed in those chapters. Chapter7 describes the software-in-the-loop setup used for the execution of the experiment. Chapters 8,9, and 10 motivate and analyze the experimental results. In Chapter 8, the profiling of the ap-plication is described. Chapter 9 discusses the quality-of-control degradation, which is caused byapproximation without taking into account the runtime improvements. Chapter 10 discusses theimprovement in quality-of-control when the runtime improvements are taken into account. Lastly,Chapter 11 provides the conclusion of this thesis, offering a summary and future directions of thiswork.

3

Chapter 2

Related Work

2.1 Motivation

Approximate computing is gaining popularity due to the energy and performance benefits it offersin error resilient applications. Although its effects have been thoroughly studied at different levelsacross the computing stack, there is a limited amount of research on the impact of approximationson the bigger closed-loop system. The aim of this project is to improve the quality-of-control ofcompute-intensive image-based control (IBC) applications by using approximation. IBC is a classof data-intensive feedback control systems whose feedback is provided by a camera sensor. Giventhe fact that the use case of the application is in vision-based lateral control, the requirement fromimage processing is to correctly identify the lanes, using a lane departure detection algorithm.

2.2 Image-Based Control

The compute-intensive image processing stage of an IBC system result in longer sampling periods,which are considered during design time and negatively affect the QoC of the underlying system.Usually the control design of data intensive feedback control applications relies on the worst caseapproach [10]. This includes image processing computations, the duration of which might vary,but is not taken into account.

Mohamed et al [11] proposed to deal with workload variations that are caused by image pro-cessing using a scenario-aware approach. Instead of designing the controller for the worst-case(WC) scenario, continuously adapt the sampling period of the controller on the actual case atrun-time. In that way, the average sampling period is improved with respect to a WC design.However, this approach treats the image preprocessing pipeline as a black box and assumes aconstant image preprocessing delay.

2.3 Lane Departure Detection Algorithms

The image processing step is required in order to process the output of the image sensor andextract useful information that can be used by the controller for decision making. Yeniaydin etal [12] identified the impact that excessive computations in the image processing stage can havefor real-time applications and proposed a feature-based detection algorithm, which only performsbasic operations in order to identify the lane markers. In the preprocessing step, the RGB imageis converted to gray-scale in order to select the region-of-interest (ROI) and extract two binaryimages, out of which one is used for global thresholding and the other for edge detection usingthe Sobel filter. Those two binaries are, then, combined using neighborhood-AND to generate thebird’s eye view (BEV) of this image, and lane identification is performed based on the maximumlikelihood estimation on histogram plots.

5

CHAPTER 2. RELATED WORK

Although a feature-based lane detection approach such as the one in [12] allows for increasedflexibility for each one of the intermediate steps of the algorithm, it uses excessive steps, whichmay hinder the real-time performance. Baili et al [13] followed a slightly different approach;using a front-mounted camera view instead of BEV for selecting the ROI in their algorithm,they attempted to minimize the steps needed by the algorithm. The preprocessing stage includesconversion from RGB to intensity YCbCr image followed by an averaging filter to reduce noise.Then, the intensity image is thresholded in order to obtain the lane marks. Finally, for laneidentification, the Hough transformation is used in combination with a horizontal differencingfilter, which has the role of the edge detector. This approach achieves good performance, butis susceptible to the performance of the edge detection, since the Hough transformation requiresaccurately detected edges.

2.4 Approximate Computing

Approximate computing is gaining popularity due to the energy and performance benefits it offersin error-tolerant applications. It allows improvements in deep neural networks (DNNs), sincethey require highly intensive computations on large amounts of data in order to achieve superioraccuracy. Chen et al [14] attempted to address the challenges of DNNs by using approximatecomputing. They proposed an adaptive compression approach to reduce communication overheadin distributed DNN training and an approximate quantization by reducing the bit precision of theDNN data structures such as weights and activations in order to reduce the computation costs.The proposed approach proved to be quite effective, achieving orders of magnitude improvementsboth in computational costs and in communication overhead, and although targeting the veryspecific area of DNNs, its methodology can be exploited by a broader set of applications.

Contrary to Chen et al who targeted a software-based application of approximate computing,Buckler et al [8] endeavored to elucidate the application of approximate computing in hardware-based computer vision. Adopting the concepts of adaptive compression and approximate quantiz-ation, Buckler et al proposed an image sensor design for computer vision tasks that entirely avoidsthe ISP stages and generates inaccurate, subsampled image data that achieve a considerable re-duction in energy while keeping the classification accuracy of typical DNN datasets, such as theCIFAR-10, within acceptable levels. In order to achieve that, the proposed approach approximatesthe effect of ISP stages, such as demosaicing and gamma compression, by performing subsamplingwithin the ADC stage of the circuitry of the image sensor. However, even though the resultsproved rather promising, the validation of the design used an empirical approach and specificclassification datasets, which limits the conclusions to only fit specific algorithms and datasets.

2.5 Applications of Approximate Computing

Although the effects of approximate computing have been thoroughly studied at the algorithmiclevel, there is a limited amount of end-to-end studies that show the impact that approximationhas on high-level applications. Mercat et al [15] conducted such a study, by exploring the impactof approximate computing on a high-efficiency video coding (HEVC) encoder. HEVC encoderscontain several algorithms that aim at minimization of a cost function by performing search spaceexploration (MSSE) in order to optimize their performance. This MSSE can be approximated iflow value-added computations are skipped, leading to considerable energy benefits that can enablethe use of such encoders in ultra-low power applications such as in the Internet of Things (IoT)domain.

Hashemi et al [16] identified that approximate computing can be utilized in domains other thanmachine learning or image processing and demonstrated an end-to-end case study on the domainof biometric security systems targeting an iris scanning application. The challenge in an iris scanapplication is that it is a multidimensional problem to approximate, as more than one algorithmsneed to be involved in a pipelined fashion in order to produce results and be able to quantify the

6

CHAPTER 2. RELATED WORK

effect of approximation. The proposed approach uses a four-stage pipeline with a camera sensorstage to obtain the image, a focus assessment stage in order to pick the frame with the best focusamong subsequent frames, an iris segmentation stage that computes the center points of the irisand the pupils, and finally a normalization stage that produces the iris signature. Interestingly,the approximation knobs are chosen only at the stages that can actually provide runtime benefits;an approach that led to a considerable speedup of the entire pipeline while maintaining the targetaccuracies that are set by industry standards for iris encoding.

7

Chapter 3

Problem Statement

3.1 Problem Statement

The possible performance gains from approximating the image signal processing (ISP) can have asignificant impact on applications, in which image processing is the bottleneck, such as in image-based control (IBC) systems. A simple paradigm of an image-based control system includes asensing task, which processes the output of a camera sensor, a computation task for the controller,and an actuation task that will apply the decisions of the controller. Due to the heavy workload ofthe ISP when computing an image using the traditional ISP stages, the sensing task is the majorbottleneck. The sampling period of the controller, which equals the time between two consecutivesensing tasks, is therefore impacted and so is the quality-of-control (QoC), which depends on thesampling period. The main focus of this project is to evaluate the impact of algorithmic approx-imation on the image processing pipeline of an image signal processor on the QoC of IBC systems.

The research question, therefore, is: Can we improve the quality-of-control of image-based control systems by approximating the stages of a traditional image signal pro-cessing pipeline?

3.2 Contribution

The main contributions of this project include the following:

1. Integration of an image preprocessing pipeline and approximate computing in a baseline IBCframework.

2. Characterization of accurate and approximate algorithms to identify the regions-of-interestfor approximation.

3. Study and characterization of approximation choices of the imaging pipeline & computervision algorithms with respect to QoC.

3.1. Approximation vs. QoC without taking into account the timing impact.

3.2. Approximation vs. QoC taking into account the timing impact on sampling period.

4. Application profiling to obtain execution times and to compute optimized sampling periods.

5. Trade-off analysis between approximation and QoC for optimized sampling period designs.

6. Development of a tool-chain written in a high performance language (C++/Halide) thatallows the exploration of different approximations and their impact on quality-of-control.

9

CHAPTER 3. PROBLEM STATEMENT

3.3 Publications

This work resulted in the publication of the paper titled:

• IMACS: A Framework for Performance Evaluation of Image Approximation in a Closed-loopSystem.

This paper focuses on the tool-chain that was developed in this thesis and was accepted for the7th EUROMICRO/IEEE Workshop on Embedded and Cyber-Physical Systems (ECYPS’2019).In addition, a second publication is in progress, which focuses on the trade-offs between imageapproximation and control performance.

10

Chapter 4

Image Based Control

4.1 Motivation

Image-Based Control (IBC) systems are closed-loop feedback control systems which form thebackbone of many modern applications like advanced driver assistance systems (ADAS), lanedeparture warning (LDW) systems, autonomous driving systems, visual navigation systems etc.A typical IBC system consists of a sensing task (Ts), which processes the output of a camerasensor, a computation task (Tc), which executes the control algorithm and an actuation task (Ta)that applies the decisions made by the controller (see Fig. 4.1).

The sensing task (Ts) in a typical IBC system is composed of an image signal processing (ISP)pipeline which preprocesses the RAW image captured by the camera sensor and converts it to acompressed format. This is followed by a set of image processing (IP) algorithms designed as perthe application requirements which extract features to be used by the control algorithm. Due toheavy workloads of the ISP pipeline and the IP algorithms, the sensing task (Ts) becomes the mainbottleneck in determining a faster sampling period for the controller. This impacts the controlperformance of the entire IBC system [11].

Figure 4.1: Sensor-to-actuator delay

4.2 Vision-Based Lateral Control

Image-based control is used for applications that use visual feedback for motion control of continuous-time plant as defined by [11]:

x(t) = Acx(t) +Bcu(t) (4.1)

y(t) = Ccx(t) (4.2)

where x(t) and y(t) are the state and output of the plant, u(t) the control input, and Ac, Bc,Cc the state, input, and output matrices. Since the implementation of such a controller is on adigital platform, the above model is discretized using the Zero-Order Hold (ZOH) method, witha sampling period h:

11

CHAPTER 4. IMAGE BASED CONTROL

x[k + 1] = Adx[k] +Bdu[k] (4.3)

y[k] = Cdx[k] (4.4)

where Ad = eAch, Bd = eBch, and Cd = eCch.The sensor-to-actuator delay is considered non-zero but less than the sampling period. When

the magnitude of those two is comparable, the switching that occurs in the control input due tosensing becomes relevant and needs to be modelled [11][17]:

x[k + 1] = Ax[k] +B0(Dc)u[k] +B1(Dc)u[k − 1] (4.5)

where A = eAch, B0(h) =∫ h−Dc0

eAcs ds ·Bc, and B1(h) =∫ hh−Dc e

Acs ds ·Bc.Therefore, a new state z[k] = [x[k] u[k−1]] can be introduced, assuming that z[0] = [x(0) 0],

which results in the augmented higher order system:

z[k + 1] = Aaug(h)z[k] +Baug(h)u[k] (4.6)

where Aaug(h) =

[A B1(h)0 0

]and Baug(h) =

[B0(h)I

], with I the identity matrix.

Table 4.1: Vehicle parameters and terminology

Symbol Explanation

ux Longitudinal velocityuy Lateral velocityδf Front wheel steering angle

ψ Yaw ratelf Front axle-center of gravity (CoG) distancelr Rear axle-CoG distanceIψ Total inertia of vehicle around CoGcf Cornering stifness of the front tirescr Cornering stifness of the rear tiresm Mass of the vehicleL Look-ahead distance of the camerayL Distance from center of the laneKL Curvature of the road at look-ahead distanceεL Angle between the tangent to the road and the vehicle orientation

A case-study of an IBC application is the vision-based lateral control of autonomous vehicles. Itadditionally includes the vision dynamics taking into account the offset from the center of the laneat a look-ahead distance and the angle between the tangent to the road and the vehicle orientation[18]. Taking into account the bicycle vehicle model from [18], the state-space representation isaltered so that the state vector is x(t) = [uy ψ yL εL KL]T , the output y(t) is yL, the controlinput u(t) is δf , and the output matrix Cc = [0 0 1 0 0]. The state and input matrices are:

Ac =

− cf+crmux

−mu2x+crlr−cf lfmux

0 0 0−cf lf+crlr

Iψux− cf l

2f+crl

2r

Iψux0 0 0

−1 −L 0 ux 00 −1 0 0 ux0 0 0 0 0

and Bc =

cfmcf lfIψ

000

,

where the different parameters are explained in Table 4.1.

12

Chapter 5

Approximate Computing

5.1 Motivation

Approximate computing is a technique that leverages from the tolerance of applications to errorsor inexact computations that reduce the quality in a controllable and acceptable manner. A largenumber of modern applications can tolerate those inexact computations and boost performanceand energy efficiency [19]. Software techniques such as loop perforation, memoization, precisionscaling, task dropping, and data sampling [20] or hardware techniques such as overscaling, clockover-gating, body-biasing and refreshing rate [19], may yield benefits of possibly up to 50% [21]in execution time and analogous improvements in energy efficiency. In the case of IBC systemsthat are used in the context of ADAS or autonomous driving, the obvious bottom line is to avoidcrashing the controlled vehicle. Thus, the output of an approximate computing algorithm needsto be carefully monitored. There are different approximation strategies available [20], either forsoftware-based approximation, such as precision scaling and loop perforation, or hardware-basedapproximation, such as using inexact hardware. A common denominator of those approaches isthat they are application-specific.

5.2 Image Signal Processing

An image signal processor is specific to the image sensor and its exact implementation varies fordifferent manufacturers. Its purpose is to convert the RAW output of the image sensor into acompressed image, which is pleasant to the human vision. A traditional ISP pipeline consists ofcommonly found stages [8] such as demosaic, denoise, color transformations, gamut mapping, tonemapping and compression. As an example, those ISP stages are quite similar to the ones in theAndroid’s Camera Hardware-Abstraction-Layer (HAL) subsystem, which consists of a hot pixelcorrection, demosaic, noise reduction, shading and geometric correction, color correction, tonecurve adjustment and edge enhancement stages [22]. A RAW formatted image is the input to theISP pipeline. This RAW image is generated by the image sensor, using arrays of photodiodes.The radiance is first converted to charge by the photodiodes, and subsequently to voltage. Theoverall layout of the photodiodes array is called mosaic, as each of the photodiodes can only detecta specific color, which is either red, green, or blue. The image is, then, processed by each one ofthe ISP stages in a pipelined architecture, leading to a compressed image, which can be used bya computer vision application.

The first processing stage of an ISP is the demosaic stage. It allows to generate a three-channelRGB image, by interpolating the values of neighboring pixels. Next, the image is denoised. Dueto imperfections in the electronic circuitry of the image sensor, the demosaiced image will containsome noise. Denoising exploits the self-similarity of the image in order to improve the signal-to-noise ratio, by averaging the neighboring pixels that resemble each other. After the image isdenoised, it undergoes a series of color transformations. Color mapping is applied to correct the

13

CHAPTER 5. APPROXIMATE COMPUTING

Figure 5.1: Traditional ISP pipeline

intensity of the green channel and white balancing to match the temperature of the image to thescene it depicts. Then, the image is processed by the gamut mapping stage, in which the pixelsare corrected in order to be within an acceptable color range. A final correction to the dynamicrange of the image is applied in the tone mapping stage and directly impacts the contrast. Finally,the image is being compressed in order to either be transmitted to a different processing unit orbecome suitable for storage.

The outcome of each ISP stage can be seen in Figure 5.1. The RAW image is basically a bufferof intensities per pixel. For this reason, the image at this stage is very dark and not suitablefor a computer vision application. When the image is demosaiced, each pixel is mapped to athree-channel RGB representation with red, green and blue channels. The demosaic algorithmdetermines the intensities of the other channels, based on information from neighboring pixels.This is a lossy stage, as information in terms of bits per pixel is being lost. The demosaicedimage has a hue tone, which is due to the Bayer mosaic pattern. In the Bayer mosaic, greenis the dominant color as the color arrays in the image sensor are placed in rows of green-blueand red-green. As can be seen, color transformation corrects for that hue tone. The effect ofgamma correction is not directly visible in Figure 5.1 when compared to the output of the colortransformation stage, but adds to the highlights and black tone appearance of the final result. Theeffect of the tone mapping stage is clearly visible, as it makes the image pleasant to the humanvision, by fixing the brightness of different parts of the image.

5.3 Approximating the Image Signal Processing

The ISP stages are crucial for the correct representation of an image in a pleasant to the humanvision way. However, some of the stages may be unnecessary for computer vision and skippingthose might be beneficial for computer vision applications. Approximating the image sensor ofthe camera and its corresponding ISP chip is achieved by using an improved version of the fullconfigurable and reversible imaging pipeline (CRIP) simulator that was developed in [8]. CRIP isa tool based on the Halide programming language, which is embedded in C++ and allows boththe backward and forward simulation of a camera sensor. It allows to revert the stages from acompressed JPEG or PNG image back to a RAW format and then forward apply a different com-bination of ISP stages. Since the V-REP camera provides a compressed image, this characteristiccan be utilized within the framework in order to explore the effect of approximation on the ISPpipeline.

14


Table 5.1: Pipeline versions for different ISP stages

Version ISP Stages Explanation

v0 None No change to datav1 Rto, Rg, Rtr, Ftr, Fto Skip gamut mappingv2 Rto, Rg, Rtr, Ftr, Fg Skip tone mappingv3 Rto, Rg, Rtr, Fg, Fto Skip color transformv4 Rto, Rg, Rtr, Ftr Only do color transformv5 Rto, Rg, Rtr, Fg Only do gamut mappingv6 Rto, Rg, Rtr, Fto Only do tone mappingv7 Rto, Rg, Rtr Reverse to demosaic

The limitation of the approach for the tool used in [8] is that Buckler et al were interestedsolely on image classification accuracy on common image recognition datasets. Profiling of theapplication was out of scope and, as such, the developed versions were lacking a natural continu-ation of a realistic ISP. As an example, skipping one stage in their approach was achieved by justreversing that stage, which, although providing the desired results for image classification, madethe profiling comparison between different versions unrealistic. In order to approximate the ISP,seven different pipeline versions are used, as listed in Table 5.1. The baseline framework coincideswith v0, which does not alter the image, whereas the rest of the pipeline versions are combinationsof the different ISP stages. The stages used are either reverse e.g. reverse tone map (Rto), re-verse gamut map (Rg), reverse white balancing and color map transformation (Rtr), their forwardequivalents Fto, Fg, Ftr, or combinations of the above. In fact, the v0 performs all the reverseand forward stages and the rest of the pipelines take as input the v0 result and perform furtherprocessing according to Table 5.1.

5.4 Performance Optimization of the Image Signal Pro-cessing

The image signal processing is typically a pipeline of processing stages with expensive computa-tional complexity. As a result, it is usually mapped onto specialized hardware [23], sush as FieldProgrammable Gate Arrays (FPGAs) and Digital Signal Processors (DSPs), that can increasethe efficiency of the operations and the overall performance. Executing the ISP algorithm in aserial way, prevents the efficient utilization of the hardware upon which the algorithm is running.The Halide programming language [24] is used in order to exploit the available parallelism ofthe ISP algorithm and the data locality between adjacent stages. It allows to separate the actualcomputations and the order of those computations, by separating the algorithm from the schedule.

Var x , y , c ;Var x vi , x vo ;Var y i , y o ;

{i n t e rmed ia t e

. compute at ( output , y i )

. s t o r e a t ( output , y i )

. r e o rde r ( c , x , y )

. u n r o l l ( c )

. v e c t o r i z e ( x ) ;}

Listing 5.1: Schedule of intermediate stages

{output

. compute root ( )

. s p l i t (y , y o , y i , 3 2 )

. s p l i t (x , x vo , x v i , 3 2 )

. r eo rde r ( c , x v i , y i , x vo , y o )

. v e c t o r i z e ( x v i )

. bound ( c , 0 , 3 )

. u n r o l l ( c )

. p a r a l l e l ( x vo )

. p a r a l l e l ( y o ) ;}

Listing 5.2: Schedule of output stage

15


Figure 5.2: Scheduling between producer and consumer stages

Table 5.2: Profiling of unoptimized ISP (order of magnitude)

eisp eld eibc

s ms µs

Listings 5.1 and 5.2 show the optimized schedules for all the intermediate and the output stagesrespectively. Instead of computing the ISP stages one-by-one in a pipelined way, the computationsfor the different stages is scheduled and only performed while computing the output stage. Thecomputations are performed on three-channel (c) images of a given resolution (x × y). Sincethe images within Halide are expressed as a tuple of (x, y, c) and they are known to have threechannels. This tuple is equivalent to three nested loops, in which the outer loop is the width,the middle loop is the height, and the inner loop is the channel. An immediate optimization isto reorder the tuple, by placing the channel as the outermost loop and applying loop unrolling.Last, the image is vectorized in the x dimension, so that multiple computations can occur at once.Those computations for every producer stage are stored and computed as needed for the consumeroutput stage. This method allows to maximize the data locality of the computations.

At the output stage, the computations are performed in a parallel tiled traversal. In orderto achieve that, the image is split into smaller tiles and iterate within the pixels of those tiles.The inner x dimension is vectorized into strips of 32 scanlines that allow efficient computation ofmultiple pixels at the same time. Those strips can be computed in parallel using a thread pooland a task queue. In addition to that, loop unrolling is again used in order to optimize the loop ofthe 3 channels of the image. Figure 5.2 shows a 2-stage example of how the order of computationslooks like between the intermediate and the output stages. The consumer stage works with fourtiles and parallel execution for each tile. Within the tiles, each vector of the consumer is computedusing the just-in-time computations of the relevant row from all the producers. It is a fine, per-pixel scheduling, which only requires the necessary memory to store all producer computations.However, although using excessive memory, storing the intermediate results at the consumer’sdisposal, allows to avoid redundant computations. This is a trade-off, which depends on theunderlying architecture.

An initial profiling showed that the ISP (eisp) as a component performs drastically slower thanthe rest of the application and proves to be a major bottleneck for obtaining realistic results. Ascan be seen in Table 5.2, the execution time of the ISP is in the order of magnitude of seconds,whereas the lane detection (eld) is in the milliseconds and the controller (eibc) in the microseconds.Therefore, a serial execution of the ISP does not yield realistic results and justifies the necessityfor optimization.

16

Chapter 6

Lane Departure Detection

6.1 Motivation

Lane departure detection is a driving assistance method that uses computer vision in order toidentify the lane markers of the road and detect the disposition from the middle of the current lane.As shown in Figure 6.1, lane departure detection techniques consist of three fundamental steps;the image preprocessing, the lane detection, and the lane tracking [25][26]. The preprocessing steptypically includes a region of interest selection (ROI), followed by a bird’s eye view transformation(BEV) or a front-mounted camera perspective. The lane detection step is responsible for thefeature extraction in order to perform lane identification, which can be achieved using eitheredge detection algorithms such as Sobel or Canny, or thresholding techniques combined with edgedetection by applying neighborhood-AND [12]. Finally, the lane tracking step allows estimatingthe position of the vehicle and its disposition from the center of the lane.

Figure 6.1: Lane departure detection flowchart

17

CHAPTER 6. LANE DEPARTURE DETECTION

6.2 Performance of Feature Extraction Algorithms

As there is a plethora of techniques for lane departure detection, it is left to the applicationdesigner to opt for the most effective one for the required use case. The use case of lane departuredetection on approximated images requires an error-resilient feature extraction step, since a failureof this step can cause the failure of the entire application. In order to evaluate the performanceof common feature extraction algorithms, a baseline lane detection algorithm was used. Thepreprocessing step of the algorithm uses a bird’s eye view transformation and a transformationfrom the RGB color domain to the YCbCr color domain. The grayscale image is then used by thefeature extraction mechanism. The lane tracking step of the algorithm uses the sliding-windowapproach, which allows to examine the performance of the previous feature extraction step. Thefeature extraction stage identifies the candidate lane pixels of the image and can be verified forits correctness by examining the resulting histogram. When the lanes are correctly identified, thehistogram is bimodal, and the two prominent peaks reflect the position of the lane markers.

The baseline image processing algorithm can effectively handle image streams designated forthe human vision, but can prove problematic when it comes to approximated image streamsdesignated for computer vision. In order to identify why the approximated image streams arenot correctly handled, the feature extraction part of the baseline image processing block, whichconsists of a combined filter of the Sobel edge detector and a color masking filter, was examined.This combined filter produces a binary image, in which the identified feature is the white lanepixels. By summing those pixels along the columns of the binary image, the proximity of thosepixels to each other becomes more evident and, therefore, the most prominent peaks indicate thehorizontal position of the base of the lanes. This procedure is then followed in slices over the entireimage, which allows for sliding windows to be identified, upon which a polynomial can be fit inorder to identify the lane markers.

Following the baseline image processing approach, the lane markers can be correctly identifiedfor non-approximated images. The effect of different edge detector operators, such as the Prewitt,the Roberts, the Scharr, the Sobel, and the Canny, is explored in order to evaluate their effect-iveness. As can be seen in Figure 6.2, a combined filter of a color masking filter with an edgedetector produces similar results for Sobel, Scharr, Prewitt, and Roberts, whereas for Canny theresult is quite noisy over the entire image. Thus, the edge detectors, besides Canny, could be usedinterchangeably and provide accurate results. However, edge detectors come at a high computa-tional cost as they use convolutional kernels in order to calculate every pixel. Therefore, the bestapproach for no-approximated images, both in computational complexity and performance, is toreduce the feature extraction only in using the color masking, which produces similar results tothe ones of the different combined filters.

In order to be able to handle RAW-formatted or partially processed images, the feature ex-traction mechanism of the image processing algorithm had to be modified. The color maskingapproach is no longer sufficient to provide a good estimation for the lane markers. In fact, noinformation could be extracted as the histogram was zero for all column indexes of the investigatedRAW image, which was provided using pipeline v1. In this case, the edge detectors perform fairlywell as standalone feature extractors, as can be seen in Figure 6.3, since all of them succeed toidentify two prominent peaks in the histogram. However, in all edge detectors excluding perhapsCanny, the amount of noise in the histogram is rather high. Roberts, Scharr, and Prewitt edgedetectors are noisy along the entire histogram, whereas Sobel and Canny are noisy at the left sideof the histogram, which coincides with the area to the left of the left lane marker. In order tobe able to effectively identify the lane markers, by obtaining a histogram with minimum noise,image thresholding was used as the feature extractor. For thresholding, the Otsu’s binarizationmethod is used, which automatically calculates the threshold based on the provided histogram.The Otsu’s binarization method performs very well when the histogram is bimodal, as is in thecase of lane marker extraction.

18


Figure 6.2: Performance of feature extraction algorithms on non-approximated images throughprobability density function of identified lane pixels

Figure 6.3: Performance of various feature extraction algorithms on approximated images throughprobability density function of identified lane pixels

19


6.3 Lane Detection Algorithm

The lane detection algorithm being used is shown in Figure 6.5. It allows to process the inputstream of images and calculate the lateral deviation of the trajectory of the vehicle with respect tothe actual middle of the lane. It is based on three separate stages, which are preprocessing, featureextraction and lane tracking. The individual algorithms of each stage are selected with V-REPin mind. The goal is to be able to perform lane tracking based on images that are generatedfrom V-REP. As seen in the input step of Figure 6.5 the terrain of V-REP is developed in a waythat resembles a real-life road scenario, but assumes clearly distinguishable white lanes, which iscommon for highways in the Netherlands.

The preprocessing stage performs the necessary alterations to the image that enable an effectivefeature extraction. It consists of three steps, namely the region of interest (ROI) selection, thebird’s eye view transformation, and the color domain transformation from RGB to YCbCr. Theregion of interest selection is performed in order to obtain only a useful portion of the image forfurther processing. The points used can be seen in Table 6.1. As they reflect to locations onan image generated by V-REP, their selection is application specific and needs to be altered forimages generated by a different camera. The region of interest points are used on the subsequentstep of bird’s eye view transformation. As can be seen in Figure 6.4, the four source ROI pointsare mapped to the four destination ROI points and the image is geometrically transformed usinga perspective transformation. Once the bird’s eye view image is generated, it is being transformedfrom the RGB to the YCbCr color domain. Only the Y component is used, as it provides thegrayscale image according to the Equation 6.1 [27]. Converting the image to grayscale allows toperform operations on a single-channel image, as opposed to the three-channel RGB representation.

Y = 0.299 ·R+ 0.587 ·G+ 0.114 ·B (6.1)

The second stage of the lane detection algorithm is feature extraction. Since it is required tohandle both RGB and RAW images, the algorithm incorporates a quality-of-result (QoR) check toguarantee that the rest of the algorithm is not operating on corrupted data. This check is performedas early as possible within the algorithm to guarantee that the vehicle will not crash. This earlycrash/no crash check, therefore, plays the role of a dynamic knob for algorithmic approximation.It is achieved by checking if the brightness of each pixel in the color masked image is above athreshold. The color masking step is, therefore, used for the QoR check. In the color masking,the grayscale image is processed pixel by pixel, with pixels brighter than a threshold being set tothe highest brightness while the rest are set to zero. The outcome is an image with the white lanemarkers clearly distinguishable from the rest of the image, which is black. This outcome is usedfor the QoR check and when successful, the algorithm selects it for further processing in the latterstages. When color masking fails to identify bright pixels that represent the white markers, thealgorithm falls back into a dynamic thresholding technique. In fact, the color masking approachuses a static threshold that may identify a white pixel. When the ISP stages of the image havebeen approximated, this static threshold is no longer effective. In order to achieve the dynamicthresholding, Otsu’s binarization algorithm is being used. Otsu’s algorithm operates on bimodalhistograms and automatically identifies the optimal threshold. This allows to identify the whitepixels correctly and avoid performing subsequent calculations on corrupted data.

Table 6.1: Region of interest selection points

Point Source (x, y) Destination (x, y)

A (233, 280) (120, 0)B (277, 280) (392, 0)C (50, 512) (120, 512)D (462, 512) (392, 512)

20


Figure 6.4: Bird’s eye view transformation using the region of interest points

Figure 6.5: Lane detection algorithm

21


The last useful stage of the lane detection algorithm consists of the sliding window techniqueto identify the lane markers and calculate the lateral deviation from the middle of the lane. Thesliding window technique operates on the output image from the feature extraction stage. Initially,the histogram of the bottom half of the image is calculated and the bases of both the left and rightlane markers are identified. The two base points are used in order to center the bottom windows,within which the algorithm searches for candidate pixels. Once all candidate pixels within thefirst windows are identified, their mean position indicates the center of the following windows,which will be used for candidate pixel exploration. This process repeats until all windows that fallwithin the height of the image have been processed and all candidate pixels have been identified.The pseudo-code of the algorithm can be seen in Algorithm 1.

Algorithm 1 Sliding window algorithm

1: procedure SlidingWindow2: histogram← image bottom half3: left base← histogram left half4: right base← histogram right half5: while window < nr windows do6: left window ← calculate window(left base, window width, window height)7: right window ← calculate window(right base, window width, window height)8: if IsWithinWindow(left window) then9: left line indexes← candidate pixels

10: if IsWithinWindow(right window) then11: right line indexes← candidate pixels

12: left base← mean(left line indexes[window])13: right base← mean(right line indexes[window])

14: end

Once the candidate pixels are identified across the image and for both lane markers, the lateraldeviation from the middle of the lane is calculated. The left and right lanes are being fit with asecond degree polynomial and the lateral deviation is calculated at the look-ahead distance thatwas designed for the IBC controller.

22

Chapter 7

Experimental Setup

7.1 Experimental Setup

The experimental setup consists of a software framework, which is a modular design of four distinctcomponents, as shown in Figure 7.1. The system component refers to the vehicle with a front-mounted camera in a highway track, which is simulated using the virtual robot experimentationplatform (V-REP) [28]. V-REP is a robot simulation framework that enables the developmentof complex simulation scenarios, allowing the control entity of the simulation to be external.As external control entity, V-REP’s remote API is used in a client-server architecture, in whichthe server is the V-REP simulation and client is the control entity written in C++. V-REP’ssynchronous communication operation is being used, which allows passing each simulation stepwithin V-REP in full synchronization with the external control entity. The interaction with thevehicle is limited to extracting the image taken by the camera and setting the steering angle byaccordingly setting the rotational speed of the front wheels.

The extracted image is processed from the Image Signal Processing (ISP) component and fedto the lane detection module. The ISP component allows to simulate the image sensor of thecamera and approximate its corresponding ISP module. It is achieved using the full configurableand reversible imaging pipeline (CRIP) simulator that was developed in [8]. CRIP is a tool basedon the Halide programming language, which allows both the backward and forward simulation ofa camera sensor. It allows to revert the stages from a compressed JPEG or PNG image back toa RAW format and then forward apply a different combination of ISP stages. This is compulsoryin order to overcome the limitation of V-REP to provide a RAW output image. The CRIP toolis further developed for the purpose of this project, in order to achieve logical continuation forthe ISP pipelines that also allows for realistic profiling. Furthermore, in order to be able toachieve realistic profiling results, the different ISP algorithms have been optimized for executionin CPU, by exploiting the available parallelism and data locality. Those were the limitations ofthe approach in [8] that had to be resolved in the current project.

The lane detection component, written in C++, processes the image from the ISP componentin order to identify the lane markers and calculate the lateral deviation from the middle of thelane, which can then be used by the controller. Since it is required to handle both compressed andRAW images, the algorithm incorporates a quality-of-result (QoR) check to guarantee that therest of the algorithm is not operating on corrupted data, which will lead the vehicle to crash. Thelane detection algorithm consists of a bird’s-eye-view (BEV) transformation stage using inverseperspective mapping, a feature extraction stage based on a combined filter of color masking andthresholding, and a sliding window stage, at which the lateral deviation is calculated.

The controller is the last component of the framework and is written in C++. It models thedynamics of the vehicle and implements a proportional controller. It interacts with the lane-detection component, from which it receives the lateral deviation, and calculates the requiredsteering angle in order to keep the vehicle in the middle of the lane. The steering angle is, then,

23

CHAPTER 7. EXPERIMENTAL SETUP

Figure 7.1: Software framework of experimental setup

fed to the V-REP simulation, which is part of the system component.The application is developed and deployed on a HP Z-book workstation running the Ubuntu

18.04.2 LTS operating system. It uses an Intel i7 processor with the specification listed in Table7.1. In brief, it is a 4-core, x86 64 architecture with 8 CPU threads, clocked at 2.6 GHz. It hassplit instruction and data L1 caches, and a total of three levels of cache. The different flags listed inTable 7.1 indicate specific features of the processor. Those can be expoited by the compiler in orderto optimize the code. For example, the Halide target regarding the host CPU is: ”x86 64-linux-avx-avx2-f16c-fma-sse41”. The compiler identifies that x86 64 is the underlying architecture withlinux (Ubuntu) operating system. The compiled code will benefit from the support of advancedvector extensions (avx), their expansion (avx2), and streaming SIMD extensions 4.1 (sse41). Thoseflags define the available instruction set format and allow vectorization of operations. the fusedmultiply-add (fma) flag specifies again an instruction set that allows operation fusion.

24

CHAPTER 7. EXPERIMENTAL SETUP

Table 7.1: Experimental setup hardware

Parameter Value

Architecture: x86 64CPU op-modes: 32-bit, 64-bitByte Order: Little EndianCPUs: 8On-line CPUs list: 0-7Threads per core: 2Cores per socket: 4Sockets: 1NUMA nodes: 1Vendor ID: GenuineIntelCPU family: 6Model: 94Model name: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHzStepping: 3CPU MHz: 900.102CPU max MHz: 3500.0000CPU min MHz: 800.0000BogoMIPS: 5184.00Virtualization: VT-xL1d cache: 32KL1i cache: 32KL2 cache: 256KL3 cache: 6144KNUMA node0 CPUs: 0-7Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca

cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tmpbe syscall nx pdpe1gb rdtscp lm constant tsc art arch perfmonpebs bts rep good nopl xtopology nonstop tsc cpuid aperfmperftsc known freq pni pclmulqdq dtes64 monitor ds cpl vmx est tm2ssse3 sdbg fma cx16 xtpr pdcm pcid sse4 1 sse4 2 x2apic movbepopcnt tsc deadline timer aes xsave avx f16c rdrand lahf lm abm3dnowprefetch cpuid fault epb invpcid single pti ssbd ibrs ibpbstibp tpr shadow vnmi flexpriority ept vpid fsgsbase tsc adjustbmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smapclflushopt intel pt xsaveopt xsavec xgetbv1 xsaves dtherm idaarat pln pts hwp hwp notify hwp act window hwp epp flush l1d

25

Chapter 8

Application Profiling

8.1 Profiling Setup

In order to get a better insight on the application and identify the possible bottlenecks, theapplication needs to be profiled. The applications consists of several components, as described inChapter 7. The use of V-REP as the simulation environment for the validation of the experimentalapproach posed a fundamental difficulty. The application is processing images generated by thecamera that is part of the simulation environment. As such, the limitation of V-REP is that itdoes not provide a RAW formatted image but an RGB. This requires additional effort to traversethe ISP first backwards and then in a standard forward manner.

The profiling pipeline is depicted in Figure 8.1. The system and camera blocks refer to theactual vehicle within the simulation and its camera. They are depicted in order to have a helicopterview of the entire application and show the relevance of each one of the remaining blocks. Thosereflect the execution times that each one of the fundamental modules that were explained inSection 7.1. The only addition is the erev isp block, which reflects the time needed to traverse theISP in a backwards fashion. This is to overcome the limitation of V-REP to provide RAW imagesand is out of scope for the actual profiling. The first logical execution time to profile is the eisp,which is the time of the forward ISP pipeline. It is the actual point of ISP approximation andvaries for the different pipeline versions. Next, eld is the execution time of the lane detection stageand, finally, eibc is the execution time of the IBC controller.

The application is being profiled in order to measure the total time needed for the differentcomponents. The total execution time is calculated by Equation 8.4. This etotal is the total timeneeded from the moment an image is captured by the camera (sensing), to the moment a new

System Camera erev isp

eispeldeibc

Figure 8.1: Profiling diagram

27

CHAPTER 8. APPLICATION PROFILING

Figure 8.2: Side-by-side comparison between a boxplot and a probability density function of anormal distribution [1]

steering angle is actuated. (actuating). It is the sensor-to-actuator delay and allows to design thecontroller parameters based on the sampling period. For a given camera frame rate (frame rate)and a sensor-to-actuator delay, the sampling period (h) is calculated as shown in Equation 8.5.

eisp = Demosaic+Denoise+ Pipeline version+ Compress (8.1)

eld = Decompress+ Lane Detection (8.2)

eibc = Controller (8.3)

delay = eisp + eld + eibc (8.4)

h = ceil(delay/frame rate) · frame rate (8.5)

8.2 Profiling Method

The application is profiled using the std::chrono library that is built in C++. This library allowsto measure the elapse time of specific functions of the application. It provides a high resolutionclock that offers a resolution of 1 ns. The data analysis is performed using boxplots [29]. Theelapsed time of the application is measured in batches of one hundred iterations, with a totalof one hundred batches. The resolution of the images that were used is 512 × 256 pixels. Thedistribution of the measurements is split into quartiles and represented in the form of a boxplot.

As can be seen in Figure 8.2, the distribution can be described using five indicative parameters,namely the minimum, the first quartile (Q1), the median, the third quartile (Q3), and the max-imum. The interquartile range (IQR) is the difference between the 25th and the 75th percentiles.

28


The median corresponds to the 50th percentile, and minimum and maximum are calculated asshown in Equations 8.6 and 8.7. As can be seen in Figure 8.2, the 99.3% of the data values areless than the maximum and greater than the minimum, with 0.7% of the data being the outliers.

Minimum = Q3− 1.5 · IQR (8.6)

Maximum = Q3 + 1.5 · IQR (8.7)

This statistical approach allows to obtain a fundamental insight on the performance of theapplication. Additionally, it allows to identify the bottlenecks and evaluate the comparativeperformance of the different components of the application. The application is being profiled ongeneral purpose CPU with Linux operating system, which is not a real-time operating system.Therefore, by repeating the profiling, the instructions are loaded to the cache of the processor andthe impact of cache misses is minimized.

8.3 Experimental Results

Following the procedure described in Section 8.2, profiling of the application was performed on theplatform described in Section 7.1. The obtained results for profiling round on a single image areshown on Table 8.1. Twelve different functions are profiled, in order to calculate each element ofthe Equations 8.1, 8.2 and 8.3. In this way, the total delay for each pipeline version is calculated,according to the Equation 8.4. The result is shown in Table 8.2.

Each of the functions in Table 8.1 participates in the total execution time of the applicationand, therefore, affects the sensor-to-actuator delay. The execution time of the demosaic, denoise,compress, decompress, lane detection, and controller steps are always taken into account and areconsidered as the fixed part of the total time. On the other hand, the ISP pipelines v0 to v6 aredifferent approximations of the ISP and each one of them combined with the fixed part providesa different solution. From the functions of the fixed part, the most time consuming is the denoise.Since it uses an implementation of the fast non-local means denoising algorithm [30], it depends onthe size of the convolutional kernel that processes the image. For the purpose of our application,this kernel is of size 3, but larger kernels can lead to longer execution times. The controller, on theother hand, is the fastest component as it requires only 751 µs. This is due to the fact that thecontroller consists of only a few matrix multiplications for the computation of the state matrices.The lane detection algorithm requires approximately 3 ms, which proves to be a good result whenaiming for real-time performance.

On the variable part of the total execution time, which concerns the different ISP pipelineversions, the impact of different ISP approximations becomes evident. All approximated versionsexcept v3 manage to achieve a speedup, either more (v1, v4, v6) or less (v2, v5) significant. Thepipeline v3 is in fact slightly slower than the baseline pipeline v0, although it skips one stage. Thiscan be attributed to dataflow dependencies. Since v3 skips the color transform stage, the gamutmapping stage that follows operates directly on the data of the demosaiced image, which provesto be very ineffective. Dataflow inter-dependencies within the application is an interesting topicfor further research, but is out of scope for the current project. Out of all stages, gamut mappingproves to be the bottleneck of the application as it incurs the biggest delay to the execution time.This is attributed to the fact that its algorithm is non-linear for every pixel, leading to highercomputational complexity compared to the other ISP stages. It is, also, identified as a primecandidate for approximation.

The results in Table 8.1 provide a good insight on the bottlenecks of the application. However,they are only focused on a single 512 × 256 image and can be misleading. For that purpose, agreater dataset of 200 images of 512 × 256 resolution were profiled. The results can be seen inTable 8.3. Although the outcome is similar to the one for single image, it shows higher varianceand for almost all functions in Table 8.1, the worst-case is slightly higher than in Table 8.1. This isattributed to the fact that different images have different complexity with some of them requiringadditional computational intensity to be processed. An example of contradictory behavior is the

29


Table 8.1: Profiling results on single image

Function Minimum (ms) Q1 (ms) Median (ms) Q3 (ms) Maximum (ms)

Demosaic 0.161 0.174 0.178 0.183 0.197Denoise 6.962 7.132 7.191 7.245 7.414(v0) Ftr, Fg, Fto 41.71 43.432 43.909 44.581 46.303(v1) Ftr, Fto 3.119 3.282 3.297 3.39 3.552(v2) Ftr, Fg 31.462 33.068 33.355 34.138 35.743(v3) Fg, Fto 42.839 44.508 45.003 45.621 47.29(v4) Ftr 0.15 0.166 0.17 0.176 0.192(v5) Fg 31.088 33.528 34.152 35.154 37.594(v6) Fto 3.287 3.368 3.38 3.421 3.501Compress 1.448 1.647 1.71 1.78 1.98Decompress 2.655 2.888 2.926 3.043 3.276Lane Detection 2.901 2.96 2.978 2.999 3.058Controller 0.708 (µs) 0.724 (µs) 0.73 (µs) 0.735 (µs) 0.751 (µs)

Table 8.2: Sensor-to-actuator delay profiling results

ISP Delay (ms) Frames per second Speedup

v0 68.3 14 1.0v1 24.3 41 2.8v2 57.6 17 1.2v3 69.6 14 1.0v4 20.1 49 3.4v5 55.2 18 1.2v6 23 43 3.0v7 19.5 51 3.5

lane detection, which for all different pipelines shows lower values than for single image. This canvery well be caused by an image requiring excessive computations for the single image case. Thevisual comparison of this difference between single and multi-image profiling can be seen in theboxplot Figures 8.1 and 8.3. In those Figures, the whiskers indicate the range of the profiling setand for wider ranges the data is more scattered. This is the case for most multi-image functionsand pipelines. However, it is also noticed that the main box of the plots that indicates the majorityof the samples show a degree of agreement between single and multi-image sets, as they extendon similar values.

The overall performance of each of the different pipelines is shown in Table 8.2. It can be seenthat v0 together with v3 are the most time consuming pipelines, which for the given ISP algorithmachieve only 14 fps. On the other hand, the fastest pipelines are v1, v4, and v6, achieving a speedupof appproximately a factor of 3. The total runtime improvement for the sensor-to-actuator delayis 60% for v1, 70% for v4, and 70% for v6. On the lower end of improvement, pipelines v2 and v5achieve a runtime improvement of 20%. It is noticed that the degree of performance improvementis not related to the amount of stages skipped by the ISP pipeline, as for example v1 skips only onestage, whereas v6 skips two stages. The degree of improvement depends on the actual algorithmsthat are being approximated. Those high-performing versions have in common the absence ofgamut mapping, which is a non-linear algorithm and highly compute-intensive. Similarly, thoseversions achieve a much better utilization for a 60 fps camera sensor, by achieving up to 51 fps.

30


single multiple

3.0

3.5

4.0

4.5

Tim

e (m

s)

(a) Decompress

single multiple

3.0

3.5

4.0

4.5

Tim

e (m

s)

(b) Compress

single multiple0.15

0.20

0.25

0.30

0.35

0.40

Tim

e (m

s)

(c) Demosaic

single multiple

7.0

7.5

8.0

Tim

e (m

s)

(d) Denoise

single multiple

2.0

2.2

2.4

2.6

2.8

3.0

Tim

e (m

s)

(e) Lane Detection

single multiple

0.0007

0.0008

0.0009

0.0010

0.0011

Tim

e (m

s)

(f) Controller

Figure 8.3: Single vs multi-image comparison of the different functions (outliers removed)

31


single multiple42

44

46

48

Tim

e (m

s)

(a) ISP pipeline v0

single multiple

3.4

3.6

3.8

4.0

4.2

Tim

e (m

s)

(b) ISP pipeline v1

single multiple

32

33

34

35

36

37

Tim

e (m

s)

(c) ISP pipeline v2

single multiple

44

46

48Ti

me

(ms)

(d) ISP pipeline v3

single multiple0.14

0.16

0.18

0.20

Tim

e (m

s)

(e) ISP pipeline v4

single multiple

32

33

34

35

36

37

Tim

e (m

s)

(f) ISP pipeline v5

single multiple

3.25

3.30

3.35

3.40

3.45

3.50

Tim

e (m

s)

(g) ISP pipeline v6

Figure 8.4: Single vs multi-image comparison of the different pipelines (outliers removed)

32


Table 8.3: Profiling results on dataset of 200 images

Function Minimum (ms) Q1 (ms) Median (ms) Q3 (ms) Maximum (ms)

Demosaic 0.347 0.367 0.373 0.381 0.401Denoise 6.416 7.144 7.281 7.63 8.358(v0) Ftr, Fg, Fto 43.252 45.338 46.014 46.728 48.814(v1) Ftr, Fto 2.863 3.41 3.514 3.775 4.321(v2) Ftr, Fg 34.061 35.395 35.799 36.284 37.618(v3) Fg, Fto 42.691 45.138 45.913 46.77 49.217(v4) Ftr 0.141 0.167 0.174 0.185 0.212(v5) Fg 31.821 33.276 34.035 34.245 35.7(v6) Fto 3.096 3.26 3.286 3.369 3.533Compress v0 1.983 2.39 2.518 2.661 3.068Compress v1 1.965 2.429 2.558 2.739 3.203Compress v2 1.643 2.144 2.292 2.479 2.98Compress v3 1.9 2.437 2.592 2.796 3.333Compress v4 1.674 2.045 2.176 2.293 2.664Compress v5 1.663 2.04 2.174 2.29 2.666Compress v6 1.96 2.361 2.491 2.629 3.03Compress v7 1.64 2.017 2.148 2.269 2.647Decompress v0 3.422 4.159 4.359 4.651 5.388Decompress v1 3.293 4.24 4.521 4.871 5.818Decompress v2 3.617 4.536 4.799 5.148 6.067Decompress v3 3.647 4.583 4.849 5.207 6.143Decompress v4 3.325 4.047 4.24 4.529 5.251Decompress v5 3.316 4.04 4.231 4.522 5.247Decompress v6 3.491 4.239 4.444 4.737 5.485Decompress v7 3.36 4.095 4.285 4.586 5.322Lane Detection v0 1.978 2.053 2.073 2.103 2.177Lane Detection v1 1.978 2.047 2.066 2.094 2.163Lane Detection v2 1.966 2.033 2.054 2.078 2.145Lane Detection v3 1.982 2.048 2.068 2.092 2.158Lane Detection v4 1.965 2.035 2.054 2.082 2.152Lane Detection v5 2.599 2.671 2.692 2.72 2.792Lane Detection v6 1.976 2.053 2.072 2.104 2.181Lane Detection v7 2.598 2.669 2.688 2.717 2.789Controller v0 0.67 (µs) 0.845 (µs) 0.906 (µs) 0.962 (µs) 1.137 (µs)Controller v1 0.694 (µs) 0.844 (µs) 0.897 (µs) 0.944 (µs) 1.094 (µs)Controller v2 0.73 (µs) 0.875 (µs) 0.929 (µs) 0.972 (µs) 1.117 (µs)Controller v3 0.727 (µs) 0.877 (µs) 0.932 (µs) 0.977 (µs) 1.127 (µs)Controller v4 0.735 (µs) 0.882 (µs) 0.936 (µs) 0.98 (µs) 1.127 (µs)Controller v5 0.826 (µs) 0.927 (µs) 0.957 (µs) 0.994 (µs) 1.095 (µs)Controller v6 0.734 (µs) 0.887 (µs) 0.942 (µs) 0.989 (µs) 1.142 (µs)Controller v7 0.823 (µs) 0.926 (µs) 0.956 (µs) 0.994 (µs) 1.096 (µs)

33

Chapter 9

Quality-of-Control Degradationdue to Approximation

9.1 Motivation

The ISP of the camera sensor is approximated in order to evaluate the possible performance gainsfor the application. Each approximated pipeline skips different parts of the ISP and leads toa different output image that is, then, used by the lane detection algorithm in the applicationstream. The difference between the various pipelines lies on the level of degradation they incur tothe output image. In order to measure this degradation, the output of each pipeline is comparedto the baseline v0, using the peak signal-to-noise ratio (PSNR) as a metric. This allows to quantifythe mean squared error per pixel and per channel of the the RGB image. PSNR was chosen as themetric, because its results for the given set of approximated images follow the human perception.Higher PSNR values indicate better image reconstruction.

Next, the impact of image degradation is analyzed. First, the degradation is modeled as whitenoise, using the result for the calculated variance on the same dataset for all approximations. Thisallows to project a theoretic range, in which the actual result of a simulation is expected. Last,the simulation of the entire application is run for all pipelines, without taking into account thetime improvements, using a baseline sampling period of 60 ms. Settling time and mean squarederror (MSE) are the two metrics used to quantify the actual impact on the quality-of-control. TheMSE is calculated for two different use cases, namely a straight and a curved track.

v1 v2 v3 v4 v5 v6 v7

15

20

25

30

35

PSN

R (D

B)

(a) PSNR difference of ISP pipelines

v0 v1 v2 v3 v4 v5 v6 v70.0

0.5

1.0

1.5

2.0

2.5

3.0

Vari

ance

1e 7

(b) Variance of different ISP pipelines

Figure 9.1: Image quality degradation

35

CHAPTER 9. QUALITY-OF-CONTROL DEGRADATION DUE TO APPROXIMATION

0 1 2 3 4 5 6 7Time (s)

30

0

30

60

Late

ral D

evia

tion

(cm

)

v0v1

(a) Modelling result v0 - v1

0 1 2 3 4 5 6 7Time (s)

30

0

30

60

Late

ral D

evia

tion

(cm

)

v0v1

(b) Simulation result v0 - v1

0 1 2 3 4 5 6 7Time (s)

30

0

30

60

Late

ral D

evia

tion

(cm

)

v0v2

(c) Modelling result v0 - v2

0 1 2 3 4 5 6 7Time (s)

30

0

30

60

Late

ral D

evia

tion

(cm

)

v0v2

(d) Simulation result v0 - v2

0 1 2 3 4 5 6 7Time (s)

30

0

30

60

Late

ral D

evia

tion

(cm

)

v0v4

(e) Modelling result v0 - v4

0 1 2 3 4 5 6 7Time (s)

30

0

30

60

Late

ral D

evia

tion

(cm

)

v0v4

(f) Simulation result v0 - v4

0 1 2 3 4 5 6 7Time (s)

30

0

30

60

Late

ral D

evia

tion

(cm

)

v0v5

(g) Modelling result v0 - v5

0 1 2 3 4 5 6 7Time (s)

30

0

30

60

Late

ral D

evia

tion

(cm

)

v0v5

(h) Simulation result v0 - v5

Figure 9.2: Modelled degradation vs actual response on curved track

36

CHAPTER 9. QUALITY-OF-CONTROL DEGRADATION DUE TO APPROXIMATION


Initially, the image quality degradation test is run on a dataset of 200 images of the baseline v0. Thebaseline image dataset is first processed by the different pipelines v1-v6. The PSNR is calculatedand the degradation is quantified as white noise, using the variance of each pipeline towards thebaseline. Figure 9.1(a) shows the degradation in terms of PSNR and 9.1(b) the variance of eachpipeline compared to the baseline v0. This provides an insight on the exact degradation based onthe same image dataset for all pipelines, as no separate simulation is run. As can be seen, pipelinesv2 and v4 show the biggest variance as well as very low PSNR. This indicates that the errors fromthose approximation have the most negative impact on the application. On the contrary, althoughv5 is as bad as v2 and v4 in terms of PSNR, it is handled well by the rest of the application andshow less variance. In that case, the application is error-resilient to those approximations.

The simulation is run for all the available pipelines v0 - v7 in order to get the actual quality-of-control that each pipeline achieves. The difference with modelling the degradation as white noiseis that in this case every measurement of lateral deviation will differ per pipeline and, based onthat measurement, subsequent measurements will be affected. The MSE results of the simulationcan be seen in Figure 9.3. In both cases, the simulation results follow the modelled ones. In thestraight track case, the worst quality-of-control degradation is observed for pipelines v2 and v4with a degradation of 10% and 9% respectively, compared to the result of the baseline v0. Theother versions perform relatively better, ranging from 1% to 4% worse MSE. For the curved track,versions v2 and v4 show the worst MSE of all versions, performing 80% and 40% worse than thebaseline v0.

A safer conclusion at this point would be that the application is able to handle the degradationin quality-of-control for all pipeline versions and keep it within acceptable levels. The applicationavoids crashing the vehicle in all cases and allows subsequent corrections based on the runtimeimprovements. The lane detection stage is involved in making the application error-resilient. Thisallows to minimize the impact of the error introduced at the ISP stage. As can be seen in Figure9.2, both the actual simulation result and the modelled one show response times that follow closelythe baseline case. It shows four indicative examples, with pipelines v1 and v4 having a higherresemblance and pipelines v2 and v4 showing inferior performance.

Besides the MSE, settling time is also used as a metric to define the quality-of-control. Thesettling time can be observed for the use case of the straight track. It is the time required toreach zero deviation from the middle of the lane when starting from an innitial deviation ofapproximately 18 cm. However, as can be seen in Figure 9.4(a), the settling time is not influencedby the quality degradation of the image when the timing improvement is not taken into account.For all different pipelines, using the same baseline sampling period of 60 ms, the settling time is1.6 second.

v0 v1 v2 v3 v4 v5 v6 v70.80

0.85

0.90

0.95

1.00

1.05

1.10

1.15

1.20

Nor

mal

ized

MSE

(a) Straight track

v0 v1 v2 v3 v4 v5 v6 v70.00

0.25

0.50

0.75

1.00

1.25

1.50

1.75

Nor

mal

ized

MSE

(b) Curved track

Figure 9.3: Normalized MSE results on straight vs curved track

37

0 1 2 3 4 5 6 7Time (s)

180

90

300

3060

Late

ral D

evia

tion

(cm

)

v0v1v2v3

v4v5v6v7

Figure 9.4: Response of various pipelines for the straight track

Chapter 10

Quality-of-Control Improvementdue to Controller Design

10.1 Motivation

In the previous two chapters, the application was profiled and the image degradation due toapproximation was quantified. On one hand, approximating the ISP yielded runtime improvementof up to a factor of 3.5 compared to the baseline v0. On the other hand, the quality-of-controldegraded up to 80%, compared to the baseline v0. The degradation in quality-of-control wasmeasured without taking into account the runtime improvements. That allowed to verify thebenefits and drawbacks of using approximate computing, which comes with a trade-off betweenperformance and accuracy. The runtime improvement due to approximation is taken into accountin order to analyze the outcome of that performance and accuracy trade-off. The sampling periodof the controller is adjusted separately for each approximated ISP pipeline version. Again, twouse cases are being used, namely a straight and a curved track, with mean squared error (MSE)and settling time as quality metrics.


The simulation was run for all the different ISP pipelines. Table 10.1 shows the controller designparameters. The delay and sampling period are calculated as described in Section 8. The samplingperiod is round up to the nearest 5th number due to parameters needed to achieve 60 fps for thecamera of VREP. The result of the simulation on the straight track are showed in Figure 10.1.Observing the response of the approximated pipelines in comparison to the baseline v0, enablesto compare the settling time of each version. Out of the different versions, v7 is the one withthe fastest settling time with a noticeable difference from v0. As can also be seen in Table10.2, it settles within 0.9 ms compared to 1.6 ms of v0, achieving a 32% improvement. Thisresult is attributed on one hand to the faster sampling period of v7, which directly improves thesettling time. On the other hand, v7 was the version with the least image degradation from allapproximated versions, together with v1. This allows its response to avoid the spikes seen in theresponses of other pipelines, such as in v2 and v4. Those two were the versions with the biggestvariance compared to the baseline and, although somewhat faster than v0, show a rather noisyresponse. Nevertheless, they still manage to achieve a 29% and 39% improvement in settlingtime, compared to v0. From the remaining versions, v5 achieved an improvement of 40% and v3improved by 9% in settling time compared to v0.

As mentioned already, v2 and v4 are rather noisy due to the image degradation that theyincur. Although this doesn’t affect the settling time, it does affect the MSE. From Figure 10.1, itcan be seen that v4 is visually worse than v2. This is partially correct, but is attributed to the

39

CHAPTER 10. QUALITY-OF-CONTROL IMPROVEMENT DUE TO CONTROLLERDESIGN

Table 10.1: Controller design parameters

v0 v1 v2 v3 v4 v5 v6 v7

Delay (ms) 68.3 24.3 57.6 69.6 20.1 55.2 23 19.5Sampling Period (ms) 70 25 60 70 25 60 25 20

0 1 2 3 4 5 6 7Time (s)

180

90

300

3060

Late

ral D

evia

tion

(cm

)

v0v1

(a) Straight track result for v0 and v1

0 1 2 3 4 5 6 7Time (s)

180

90

300

3060

Late

ral D

evia

tion

(cm

)

v0v2

(b) Straight track result for v0 and v2

0 1 2 3 4 5 6 7Time (s)

180

90

300

3060

Late

ral D

evia

tion

(cm

)

v0v3

(c) Straight track result for v0 and v3

0 1 2 3 4 5 6 7Time (s)

180

90

300

3060

Late

ral D

evia

tion

(cm

)

v0v4

(d) Straight track result for v0 and v4

0 1 2 3 4 5 6 7Time (s)

180

90

300

3060

Late

ral D

evia

tion

(cm

)

v0v5

(e) Straight track result for v0 and v5

0 1 2 3 4 5 6 7Time (s)

180

90

300

3060

Late

ral D

evia

tion

(cm

)

v0v6

(f) Straight track result for v0 and v6

Figure 10.1: Straight track simulation results for all pipelines in comparison to v0

40


increased amount of samples that v4 processes within the duration of one simulation. Pipeline v2processes in total 116 samples, as opposed to the 280 samples of v4. This is the reason why morespikes are observed in v4’s response, as it has more opportunities to approximate.

0 1 2 3 4 5 6 7Time (s)

30

0

30

60

Late

ral D

evia

tion

(cm

)

v0v1

(a) Curved track result for v0 and v1

0 1 2 3 4 5 6 7Time (s)

30

0

30

60

Late

ral D

evia

tion

(cm

)

v0v2

(b) Curved track result for v0 and v2

0 1 2 3 4 5 6 7Time (s)

30

0

30

60

Late

ral D

evia

tion

(cm

)

v0v3

(c) Curved track result for v0 and v3

0 1 2 3 4 5 6 7Time (s)

30

0

30

60

Late

ral D

evia

tion

(cm

)

v0v4

(d) Curved track result for v0 and v4

0 1 2 3 4 5 6 7Time (s)

30

0

30

60

Late

ral D

evia

tion

(cm

)

v0v5

(e) Curved track result for v0 and v5

0 1 2 3 4 5 6 7Time (s)

30

0

30

60

Late

ral D

evia

tion

(cm

)

v0v6

(f) Curved track result for v0 and v6

Figure 10.2: Curved track simulation results for all pipelines in comparison to v0

The bad MSE performance of v2 and v4 becomes more apparent in the simulation on thecurved track. As can be seen in Figure 10.2, they are the two approximated versions with the worstnoticeable performance. In the curved track, the vehicle is entering into a curve and, therefore,the reference is a step from zero to approximately 58 cm. The MSE performance, however, isfocused on the latter part. The overall result can be visualized in Figure 10.3. The best performingversions in the curved track simulation are v1 and v6, with 20% and 40% improvement respectively,

41


v0 v1 v2 v3 v4 v5 v6 v70.0

0.2

0.4

0.6

0.8

1.0

Nor

mal

ized

MSE

(a) Straight track

v0 v1 v2 v3 v4 v5 v6 v70.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

Nor

mal

ized

MSE

(b) Curved track

Figure 10.3: Normalized MSE results

compared to v0. Pipelines v2 and v4 are performing 40% and 20% worse than v0 and are theworst pipelines overall regarding MSE. This is attributed to the increased image degradation theyincur compared to the other approximate pipelines. For the straight track, v4, v6, and v7 achievea 60% improvement, compared to v0.

10.3 Comparative Analysis

For the overall comparison, which can be seen in Table 10.2, the performance metrics taken intoaccount are the sensor-to-actuator delay, the energy, the memory footprint, the mean MSE fromthe two track scenarios, and the settling time for the different pipeline versions. The memoryfootprint and the energy add two different dimensions to the problem. Although energy is highlycorrelated to the actual execution time, which in this case is chose to be the sensor-to-actuatordelay, the memory footprint gives an insight on how much information is contained on the image-based on the level of approximation. Table 10.2 shows the size of images processed by the differentpipelines. The result is measured on a dataset of 200 sample images for each case. Pipelines v2,v4, v5 and v7 improve the memory requirements by 20%, 20%, 30% and 30% respectively. On thecontrary, v1 shows identical result to v0, which can also be explained by the PSNR comparisonthat was discussed in Chapter 9. The higher the reconstruction accuracy, the more memory neededto store the images with worst-case being the baseline v0.

Table 10.2: Overall performance results

ISP Delay (ms) Energy (J) Memory (KB) Mean MSE Settling Time (ms)

v0 68.3 4.4 317.5 1 1.6v1 24.3 1.6 317.8 0.65 1v2 57.6 3.7 243.7 1.1 1.2v3 69.6 4.5 302.4 0.95 1.5v4 20.1 1.3 247.6 0.8 1v5 55.2 3.6 222.8 0.95 1v6 23 1.5 278.8 0.5 1v7 19.5 1.2 222.2 0.75 0.9

The energy results of Table 10.2 indicate how energy efficient each pipeline is when takinginto account only its actual execution time, which in our case is the sensor-to-actuator delay.According to the product specification of the Intel i7-6700 processor that is being used as theunderlying hardware, the thermal design power (TDP) of this processor is 65 W, under a baseclock frequency of 3.5 GHz [31], which is the worst case. Since the energy result is proportionalto the execution time, the outcome follows the result of the sensor-to-actuator delay. The fastest

42


performing ISP approximations, v4 and v6, yield a 70% improvement, while v1 performs 60%better than v0.

The overall performance of all pipelines when all performance metrics are taken into account canbe seen in Figure 10.5. Each one of the metrics is normalized with respect to the baseline v0, aimingto show the relative improvement or degradation for each metric. The normalized metrics are thenaccumulated per pipeline in order to visualize the overall impact of each approximation. With fivemetrics used, the baseline v0 is awarded 5 units. As can be seen, all other approximated pipelinesachieve an overall improvement against the baseline. The better performing approximations arethe ones that show the biggest improvement in sensor-to-actuator delay and are the pipelines v1,v4, v6, and v7. Their overall performance improvement is 40%, 45%, 50% and 50% respectively.These are also the versions with the fastest settling times. As for the rest, pipelines v2, v3,and v5 perform 16%, 3% and 33% better than the baseline v0. The lower improvement thatthose approximations achieve is attributed to the higher sensor-to-actuator delay and thereforethe slower settling times than the other approximations.

v0 v1 v2 v3 v4 v5 v6 v70

1

2

3

4

5

Nor

mal

ized

Uni

ts

DelayEnergyMemory

MSESettling Time

Figure 10.4: Cumulative comparison of different normalized performance metrics

43

v0 v1 v2 v3 v4 v5 v6 v70.0

0.2

0.4

0.6

0.8

1.0

Nor

mal

ized

DelayEnergy

MemoryMSE

Settling Time

Figure 10.5: Side-by-side comparison of different normalized performance metrics

Chapter 11

Conclusion

11.1 Summary

This work has shown the impact of algorithmic approximation to the quality-of-control for image-based control systems. For the analysis, we used the use-case of a lateral controller that performslane keeping for an autonomous vehicle. The application that performs the image signal pro-cessing, lane detection and control computations, was developed in the high performance C++programming language, with the ISP programmed in the domain specific language Halide. Thesimulations were run on the VREP robotic simulator, using a vehicle and two separate tracks,namely a straight and a curved track.

The application was made error-resilient by adapting the lane detection algorithm to be ableto process the degree of approximation that the different approximate ISP versions required. Weran this application on an Intel i7 processor, utilizing the available parallelism of the applicationthrough Halide on the eight available threads distributed in four cores. We, initially, benchmarkedthe application by conducting careful and detailed profiling on a total of 8 different pipelineversions, each of which on a dataset of 200 images that were obtained in VREP. Then, we evaluatedthe degradation in quality-of-control that is caused by approximation, without taking into accountthe impact of approximation on runtime performance. Finally, the runtime performance gains dueto approximation was taken into account and the improvement in quality-of-control was quantified.We used two separate metrics, namely the settling time and sum of squared errors, to evaluate theperformance of the controller. Additionally, we used the energy, memory footprint and sensor-to-actuator delay to measure the improvement in the application performance.

Using this approach, we shed light on how real-time IBC applications can benefit from approx-imate computing. The approximated ISP pipelines achieved a maximum speedup of factor 3.5 forthe sensor-to-actuator delay and a 40% improvement in settling time. The overall performancethat was achieved by the multi-dimensional application was 50% better than the baseline. Weshowed the improvement of the application in metrics such as energy and memory footprint, whichsuggests that approximate computing is an efficient approach, when there are more constraintsfor the application, such as in an embedded environment.

11.2 Future Work and Future Directions

Our future plans include the further optimization of the quality-of-control by improving the controlstrategy. The current work has quantified the approximation error that each approximated ISPpipeline incurs and modelled it as white noise. This quantified error can be used in the future todevelop a linear quadratic gaussian control strategy that will take into account the measurementnoise due to approximation.

In addition, an interesting outlook of this work would be to implement the application in anembedded environment using a real-time operating system in combination with an ARM processor

45

CHAPTER 11. CONCLUSION

for lane detection, and a DSP for the ISP processing. That will allow to evaluate the effectivenessof our approach when there are hard real-time requirements for the application.

Finally, in this work we focused on improving the quality-of-control for a camera-based applic-ation. It would be interesting to combine more sensors in a future iteration of this work, usingsensor fusion.

46

Bibliography

[1] M. Galarnyk, “Understanding boxplots,” September 2018. [Online]. Available: https://towardsdatascience.com/understanding-boxplots-5e2df7bcbd51 vii, 28

[2] “Critical reasons for crashes investigated in the national motor vehicle crash causation survey,”in National Center for Statistics and Analysis, U.S. Department of Transportation, February2015. [Online]. Available: https://crashstats.nhtsa.dot.gov/Api/Public/ViewPublication/812115 1

[3] L. Fridman, D. E. Brown, M. Glazer, W. Angell, S. Dodd, B. Jenik, J. Terwilliger, J. Kindels-berger, L. Ding, S. Seaman, H. Abraham, A. Mehler, A. Sipperley, A. Pettinato, B. Seppelt,L. Angell, B. Mehler, and B. Reimer, “MIT autonomous vehicle technology study: Large-scaledeep learning based analysis of driver behavior and interaction with automation,” CoRR, vol.abs/1711.06976, 2017. 1

[4] K. Bimbraw, “Autonomous cars: Past, present and future a review of the developments inthe last century, the present scenario and the expected future of autonomous vehicle techno-logy,” in 12th International Conference on Informatics in Control, Automation and Robotics(ICINCO), vol. 01, July 2015, pp. 191–198. 1

[5] S. Liu, L. Li, J. Tang, S. Wu, and J.-L. Gaudiot, Creating Autonomous Vehicle Systems.Morgan & Claypool, pp. 1-7, 2017. 1

[6] R. Okuda, Y. Kajiwara, and K. Terashima, “A survey of technical trend of adas and autonom-ous driving,” in Proceedings of Technical Program - International Symposium on VLSI Tech-nology, Systems and Application (VLSI-TSA), April 2014, pp. 1–4. 1

[7] C. Turner, “Next-generation automotive image processing with arm mali-c71,” in ARM TechForum Korea. ARM Ltd., 2017. 1

[8] M. Buckler, S. Jayasuriya, and A. Sampson, “Reconfiguring the imaging pipeline for computervision,” in The IEEE International Conference on Computer Vision (ICCV), 2017. 1, 6, 13,14, 15, 23

[9] Q. Xu, T. Mytkowicz, and N. S. Kim, “Approximate computing: A survey,” IEEE DesignTest, vol. 33, no. 1, pp. 8–22, Feb 2016. 1

[10] E. van Horssen, “Data-intensive feedback control : switched systems analysis and design,”Ph.D. dissertation, Department of Mechanical Engineering, 2 2018, proefschrift. 5

[11] S. Mohamed, D. Zhu, D. Goswami, and T. Basten, “Optimising quality-of-control for data-intensive multiprocessor image-based control systems considering workload variations,” inDSD, 2018. 5, 11, 12

[12] Y. Yenaydin and K. W. Schmidt, “A lane detection algorithm based on reliable lane mark-ings,” in 26th Signal Processing and Communications Applications Conference (SIU), May2018, pp. 1–4. 5, 6, 17

47

https://towardsdatascience.com/understanding-boxplots-5e2df7bcbd51

https://towardsdatascience.com/understanding-boxplots-5e2df7bcbd51

https://crashstats.nhtsa.dot.gov/Api/Public/ViewPublication/812115

https://crashstats.nhtsa.dot.gov/Api/Public/ViewPublication/812115

BIBLIOGRAPHY

[13] J. Baili, M. Marzougui, A. Sboui, S. Lahouar, M. Hergli, J. S. C. Bose, and K. Besbes, “Lanedeparture detection using image processing techniques,” in 2nd International Conference onAnti-Cyber Crimes (ICACC), March 2017, pp. 238–241. 6

[14] C. Chen, J. Choi, K. Gopalakrishnan, V. Srinivasan, and S. Venkataramani, “Exploitingapproximate computing for deep learning acceleration,” in Design, Automation Test in EuropeConference Exhibition (DATE), March 2018, pp. 821–826. 6

[15] A. Mercat, J. Bonnot, M. Pelcat, W. Hamidouche, and D. Menard, “Exploiting computationskip to reduce energy consumption by approximate computing, an hevc encoder case study,”in Design, Automation Test in Europe Conference Exhibition (DATE), March 2017, pp. 494–499. 6

[16] S. Hashemi, H. Tann, F. Buttafuoco, and S. Reda, “Approximate computing for biometricsecurity systems: A case study on iris scanning,” in Design, Automation Test in EuropeConference Exhibition (DATE), March 2018, pp. 319–324. 6

[17] A. Y. Bhave and B. H. Krogh, “Performance bounds on state-feedback controllers with net-work delay,” in 47th IEEE Conference on Decision and Control, Dec 2008, pp. 4608–4613.12

[18] J. Kosecka, R. Blasi, C. J. Taylor, and J. Malik, “A comparative study of vision-based lateralcontrol strategies for autonomous highway driving,” in Proceedings of 1998 IEEE Interna-tional Conference on Robotics and Automation, vol. 3, May 1998, pp. 1903–1908 vol.3. 12

[19] L. Anghel, M. Benabdenbi, A. Bosio, and E. I. Vatajelu, “Test and reliability in approximatecomputing,” in International Mixed Signals Testing Workshop (IMSTW), July 2017, pp. 1–6.13

[20] S. Mittal, “A survey of techniques for approximate computing,” ACM Comput. Surv., vol. 48,no. 4, pp. 62:1–62:33, Mar. 2016. 13

[21] A. Agrawal, J. Choi, K. Gopalakrishnan, S. Gupta, R. Nair, J. Oh, D. A. Prener, S. Shukla,V. Srinivasan, and Z. Sura, “Approximate computing: Challenges and opportunities,” inIEEE International Conference on Rebooting Computing (ICRC), Oct 2016, pp. 1–8. 13

[22] “Android’s camera hardware abstraction layer,” March 2019. [Online]. Available:https://source.android.com/devices/camera/camera3 requests hal 13

[23] K. Grabowski and A. Napieralski, “Hardware architecture for advanced image processing,”in IEEE Nuclear Science Symposuim Medical Imaging Conference, Oct 2010, pp. 3626–3633.15

[24] J. Ragan-Kelley, A. Adams, D. Sharlet, C. Barnes, S. Paris, M. Levoy, S. Amarasinghe,and F. Durand, “Halide: Decoupling algorithms from schedules for high-performance imageprocessing,” Commun. ACM, vol. 61, no. 1, pp. 106–115, Dec. 2017. 15

[25] S. P. Narote, P. N. Bhujbal, A. S. Narote, and D. M. Dhane, “A review of recent advancesin lane detection and departure warning system,” Pattern Recognition, vol. 73, pp. 216–234,2018. 17

[26] A. M. Kumar and P. Simon, “Review of lane detection and tracking algorithms in advanceddriver assistance system,” International Journal of Computer Science & Information Tech-nology (IJCSIT), vol. 7, pp. 65–78, 2015. 17

[27] C. Saravanan, “Color image to grayscale image conversion,” in Second International Confer-ence on Computer Engineering and Applications, vol. 2, March 2010, pp. 196–199. 20

48

https://source.android.com/devices/camera/camera3_requests_hal

BIBLIOGRAPHY

[28] E. Rohmer, S. P. N. Singh, and M. Freese, “V-REP: A versatile and scalable robot simulationframework,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, Nov2013, pp. 1321–1326. 23

[29] J. W. Tukey, “Exploratory data analysis,” 1977, p. Section 2C. 28

[30] V. Karnati, M. Uliyar, and S. Dey, “Fast non-local algorithm for image denoising,” in Proceed-ings of the 16th IEEE International Conference on Image Processing, ser. (ICIP’09), 2009,pp. 3829–3832. 29

[31] Intel, “Product specification - intel core i7-6700 processor.” [On-line]. Available: https://ark.intel.com/content/www/us/en/ark/products/88196/intel-core-i7-6700-processor-8m-cache-up-to-4-00-ghz.html 42

49

https://ark.intel.com/content/www/us/en/ark/products/88196/intel-core-i7-6700-processor-8m-cache-up-to-4-00-ghz.html

https://ark.intel.com/content/www/us/en/ark/products/88196/intel-core-i7-6700-processor-8m-cache-up-to-4-00-ghz.html

Eindhoven University of Technology MASTER Impact of ... › files › 130174946 › K... · Master...

Documents

Transcript of Eindhoven University of Technology MASTER Impact of ... › files › 130174946 › K... · Master...