Vision-Based Estimation for Guidance, Navigation, and Control of...

Vision-Based Estimation forGuidance, Navigation, andControl of an Aerial Vehicle

M. K. KAISERAir Force Research Laboratory

N. R. GANS, Member, IEEEUniversity of Texas, Dallas

W. E. DIXON, Senior Member, IEEEUniversity of Florida

While a Global Positioning System (GPS) is the most widelyused sensor modality for aircraft navigation, researchershave been motivated to investigate other navigational sensormodalities because of the desire to operate in GPS deniedenvironments. Due to advances in computer vision and controltheory, monocular camera systems have received growing interestas an alternative/collaborative sensor to GPS systems. Camerascan act as navigational sensors by detecting and tracking featurepoints in an image. Current methods have a limited ability torelate feature points as they enter and leave the camera field ofview (FOV).A vision-based position and orientation estimation method

for aircraft navigation and control is described. This estimationmethod accounts for a limited camera FOV by releasing trackedfeatures that are about to leave the FOV and tracking newfeatures. At each time instant that new features are selected fortracking, the previous pose estimate is updated. The vision-basedestimation scheme can provide input directly to the vehicleguidance system and autopilot. Simulations are performedwherein the vision-based pose estimation is integrated with anonlinear flight model of an aircraft. Experimental verificationof the pose estimation is performed using the modelled aircraft.

Manuscript received August 8, 2008; revised February 12, 2009;released for publication March 2, 2009.

IEEE Log No. T-AES/46/3/937960.

Refereeing of this contribution was handled by M. Braasch.

This research is supported in part by the NSF CAREER awardCMS-0547448, AFOSR Contracts F49620-03-1-0381 andF49620-03-1-0170, AFRL Contract FA4819-05-D-0011, and byResearch Grant US-3715-05 from BARD, the United States—IsraelBinational Agricultural Research and Development Fund at theUniversity of Florida.

Authors’ addresses: M. K. Kaiser, Air Force Research Laboratory,Munitions Directorate, Eglin AFB, FL 32542-6810; N. R. Gansand W. E. Dixon, Department of Mechanical and AerospaceEngineering, University of Florida, Gale-Lembrand Dr., Gainesville,FL 32611, E-mail: ([email protected]).

0018-9251/10/$26.00 c° 2010 IEEE

I. INTRODUCTION

Global Positioning System (GPS) is the primarynavigational sensor modality used for vehicleguidance, navigation, and control. However, acomprehensive study, referred to as the Volpe report[1], indicates several vulnerabilities of GPS associatedwith signal disruptions. The Volpe Report delineatesthe sources of interference with the GPS signalinto two categories, unintentional and deliberatedisruptions. Some of the unintentional disruptionsinclude ionosphere interference (also known asionospheric scintillation) and RF interference(broadcast television, VHF, cell phones, and two-waypagers); whereas, some of the intentional disruptionsinvolve jamming, spoofing, and meaconing. Someof the ultimate recommendations of this reportwere to, “create awareness among members ofthe domestic and global transportation communityof the need for GPS backup systems: : :” and to“conduct a comprehensive analysis of GPS backupnavigation: : :” which included instrument landingsystems (ILS), long range navigation (LORAN), andinertial navigation systems (INS) [1].The Volpe report inspired a search for strategies

to mitigate the vulnerabilities of the current GPSnavigation protocol. Nearly all resulting strategiesfollowed the suggested GPS backup methods thatrevert to archaic/legacy methods. Unfortunately, thesenavigational modalities are limited by the range oftheir land-based transmitters, which are expensiveand may not be feasible for remote, hazardous, oradversarial environments. Based on these restrictions,researchers have investigated local methods ofestimating position when GPS is denied.Given the advancements in computer vision and

estimation and control theory, monocular camerasystems have received growing interest as a localalternative/collaborative sensor to GPS systems. Oneissue that has inhibited the use of a vision system asa navigational aid is the difficulty in reconstructinginertial measurements from the projected image.Current approaches to estimating the aircraft statethrough a camera system utilize the motion offeature points in an image. A geometric approachis proposed (and our preliminary results [2, 3])that uses a series of homography relationships toestimate position and orientation with respect toan inertial pose. This approach creates a series of“daisy-chained” pose estimates (see [4] and [5]), inwhich the current feature points can be related topreviously viewed feature points to determine thecurrent coordinates between each successive image.Through these relationships previously recordedGPS data can be linked with the image data toprovide position measurements in navigational regionswhere GPS is denied. The method also delivers anaccurate estimation of vehicle attitude, which is an

1064 IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS VOL. 46, NO. 3 JULY 2010

open problem in aerial vehicle control. The positionand attitude (i.e., pose) estimation method can beexecuted in real time, making it amenable for use inclosed-loop guidance control of an aircraft.The concept of vision-based control for a flight

vehicle has been an active area of research over thelast decade. Recent literature focused on vision-basedstate estimation for use in control of a flight vehiclecan be categorized by several distinctions. Onedistinction is that some methods require simultaneoussensor fusion [6, 7], while other methods relysolely on camera feedback [8]. Research can furtherbe categorized into methods that require a prioriknowledge of landmarks (such as pattern or shape[9, 10], light intensity variations [11], and runwayedges or lights [12, 13]) versus techniques that do notrequire any prior knowledge of landmarks [14—16].Another category of research includes methods

that require the image features to remain in the fieldof view (FOV) [14] versus methods that are capableof acquiring new features [17]. Finally, methodscan be categorized according to the vision-basedtechnique for information extraction, such as: opticflow [18], simultaneous localization and mapping(SLAM) [19, 20], stereo vision [21], or epipolargeometry [14, 22—24]. This last category mightalso be delineated between methods that are morecomputationally intensive and therefore indicativeof the level of real-time, on-board, computationalfeasibility.Methods using homography relationships between

images to estimate the pose of an aircraft arepresented by Caballero et al. [24] and Shakernia et al.[14] (where it is referred to as the “planar essentialmatrix”). The method presented by Caballero et al.is limited to flying above a planar environment, andthis method creates an image mosaic, which can becostly in terms of memory. Shakernia’s approach doesnot account for feature points entering and exiting thecamera FOV. The method developed here is designedfor use with a fixed wing aircraft, thus the methodexplicitly acquires new feature points when the currentfeatures risk leaving the image, and no target modelis needed, as compared to other methods [9—13]. Therequirement of flying over a constant planar surfaceis also relaxed to allow flight over piecewise planarpatches, which is more characteristic of real-worldscenarios.The efforts used here share some concepts

associated with visual SLAM (VSLAM). There isno strict definition of VSLAM, and there are manydifferent approaches. Some authors (e.g., [25—27])make a distinction between “local VSLAM” and“global VSLAM.” In this categorization local VSLAMis concerned with estimating the current state ofthe robot and world map through matching visualfeatures from frame to frame, and global VSLAM is

concerned with recognizing when features havebeen previously encountered and with updatingestimates of the robot and map (sometimes referredto as “closing loops”). To address both issuesmany researchers use invariant features, such asscale-invarient feature transform (SIFT) [28], whichcan be accurately matched from frame to frame orfrom multiple camera viewpoints. Many VSLAMapproaches use probabilistic filters (e.g., an extendedKalman filter or a particle filter) [26, 27, 29—31],that typically estimate a state vector composed of thecamera/robot position, orientation and velocity, andthe 3D coordinates of visual features in the worldframe. The use of epipolar geometry [25, 32] is anoption to filter-based approaches. A final possiblecategory is methods that build a true 3D map (i.e.,a map that is easily interpreted by a human being,such as walls or topography) [25—27, 30, 31] andthose that build a more abstract map that is designedto allow the camera/robot to accurately navigate andrecognize its location but that is not designed forhuman interpretation.The method presented can provide an estimate

of the position and attitude of unmanned air vehicle(UAV), and the method can be extended to map thelocation of static landmarks in the world frame. Hencethis approach can be used in VSLAM of the UAV,with applications toward path planning, real timetrajectory generation, obstacle avoidance, multi-vehiclecoordination control and task assignment, etc. Byusing the daisy-chaining strategy, the coordinates ofstatic features that have moved out of the FOV canalso be estimated. The estimates of static featurescan be maintained as a map, or they can be used asmeasurements in existing VSLAM methods. Thedaisy-chaining method in this paper is best suitedto a local VSLAM approach, where the globalproblem of closing loops can be addressed thoughexisting methods. Unlike typical VSLAM approachescamera/vehicle odometry is not required as an inputfor the approach in this paper.To investigate the performance of the developed

method, a numerical simulation is provided for anonlinear, six degrees-of-freedom model of an OspreyUAV. The simulation illustrates the ability of theestimation method to reconstruct the UAV pose inthe presence of disturbances, such as errors in theinitial altitude measure, image quantization, andnoise. To illustrate the potential use of the estimatesin a feedback loop, an autopilot was also includedin the simulation, with inputs complimentary withthe outputs from the estimation method. A specificmaneuver is created to perform a simultaneous rolling,pitching, and yawing motion of the aircraft, combinedwith a fixed-mounted camera. The aircraft/autopilotmodeling effort and maneuver are intended to testthe robustness of the homography-based estimationmethod as well as to provide proof-of-concept in

KAISER ET AL.: VISION-BASED ESTIMATION FOR GUIDANCE, NAVIGATION, AND CONTROL 1065

using the camera as the primary sensor for achievingclosed-loop autonomous flight.Based on the outcomes from the simulation, the

performance of the developed estimation method isalso experimentally tested through two flight testswith an Osprey UAV. One experiment comparesthe estimated inertial coordinates of the UAV tothe GPS position reported by two on-board GPSunits. The GPS units provide different outputs,but the homography-based estimation is shown toprovide approximately equivalent performance. Asecond experiment is also performed, where precisionlaser-measured ground markers were viewed by theflight camera. In this experiment, the images of theknown maker locations are used to generate a preciseground truth. The estimation method (which did notuse any information about marker locations) matchesthe reconstructed ground truth data.Section II provides details on pose reconstruction

using the homography-based daisy-chainingmethod. Simulations are presented in Section IIIto demonstrate the performance of the method,and Section IV includes the experimental results.Section V describes potential future efforts to improvethe developed method for broader application.

II. POSE RECONSTRUCTION FROM TWO VIEWS

A. Euclidean Relationships

Consider a body-fixed coordinate frame Fc thatdefines the position and attitude of a camera, withrespect to a constant world frame Fw. The worldframe could represent a departure point, destination,or some other point of interest. The rotation andtranslation of Fc, with respect to Fw, is defined asR(t) 2 R3£3 and x(t) 2R3, respectively. The camerarotation and translation of Fc between two timeinstances, t0 and t1, is denoted by R01(t1) and x01(t1).During the camera motion a collection of I (whereI ¸ 4) coplanar and noncolinear static featurepoints are assumed to be visible in a plane ¼. Theassumption of four coplanar and noncolinear featurepoints is only required to simplify the subsequentanalysis and is made without loss of generality. Imageprocessing techniques can be used to select coplanarand noncolinear feature points within an imagesuch as in [33] and [34]. However, if four coplanartarget points are not available, then the subsequentdevelopment can also exploit a variety of linearsolutions for eight or more noncoplanar points (e.g.,the classic eight points algorithm [35, 36] or virtualparallax [37, 38]) or nonlinear solutions for five ormore points [39].A feature point pi(t) has coordinates m̄i(t) =

[xi(t),yi(t),zi(t)]T 2 R3 8i 2 f1 : : : Ig in Fc. Standard

geometric relationships can be applied to thecoordinate systems depicted in Fig. 1 to develop the

Fig. 1. Euclidean relationships between two camera poses.

following relationships:

m̄i(t1) = R01(t1)m̄i(t0)+ x01(t1)

m̄i(t1) =μR01(t1)+

x01(t1)d(t0)

n(t0)T

¶| {z }

H(t1)

m̄i(t0) (1)

where H(t1) is the Euclidean homography matrix attime t1, n(t0) is the constant unit vector normal to theplane ¼ from Fc at time t0, and d(t0) is the constantdistance between the plane ¼ and Fc along n(t0). Afternormalizing the Euclidean coordinates as

mi(t) =m̄i(t)zi(t)

(2)

the relationship in (1) can be rewritten as

mi(t1) =zi(t0)zi(t1)| {z }®i

H(t1)mi(t0) (3)

where ®i 2R 8i 2 f1 : : : Ig is a scaling factor. Furtherdetails on the Euclidean homography can be found in[40] and [41].

B. Projective Relationships

Using standard projective geometry the Euclideancoordinate m̄i(t) can be expressed in image-space pixelcoordinates as pi(t) = [ui(t),vi(t),1]

T. The projectedpixel coordinates are related to the normalizedEuclidean coordinates, mi(t) by the pin-hole cameramodel as [40]

pi = Ami (4)

where A is an invertible, upper triangular cameracalibration matrix defined as

A¢=

2664a ¡acosÁ u0

0b

sinÁv0

0 0 1

3775 : (5)

In (5), u0 and v0 2 R denote the pixel coordinates ofthe principal point (the image center as defined bythe intersection of the optical axis with the image


plane), a,b 2 R represent scaling factors of the pixeldimensions, and Á 2R is the skew angle betweencamera axes.By using (4), the Euclidean relationship in (3) can

be expressed as

pi(t1) = ®iAH(t1)A¡1pi(t0)

= ®iG(t1)pi(t0): (6)

Sets of linear equations can be developed from (6) todetermine the projective and Euclidean homographymatrices G(t1) and H(t1) up to a scalar multiple.Various techniques [41, 42] can be used to decomposethe Euclidean homography to obtain ®i(t1), n(t0),x01(t1)=d(t0), and R01(t1). The decomposition methodsin [41] and [42] generally return two physicallyvalid solutions. The correct solution can be chosenby knowledge of the correct value for the normalvector or by using an image taken at a third pose.The distance d(t0) must be separately measured(e.g., through an altimeter or radar range finder) orestimated using a priori knowledge of the relativefeature point locations or as an estimator signal in afeedback control.

C. Chained Pose Reconstruction for Aerial Vehicles

Consider an aerial vehicle equipped with a GPSand a camera capable of viewing a landscape. Atechnique is developed in this section to estimatethe position and attitude using camera data when theGPS signal is denied. A camera has a limited FOV,and motion of a vehicle can cause observed featurepoints to leave the image. The method presented herechains together pose estimations from sequential setsof tracked of points. This approach allows the systemto halt tracking a set of image features if the set islikely to leave the image and allows the system tobegin tracking a new set of features while maintainingthe pose estimate. Thus the estimation can continueindefinitely and is not limited by the camera’s FOV.The subsequent development assumes that the

aerial vehicle begins operating at time t0, wherethe translation and rotation (i.e., x0(t0) and R0(t0)in Fig. 2) between Fc and Fw at time t0 are known.The rotation between Fc and Fw at time t0 can bedetermined through the bearing information of theGPS, along with other sensors such as a gyroscopeand/or compass. Without loss of generality the GPSunit is assumed to be fixed to the origin of the aerialvehicle’s coordinate frame, and the constant positionand attitude of the camera frame are known withrespect to the position and attitude of the aerialvehicle coordinate frame. The subsequent developmentfurther assumes that the GPS is capable of deliveringaltitude, perhaps in conjunction with an altimeter, sothat the altitude a(t0) is known.

Fig. 2. Illustration of pose estimation chaining.

Fig. 3. Depth estimation from altitude.

As illustrated in Fig. 2, when the camera isinitially at a pose fx0(t0),R0(t0)g, the initial set oftracked coplanar and noncolinear feature pointsare contained in the plane ¼a. The plane ¼a isperpendicular to the unit vector na(t0) in the cameraframe, and the plane lies at a distance da(t0) from thecamera frame origin. The feature points lying in ¼ahave Euclidean coordinates m̄ai(t0) 2 R3 8i 2 f1 : : : Igin Fc. An image is captured at this time, resulting inimage points with pixel coordinates pa(t0). At time t1the vehicle has undergone some rotation R01(t1) andtranslation x01(t1): At this time the points in ¼a haveEuclidean coordinates m̄ai(t1) 2R3 8i 2 f1 : : : Ig in Fc.Another image is captured, resulting in image pointswith pixel coordinates pa(t1).As described in Section IIB, R01(t1) and

x01(t1)=da(t0) can be solved from two correspondingimages of the feature points pa(t0) and pa(t1). Ameasurement or estimate for da(t0) is required torecover x01(t1). This estimation is possible withdistance sensors or with a priori knowledge ofthe geometric distances between the points in¼a. However, with an additional assumption, it ispossible to estimate da(t0) geometrically using altitudeinformation from the last GPS reading and/or analtimeter. From the illustration in Fig. 3, if a(t0) isthe height above ¼a (e.g. the slope of the ground isconstant between the feature points and projection ofthe plane’s location to the ground), then the distanceda(t0) can be determined by the projection:

da(t0) = na(t0)Ta(t0) (7)

where na(t0) is known from the homographydecomposition.Once R01(t1), da(t0), and x01(t1) have been

determined, the rotation R1(t1) and translation x1(t1)


can be determined with respect to Fw asR1 = R0R01

x1 = R01x01 + x0:

As illustrated in Fig. 2, at time t1, a new collectionof feature points mbi(t1) is visible in the planar patchdenoted by ¼b. Capturing an image of these pointsgives a set of image points with pixel coordinatespb(t1) that can be tracked over time. At time t2 the setsof points pb(t1) and pb(t2) can be used to determineR12(t2) and x12(t2)=db(t1), which provides the rotationand scaled translation of Fc with respect to Fw. If¼b and ¼a are the same plane, then db(t1) can bedetermined as

db(t1) = da(t1) = da(t0)+ x01(t1) ¢ n(t0): (8)

When ¼b and ¼a are the same plane, x12(t2) canbe correctly scaled, and R2(t2) and x2(t2) can becomputed in a similar manner as described for R1(t1)and x1(t1). These estimates can be propagated bychaining them together at each time instance withoutfurther use of the GPS.In the general case pb and pa may not be coplanar,

and (8) cannot be used to determine db(t1). If pb andpa are both visible for two or more frames, it is stillpossible to calculate db(t) through geometric means.Let t1¡ denote some time before the daisy-chainoperation is performed, when both pb and pa arevisible in the image. At time t1¡ an additional set ofhomography equations can be solved for the points pband pa at times t1 and t1¡

mai(t1) = ®a

μR̄1(t1)+

x̄1(t1)na(t1¡)T

da(t1¡)

¶mai(t1¡) (9)

mbi(t1) = ®b

μR̄1(t1)+

x̄1(t1)nb(t1¡)T

db(t1¡)

¶mbi(t1¡)

(10)

where R̄1(t1) and x̄1(t1) are the rotation and translation,respectively, that the camera undergoes from time t1¡to t1, and

®a =zai(t1¡)zai(t1)

and ®b =zai(t1¡)zai(t1)

:

Note that R̄1(t1) and x̄1(t1) have the same valuesin equations (9) and (10), but the distance and normalto the plane are different for the two sets of points.The distance da(t1¡) is known from (8), and the scaledtranslations

xa(t1) =x̄1(t1)da(t1¡)

and xb(t1) =x̄1(t1)db(t1¡)

can be recovered from the homography decompositionin (9) and (10). Given da(t1¡), xa(t1), and xb(t1), thetranslation x̄1(t1) can be determined as

x̄1(t1) = da(t1¡)xa(t1)

Fig. 4. Air and Sea Inc. Osprey UAV used in experiments andmodeled for simulations.

and db(t1¡) can then be determined as

db(t1¡) =xTb (t1)x̄1(t1)kxb(t1)k

:

The distance db(t1) can then be found by using(8), with db(t1¡) in place of da(t0). Alternatively,additional sensors, such as an altimeter, can providean additional estimate in the change in altitude.These estimates can be used in conjunction with(8) to update depth estimates. Note that an error inestimating the distance signals d does not affect theorientation estimate, but it does cause a scaling errorin the translation estimate. The effects of an error inestimate of d(0) is investigated in Section III.

III. SIMULATION RESULTS

To facilitate the subsequent flight tests, ahigh fidelity vehicle simulation was developed toinvestigate the feasibility of the proposed vision-basedstate estimation and guidance method. Prior to thesimulation the Osprey fixed-wing UAV by Air andSea Composites, Inc. (See Fig. 4) was selected as theexperimental testbed because of cost and payloadcapacity factors and because of the fact that thepusher prop configuration is amenable to forwardlooking camera placement. Given that the OspreyUAV was going to be used as the experimentaltestbed, a full, nonlinear model of the equations ofmotion and aerodynamics of the Osprey UAV weredeveloped (see [43] for details regarding the modeldevelopment) along with an autopilot design. Theautopilot design allowed for the vehicle to performsimple commanded maneuvers that an autonomousaircraft would typically be expected to receive froman on-board guidance system. The autopilot wasconstructed to use state estimates that came from thehomography-based algorithm. Preliminary modalanalysis of the Osprey vehicle flying at a 60 maltitude at 25 m/s indicated a short-period frequency


!sp = 10:1 rad/s and damping ³sp = 0:85; aphugoid mode frequency !ph = 0:34 rad/s anddamping ³ph = 0:24; a dutch-roll frequency !dr =4:39 rad/s and damping ³dr = 0:19; a roll subsidencetime constant of ¿r = 0:08 s; and a spiral modetime-to-double ttd = 39:8 s. These rigid-bodyfrequencies are crucial for the autopilot designas well for determining what, if any, of the stateestimation values coming from the camera andproposed technique are favorable to be used ina closed-loop sense, due to the imminent phaselag associated with associated camera frame rate(sampling frequency) and phase lag resulting fromthe signal filtering.In the simulations five patches of 4 feature points

are programmed to lie along a 500 m ground track.For simplicity all planar patches lie in the sameplane. The task is to perform the state estimationduring a maneuver. The commanded maneuveris to simultaneously perform a 10 m lateral shiftto the right and a 10 m longitudinal increasein altitude. This particular maneuver results inthe vehicle simultaneously pitching, rolling, andyawing, while translating. For simulation purposesthe camera is mounted underneath the fuselagelooking downwards. The camera model is intendedto be representative of a typical 640£ 480 linesof resolution charge-coupled device equippedwith a 10 mm lens. To more accurately capturetrue system performance, pixel coordinates arerounded to the nearest integer to model errorsdue to camera pixilation effects (i.e., quantizationnoise), and furthermore a 5% error is added tothe estimated vehicle altitude to test robustness.While some modeling errors are present in thisanalysis in order to address some practical issues,a full error analysis is intentionally avoided sincestatistical error analysis for camera-based systemsis still somewhat ambiguous. Some of the errorsources which are present in real-world systemsinclude, but are not limited to: feature pointselection/extraction/tracking algorithms, spatialseparation of feature points in image plane, lensaberrations, camera calibration error, coplanarityof 4 feature points, distance to feature point plane,exposure time (effectively “shutter speed”), rollingshutter effects (if not a global shutter camera),interlacing effects (if not a progressive scancamera), and numerical errors associated with theconstruction and decomposition of the homographyrelationships.The first simulation is designed to test the accuracy

of the vision-based estimation. Vision is not used inthe feedback for this maneuver, and the estimatedpose is compared to the true pose. The results of thispreliminary analysis are given in Figs. 5 and 6. Theeffects of noise are visible, but the estimated pose isaccurate.

The second simulation is intended to examinethe effects of using the vision-based estimate asa sensor in closed-loop control. This simulationinvolves replacing the perfect position and attitudemeasurements, used in the guidance system andautopilot, with position and attitude estimationsdetermined from the vision-based method. Theresulting control architecture and sensor suite forthis UAV is given in Fig. 7. The noise content ofthe estimated position and attitude requires filteringprior to being used by the autopilot to prevent thehigh-frequency noise from being passed to the aircraftactuators. As expected the noise occurs at 30 Hzand corresponds to the frame rate of the camera.First-order, low pass-filters (cutoff frequency as lowas 4 rad/s) are used to filter the noise. The noise alsoprevents effective differentiation of the position andattitude and necessitates the use of rate gyros for yawand roll damping, as depicted in Fig. 7. The air datasystem is also included, as shown in Fig. 7, for theinitial altitude measurement since it is more accuratefor altitude than current GPS solutions. The results ofthe camera-in-the-loop system performing the sameguidance-commanded autonomous maneuver are givenin Figs. 8 and 9.The simulation results indicate that a camera

supplemented with minimal sensors, such as rategyros and barometric altitude, can be used forcompletely autonomous flight of a fixed wing vehicle,however some residual oscillation effects due tonoise are present in the vehicle attitude response. Amajority of the noise source can directly be attributedto camera pixilation effects and the correspondingphase lag introduced by the first order filtering.To test the performance of the system over long

time periods, a Monte Carlo analysis is performed.This analysis includes a sequence of constant altitude“s-curves” for 1, 5, 10, 15, 30, 45, and 60 sec. Eachs-curve takes 1 min to perform, and the UAV’sforward velocity is constant at 10 m/s. Quantizationnoise is included. Each simulation is performed 1000times, and mean and standard deviation of the poseand rotation error are plotted verses time and totaldistance traveled and total rotation performed. Totaldistance refers to the length of the curve followed, notthe straight distance from start to finish. Similarly, asthe rotation evaluates to sinusoidal values to achievethe s-curve, the total rotation refers to the rms of theangular values over time.The results are seen in Figs. 10 and 11. The

expected error for both simulations is roughly linearwith the time/distance travelled. The expected poseerror after an hour of flight and a travelled distanceof 30 km is 0.66 km. The expected rotation error overthis same time period is 0.21 rad. This amounts to anerror of 2.2% of distance travelled and 0.39% of totalrotation performed.


Fig. 5. Actual aircraft position versus estimated position. State estimation only.

Fig. 6. Actual aircraft attitude versus estimated attitude. State estimation only.

Fig. 7. Autonomous control architecture.

IV. EXPERIMENTAL RESULTS

Based on the results of the simulation, two setsof experimental results are conducted to establishthe feasibility of the proposed vision-based stateestimation method. To give broad results eachexperiment features different equipment and differentmethods to generate a ground truth comparison. Inboth flight tests artificial, red markers (i.e., fiducialfeatures) are placed along a stretch of the runway.

The runway is fairly flat, thus each patch of pointsare roughly coplanar, although this was not assumedto be true in the experiment. Due to the orientationchanges of the aircraft during the flight, the normalvector n is estimated at each handoff. A cameramounted on the radio controlled UAV (Fig. 4) recordsthe view of the runway; video data is captured usinga digital, video tape recorder and analyzed offline.Sample video frames from the flights are providedin Figs. 14 and 16. The features are extracted andtracked using a color-based thresholding method.Fig. 12 represents a 3D surface constructed from anonlinear combination of the red, green, and bluecolor space values of a particular image frame. Notethat the large spikes correspond to the location ofthe red landmarks. An example of the output ofthe tracking algorithm is also given in Fig. 13. Thefour trajectories in this particular case representthe path of the landmarks from the first patch asit enters and exits the FOV (from top to bottom).As a point of interest, note that the center of theimage plane does not correspond to the locationof the optical axis. This apparent oddity is in


Fig. 8. Actual aircraft position versus estimated position in closed-loop control.

Fig. 9. Actual aircraft attitude versus estimated attitude in closed-loop control.

keeping with the camera calibration results, and itis most likely congruous with a low end imager andlens.

A. Daisy Chained Homography versus GPS

The first flight experiment compares positionestimates from the daisy-chained pose estimation toposition, which is estimated using a Garmin GPS35 receiver. A copy of the video is overlaid withGPS, as seen in Fig. 14, which shows the effects ofvideo interlacing. A second GPS unit (manufacturedby Eagle Tree Systems, LLC) is also onboard totest inter-GPS accuracy. The use of two GPS unitsprovides a comparison for the vision-based method,which is intended to compute GPS-like information.Poor image quality, including focus, motion blur,and interlacing of the digital video, necessitates adownsampling of the feature extraction, by a factorof five, to reduce noise. The downsampling results ina 5 Hz input signal.Results of the experiment are given in Fig. 15. In

the legend for Fig. 15, GPS2 represents the overlaidGPS data, and GPS1 represents the onboard data

Fig. 10. Error plots of Monte Carlo analysis, showing meanposition error and standard deviation of position error over longtime period and long distance. Position error refers to norm ofdifference between known 3D position vector and estimated 3D

position vector.

logger GPS values. A “*” on the time axis indicatesa time when a daisy-chaining hand-off is performed,and pose reconstruction is performed using a new setof feature points. The results from this test appear tobe very promising. Significant disagreement existsbetween the two GPS measurements (low cost GPS


Fig. 11. Error plots of Monte Carlo analysis, showing meanorientation error and standard deviation of orientation error overlong time period and long distance. Orientation error refers tonorm of difference between known 3D orientation and estimated3D orientation, where orientation is represented as an angle/axis

vector.

Fig. 12. Example of 3D contour plot generated from matrixdesigned as nonlinear combination of red and green color space

matrices.

receivers are notorious for having a poor altitudemeasurement [44]), and the vision-based estimationremains proportionate to the two GPS measurements.

Fig. 15. Results from first experimental flight test. Estimated position is compared with two GPS signals.

Fig. 13. Image plane trajectories made by landmarks frompatch1entering and exiting FOV.

Fig. 14. Single video frame with GPS overlay illustratinglandmarks placed along inside edge of runway.

Furthermore, the estimates agree closely with GPS2,for downrange and crossrange translation, and withGPS1, for altitude translation. There is no discerniblediscontinuity or increased error at the daisy-chainhand-off times. Note that the resolution of thevision-based estimation (5 Hz) is also higher thanthat of both GPS units (1 Hz). The image processingand pose estimation code can be executed in real time(> 30 Hz) on a Pentium M laptop (single core) with1 Gb RAM. The disagreement between the two GPS


units raises concerns about their accuracy. This, inpart, inspired the experiment in Section IVB.

B. Daisy Chained Homography versus PreciseGeometric Reconstruction

The accuracy of the recorded GPS data provedto be questionable, as the two GPS records hadsignificant disagreement. A second flight testwas designed that was intended to enable thevehicle position to be determined with greateraccuracy than what could be achieved with theinexpensive GPS units used in the first round oftesting. An additional drawback with the first flighttest was that, due to cost constraints, an inertialmeasurement unit (IMU) was not employed tocapture attitude information. In addition to theissues with the GPS output being considered astruth data, it was also of interest to investigatewhether using a wider angle FOV lens on amore forward pointing camera could improvethe results. The fundamental assumption of thisadditional testing was that if the locations of theground targets were known precisely, then vehicleposition and attitude could be determined throughgeometric reconstruction. With the trajectories ofthe four landmarks of each patch recorded, vehiclelocalization and orientation could then be performedoffline for analysis purposes. While this geometricreconstruction was also based on image data, theprecise knowledge of the fiducial position in 3Dworld coordinates allowed for a very accurateestimate. If the geolocation of the points wereknown (e.g. through previous physical mappingor template matching with an accurate satelliteimages), this would represent an accurate meansto geolocate the UAV. In essence this experimentcompared the best vision-based estimate, givenall available knowledge of the precise Euclideanposition of the fiducial markers with the daisy-chainedhomography estimate, which required no knowledgeother than initial position and attitude. Details onthe geometric reconstruction of attitude and positioninformation for the ground truth in this experimentis described in [43]. A similar procedure using threepoints that remained in the FOV is described in[15].By using a wider FOV lens, the resulting video

image exhibited severe lens distortion effects thatrequired correcting in order for the homographyto operate correctly. The distortion can be seen inFig. 16, where the four landmarks in the lower portionof the frame appear to be located on a sphericalsurface as opposed to a plane. This same image, withthe radial distortion removed, is given in Fig. 17.The interlacing effect, prevalent in Fig. 14, has also

Fig. 16. Single video frame from second flight test experimentillustrating effect of forward-looking, wider FOV camera.

Fig. 17. Single video frame from second flight test experimentwith lens distortion removed.

been removed using freeware video editing softwareVirtualDub.1

Results of this flight experiment are given inFigs. 18—21. The initial value of d(0) is estimatedfrom the detailed knowledge of the first four points,but subsequent estimates of d(t) are generated usingthe methods in Section IIC. A “*” in the plot indicatesa time when a daisy-chaining hand-off was performedand when pose reconstruction is performed usinga new set of feature points. The position estimateof the daisy-chained homography estimate agreesclosely with the position estimate of the geometricreconstruction. The attitude estimate of both methodsshows much more variation than the translationestimates, and there is less agreement between thetwo estimates. Attitude is expressed in roll/pitch/yawangles. Despite the disagreement, the relative shapeof both signals is similar, and the homography tracksthe geometric reconstruction well, with the exceptionof the first 3 s of the roll angle and a possible driftin the heading angle towards the end of the flight.Since the developed estimation is based on chainingsuccessive approximations, some error propagationis inevitable. (See Figs. 11 and 10.) The focus of this

1VirtualDub–Copyright (c) by Avery Leehttp://www.virtualdub.org/index.html.


Fig. 18. Results from second experimental flight test. Estimatedposition is compared to construction using complete geometric

knowledge of ground targets.

Fig. 19. Results from second experimental flight test. Estimatedattitude and bearing is compared to construction using complete

geometric knowledge of ground targets.

study was to investigate the feasibility of using thecamera feedback for pose estimation. Future effortswill focus on quantifying the error propagation, andthese efforts will investigate methods that may be usedto mitigate the error propagation, such as collaborativesensing between multiple UAVs, sensor integrationwith other sensors, error correction by sporadic GPSfeedback, etc.

V. CONCLUSIONS

The efforts in this paper integrated newvision-based, pose estimation methods for guidance,navigation, and control of an aerial vehicle. Thismethod is based on Epipolar geometry, with a novel“daisy-chaining” approach that allows image featuresto enter and leave the FOV while maintaining poseestimation. Furthermore, no known target is required.A nonlinear aircraft flight model for an Osprey UAVwas developed to investigate the performance ofthe developed method in simulations. Simulationsalso include testing the pose estimation method inclosed-loop control of the aircraft. Based on the

Fig. 20. Results from second experimental flight test showsdisagreement between daisy-chained position estimate and positionestimation using complete geometric knowledge of ground targets.

Fig. 21. Results from second experimental flight test showsdisagreement between daisy-chained orientation estimate andorientation estimation using complete geometric knowledge of

ground targets.

successful simulation results, two flight tests were alsoperformed using a radio controlled Osprey UAV. Thefirst flight test indicated that the estimation methodprovided approximately equal pose estimates as thetwo GPS units. However, conclusive results weredifficult to obtain due to discrepancies in the GPSunits and due to lack of attitude information. Thesecond flight test used precise information about theEuclidean position of fiducial markers as a groundtruth for the estimation method. The daisy-chainingestimation method was successfully demonstrated toreconstruct both position and attitude measurementsin comparison to the reconstructed ground truth usingdetailed geometric knowledge.In addition to providing pose estimates for control,

the daisy-chaining method in this paper is suitablefor use as a localization algorithm in local VSLAMapplications. It is feasible to keep track of the locationof previously tracked feature points, resulting in amap useful for global VSLAM applications. Futureefforts will work to further develop such VSLAM


applications. Future research will also investigatemethods to improve inevitable error propagationthrough the inclusion of additional sensors, such asIMUs and intermittent GPS. Additional future workwill focus on determining what, if any, bounds can beplaced on estimate errors over time and under variouscommon error sources. Future research will investigatemethods to improve inevitable error propagationthrough the inclusion of additional sensors, such asIMUs and intermittent GPS.

REFERENCES

[1] J. A. V. N. T. S. CenterVulnerability assessment of the transport infrastructurerelying on the global positioning system.Office of the Assistant Secretary for TransportationPolicy, U.S. Department of Transportation, Report, Aug.2001.

[2] Kaiser, K., Gans, N., and Dixon, W.Position and orientation an aerial vehicle through chained,vision-based pose reconstruction.In Proceedings of the AIAA Conference on Guidance,Navigation and Control, 2005.

[3] Kaiser, K., Gans, N., and Dixon, W.Localization and control an aerial vehicle throughchained, vision-based pose reconstruction.In Proceedings of the American Control Conference, 2007,5934—5939.

[4] Hu, G., Mehta, S., Gans, N., and Dixon, W. E.Daisy chaining based visual servo control. Part I:Adaptive quaternion-based tracking control.In Proceedings of the IEEE Multi-conference on Systemsand Control, 2007, 1474—1479.

[5] Hu, G., Gans, N., Mehta, S., and Dixon, W. E.Daisy chaining based visual servo control. Part II:Extensions, applications and open problems.In Proceedings of the IEEE Multi-conference on Systemsand Control, 2007, 729—734.

[6] Chatterji, G., Menon, P., and Sridhar, B.GPS/machine vision navigation system for aircraft.IEEE Transactions on Aerospace and Electronic Systems,33, 3 (July 1997), 1012—1025.

[7] Roberts, J. M., Corke, P. I., and Buskey, G.Low-cost flight control system for a small autonomoushelicopter.In Proceedings of the Australasian Conference on Roboticsand Automation, 2002.

[8] Zhang, H. and Ostrowski, J.Visual servoing with dynamics: Control of an unmannedblimp.In Proceedings of the IEEE International Conference onRobotics and Automation, vol. 1, May 1999, 618—623.

[9] Sharp, C. S., Shakernia, O., and Sastry, S. S.A vision system for landing an unmanned aerial vehicle.In Proceedings of the IEEE International Conference onRobotics and Automation, 2001, 1720—1727.

[10] Jones, C. G., Heyder-Bruckner, J. F., Richardson, T. S., andJones, D. C.Vision-based control for unmanned rotorcraft.In Proceedings of the AIAA Guidance, Navigation, andControl Conference, 2006.

[11] Saripalli, S., Montogomery, J. F., and Sukhatme, G. S.Visually guided landing of an unmanned aerial vehicle.IEEE Transactions on Robotics and Automation, 19, 3(2003), 371—380.

[12] Chatterji, G. B., Menon, P. K., and Sridhar, B.Vision-based position and attitude determination foraircraft night landing.Journal of Guidance, Control, and Dynamics, 21, 1 (1998),84—92.

[13] Liu, T. and Fleming, G.Videogrammetric determination of aircraft position andattitude for vision-based autonomous landing.Presented at the AIAA Aerospace Sciences Meeting andExhibit, 2006.

[14] Shakernia, O., Ma, Y., Koo, T. J., and Sastry, S.Landing an unmanned air vehicle: Vision based motionestimation and non-linear control.Asian Journal of Control, 1 (1999), 128—145.

[15] Yakimenko, O., Kaminer, I., Lentz, W., and Ghyzel, P.Unmanned aircraft navigation for shipboard landing usinginfrared vision.IEEE Transactions on Aerospace and Electronic Systems,38, 4 (Oct. 2002) 1181—1200.

[16] Koch, A., Wittich, H., and Thielecke, F.A vision-based navigation algorithm for a VTOL-UAV.Presented at the AIAA Guidance, Navigation, and ControlConference, 2006.

[17] Jianchao, Y.A new scheme of vision based navigation for flyingvehicles–concept study and experiment evaluation.In Proceedings of the IEEE International Conference onControl, Automation, Robotics And Vision, 2002, 643—648.

[18] Barber, D. B., Griffiths, S. R., McLain, T. W., andBeard, R. W.Autonomous landing of miniature aerial vehicles.Journal of Aerospace Computing, Information, andCommunication, 4 (2007), 770.

[19] Bryson, M. and Sukkarieh, S.Observability analysis and active control for airborneSLAM.IEEE Transactions on Aerospace and Electronic Systems,44, 1 (Jan. 2008), 261—280.

[20] Kim, J. and Sukkarieh, S.Autonomous airborne navigation in unknown terrainenvironments.IEEE Transactions on Aerospace and Electronic Systems,40, 3 (July 2004), 1031—1045.

[21] Trisiripisal, P., Parks, M. R., Abbot, A. L., Liu, T., andFleming, G. A.Stereo analysis for vision-based guidance and control ofaircraft landing.Presented at the AIAA Aerospace Sciences Meeting andExhibit, 2006.

[22] Wu, A. D., Johnson, E. N., and Proctor, A. A.Vision-aided inertial navigation for flight control.Journal of Aerospace Computing, Information, andCommunication, 2 (2005), 348—360.

[23] Prazenica, R. J., Watkins, A., Kurdila, A. J., Ke, Q. F., andKanade, T.Vision-based Kalman filtering for aircraft state estimationand structure from motion.Presented at the AIAA Guidance, Navigation, and ControlConference, 2005.

[24] Caballero, F., Merino, L., Ferruz, J., and Ollero, A.Improving vision-based planar motion estimation forunmanned aerial vehicles through online mosaicing.In Proceedings of the IEEE International Conference onRobotics and Automation, 2006, 2860—2865.

[25] Se, S., Lowe, D., and Little, J.Global localization using distinctive visual features.In Proceedings of the IEEE/RAS International Symposiumon Robotics and Automation, 2002, 226—231.


[26] Eustice, R., Singh, H., Leonard, J., Walter, M., andBallard, R.Visually navigating the RMS titanic with SLAMinformation filters.In Proceedings of Robotics: Science and Systems, June2005.

[27] Jensfelt, P., Kragic, D., Folkesson, J., and Bjorkman, M.A framework for vision based bearing only 3D SLAM.In Proceedings of the IEEE International Conference onRobotics and Automation, 2006, 1944—1950.

[28] Lowe, D.Object recognition from local scale-invariant features.In Proceedings of the IEEE International Conference onComputer Vision, 1999, 1150—1157.

[29] Davison, A. J., Reid, I. D., Molton, N. D., and Stasse, O.MonoSLAM: Real-time single camera SLAM.IEEE Transactions on Pattern Analysis and MachineIntelligence, 29, 6 (2007), 1052—1067.

[30] Kim, J-H. and Sukkarieh, S.Airborne simultaneous localisation and map building.In Proceedings of the IEEE International Conference onRobotics and Automation, 2003.

[31] Jung, I-K. and Lacroix, S.High resolution terrain mapping using low attitude aerialstereo imagery.In Proceedings of the IEEE International Conference onComputer Vision, 2003, 946—951.

[32] Goncalves, L., di Bernardo, E., Benson, D., Svedman, M.,Ostrowski, J., Karlsson, N., and Pirjanian, P.A visual front-end for simultaneous localization andmapping.In Proceedings of the IEEE International Conference onRobotics and Automation, 2005, 44—49.

[33] Baillard, C. and Zisserman, A.Automatic reconstruction planar models from multipleviews.In Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition, 1999, 559—565.

[34] Okada, K., Kagami, S., Inaba, M., and Inoue, H.Plane segment finder: Algorithm, implementation andapplications.In Proceedings of the IEEE International Conference onRobotics and Automation, 2001, 2120—2125.

[35] Longuet-Higgins, H.A computer algorithm for reconstructing a scene fromtwo projections.Nature, 293 (Sept. 1981), 133—135.

[36] Hartley, R.Computer Vision–Proceedings of the 1992 EuropeanConference on Computer Vision, Lecture Notes inComputer Sciences.

[37] Boufama, B. and Mohr, R.Epipole and fundamental matrix estimation using virtualparallax.In Proceedings of the IEEE International Conference onComputer Vision, 1995, 1030—1036.

[38] Malis, E. and Chaumette, F.2 1/2 D visual servoing with respect to unknown objectsthrough a new estimation scheme of camera displacement.International Journal of Computer Vision, 37, 1 (2000),79—97.

[39] Nister, D.An efficient solution to the five-point relative poseproblem.IEEE Transactions on Pattern Analysis and MachineIntelligence, 26 (2004), 79—97.

[40] Y. Ma, S. Soatto, J. Koseck, and S. SastryAn Invitation to 3-D Vision.New York: Springer, 2004.

[41] Faugeras, O. D. and Lustman, F.Motion and structure from motion in a piecewiseplanarenvironment.International Journal of Pattern Recognition and ArtificialIntelligence, 2, 3 (1988), 485—508.

[42] Zhang, Z. and Hanson, A.3D reconstruction based on homography mapping.Presented at the ARPA Image Understanding Workshop,Palm Springs, CA, 1996.

[43] Kaiser, M. K.Vision-based estimation, localization, and control of anunmanned aerial vehicle.Ph.D. dissertation, Department of Mechanical andAerospace Engineering, University of Florida,Gainesville, FL, May 2008.

[44] Kim, J. and Sukkarieh, S.A baro-altimeter augmented INS/GPS navigation systemfor an uninhabited aerial vehicle.In Proceedings of the International Symposium on SatelliteNavigation Technology Including Mobile Positioning andLocation Services, Melbourne, Australia, 2003.


Michael Kent Kaiser received his B.S. degree in aerospace engineering fromAuburn University, Auburn, AL, in 1990, an M.S. degree in mechanicalengineering from the University of Houston, Houston, TX, in 1995, and a Ph.D.in aerospace engineering from the University of Florida, Gainesville, in 2008.He has nearly 20 years of work experience within the aerospace industry,

which includes his current employment with Mosaic, ATM, as well as previousemployment with the Air Force Research Labs (AFRL) and Lockheed MartinCorporation in Houston, TX, Palmdale, CA, and Marietta, GA. Some of hisresearch interests include system identification, aerodynamic performancebased vehicle design, flight dynamics and control, and vision-based navigation,guidance, and control.

Nicholas Gans (M’05) received his B.S. degree in electrical engineering fromCase Western Reserve University, Cleveland, OH, in 1999. He earned his M.S. inelectrical and computer engineering and his Ph.D. in systems and entrepreneurialengineering from the University of Illinois at Urbana-Champaign in December2005.After completing his doctoral studies he worked as a postdoctoral researcher

with the Mechanical and Aerospace Engineering Department at the Universityof Florida. He is currently a postdoctoral associate with the National ResearchCouncil, where he conducts research for the Air Force Research Laboratoryat Eglin AFB in Florida. His research interests include nonlinear and adaptivecontrol, with focus on vision-based control and estimation, robotics, andautonomous vehicles.Dr. Gans is a member of the IEEE Robotics and Automation Society (RAS),

and the IEEE Control System Society (CSS), and he has served on the programcommittees for several conferences.

Warren Dixon (M’00–SM’05) received his Ph.D. degree in 2000 from theDepartment of Electrical and Computer Engineering from Clemson University,Clemson, SC.After completing his doctoral studies, he was selected as a Eugene P. Wigner

Fellow at Oak Ridge National Laboratory (ORNL), where he worked in theRobotics and Energetic Systems Group. In 2004 he joined the faculty of theUniversity of Florida in the Mechanical and Aerospace Engineering Department.Dr. Dixon’s main research interest has been the development and application ofLyapunov-based control techniques for uncertain nonlinear systems.Dr. Dixon has published 2 books, an edited collection, 4 chapters, and over

180 refereed journal and conference papers. He was awarded the 2001 ORNLEarly Career Award for Engineering Achievement for his contributions toLyapunov-based control methods. He was awarded the 2004 DOE OutstandingMentor Award for his student advising at ORNL. He was awarded anNSF CAREER award, in 2006, for new development and application ofLyapunov-based control methods. He was also awarded the 2006 IEEE Roboticsand Automation Society (RAS) Early Academic Career Award. He serves on theIEEE CSS Technical Committee on Intelligent Control, is a primary memberof the ASME DSC Division Mechatronics Technical Committee, is a memberof numerous conference program committees, and serves on the conferenceeditorial board for the IEEE CSS and RAS and the ASME DSC. He servedas an appointed member to the IEEE CSS Board of Governors for 2008. Heis currently an associate editor for IEEE Transactions on Systems Man andCybernetics: Part B Cybernetics, Automatica, International Journal of Robust andNonlinear Control, and Journal of Robotics.


Vision-Based Estimation for Guidance, Navigation, and Control of...

Documents

Transcript of Vision-Based Estimation for Guidance, Navigation, and Control of...