Autonomous Airborne Video-Aided Navigation · ABSTRACT: We present an autonomous airborne...

11
Autonomous Airborne Video-Aided Navigation KYUNGSUK LEE, JASON M. KRIESEL, and NAHUM GAT Opto-Knowledge Systems, Inc. (OKSI), Torrance, CA 90502 Received December 2008; Revised June 2010 ABSTRACT: We present an autonomous airborne video-aided navigation system that uses video from an onboard camera, data from an IMU, and digitally stored georeferenced landmark images. The system enables self-contained navigation in the absence of GPS. Relative position and motion are tracked by comparing simple mathematical representations of consecutive video frames. Periodically, a single image frame is compared to the landmark image to determine absolute position and correct for any possible drift or bias in calculating the rela- tive motion. This paper describes the computational approach, test flight hardware, and test results obtained using actual flight data. The techniques are designed to be used for UAVs, cruise missiles, or smart munitions, and provide a cost effective system for navigation in GPS-denied environments. INTRODUCTION The U.S. Department of Defense relies heavily on GPS for both targeting and navigation, on sol- diers, manned and unmanned ground and aerial platforms, and guided munitions. However, GPS signals are susceptible to jamming and can be diffi- cult to utilize in certain locations, such as ‘‘urban canyons.’’ A high quality, low-cost backup to GPS is needed so that the targeting and navigation capabilities of the U.S. warfighter are not compro- mised under adverse conditions. Inertial Navigation Systems (INS), Attitude Heading Reference Systems (AHRS), and Inertial Measurement Units (IMU) use gyros, accelerome- ters, and magnetometers to track motion and determine heading. While such systems can be used as GPS alternatives, the sensors are known to suffer from drift and random walk errors during long-duration operations. Position and attitude determination via double integration of accelerom- eters and gyros are particularly susceptible to errors. Errors accumulate over time and can pro- duce relatively large and unacceptable mistakes in navigation. High-end INS devices take great pains to monitor the drift and bias and can do fairly well over short distances, but these systems are prohib- itively expensive (cost [ $20k) for use onboard low cost platforms or expendable munitions. In addi- tion, even these higher cost inertial devices do not have a means to make an absolute verification of the location of the platform at any time, and must ‘‘blindly’’ trust the dead reckoning calculations. Because of these limitations, reasonably priced inertial systems alone do not provide a reliable backup solution to GPS. Thus, a reliable, low cost alternative to inertial systems is needed for situa- tions where GPS is jammed or otherwise unavail- able. Such an alternative system would allow munitions and other platforms to navigate and tar- get in GPS-denied environments, and would pro- vide an option for lunar and planetary surface explorations where GPS is simply not available. Our solution to this need is a video-aided naviga- tion system (VANS) that does not rely on GPS and can work with a low cost inertial type device (and appears promising for use without an inertial sys- tem altogether). A sequence of video images con- tains large amounts of information that can be used for vehicle navigation and control, object detection and identification, obstacle avoidance, and many other tasks. Unlike radar- or laser-based systems, computer vision is passive and emits no external signals. As a result, vision systems can operate undetectably in hostile environments. Video camera(s), inertial device(s), batteries, etc., are generally already part of the payload of a typi- cal Unmanned Aerial Vehicle (UAV); therefore, the additional size, weight, power, and cost require- ments are minimal. PREVIOUS WORK There have been many previous efforts and research projects related to computer vision based NAVIGATION: Journal of The Institute of Navigation Vol. 57, No. 3, Fall 2010 Printed in the U.S.A. 163

Transcript of Autonomous Airborne Video-Aided Navigation · ABSTRACT: We present an autonomous airborne...

Page 1: Autonomous Airborne Video-Aided Navigation · ABSTRACT: We present an autonomous airborne video-aided navigation system that uses video from an onboard camera, data from an IMU, and

Autonomous Airborne Video-AidedNavigation

KYUNGSUK LEE, JASON M. KRIESEL, and NAHUM GATOpto-Knowledge Systems, Inc. (OKSI), Torrance, CA 90502

Received December 2008; Revised June 2010

ABSTRACT: We present an autonomous airborne video-aided navigation system that uses video from anonboard camera, data from an IMU, and digitally stored georeferenced landmark images. The system enablesself-contained navigation in the absence of GPS. Relative position and motion are tracked by comparing simplemathematical representations of consecutive video frames. Periodically, a single image frame is compared to thelandmark image to determine absolute position and correct for any possible drift or bias in calculating the rela-tive motion. This paper describes the computational approach, test flight hardware, and test results obtainedusing actual flight data. The techniques are designed to be used for UAVs, cruise missiles, or smart munitions,and provide a cost effective system for navigation in GPS-denied environments.

INTRODUCTION

The U.S. Department of Defense relies heavilyon GPS for both targeting and navigation, on sol-diers, manned and unmanned ground and aerialplatforms, and guided munitions. However, GPSsignals are susceptible to jamming and can be diffi-cult to utilize in certain locations, such as ‘‘urbancanyons.’’ A high quality, low-cost backup to GPSis needed so that the targeting and navigationcapabilities of the U.S. warfighter are not compro-mised under adverse conditions.

Inertial Navigation Systems (INS), AttitudeHeading Reference Systems (AHRS), and InertialMeasurement Units (IMU) use gyros, accelerome-ters, and magnetometers to track motion anddetermine heading. While such systems can beused as GPS alternatives, the sensors are knownto suffer from drift and random walk errors duringlong-duration operations. Position and attitudedetermination via double integration of accelerom-eters and gyros are particularly susceptible toerrors. Errors accumulate over time and can pro-duce relatively large and unacceptable mistakes innavigation. High-end INS devices take great painsto monitor the drift and bias and can do fairly wellover short distances, but these systems are prohib-itively expensive (cost [ $20k) for use onboard lowcost platforms or expendable munitions. In addi-tion, even these higher cost inertial devices do nothave a means to make an absolute verification of

the location of the platform at any time, and must‘‘blindly’’ trust the dead reckoning calculations.

Because of these limitations, reasonably pricedinertial systems alone do not provide a reliablebackup solution to GPS. Thus, a reliable, low costalternative to inertial systems is needed for situa-tions where GPS is jammed or otherwise unavail-able. Such an alternative system would allowmunitions and other platforms to navigate and tar-get in GPS-denied environments, and would pro-vide an option for lunar and planetary surfaceexplorations where GPS is simply not available.

Our solution to this need is a video-aided naviga-tion system (VANS) that does not rely on GPS andcan work with a low cost inertial type device (andappears promising for use without an inertial sys-tem altogether). A sequence of video images con-tains large amounts of information that can beused for vehicle navigation and control, objectdetection and identification, obstacle avoidance,and many other tasks. Unlike radar- or laser-basedsystems, computer vision is passive and emits noexternal signals. As a result, vision systems canoperate undetectably in hostile environments.Video camera(s), inertial device(s), batteries, etc.,are generally already part of the payload of a typi-cal Unmanned Aerial Vehicle (UAV); therefore, theadditional size, weight, power, and cost require-ments are minimal.

PREVIOUS WORK

There have been many previous efforts andresearch projects related to computer vision based

NAVIGATION: Journal of The Institute of NavigationVol. 57, No. 3, Fall 2010Printed in the U.S.A.

163

Page 2: Autonomous Airborne Video-Aided Navigation · ABSTRACT: We present an autonomous airborne video-aided navigation system that uses video from an onboard camera, data from an IMU, and

airborne navigation systems. Most of those havefocused on feature tracking and optical flow-basedmethods to estimate platform motion, and theycompute the platform positions through the regis-tration of images taken at multiple views (e.g.,video sequence or/and landmark images). Theyusually utilize a variety of sensor systems to takeadvantage of their coupling effect or to compensatefor the weakness of each system.

There have been several studies incorporatingimaging systems with IMU or GPS/IMU navigationsystems [1–4]. While some of the previous efforts([1] and [2]) are limited where GPS is not avail-able, other efforts (such as [3] and [4]) have shownthe benefit of fusing imaging and inertial systemsin GPS-denied environments for improved perform-ance over inertial-only navigation systems. Theseefforts use a stochastic feature tracking method,employing a Kalman filter for feature correspon-dence searches between images. The advantage ofthis method is that one can track and minimizethe INS-navigating error by analyzing the discrep-ancy between INS-predicted feature-positions andimage-coregistered feature-positions where the dis-crepancy serves in the Kalman filter, as INS error-samples obtained from a source independent ofINS itself.

Other efforts on feature tracking-based methodshave been investigated based on well-known solu-tions for 3-D scene reconstruction [5], theories incamera calibration and image registration [6], ormotion estimation [7] to provide relative platformpositions.

A more accurate absolute position of a platformcan also be estimated by tracking/matching the 2-Dprojections of located landmark features to plat-form sensor images [8, 9], or by reconstructing theterrain map from multiple images to compare withreference data such as Digital Elevation Models(DEMs) [10]. These techniques are computationallyintensive and limited to navigation only over areaswhere landmark image or DEM is available.

Compared to previous techniques, our video-aided navigation technique is based on relativelysimple and fast processes. Unlike other techniquesthat rely primarily on inertial measurements, inprinciple, our technique does not even require anIMU. It can work using only a sequence of videoframes, an altimeter, and Digital Terrain ElevationData (DTED) to estimate camera-pointing posi-tions, where the camera-pointing position is theposition on the ground where the extended cameraoptical axis is intersected. The camera-pointingposition may not necessarily match the actual plat-form track due to the platform attitude such thatthe camera does not point straight down. Inertialmeasurements are therefore useful in determiningthe platform attitude changes enabling conversion

of changes in the camera pointing position tochanges in ground position. The significance of theuse of the INS data in the current technique isthat the primary motion calculation is accom-plished with the video data, not INS, and only ratevalues are used from the inertial system, whicheliminates the need for an expensive INS.

The technique described here is divided into twobasic modes of operation, Relative Navigation(RelNav) and Absolute Navigation (AbsNav). Thiscombined approach overcomes excessive computa-tional time, which would occur when using onlyabsolute position determination, and overcomespotential inaccuracies when using only relativeposition determination.

RelNav uses video sequences to track camera-pointing position and update the current position.This algorithm is computationally fast and canexecute in real time at video frame rates; however,this mode may suffer from accumulated errorsover a long track due to the resolution of the cam-eras, the inherent distortions in video imagery,and the use of low cost IMU. Therefore, AbsNavruns periodically to update the platform’s position(latitude/longitude/altitude) and attitude (i.e., roll/pitch/yaw) by correcting errors accumulated by theRelNav, and by providing an absolute reference tolandmark imagery. The modes are described in thenext two sections, respectively, and results ob-tained by applying the techniques to actual flightdata are presented in the third. The techniqueswere initially developed under small business inno-vative research projects in 2002 and more specificdetails of how the techniques were developed aswell as data from progressive tests can be found inthe technical reports [11] and [12].

The goal of the work presented here is to enablethe airborne platform to continue navigating with-out GPS by using an onboard camera, an inexpen-sive INS (such as an IMU), an altimeter, and anoccasional comparison to landmark imagery. Thetechniques provide autonomous navigation capabil-ities for small platforms such as UAVs or Micro AirVehicles (MAVs).

RELATIVE NAVIGATION

The RelNav algorithm, depicted in a block dia-gram in Figure 1, is applied to real-time streamingvideo from an onboard camera. The algorithm com-pares successive frames in a video sequence anddetermines the change in the camera-pointingposition from one frame to the next as illustratedin Figure 2. The algorithm uses inertial measure-ments to remove the perceived change in motiondue to changes in the attitude of the platform asopposed to actual changes in platform position.

164 Navigation Fall 2010

Page 3: Autonomous Airborne Video-Aided Navigation · ABSTRACT: We present an autonomous airborne video-aided navigation system that uses video from an onboard camera, data from an IMU, and

For example, for the images in Figure 2 the plat-form is moving primarily in the þy direction andperceived motion in the x direction is actually dueto roll motion. The image distortion between suc-cessive video frames due to the change of attitude,in principle, can be calculated using a 3D projec-tion from ground to image plane with an AffineTransform. However, the image distortion can alsobe more simply approximated by rotation andtranslation, which is the approximation techniquethat RelNav uses to quickly determine the changein camera-pointing position. The change in theactual platform position is then computed with anattitude correction using IMU rate data.

Rotation Extraction

To calculate relative rotation between frames, aRadon Transformation is applied to subsampledcircular portions of the images. This circular sam-pling method does not suffer from the problem ofdifferent edge contents of images encounteredwhen using rectangular images, and thus canavoid such an issue that often occurs with use of aFourier Transform technique applied to the imagedata. The Radon Transform [13] is the sum of thepixels along a ray (s) defined by the radius, q, fromthe origin, at an angle of inclination, h. The Radonoperator maps the image domain I(x, y) to the

Radon domain, or ray-sum image, R(h, q), in whicha point corresponds to the sum of the pixels alonga ray in the image domain.

R h; qð Þ ¼Zr

�r

I q cos h� s sin h; q sin hþ s cos hð Þ ds

(1)

It is noted that two images that are rotated rela-tive to each other produce two ‘‘ray-sum images’’that are different only by a linear shift. The ideacan be used to detect the rotation angle betweentwo similar images by transforming them into theRadon domain and then extracting the shift (trans-lation) between the two ray sum images. In prac-tice, the ray-sum signal, R (h,0), is used instead ofthe full signal to save the processing time, yet stillmaintain comparable performance.

Figure 3(a) shows an input image (a) with raysat 308 equally-sampled angular directions, and Fig-ure 3(b) shows the same image (b) rotated by2308. The ray sum array for image (a) consists ofthe sums (S1, S2, S3, S4, S5, S6), while that of therotated input image (b) consists of the sums (S2,S3, S4, S5, S6, S1). Thus the only differencebetween the two ray sum signals is that the laterimage (b) is shifted from that of the former image(a) by ‘‘21’’ translation units, where one transla-

Fig. 2–Three successive frames in a video sequence showing the change in the camera pointing position from oneframe to the next, along with corresponding cartoon representations of an aircraft. The change due to platformmotion (þy direction) and roll motion (x direction) is calculated using image matching techniques and inertialdata.

Fig. 1–Block Diagram of the Relative Navigation code.

Vol. 57, No. 3 Lee et al.: Autonomous Airborne Video-Aided Navigation 165

Page 4: Autonomous Airborne Video-Aided Navigation · ABSTRACT: We present an autonomous airborne video-aided navigation system that uses video from an onboard camera, data from an IMU, and

tion unit is equivalent to a 308 rotation. In prac-tice, angular sampling can be done at a muchhigher resolution so that one translation unitwould typically be 18, 0.58, or even less if neces-sary, depending on the spatial resolution of thevideo images. In a case where the translation unit is18, the ray sum signal would be a 1-D array consist-ing of 180 elements represented as (S1, S2, S3, . . .S180) and theoretically, the system would be able toresolve down to 18 rotation between images.

To detect the shift between the two ray sum sig-nals, we assume that two ray sum signals, S(n)and T(n), centered at the same point but rotatedfrom one another, are periodic translations of oneanother such that T(n) ¼ S(n þ k). Then the Fou-rier transforms of these two ray sum signals havethe same magnitude but different phase. The rela-tionship of Fourier transforms of S and T isdescribed as:

FFTðTÞ ¼ FFTðSÞ � ej2pku (2)

By the shift theorem, the difference k can bedetected in terms of delta function as follows:

FFT�1 FFTðSðnÞÞ � FFT�ðTðnÞÞFFTðSðnÞÞ � FFT�ðTðnÞÞj j

8>>:9>>; ¼ dðn� kÞ (3)

In practice, it is implemented in terms of fast andsimple convolution. To do this, we calculate cross-correlation scores by iterating over a given rangeof angles.

score ðkÞ ¼X

n

SðnÞ � lSð Þ Tðnþ kÞ � lTð ÞrSrT

(4)

If a point in one image does not match the samepoint in the other image (i.e., not correspondingpoints), then the correlation between the two raysum signals has a low score. The correct angular

shift between images is determined by finding themaximum cross-correlation score.

Translation Extraction

To find the translation between frames (DX andDY) the cross-correlation score is maximized overan iterative process where the images are trans-lated according to optimal directions. The startingposition is found by calculating an estimation ofthe change of camera pointing position (DXc, DYc)using the plane velocity (V) from a previous calcu-lation, along with Droll, Dpitch from the IMU mea-surement.

D~Xc ¼ VxDtþH � tanðDrollÞD~Yc ¼ VyDtþH � tanðDpitchÞ

where H is the altitude of previous frame state:

ð5Þ

The approximated change can easily be convertedinto camera pixel units as follows:

D~Xc; D~Yc

� �pixel¼ D~Xc; D~Yc

� �� f

H � pwhere f is a focal length and p is CCD pixel size:

ð6Þ

Thus, the camera pointing position of the currentframe is approximately shifted (DXc, DYc)pixel fromthe center position of the previous frame. Next, aDownhill Simplex method [14] is employed in anoptimal search where the initial shift is used forthe starting position of the Simplex run. The scorethat Simplex produces at a given position is theresult of Radon calculation in Eq. (4), and the pro-cess is continued over a fixed number of iterationsor until a sufficiently high correlation score isfound. The matching point in the image that

Fig. 3–Ray-sum calculation of two images of the same scene but rotated by 308.

166 Navigation Fall 2010

Page 5: Autonomous Airborne Video-Aided Navigation · ABSTRACT: We present an autonomous airborne video-aided navigation system that uses video from an onboard camera, data from an IMU, and

achieves the highest score is thus the calculatedchange of camera-pointing position, (DXc, DYc)pixel.

Extraction of Camera-Pointing Position andActual Platform Position

Since the change of camera-pointing position(DXc, DYc)pixel at the current frame state is in pixelunits, the pixel units are converted back into me-ter units to give the camera-pointing position atthe current frame state as follows:

Xc; Ycð Þcurrent ¼ Xc; Ycð Þprevious

þ DXc; DYcð Þpixel�H

cosðDrollÞ � cosðDpitchÞ �p

fð7Þ

The actual position (Xp, Yp) of the platform is nowcomputed from the camera pointing position byapplying a simple roll/pitch/yaw correction usingEuler Transform [15] from a coordinate systemcentered on the platform to one centered on theground.

xyz

2435 ¼

cos h cos w � cos / sin wþ sin / sin h cos w sin / sin wþ cos / sin h cos wcos h sin w cos / cos wþ sin / sin h sin w � sin / cos wþ cos / sin h sin w� sin h sin / cos h cos / cos h

8>>>>>:9>>>>>;

00f

2435 (8)

where h,f,w are the roll, pitch, and yaw at the cur-rent frame state. Then, using ray-tracing analysis,the vector (x, y, z) is projected onto the ground, as:

x y zð Þ �!Ground Projection � xH

z� yH

z�H

8>:9>; (9)

Then the actual position of the platform at the cur-rent frame state is:

Xp; Yp

� �current¼ Xc; Ycð Þcurrent� � xH

z; � yH

z

8>:9>; (10)

It is noted that the AGL altitude (H) is determinedfrom pressure altimeter data and updated at videoframe rate, which provides above sea level infor-mation of the platform, together with DTED,which is a database of ground terrain elevation.Alternatively, AGL altitude can be calculated froma laser altimeter or possibly from triangulationwith two platform positions. These techniques willbe investigated for future research.

In summary, the RelNav continuously tracks thecamera pointing positions based on frame-by-frameimage analysis along with instantaneous rate val-ues from an IMU; it then estimates actual platformposition using the current platform attitude.Though the RelNav calculation utilizes inertialmeasurements it does not suffer in the same wayfrom accumulated error which occurs in dead reck-oning systems relying on the integration of inertialmeasurements over time. This is because the cal-culation of current platform position does notdepend on previous platform position in an accu-mulating manner, but is calculated directly fromthe current camera pointing position. However, tocheck the position accuracy of RelNav, AbsNav

is periodically invoked as described in the nextsection.

ABSOLUTE NAVIGATION

The AbsNav algorithm compares a single videoframe to a portion of a georeferenced landmarkarchival image (e.g., from a previous flight overthe same area or a satellite). A difference in thespatial resolution (or GSD) between the videoframe and landmark image is removed before com-parison by projecting and resampling the videoframe onto the same grid as the landmark image.When a match is found, the position of the plat-form at the time of the video frame is known. Thiscan be used to correct the RelNav results asneeded. The update rate for AbsNav algorithmdepends critically on the computational resourcesavailable considering the size, weight, and powerrestrictions of the intended platform. A higherupdate rate would lead to improved accuracy, butat the cost of a higher processing burden. A typicalupdate rate should be on the order of hundreds ofseconds because the drift of the RelNav systemrequires it to be reset under those rates, but theoptimal value depends also on the aircraft speed,the accuracy of the IMU used, and the features (orlack thereof) of the landscape.

The process of the AbsNav algorithm is brokenup into two main steps: Global Search and FineSearch, as illustrated in Figure 4.

Global Search

The purpose of the Global Search is to find an ap-proximate location (lat/lon) to be used as the start-ing point for a Fine Search. This process drastically

Vol. 57, No. 3 Lee et al.: Autonomous Airborne Video-Aided Navigation 167

Page 6: Autonomous Airborne Video-Aided Navigation · ABSTRACT: We present an autonomous airborne video-aided navigation system that uses video from an onboard camera, data from an IMU, and

saves computational time by reducing the searchbound of the more detailed Fine Search. In theGlobal Search, the video image is projected onto theground coordinate system to match the pixel sizeand orientation of the landmark image. This projec-tion is performed based on best estimates of posi-tion (lat/lon/alt), and attitude (roll/pitch/yaw),which come from the RelNav algorithm. In order tocreate a projection image, we first define the Earthcoordinate system (East: þx-axis, North: þy-axis)with an origin point at the lens position. The CCDimage plane is then placed in an ideal position sothat it is oriented identically to the Earth’s x-y coor-dinates with a nadir viewing geometry. Then thecenter position of the CCD plane is at (0, 0, f), andthe (m, n)th pixel position in the CCD is at (mp, np,f) in the Earth coordinate system where p is thepixel size and f is the focal length of the cameralens. Given an initial set of attitude parameters andaltitude (roll, pitch, yaw, and height, or h, u, w, andh), each CCD pixel vector (mp, np, f) is transformedaccording to the sensor’s orientation using theground projection

m nð Þ �!EarthCoordinatemp np fð Þ �!EulerTransform

x y zð Þ �!GroundProjection � xH

z� yH

z�H

8>:9>; ð11Þ

Our analysis uses a pinhole model of a camera [5].In practice, this requires calibration of the camerato determine internal parameters such as the actuallocation of the image center (optical axis), lens dis-tortions, and so on. Due to the possible errors ofestimates of position (lat/lon/alt), and attitude (roll/pitch/yaw) from the RelNav, the ground projectionimage of the video frame usually does not exactlymatch to the portion of landmark image correspond-ing to the projected area. Around this initial area,the projected video image is ‘‘stepped’’ through adesignated search area of the larger landmarkimage to find the portion of the landmark imagethat best matches the video image. The algorithmuses a cross correlation coefficient [16] to find amatching position.

rðm;nÞ

¼PP

Lðx;yÞ� �Lðx;yÞ� �

Vðx�m;y�nÞ� �V� �

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPPLðx;yÞ� �Lðx;yÞ� �2PP

Vðx�m;y�nÞ� �V� �2

q

ð12Þ

where V is the video image projected onto theground, V is the average of V, L is the landmarkimage, and L is the average of L in the region coin-cident with V. The correlation coefficient is calcu-lated along with histogram matching, where thehistogram matching is used to compensate for dif-ferences in the environmental conditions, seasonalchanges, and camera parameters between the pres-ent video image and the archived landmark image.The position is updated based on the best matches(highest correlation), and they are used as input tothe Fine Search algorithm.

Fine Search

The Fine Search uses the updated position andthe same roll, pitch, yaw, and altitude initiallyused in the Global Search. Starting with these ini-tial values, the Fine Search iterates within a localregion of the landmark image until it finds an opti-mal set of all six degrees of freedom (roll/pitch/yaw,lat/lon/alt). For the optimizing process, we onceagain use the Downhill Simplex method [14]. Thealgorithm updates all six parameters in its optimalway, creates a rendered video image projected fromthe landmark image back onto the CCD planeusing ray tracing (11) inversely based on theupdated parameters, and finds the maximum cor-relation between the rendered video image and theoriginally captured video image. In the GlobalSearch, the original video image is projected ontothe landmark image (ground), while in the FineSearch, a portion of the landmark image is itera-tively projected onto the video images while vary-ing the projection parameters. Table 1 summarizesthe two main computational modes in the AbsNavalgorithm.

Fig. 4–Schematic of Absolute Navigation code.

168 Navigation Fall 2010

Page 7: Autonomous Airborne Video-Aided Navigation · ABSTRACT: We present an autonomous airborne video-aided navigation system that uses video from an onboard camera, data from an IMU, and

RESULTS

FLIGHT DATA COLLECTION

Flight collections for testing the video navigationsystem were conducted over the South Bay regionof Los Angeles, California. This region has a vari-ety of terrain providing the opportunity to collectaerial imagery over urban/suburban terrain (Tor-rance), undeveloped hillsides (Palos Verdes), coast-line, oil refineries, industrial developments, andthe Port of Los Angeles. The flight system usedwas comprised of various types of cameras, whichwere developed to fit into a standard aerial cameramount, shown in Figure 5.

Imagery was collected with several cameras, andat two different altitudes, approximately 5,000 ftand 10,000 ft above sea level. The data sets taken

as a whole can be used to optimize the algorithmsfor varied conditions and flight equipment, as wellas to investigate potential strengths and weaknessof different approaches. The cameras used forflight video capture included visible to near infra-red (VNIR), a short-wave infrared (SWIR), and along-wave infrared (LWIR) cameras to investigatedifferent applications including the potential forday/night operations. One issue is that while ther-mal infrared cameras can be used at night, corre-sponding satellite imagery is not readily available.

The IMU system used was the CrossbowAHRS400CC [17] with nine-axis measurements,which combines linear accelerometers, rotationalrate sensors, and a magnetometer. It uses anonboard DSP with Kalman filter algorithm, andhas 60 Hz data rate.

Fig. 5–Pictures of flight package used to collect aerial image data.

Table 1—Basic description of the two main computations in the Absolute Navigation algorithm

Process Input Output Techniques

Global Search � Roll/pitch from AHRS � Estimated latitudeand longitude (X/Yposition)

� Use attitude and altitudemeasurements to warp video imageto the coordinate frame of thelandmark image

� Altitude from DTEDand altimeter

� Use updated lat/lon values to set asearch bound

� Latitude, longitude,and yaw from RelativeNavigation

� Step through landmark image on aglobal scale to find best estimate forlat/lon.

Fine Search � Same input as GlobalSearch for roll/pitch/yaw & altitude

� More accurateestimate of lat/lonalong with roll/ pitch/yaw & altitude

� Use lat/lon values from GlobalSearch

� Updated lat/lon fromGlobal Search

� Use attitude and altitudemeasurements as a search bound

� Iteratively warp portion oflandmark with different values ofattitude and altitude

� Iterate with Fine Search for lat/lon

Vol. 57, No. 3 Lee et al.: Autonomous Airborne Video-Aided Navigation 169

Page 8: Autonomous Airborne Video-Aided Navigation · ABSTRACT: We present an autonomous airborne video-aided navigation system that uses video from an onboard camera, data from an IMU, and

TEST OF RELATIVE NAVIGATION WITHFLIGHT DATA

The RelNav algorithm was applied to the flightdata, and exemplary results are shown in Figure 6.This particular test represents an extreme case asthe flight line included relatively large excursions inpitch/roll/yaw around 658/6108/6108 respectively.

A comparison between calculated and referencecamera-pointing tracks is shown in Figure 6(a). Inthe figure, the dashed line is the estimation of thecamera-pointing track from the video navigationalgorithm, and the triangle markers are the refer-ence camera-pointing track obtained from a visualcomparison between video image frames and satel-lite imagery. For the visual comparison, a set offrames was sampled from the video sequence witha gap of 50 frames between samples. Using a‘‘human eye,’’ the locations of the sample frameswere found in the georeferenced satellite image,and a set of pixels in the satellite image werepicked, which have the best match to the centerpixel of each frame. The pixels were then con-verted to Earth’s geo-coordinate system (lat/lon) toprovide the reference camera-pointing track. Thismanual track provides the best test of the image

processing algorithms since the results are not de-pendent on potential errors associated with AHRSinaccuracies and timing issues with GPS data.

The platform’s GPS flight track and the calcu-lated roll/pitch corrected position from the RelNavalgorithm are compared in Figure 6(b) where thesolid line is the platform’s flight track obtainedfrom GPS data, and the dashed line is the estimatefrom the video navigation algorithm. The videonavigation track is calculated by transforming thecamera pointing position shown in Figure 6(a) intoan aircraft position using a roll/pitch/yaw transfor-mation based on the angles reported by the AHRS.It is noted that GPS and AHRS (IMU) data wereinterpolated to synchronize with the video framerate. Figure 7 shows the error plot between theGPS flight track and the RelNav estimated trackshown in Figure 6(b).

The RelNav results over this approximately2,500-meter long flight line are summarized inTable 2, showing the RMS errors between the esti-mate and reference camera-pointing tracks, andbetween the estimate and GPS flight tracks. TheRelNav position error is less than 50 m with thelargest errors occurring during the most extrememaneuvers. We note that the largest source of

Fig. 6–Relative Navigation test results overlaid on the satellite image coordinate frame.

170 Navigation Fall 2010

Page 9: Autonomous Airborne Video-Aided Navigation · ABSTRACT: We present an autonomous airborne video-aided navigation system that uses video from an onboard camera, data from an IMU, and

error is caused by the use of AHRS angles to con-vert from camera pointing position to aircraft posi-tion, and is not due to errors with the image analy-sis portion of the algorithm (as evident by the closeagreement between calculated and reference cam-era pointing position shown in Figure 6(a)).

TEST OF ABSOLUTE NAVIGATION WITHFLIGHT DATA

The AbsNav algorithm was also tested using simi-lar flight data. In Figure 8(a), a video frame is shown,and in Figure 8(b), the portion of the IKONOS imageused for the Global Search is shown. This landmarkimage represents a relatively large search region. Inpractice, the location of the aircraft prior to execution

of the AbsNav code should be accurate enough thatthe search area can be much smaller.

At various steps in the AbsNav code, one can cal-culate the camera pointing position (i.e., center ofthe field of view) and platform location. Results ofthese calculations are shown in Table 3. The cam-era pointing position was compared to the refer-ence camera pointing position visually determinedon a 1 m 3 1 m coordinate system defined by theIKONOS image. The results show that the codecalculations agree very well with the visual com-parison results. In other words, the algorithm cor-rectly locates a match between an input videoframe and a portion of the satellite image. In com-parison, a calculation of the camera pointing posi-tion using the GPS location, along with the cam-era-pointing angle reported by the AHRS, differsfrom the actual camera pointing position by morethan 200 m. Thus the combination of GPS þ AHRSdata do not provide an adequate reference for com-paring the results of the calculation; we believe thisis due to errors in the AHRS angle data on theorder of 618. In fact, Table 3 lists precisely how

Table 2—RMS errors of the mismatches betweenestimates and references

Camera Pointing Flight Track

RMS 24.4 m 46.2 m

Fig. 8–Input images for AbsNav test. (a) Video image: original image is 1,280 3 1,024 pixels with a GSD � 0.34meters. The image was resampled to 640 3 512 pixels with a GSD � 2 m for input to the algorithm. (b) Portionof IKONOS image used in Global Search; this portion is 4,600 3 4,000 pixels, where each pixel is 1 m 3 1 m.The image was resampled to 2,300 3 2,000 pixels with a GSD � 2 m for input to the algorithm.

Fig. 7–Error plot of Fig. 6(b): Position errors between GPS track and estimated track of platformflight.

Vol. 57, No. 3 Lee et al.: Autonomous Airborne Video-Aided Navigation 171

Page 10: Autonomous Airborne Video-Aided Navigation · ABSTRACT: We present an autonomous airborne video-aided navigation system that uses video from an onboard camera, data from an IMU, and

angles determined by the AbsNav algorithm differfrom those reported by the AHRS. Viewed anotherway, the AbsNav algorithm can be used to checkthe accuracy of the AHRS and potentially correctfor errors due to gyro bias or other sources.

Figure 9 shows a visual representation of theresults presented in Table 3. Figure 9(a) shows thevideo frame (i.e., the same image as in Figure8(a)). Figures 9(b) to (d) are extracted from a por-

tion of the landmark image where the center isdefined in three different ways, using (b) the meas-ured GPS and AHRS data, (c) the results of theGlobal Search, and (d) the results of the FineSearch. These three different images correspond tothe first three rows in Table 3.

The image corresponding to the output from theFine Search, Figure 9(d), is visually very similar tothe actual video image taken during the flight, Fig-

Table 3—Comparison of AbsNav code outputs (Global Search and Fine Search) to measured values. Camera pointing andplatform location values are in meters defined by the IKONOS coordinate grid. As discussed in the text, the ‘‘Visual

Determination’’ provides the best test of the algorithm, to which the Fine Search results agree within 5 m

Case

Camera Pointing Plane Location Roll Pitch Yaw Altitude

X [m] Y [m] X [m] Y [m] [deg] [deg] [deg] [m]

GPS & AHRS Data 2,351 1,689 2,136 1,864 �4.3 3.5 2.7 1,417Global Search 2,138 1,814 1,923 1,989 �4.3 3.5 2.7 1,417Fine Search 2,156 1,820 2,229 2,058 �1.6 4.8 0.9 1,404Visual Determination 2,160 1,818 N/A N/A N/A N/A N/A N/A

Fig. 9–Images related to test of AbsNav. The image shown in (a) is the flight image and the images in (b), (c), and(d) are generated from IKONOS using the values for camera pointing position and roll/pitch/yaw/altitude as listedin Table 3. The close match between (a) and (d) is a visual example of the success of the Fine Search algorithm.

172 Navigation Fall 2010

Page 11: Autonomous Airborne Video-Aided Navigation · ABSTRACT: We present an autonomous airborne video-aided navigation system that uses video from an onboard camera, data from an IMU, and

ure 9(a), showing a visual demonstration of thesuccess of the AbsNav technique. The approximate‘‘error’’ in the match is less than 5 m, as shown inTable 3 (between Fine Search and Visual Determi-nation). Note that there are subtle differencesbetween the two images due to differences in thecameras used, a difference in the time of day, andactual changes to the scene from the date of theIKONOS image to the date of the flight.

One can see that the image generated using theactual GPS and AHRS measurements, Figure 9(b),is noticeably different from the actual video frame,Figure 9(a). This difference is due to a combinationof inadequate time resolution of the GPS data anderrors with the AHRS data (as discussed above).This difference illustrates the unreliable targetingand positioning that were obtained using the rela-tively inexpensive GPS and INS systems used inthe test flights.

CONCLUSION

This paper presents navigation techniques forUAVs, cruise missiles, and other platforms thatuse video imagery. The system does not rely onGPS and provides an autonomous image-basednavigation system using inertial measurementsand landmark imagery to reduce positioning error.The algorithms have been developed, tested, andoptimized, and brass-board hardware has beenused to collect extensive data sets for further de-velopment. The next step needed to transition thetechnology is the further development of a real-time system. Following this, an embedded proto-type system can be produced and tested.

ACKNOWLEDGMENTS

This work was funded by the Office of NavalResearch. The authors would like to thank Mr.Joel Gat for careful reading of the manuscript andinsightful comments.

REFERENCES

1. Sullivan, D., and Brown, A., ‘‘High Accuracy Autono-mous Image Georeferencing Using a GPS/Inertial-Aided Digital Imaging System,’’ Proceedings of the2002 National Technical Meeting of The Institute ofNavigation, San Diego, CA, January 2002, pp. 598–603.

2. Brown, A., Bockius, B., Johnson, B., Holland, H.,and Wetlesen, D., ‘‘Flight Test Results of a Video-Aided GPS/Inertial Navigation System,’’ Proceedingsof the 20th International Technical Meeting of the

Satellite Division of The Institute of Navigation (IONGNSS 2007), Fort Worth, TX, September 2007, pp.1111–1117.

3. Veth, M., Raquet, J., ‘‘Fusion of Low-Cost Imagingand Inertial Sensors for Navigation,’’ Proceedings ofthe 19th International Technical Meeting of the Sat-ellite Division of The Institute of Navigation (IONGNSS 2006), Fort Worth, TX, September 2006, pp.1093–1103.

4. Ebcin, S., and Veth, M., ‘‘Tightly-Coupled Image-Aided Inertial Navigation Using the Unscented Kal-man Filter,’’ Proceedings of the 20th InternationalTechnical Meeting of the Satellite Division of TheInstitute of Navigation (ION GNSS 2007), FortWorth, TX, September 2007, pp. 1851–1860.

5. Hartely, R. I., and Zisserman, A., Multiple View Ge-ometry in Computer Vision, Cambridge Press, 2000.

6. Heikkila, J., and Silven, O., ‘‘A Four Step CameraCalibration Procedure with Implicit Image Correc-tion,’’ Proceedings of the Computer Vision and Pat-tern Recognition Conference, San Juan, Puerto Rico,June 17–19, 1997.

7. Horn, B. K. P., Hilden, M., and Negahdaripour, S.,‘‘Closed Form Solutions of Absolute Orientation UsingOrthogonal Matrices,’’ JOSA-A, Vol. 5(7), 1987.

8. Kumar, R., Sawhney, H. S., Asmuth, J. C., Pope, A.,and Hue, S., ‘‘Registration of Video to Geo-Refer-enced Imagery,’’ ICPR ’98, Brisbane, Australia, Au-gust 16–20, 1998.

9. Zhang, Z., Deriche, R., Faugeras, O., and Luong, Q.-T.,‘‘A Robust Technique for Matching Two UncalibratedImages Through the Recovery of the Unknown Epipo-lar Geometry,’’ Artificial Intelligence Journal, Volume78, 1995, pp. 87–119.

10. Rodriguez, J. J., and Aggarwal, J. K., ‘‘Matching Aer-ial Images to 3-D Terrain Maps,’’ Pattern Analysisand Machine Intelligence, IEEE Transactions, Vol.12: 1990, pp. 1138–1149.

11. Gat, N., and Lee, K., ‘‘Video Based Autonomous Nav-igation,’’ Final Report: ONR SBIR Phase-I N00014-02-M-0167, November 2002.

12. Kriesel, J., Lee, K., and Gat, N., ‘‘Video Based Auton-omous Navigation in GPS Denied Environments,’’Final Report: ONR SBIR Phase-II N00014-03-C-0463, September 2006.

13. Deans, S. R., The Radon Transform and Some of ItsApplications. New York: John Wiley & Sons, 1983.

14. Nelder, J. A., and Mead, R., ‘‘A Simplex Method forFunction Minimization,’’ Computer Journal, Vol. 7,1965, pp. 308–313.

15. Titterton, D. H., and Weston, J. L., Strapdown Iner-tial Navigation Technology, AIAA, 2004, Revised, 2nd

edition.16. Gonzalez, R. C., and Wintz, P., Digital Image Proc-

essing. Addison-Wesley, 1987, 2nd edition.17. http://www.xbow.com/products/product_pdf_files/inertial_

pdf/6020-0025-01_b_ahrs400cc.pdf (retrieved February3, 2010).

Vol. 57, No. 3 Lee et al.: Autonomous Airborne Video-Aided Navigation 173