Physics-Based Integration of Multiple Sensing Modalities ...cvrc.ece.utexas.edu/Publications/N....

17
Physics-Based Integration of Multiple Sensing Modalities for Scene Interpretation N. NANDHAKUMAR, SENIOR MEMBER, IEEE, AND J. K. AGGARWAL, FELLOW, IEEE Invited Paper The fusion of multiple imaging modalities offers many ad- vantages over the analysis, separately, of the individual sensory modalities. In this paper we present a unique approach to the integrated analysis of disparate sources of imagery for object recognition. The approach is based on physics-based modeling of the image generation mechanisms. Such models make possible features that are physically meaningful and have an improved capacity to differentiate between multiple classes of objects. We il- lustrate the use of physics-based approach to develop multisensory vision systems for different object recognition application domains. The paper discusses the integration of different suites of sensors, the integration of image-derived information with model-derived information, and the physics-based simulation of multisensory imagery. I. INTRODUCTION It is well known that the human visual system extracts a great deal of information from a single gray level image. This fact motivated computer vision researchers to devote much of their attention to analyzing isolated gray scale images. However, research in computer vision has made it increasingly evident that formulation of the interpretation of a single image (of a general scene) as a computational prob- lem results in an underconstrained task. Several approaches have been investigated to alleviate the ill-posed nature of image interpretation tasks. The extraction of additional information from the image or from other sources, including other images, has been seen as a way of constraining the interpretation [1]. Such approaches may be broadly grouped into the following categories: 1) the extraction and fusion of multiple cues from the same image, e.g., the fusion of multiple shape-from-X methods (e.g., shape from Manuscript received May 29, 1996; revised September 6, 1996. N. Nandhakumar’s work was supported by a RADIUS seed contract from Lockheed-Martin, AFOSR Contract F49620-93-C-0063, and ARPA Con- tract F33615-94-C-1529. J. K. Aggarwal’s work was supported by Army Research Office Contracts DAAH-94-G-0417 and DAAH 049510494. N. Nandhakumar is with Electroglas, Inc., Santa Clara, CA 95054 USA (e-mail: [email protected]). J. K. Aggarwal is with the Department of Electrical and Computer Engineering, University of Texas, Austin, TX 78712-1084 USA (e-mail: [email protected]). Publisher Item Identifier S 0018-9219(97)00641-5. shading, shape from texture), 2) the use of multiple views of the scene, e.g., stereo, and more recently 3) the fusion of information from different modalities of sensing, e.g., infrared and laser ranging. Various researchers have referred to each of the above approaches as multisensory approaches to computer vision. The order in which the above approaches have been listed indicates, approximately, the chronological order in which these methods have been investigated. The order is also indicative of the increasing amount of additional informa- tion that can be extracted from the scene and which can be brought to bear on the interpretation task. Past research in computer vision has yielded analytically well defined algorithms for extracting simple information (e.g., edges, two-dimensional (2-D) shape, stereo range, etc.) from images acquired by any one modality of sens- ing. When multiple sensors, multiple processing modules, and/or different modalities of imaging are to be combined in a vision system, it is important to address the development of 1) models relating the images of each sensor to scene variables, 2) models relating sensors to each other, and 3) algorithms for extracting and combining the different information in the images. The choice of a computational framework for a mul- tisensory vision system depends on the application task. Several computational paradigms have been employed in different recent multisensory vision systems. The paradigms can be categorized as: 1) statistical, 2) variational, 3) arti- ficial intelligence (AI), and 4) phenomenological (physics- based) approaches. Statistical approaches typically involve Bayesian schemes which model multisensory information using multivariate probability models or as a collection of individual (but mutually constrained) classifiers/estimators. These schemes are appropriate when the domain of appli- cation renders probabilistic models to be intuitively natural forms of models of sensor performance and the state of the sensed environment. An alternative, deterministic, approach is based on variational principles wherein a cri- terion functional is optimized. The criterion functional 0018–9219/97$10.00 1997 IEEE PROCEEDINGS OF THE IEEE, VOL. 85, NO. 1, JANUARY 1997 147

Transcript of Physics-Based Integration of Multiple Sensing Modalities ...cvrc.ece.utexas.edu/Publications/N....

Page 1: Physics-Based Integration of Multiple Sensing Modalities ...cvrc.ece.utexas.edu/Publications/N. Nandhakumar... · radiation sensed by optical cameras, and by a synthetic aperture

Physics-Based Integration of Multiple SensingModalities for Scene Interpretation

N. NANDHAKUMAR, SENIOR MEMBER, IEEE, ANDJ. K. AGGARWAL, FELLOW, IEEE

Invited Paper

The fusion of multiple imaging modalities offers many ad-vantages over the analysis, separately, of the individual sensorymodalities. In this paper we present a unique approach to theintegrated analysis of disparate sources of imagery for objectrecognition. The approach is based on physics-based modelingof the image generation mechanisms. Such models make possiblefeatures that are physically meaningful and have an improvedcapacity to differentiate between multiple classes of objects. We il-lustrate the use of physics-based approach to develop multisensoryvision systems for different object recognition application domains.The paper discusses the integration of different suites of sensors,the integration of image-derived information with model-derivedinformation, and the physics-based simulation of multisensoryimagery.

I. INTRODUCTION

It is well known that the human visual system extracts agreat deal of information from a single gray level image.This fact motivated computer vision researchers to devotemuch of their attention to analyzing isolated gray scaleimages. However, research in computer vision has made itincreasingly evident that formulation of the interpretation ofa single image (of a general scene) as a computational prob-lem results in an underconstrained task. Several approacheshave been investigated to alleviate the ill-posed natureof image interpretation tasks. The extraction of additionalinformation from the image or from other sources, includingother images, has been seen as a way of constrainingthe interpretation [1]. Such approaches may be broadlygrouped into the following categories: 1) the extractionand fusion of multiple cues from the same image, e.g., thefusion of multiple shape-from-X methods (e.g., shape from

Manuscript received May 29, 1996; revised September 6, 1996. N.Nandhakumar’s work was supported by a RADIUS seed contract fromLockheed-Martin, AFOSR Contract F49620-93-C-0063, and ARPA Con-tract F33615-94-C-1529. J. K. Aggarwal’s work was supported by ArmyResearch Office Contracts DAAH-94-G-0417 and DAAH 049510494.

N. Nandhakumar is with Electroglas, Inc., Santa Clara, CA 95054 USA(e-mail: [email protected]).

J. K. Aggarwal is with the Department of Electrical and ComputerEngineering, University of Texas, Austin, TX 78712-1084 USA (e-mail:[email protected]).

Publisher Item Identifier S 0018-9219(97)00641-5.

shading, shape from texture), 2) the use of multiple viewsof the scene, e.g., stereo, and more recently 3) the fusionof information from different modalities of sensing, e.g.,infrared and laser ranging.

Various researchers have referred to each of the aboveapproaches as multisensory approaches to computer vision.The order in which the above approaches have been listedindicates, approximately, the chronological order in whichthese methods have been investigated. The order is alsoindicative of the increasing amount of additional informa-tion that can be extracted from the scene and which can bebrought to bear on the interpretation task.

Past research in computer vision has yielded analyticallywell defined algorithms for extracting simple information(e.g., edges, two-dimensional (2-D) shape, stereo range,etc.) from images acquired by any one modality of sens-ing. When multiple sensors, multiple processing modules,and/or different modalities of imaging are to be combined ina vision system, it is important to address the developmentof 1) models relating the images of each sensor to scenevariables, 2) models relating sensors to each other, and3) algorithms for extracting and combining the differentinformation in the images.

The choice of a computational framework for a mul-tisensory vision system depends on the application task.Several computational paradigms have been employed indifferent recent multisensory vision systems. The paradigmscan be categorized as: 1) statistical, 2) variational, 3) arti-ficial intelligence (AI), and 4) phenomenological (physics-based) approaches. Statistical approaches typically involveBayesian schemes which model multisensory informationusing multivariate probability models or as a collection ofindividual (but mutually constrained) classifiers/estimators.These schemes are appropriate when the domain of appli-cation renders probabilistic models to be intuitively naturalforms of models of sensor performance and the stateof the sensed environment. An alternative, deterministic,approach is based on variational principles wherein a cri-terion functional is optimized. The criterion functional

0018–9219/97$10.00 1997 IEEE

PROCEEDINGS OF THE IEEE, VOL. 85, NO. 1, JANUARY 1997 147

Page 2: Physics-Based Integration of Multiple Sensing Modalities ...cvrc.ece.utexas.edu/Publications/N. Nandhakumar... · radiation sensed by optical cameras, and by a synthetic aperture

implicitly models world knowledge and also explicitlyincludes constraints from multiple sensors. Adoption ofthis approach results in an iterative numerical relaxationapproach which optimizes the criterion functional.

The complexity of the task sometimes precludes simpleanalytical formulations for scene interpretation tasks. Mod-els relating the images of each sensor to scene variables,models relating sensors to each other, and algorithms forextracting and combining the different information in theimages usually embody many variables which are notknown prior to their interpretation. This necessitates theuse of heuristic and empirical methods for analyzing theimages. The development of complex interpretation strate-gies and knowledge representational mechanisms for usingsuch methods has been intensively researched in the fieldof AI. Many of these ideas can be employed in the designof a multisensory vision system.

Most of the research (to date) in multisensory computerhas adopted either a statistical-, variational-, or AI-basedapproach. Only very recently has research been directedat using phenomenological or physics-based models formultisensory vision. These models are based on physicallaws, e.g., the conservation of energy. Such models relateeach of sensed signals to the various physical parameters ofthe imaged object. The objective is to solve for the unknownphysical parameters using the known constraints and signalvalues. The physical parameters then serve as meaningfulfeatures for object classification.

Denote sensed information as. Each imaging modality(viz. physical sensor) may yield many types of sensedinformation . For example, we may have “thermalintensity,” “stereo range,” “visual intensity,”

“visual edge strength,” etc. Let denotethe value of sensed information at any specified pixellocation . For the sake of brevity, will be usedinstead of in the following. Each source of in-formation is related to object parameters,, and ambientscene parameters,, via a physical model of the followinggeneral form

(1)

where and are the number of object and sceneparameters, respectively. Note that for each, only a subsetof the entire set of parameters has nonzero coefficients.Examples of include visual reflectance of the surface,relative surface orientation, material density, and surfaceroughness. Examples of include ambient temperature,direction of illumination, intensity of solar insolation, andambient wind speed. The functional form ofdepends onthe sensor used and the specific radiometric, photometric,or projective principle that underlies image formation inthat sensor. In addition to the above, various natural lawsdescribe the physical behavior of the objects, e.g., principlesof rigidity and the law of the conservation of energy. Theselead to additional (known) constraints of the followinggeneral form

(2)

The general problem is to use the functional forms (1) and(2) established above to derive a featurethat dependsonly the object properties, , and are independent of scenevariables, . In some cases, it may be possible to solvefor a specific object property itself. If this property isa distinguishing and scene-invariant property of the object,such as material density or thermal capacitance, then thisproperty can be used as the feature,, for recognition.Note that in general, the equations are nonlinear, andhence solving them is not straightforward. Also, it maybe possible to specify a larger number of equations thanrequired, thus leading to an overconstrained system. Anerror minimization approach may then be used to solvefor the unknowns. Alternatively, an overconstrained systemsupports the derivation of algebraic invariants [2] that arefunctions of image information and physical properties ofobjects, and invariant to scene conditions.

The physics based approach has been sucessfully adoptedfor a number of scene interpretation tasks. Section II pro-vides an overview of several of these applications—andthus illustrates the usefulness of this paradigm as a generalapproach for the interpretation of multisensory imagery.Section III describes the integration of multisensory in-formation for computing features based on physics-basedmodels where some of the information is obtained formthe image, and some of it is obtained from the model ofthe hypothesized object. Another important issue related tothe interpretation of multisensory imagery is the simulationof multisensory imagery. A unified scheme for generatingimagery as would be sensed by different co-boresightedsensors is described in Section IV. Section V containsconcluding remarks.

II. I NTEGRATION OF DIFFERENT SENSING MODALITIES

The physics-based approach has been used to address,successfully, a number of object recognition tasks thatrely on the analysis of multisensory imagery. We providebelow an overview of several of these approaches toillustrate this paradigm. We describe the integration ofinfrared and visual imagery for classifying objects occurringin outdoor scenes into broadly defined classes, such asvehicles, buildings, pavement, and vegetation. Next, wedescribe the integration of underwater acoustic and visualimagery for the classification of the imaged sea floorinto two types—sediment or manganese deposits. We thendescribe the interpretation of different channels of polarizedradiation sensed by optical cameras, and by a syntheticaperture radar (SAR) system. We end the section with briefdiscussions on the fusion of some other sensing modalitiessuch as different channels of color information, and opticaland radar imagery.

A. Integration of Infrared and Optical Imagery

The interpretation of thermal and visual imagery ofoutdoor scenes using a physics-based approach has beendescribed in [3]–[6]. The approach is based on a modelof energy exchange between the imaged surface and the

148 PROCEEDINGS OF THE IEEE, VOL. 85, NO. 1, JANUARY 1997

Page 3: Physics-Based Integration of Multiple Sensing Modalities ...cvrc.ece.utexas.edu/Publications/N. Nandhakumar... · radiation sensed by optical cameras, and by a synthetic aperture

(a) (b)

Fig. 1. (a) Energy exchange at the surface of an object in an outdoor scene and (b) equivalentthermal circuit for the surface of the object.

environment (including the cameras). This approach allowsthe estimation of internal object properties of the objectsuch as its ability to sink/source radiation, i.e., its thermalcapacitance. These estimates of internal object propertiesare physically meaningful and powerful features for classi-fying objects into broadly defined classes such as buildings,vegetation, vehicles, and roads. These features are tolerantto changes in viewing direction, illumination, surface coat-ing, and ambient conditions. The approach can therefore beused in different types of outdoor autonomous navigationsystems. It is interesting to note that the perceptual systemsin certain snakes combine thermal and visual imagingsenses to produce such a map of heat sinks and sources[7]–[9].

An overview of this approach is presented. The analysisof two types of imagery is discussed: 1) single set of mul-tisensory data and 2) a temporal sequence of multisensorydata. The model used for integrating thermal and visualimagery of outdoor scenes is based on the principle ofconservation of energy. At the surface of the imaged object[Fig. 1(a)] energy absorbed by the surface equals the energylost to the environment.

(3)

Energy absorbed by the surface is is given by

(4)

where is the incident solar irradiation on a horizontalsurface and is given by available empirical models (basedon time, date and latitude of the scene) or by measurementwith a pyrheliometer, is the angle between the directionof irradiation and the surface normal, and is the surfaceabsorptivity which is related to the visual reflectanceby

. Note that it is reasonable to use the visualreflectance to estimate the energy absorbed by the surfacesince approximately 90% of the energy in solar irradiationlies in the visible wavelengths [10]. A simplified shape-from-shading approach is used to compute andfrom the visual image and is described in detail in [3]. Theenergy lost by the surface to the environment is given by

(5)

where denotes the heat convected from the surface tothe air which has temperature and velocityis the heat lost by the surface to the environment viaradiation and denotes the heat conducted from thesurface into the interior of the object. The radiation heatloss is computed from

(6)

where denotes the Stefan–Boltzman constant,is thesurface temperature of the imaged object, and is theambient temperature.

The surface temperature is computed from the thermalimage based on an appropriate model of radiation energyexchange between the surface and the infrared camera [3].The resulting relationship between the surface temperatureand image gray level produced by the infrared camera isof the form

(7)

where , , , and are known constants, isthe surface temperature of the imaged object,is thewavelength of energy, m and m forthe specific camera being used,is the thermal emissivityof the surface, and is the gray level of a pixel in thethermal image.

The convected heat transfer is given by

(8)

where is the average convected heat transfer coefficientfor the imaged surface. For mixed air flow conditions thiscoefficient may be computed by

(9)

where and are known thermophysical constants ofthe surrounding air [10], is the wind speed, and is thecharacteristic length of the imaged object. In the existingmethod, the wind speed is measured by an anemometer anda value of 1 m is assumed for

Considering a unit area on the surface of the imagedobject, the equivalent thermal circuit for the surface is

NANDHAKUMAR AND AGGARWAL: PHYSICS-BASED INTEGRATION OF MULTIPLE SENSING MODALITIES 149

Page 4: Physics-Based Integration of Multiple Sensing Modalities ...cvrc.ece.utexas.edu/Publications/N. Nandhakumar... · radiation sensed by optical cameras, and by a synthetic aperture

(a) (b)

Fig. 2. (a) Visual image of a scene and (b) thermal image of the scene.

Table 1 Thermal Capacitance of Objects CommonlyImaged in Outdoor Scenes. A Useful and PhysicallyMenaingful Feature for Object Recognition.

Object Thermal Capacitance(� 10�6 J/K)

Asphalt Pavement 1.95Concrete Wall 2.03Brick Wall 1.51Wood (Oak) Wall 1.91Granite 2.25Automobile 0.18

shown in Fig. 1(b). is the lumped thermal capacitanceof the object and is given by

where is the density of the object, is the volume, andis the specific heat. The resistances are given by

and

1) Analyzing a Single Set of Multisensor Data:It is clearfrom Fig. 1(b) that the conduction heat flux dependson the lumped thermal capacitance of the object. Arelatively high value for implies that the object is able tosink or source relatively large amounts of heat. An estimateof , therefore provides us with a relative estimate of thethermal capacitance of the object, albeit a very approximateone. Table 1 lists values of of typical objects imaged inoutdoor scenes. The values have been normalized for unitvolume of the object.

Note that the thermal capacitance for walls and pave-ments is significantly greater than that for automobilesand hence may be expected to be higher for theformer regions. Plants absorb a significant percentage of theincident solar radiation which is used for photosynthesis andalso for transpiration. Very little of the absorbed radiationis convected into the air. Therefore, the estimate of the

will be almost as large (typically 95%) as that ofthe absorbed heat flux. Thus is useful in estimating

the object’s ability to sink/source heat radiation, a featureshown to be useful in discriminating between differentclasses of objects. However, in order to minimize thefeature’s dependence on differences in absorbed heat flux,a normalized feature was defined to be the ratio

.The steps involved in evaluating this internal object

property from the thermal and visual image pair of a sceneare as follows. The registered thermal and visual image pairis segmented. The simplified shape-from-shading approachdiscussed in [3] yields the surface reflectance for eachregion, and it yields the value of at each pixel. Thethermal image provides an estimate of surface temperature

at each pixel by using a table of values of generatedby (7) for different values of and a fast table-lookupscheme. A value of 0.9 is assumed for the surface emissivityof all objects. Equations (3)–(6), (8), and (9) are appliedat each pixel to estimate and thence the ratio

.The above approach was tested on real data. Fig. 2(a)

shows the visual image of an outdoor scene. Fig. 2(b)shows the registered thermal image of the same scene. Theabove approach was used to compute (at each pixel) theratio between the energy conducted into the object and theabsorbed radiation. The mode of this value is computedfor each region [Fig. 3(a)]. The values of this internalobject property occur in the expected ranges for eachclass of objects and help distinguish between these objectclasses [Fig. 3(b)]. The values are lowest for vehicles,highest for vegetation and in between for buildings andpavements. Classification of objects using this propertyvalue is discussed in [5].

2) Analyzing a Temporal Sequence of Multisensor Data:Temporal sequences of multisensor image data are com-monly used for remote sensing and surveillance applica-tions. A temporal sequence of multisensor data consistingof thermal imagery, visual imagery, and scene conditionsmakes possible a more reliable estimate of the imaged ob-

150 PROCEEDINGS OF THE IEEE, VOL. 85, NO. 1, JANUARY 1997

Page 5: Physics-Based Integration of Multiple Sensing Modalities ...cvrc.ece.utexas.edu/Publications/N. Nandhakumar... · radiation sensed by optical cameras, and by a synthetic aperture

(a) (b)

Fig. 3. (a) Mode of the feature value computed for each region and (b) classification using thefeature value.

ject’s relative ability to sink/source heat radiation. Observethat the relationship between the conducted heat fluxand the thermal capacitance of the object is given by

(10)

A finite (backward) difference approximation to this equa-tion may be used for estimating as

(11)

where and are the time instants at which the datawere acquired, and are the correspondingsurface temperatures, and is the conducted heat fluxwhich is assumed to be constant during the time inter-val. However, does vary and an average value of

is used in (11).The above method was tested on temporal sequences

of thermal and visual image pairs acquired at intervals ofthree hours. For each thermal and visual image pair (viz.at each time instant) the method described in Section II-Awas applied to evaluate the various components of surfaceenergy exchange at each pixel. is thus evaluated foreach pixel at each time instant. Equation (11) is thenevaluated at each pixel, for each pair of successive datasets in the temporal sequence. The estimated values offor different classes are close to those listed in Table 1,and can be used to distinguish between different classesof imaged materials. A statistically robust scheme forcomputing this parameter, and the experimental resultsobtained are described in [6].

B. Integrating Sonar and Optical Imagery

Computer analysis of underwater imagery has many im-portant military and commercial applications. These includeaccurate underwater terrain assessment by autonomous un-derwater vehicles and remotely operated vehicles, seafloormapping, exploration of natural resources, location of ob-jects lost at sea, detecting mines, etc. In the past, underwater

imagery has utilized two major sensors: sonar and visual.However, each approach by itself cannot provide suffi-cient information to constrain the imaged object’s identity.For instance, in sonar imaging, surfaces which differ inroughness and other material properties may give riseto images of similar intensities. On the other hand, theoptical surface reflectance of different object classes aresimilar and hence the visual images of their surfaces mightbe indistinguishable. Thus each modality needs additionalinformation in order to uniquely determine object identity.

A physics-based approach for automated analysis ofimagery acquired from underwater environments has beendescribed in [11]. The approach described in [11] is basedupon fusion of information from sonar and visual sensors toconstrain interpretation of the imaged object. An overviewof this approach is presented. The integrated interpretationof sonar and visual data uses a physical model, calledthe composite roughness model [12], which is based onthe analysis of the energy transport across the interfacebetween two fluids. This approach is used to estimatematerial properties of the seafloor which serve as physicallymeaningful features for seafloor classification.

A common use of optical imagery is for sensing themicroprofile of the imaged surface [13], [14]. Standard pho-togrammetric techniques are applied to stereo photographs(Fig. 4) to record heights of bottom features. The Fouriertransform of the autocorrelation of this relief produces the2-D relief spectrum denoted as , whereare, respectively, the and components of the 2-D wavevector, . The 2-D surface relief can be modeled by

where and are dimensionless parameters used todescribe the 2-D wave spectrum andis the wavenumber,i.e., the magnitude of the 2-D wave vector. Hence,and can be computed using the stereoscopic images ofthe surface.

Many studies have shown that a strong relationship existsbetween backscattering strength and material type. This

NANDHAKUMAR AND AGGARWAL: PHYSICS-BASED INTEGRATION OF MULTIPLE SENSING MODALITIES 151

Page 6: Physics-Based Integration of Multiple Sensing Modalities ...cvrc.ece.utexas.edu/Publications/N. Nandhakumar... · radiation sensed by optical cameras, and by a synthetic aperture

Fig. 4. Optical stereo image pair of the seafloor. Surface roughness spectrum is computed forthis pair.

relationship is quantified in the composite roughness modelproposed by Jackson [12]. Good agreement exists betweenreal data and predicted values at moderate grazing anglesand frequencies of 40 kHz or higher [15]. The backscatter-ing cross section of the surface, , is computed fromsonar data (Fig. 5). Here, is the grazing angle of theacoustic energy. The composite roughness model whichtreats the boundary between the seafloor and water as atwo-fluid interface, relates the backscattering cross sectionto: 1) ratio of compressional wave speed to water soundspeed denoted as, 2) the ratio of object mass density towater mass density denoted as, and 3) surface roughness.

The small-scale roughness backscattering cross sectionis modeled by [12]

(12)

where is the acoustic wavenumber in water,is the power spectrum of the 2-D roughness distributionobtained from analyzing the optical stereo imagery.

(13)

The critical angle is given by .The small-scale roughness backscattering cross section

is obtained from the sonar data. Thus is avail-able for various values of . The power spectrum ofthe 2-D roughness distribution is obtained from bottomphotographs. Hence, is available for each of the .The backscattering cross section model, , given by(12) and (13), is now fit to the data-derived backscatteringcross section . This is a nonlinear regressionproblem where and are the parameters of the regressionfit. A least means squared error minimization technique isadopted to compute and .

Fig. 5. Raw sonar data of seafloor. Image was formed using 69sonar pings of a 40 kHz system. Reflections from each ping forma radial line in the image. Azimuthal resolution is 5�.

A feature vector consisting of and is then used toclassify the seafloor, e.g., as sandy sediment or sedimentrich in manganese nodules using a minimum distanceclassifier. Table 2 lists results of analyzing imagery froma seafloor consisting of only sediment and bereft of man-ganese nodules.

C. Interpreting Polarized Radiation

Often, an object being imaged emanates energy (reflectedor emitted energy) that has polarization information whichcan be “decoded” to make inferences on the type of object

152 PROCEEDINGS OF THE IEEE, VOL. 85, NO. 1, JANUARY 1997

Page 7: Physics-Based Integration of Multiple Sensing Modalities ...cvrc.ece.utexas.edu/Publications/N. Nandhakumar... · radiation sensed by optical cameras, and by a synthetic aperture

Table 2 Estimated Values of� and� for Real Datafrom Sediment. Ground Truth Values are� = 1:4,� = 1:0014. Initial Values Were�0 = 1:85; �0 = 1:4.Classification Based on Distance Values.

Estimated ValuesPing

� �dsed dMn class

0 1.07 1.0013 0.40 2.18 sed5 1.07 1.0017 0.40 2.18 sed

10 1.08 1.0019 0.40 2.16 sed15 1.07 1.0013 0.40 2.18 sed20 1.07 1.0014 0.40 2.18 sed25 1.09 1.0018 0.38 2.13 sed30 1.08 1.0014 0.39 2.16 sed35 1.09 1.0019 0.38 2.18 sed40 1.08 1.0016 0.39 2.16 sed45 1.06 1.0011 0.41 2.21 sed50 1.07 1.0012 0.40 2.18 sed55 1.85 1.4 0.12 0.236 sed60 1.07 1.0012 0.40 2.18 sed

surface being sensed. Filters or sensor configurations maybe used to create different images corresponding to onlyspecific modes of polarization. Complementary informationfrom these images yields information about object structureor material type. Two different examples are presentedbelow—the first discusses, briefly, the analysis of opticalimagery, and the second discusses the analysis of differentpolarization channels of SAR imagery.

1) Polarized Optical Imagery:When light is incident oncommonly occurring surfaces, the reflected light consistsof either a specular component, a diffuse component or acombination of the two. If the incident light is unpolar-ized, the diffuse component of the reflected light remainsunpolarized while the specular component of the reflectedlight becomes partially linear polarized. The degree towhich the specularly reflected component becomes partiallypolarized depends on the electrical conductivity of thereflecting surface. For materials with large conductivities,the polarization is reduced. This property has been used tosegment metal surfaces from dielectric surfaces [16].

Consider unpolarized light incident on a surface, withangle of incidence (with respect to the surface normal),and the specularly reflected light at angle of emittance,also equaling . The specular plane is defined to be theplane containing the incident ray and the reflected ray. Forspecularly reflected light the magnitude of the polarizationcomponent perpendicular to the specular plane is larger thanmagnitude of the polarization component parallel to thespecular plane. The Fresnel reflection coefficients,and

, corresponding to the polarization components that are,respectively, perpendicular to and parallel to the specularplane have values between zero and one. The magnituderatio of these polarization components is the polarizationFresnel ratio (PFR) and equals . The value of PFRvaries with the specular angle of incidence,. Wolff arguesthat for values of between 30 and 80 the value of PFRis below 2.0 for metals and above 2.0 for dielectrics. Hence,the PFR can be used to differentiate these two classes ofmetals. Different thresholds on PFR may be used to detectmetals coated with translucent insulating materials.

One way of computing the PFR of an object in thescene is to acquire a sequence of images using a camera infront of which is mounted a polarizer that is incrementallyrotated. The maximum intensity, , at a pixel location,and the minimum intensity, , at that location arenoted. The ratio approximates the PFR if thediffuse component of the reflection is much weaker thanthe specular component—which holds for many surfaces,but is violated for a surface with highly diffuse albedo,such as a sheet of bond paper. An alternative methodof estimating the PFR is to acquire a smaller number ofimages corresponding to fewer orientations of the polarizer.The intensity variation at any pixel is then modeled as asinusoidal function—and the parameters of the fitted modelare used to find and .

The above approach is hampered when the illuminationis a distributed sources instead of a point source, and alsothen the specular angle of incidence is close to the grazingangle. More recently, the approach has been extendedto distinguish edges caused by specularities from thosecaused by occluding boundaries [17]. Further extensionshave been reported to include extended light sources, andobject recognition applications [18].

2) Polarized SAR Imagery:A physics-based approachhas been reported for the interpretation of ultra-wideband(UWB) SAR imagery for object classification [19], [20].The UWB sensor (50 MHz to 1 GHz) is used for detectionof man-made objects, including those which are obscuredby a random media, e.g., vehicles obscured by foliage,and buried objects. An electromagnetic model is used topredict the backscatter received from the scene objects.The backscatter from a scene object consists of twocomponents: 1) an “early time” portion that precedes a 2)late-time portion. The early-time (physical optics) portionof the backscatter is due to reflected energy and is highlydependent on the structure of the object surface that isilluminated by the incident energy. The late time response isdue to resonant modes excited by the incident energy, and isdependent on the gross shape of the object. Hence, the earlytime response from scene objects that differ in structure,such as vehicles and trees, have different sensitivity toaspect angle of the incident wave. Aspect angle sensitivityof the backscatter is extracted by reconstructing the SARimage of an object over smaller subapertures of the fullsynthetic aperture. This multi-aperture approach providesimportant angular information while still maintainingdetectable levels in the subsequent SAR images. Thebackscatter from man-made objects differs from naturalobjects both in angular and polarimetric dependence. Thisis primarily due to the fact that manmade objects aremore planar in structure and more conductive than naturalobjects. The structure of natural objects, such as trees androcks, changes in a random manner with viewpoint, andhence, the aspect-angle dependence of backscatter from nat-ural objects is more random that that for man-made objects.This characteristic is used to develop feature vectors using amulti-aperture approach to discriminate vehicles hidden in adeciduous forest from surrounding objects of little interest.

NANDHAKUMAR AND AGGARWAL: PHYSICS-BASED INTEGRATION OF MULTIPLE SENSING MODALITIES 153

Page 8: Physics-Based Integration of Multiple Sensing Modalities ...cvrc.ece.utexas.edu/Publications/N. Nandhakumar... · radiation sensed by optical cameras, and by a synthetic aperture

(a) (b) (c)

Fig. 6. Approximation of early-time portion of backscatter for a flat plate: (a) the incident waveilluminates the plate at aspect angle�, (b) the silhouette area functionS(z) is determined, and (c)the second derivative of the function is used in the physical optics approximation to estimatethe backscatter.

(a) (b) (c)

Fig. 7. UWB-SAR image of a pickup truck (a) using full aperture, (b) quarter subaperture whichis off-broadside of truck, and (c) quarter subaperture which is along broadside of truck.

Considering the physical optics or early time portion ofthe energy scattered by the imaged surface, and assumingan impulse incident wave, the resulting scattered wave isgiven by

(14)

where the is the contribution of the incident wavein generating the local surface current at, is thesilhouette area of the scatterer projected along the directionof propagation of the incident wave,, and ;at the beginning of the scatterer. Equation (14) can be usedto provide a simple approximation of the scattered field dueto the incident impulse wave [21] and it is evident that thescattered field will be dependent on aspect angle sincechanges with aspect angle to the scatterer. An example ofthis calculation is shown in Fig. 6 for a flat plate of length

where the incident wave arrives at aspect angletothe surface of the plate. We observe that the backscatterwill consist of two finite valued peaks which are due tosurface discontinuities at the edges and tips of the plate.These closely spaced peaks have a maximum magnitudewhen the direction of propagation for the incident wave isnormal to the plate surface.

The aspect angle sensitivity of an imaged object can becomputed as follows. First, a SAR image is reconstructedusing all range profiles that are collected at variousaspect angles across the synthetic aperture angle interval,

. An object of interest is detected when a pixel in theimage is found to exceed a local contrast threshold, i.e.,a “glint” is detected. Next, the range image profilesare grouped into subsets of consecutivelyacquired (adjacent) range profiles, each subset comprising asingle subaperture. An image of an object of interest can bereconstructed using the range profiles of a single subaper-ture. When this is done for each subaperture, aspect anglesensitivity of the object’s backscatter can be determined byexamining the variation of the object’s backscatter acrossthe different subapertures.

The SAR system used in the study was fully polarimetric,in that the radar has the capability of transmitting twoorthogonal polarizations consecutively and receiving twoorthogonal polarizations simultaneously at each point alongthe aperture. Therefore, multi-aperture processing can beapplied to both co- and cross-polarized channels. Giventwo orthogonal polarization bases and , and

are defined to be the backscatter energy as a functionof subaperture position (aspect-angle) for the co-polarized

154 PROCEEDINGS OF THE IEEE, VOL. 85, NO. 1, JANUARY 1997

Page 9: Physics-Based Integration of Multiple Sensing Modalities ...cvrc.ece.utexas.edu/Publications/N. Nandhakumar... · radiation sensed by optical cameras, and by a synthetic aperture

(a) (b)

Fig. 8. Aspect angle dependence of backscatter from (a) broadside vehicle (truck) and clutter (tree)where “specular flash” is observed at aperture position seven and (b) truck whose specular flash isnot observed and tree which appears aspect angle dependent.

channels where first and second subscripts indicate transmitand receive polarizations, respectively. Similarly, multi-aperture processing of cross polarized channels results in

and —however, due to symmetry, crosspolarized returns are generally similar to one another. Anormalized correlation coefficient is defined

(15)

The coefficients and may be defined ina similar manner. The coefficient value is bounded by

, if and only if andare independent, and if and only if

, where is a constant. A feature vector can nowbe formulated whose elements are correlation coefficients

, , .Consider man-made vehicles and natural objects in an

imaged scene. Vehicles are generally metallic and havemore large, flat planar surfaces than objects occurring innature such as trees, rocks, etc. Hence, at certain aspectangles, the radar may observe a large increase in backscatterfrom vehicles due to specular reflections. In contrast, trees,tree limbs, and foliage will not exhibit this large increase inbackscatter, and will have more random fluctuations withaspect angle due to small radar cross section and randomdistribution in a resolution cell. The backscatter is indepen-dently random for both co- and cross-polarized channels.Therefore, the correlation coefficient defined in (15) willbe low for natural objects. The response for vehicles willbe similar for both co- and cross-polarized channels and,consequently, the correlation coefficient defined in (15) willbe larger for vehicles than for natural objects.

The UWB SAR images of a pickup truck using thefull aperture and two subapertures are shown in Fig. 7.The aspect-angle signatures for the broadside truck anda tree are shown in Fig. 8(a). An increase in backscatterenergy is observed near aperture position seven, wherethe specular flash from the vehicle is observed. The tree,

on the other hand, has a random but bounded backscatterenergy across the aperture, as expected. However, twofactors may degrade detection performance or increase falsealarm rates using this approach: 1) due to changing foliagedensity with aspect angle, clutter can often appear to beaspect angle dependent, and 2) the specular flash from thevehicle may not be observed along the synthetic aperture,making it difficult to discriminate between vehicles andclutter [Fig. 8(b)]. These difficulties are overcome by usingthe feature vector described above, which combines bothangular and polarimetric diversity information (Fig. 9).

D. Other Imaging Sensors

The physics-based approach has been used for the inte-gration of other types of sensors, in addition to the imagingmodalities discussed above. A brief overview of someexamples are given below.

1) Color Image Interpretation:Color may also be con-sidered to be multisensory information since irradiationin three different spectral bands are sensed. Initial workin physics based interpretation of color imagery was con-ducted by Klinkeret al. [22] who discuss the segmentationof objects using physical models of color image generation.Their model consists of a dichromatic reflection model thatis a linear combination of surface reflection (highlights) andreflection from the surface body. The combined spectraldistribution of matte and highlight points forms a skewedT-shaped cluster in red-green-blue space, where the mattepoints lie along one limb of the T and the highlight pointslie along the other limb. Principal component analysisof color distributions in small nonoverlapping windowsprovides initial hypotheses of the reflection type. Adjacentwindows are merged if the color clusters have similarorientations. These form “linear hypotheses.” Next, skewedT-shaped clusters are detected. This specifies the dichro-matic model used to locally resegment the color image viaa recursive region merging process. Thus a combination of

NANDHAKUMAR AND AGGARWAL: PHYSICS-BASED INTEGRATION OF MULTIPLE SENSING MODALITIES 155

Page 10: Physics-Based Integration of Multiple Sensing Modalities ...cvrc.ece.utexas.edu/Publications/N. Nandhakumar... · radiation sensed by optical cameras, and by a synthetic aperture

Fig. 9. Full polarimetric multi-aperture response of (a) pickup truck oriented broadsidewith specular flash at position seven, producing a feature vectorR = (0:93 0:57 0:25),(b) tree (R = (0:01 0:04�0:04)), (c) pickup truck whose specular flash is not ob-served (R = (0:75 0:71 0:92)), and (d) tree which appears aspect angle dependent(R = (0:36 0:23�0:32)).

bottom-up and top-down processing segments images intoregions corresponding to objects of different color.

Healey [23], [24] reports a color segmentation approachthat uses a reflection model that includes metallic as wellas dichromatic surfaces. The segmentation algorithm con-siders the color information at each pixel to comprise aGaussian random vector with three variables. Segmentationis achieved by a recursive subdivision of the image and bythe analysis of resulting region level statistics of the randomvector. This physics based approach has been extended forobject recognition using color information [25], [26].

2) Radar and Optical Imagery:The integration of visualimages and low-resolution microwave radar scattering crosssections to reconstruct the three-dimensional (3-D) shapesof objects for space robotic applications is discussed in[27]. Their objective is to “combine the interpreted outputof these sensors into a consistent world view that isin some way better than its component interpretations.”The visual image yields contours and a partial surfaceshape description for the viewed object. The radar systemprovides an estimate of the range and a set of polarizedradar scattering cross sections, which is a vector of four

components. An “intelligent decision module” uses theinformation derived from the visual image to find a standardgeometrical shape for the imaged object. If this is possible,then a closed form expression is used to predict the radarcross section. Otherwise, an electromagnetic model uses thesparse surface description to compute the radar cross sectionusing a finite approximation technique. The unknown shapecharacteristics of the surface are then solved for iterativelybased on minimizing the difference between the predictedand sensed radar cross section. This technique is illustratedby a simulation reported in [27].

III. I NTEGRATION OF IMAGE AND MODEL INFORMATION

Model-based object recognition is a commonly adoptedparadigm in computer vision and pattern recognition,wherein features extracted from a region in the imageare compared with those stored in a model—resulting inclassification of the image region. This paradigm is easilyextended to include a physics-based approach where thefeatures stored in object models are based on materialproperties. A further extension is possible in which themodel stores the values of physical properties (such

156 PROCEEDINGS OF THE IEEE, VOL. 85, NO. 1, JANUARY 1997

Page 11: Physics-Based Integration of Multiple Sensing Modalities ...cvrc.ece.utexas.edu/Publications/N. Nandhakumar... · radiation sensed by optical cameras, and by a synthetic aperture

as density, specific heat, conductivity) of the materialsthat comprise the object. For an image region underconsideration, a hypothesis is made of the identity of theobject, and hence of the materials present in the scene. Thevalues of these materials’ properties as stored in the model,along with image gray levels at the locations at which thesematerials are hypothesized to occur, are used to computea feature. The value of this feature is used to verify orrefute the hypothesis. The feature is based on a physicsbased approach similar to that described in Section II-A.An overview of such an approach used for recognition ofobjects in long wave infrared (LWIR) imagery is presentedbelow [28]–[30].

A. Features Using Model and Image Information

Applying the principle of conservation of energy at thesurface of an imaged object yielded the energy balanceequation (3) and (5) in Section II-A. While those equationsapplied for thin plate objects, a more general form thataccounts for nonuniform temperature distributions in thematerial is given by

(16)

where , , and are as defined earlier,is the energy used to raise the temperature of the localinfinitesimal volume at the surface of the object, andis the energy conducted into the interior of the object. Hencewe have and , where is thethermal conductivity, and is the depth from the surface.

The new energy balance equation may be rewritten inthe following linear form

(17)

Using the expressions for the various energy componentsas presented in the previous section we can express eachterm in the above expression as

(18)

Note that a calibrated LWIR image provides radiometrictemperature (assuming which is true for mostsurfaces). Hence and can be computed from the LWIRimage alone (and knowledge of the ambient temperature),while , , and are provided by the model when theidentity and pose of the object is hypothesized. The “drivingconditions,” or unknown scene parameters that can changefrom scene to scene are given by the .Thus each pixel in the thermal image equation (18) definesa point in 5-D thermophysical space.

Consider two different LWIR images of a scene obtainedunder different scene conditions and from different view-points. For a given object, points are selected such that1) the points are visible in both views and 2) each point

lies on a different component of the object which differsin material composition and/or surface orientation. Assume(for the nonce) that the object pose for each view, and pointcorrespondence between the two views are available (orhypothesized). A point in each view yields a measurementvector with components definedby (18) and a corresponding driving conditions vector

. Let a collection of of thesevectors compose a matrix,for the first scene/image. These same points in the secondscene will define vectors that compose a matrix,

. The driving condition matrix,, from the first scene and

from the second scene, are each of size.

Since the points are selected to be on differentmaterial types and/or different surface orientations, thethermophysical diversity causes the vectorsto span , as will also the vectors, . Withoutloss of generality, assume that vectors spanand also that span . These five points specifythe measurement matrices,and , corresponding to the first andsecond scene, respectively. The point selection process hereis analogous to the selection of characteristic 3-D pointsin the construction of geometric invariants. Sinceand

are of full rank, there exists a linear transformationsuch that . Hence, an induced nonsingular lineartransformation can be shown to exist betweenand .

Consider the measurement vectorof a point as definedin (18). From one scene to the next we expect the two objectproperties—thermal capacitance and conductance—toremain constant. Hence the linear transformation

must be of the form

(19)

The transformation of a measurement vector from one sceneto the next is given by

(20)

The first two elements, the thermal capacitance and thethermal conductance, are held constant and the other scenedependent elements are allowed to change. In general, wehave , where and are the matricesderived from the two scenes and for the chosen points.

Algebraic elimination of the transformation parametersusing four copies of the linear form (17) subjected to

NANDHAKUMAR AND AGGARWAL: PHYSICS-BASED INTEGRATION OF MULTIPLE SENSING MODALITIES 157

Page 12: Physics-Based Integration of Multiple Sensing Modalities ...cvrc.ece.utexas.edu/Publications/N. Nandhakumar... · radiation sensed by optical cameras, and by a synthetic aperture

Fig. 10. Three of the five vehicles used to test the object recognition approach. From top left(clockwise): tank, truck 1, and truck 2. The axis superimposed on the image show the objectcentered reference frames. The numbered points indicate the object surfaces used to form themeasurement matrices. These points are selected such that there are a variety of different materialsand/or surface normals within the set.

the transformation (19) provides us with invariants, i.e.,functions of the measurement vectors for a scene, whichremain constant and are independent of the driving con-ditions and the transformation. This elimination may beperformed by using recently reported symbolic techniques[31]. The five invariant functions derived by this elimina-tion process can be divided into two types. Each is a ratioof determinants. The first type of invariant function usesdeterminants formed from components of three of the fourvectors.

(21)

where is a th component of the th vector (thpoint).

The second type is formed from components of all fourvectors.

(22)

where is a th component of the th vector (thpoint). Since the four measurement vectors span, wecan assume without loss of generality that the denominatordeterminants in (21) and (22) are nonzero. The first typehas , independent functions given four points andsecond type has one.

In order for the invariant feature to be useful for objectrecognition not only must the values of the feature, beinvariant to scene conditions but the value must be differentif the measurement vector is obtained from a scene that doesnot contain the hypothesized object, and/or if the hypothe-

158 PROCEEDINGS OF THE IEEE, VOL. 85, NO. 1, JANUARY 1997

Page 13: Physics-Based Integration of Multiple Sensing Modalities ...cvrc.ece.utexas.edu/Publications/N. Nandhakumar... · radiation sensed by optical cameras, and by a synthetic aperture

Table 3 Values of the I1–Type Feature Used to Identify the Vehicle Class, Truck 1. The FeatureConsisted of Point Set {4,7,8,10} Corresponding to the Points Labeled in the Figure. The FeatureValue is Formed Using the Thermophysical Model of Truck 1 and the Data from the RespectiveOther Vehicles. When this Feature is Applied to the Correctly Hypothesized Data of the Tank, Ithas a Mean Value of�0.57 and a Standard Deviation of 0.13. This I1–Type Feature Produces aGood Stability Measure of 4.5, and Good Separability Between Correct and Incorrect Hypotheses.The Feature Values for Incorrect Hypotheses are at Least 3.32 StandardDeviations Away from the Mean Value for the Correct Hypothesis

Hypothesis:Data From:

Truck 1Truck 1

Truck 1Tank

Truck 1Van

Truck 1Car

Truck 1Truck 2

11 am �0.70 27.28 0.33 �1 0.6812 pm �0.71 0.09 4.83 15.58 4.151 pm �0.45 0.68 0.00 11.73 4.6e122 pm �0.66 �1.00 �1 71.23 �1.003 pm �0.40 �1.00 �1 �1.00 �1.004 pm �0.54 �1 1 5.42 22.299 am �0.68 1.38 �1.00 �6.66e14 �7.03

10 am �0.45 �1.00 �1 6.50 1

sized pose is incorrect. Since the formulation above takesinto account only feature invariance but not separability,a search for the best set of points that both identifies theobject and separates the classes must be conducted over agiven set of points identified on the object.

B. Scheme for Object Recognition

The hypothesize-and-verify scheme for object recogni-tion consists of the following steps: 1) extract geometricfeatures, e.g., lines and conics, 2) for image region,,hypothesize object class,, and pose using, for example,geometric invariants as proposed by Forsythet al. [32], 3)use the model of object and project visible points labeled

onto image region using scaled orthographicprojection, 4) for point labeledin the image region, assignthermophysical properties of point labeledin the model ofobject , 5) using gray levels at each point and the assignedthermophysical properties, compute the measurement matri-ces and , and hence compute the feature using(14) or (15), and finally, 6) compare feature withmodel prototype to verify the hypothesis.

The above technique was applied to real LWIR imageryacquired at different times of the day. Several types ofvehicles were imaged. Three of these vehicles are shownin Fig. 10. Several points were selected (as indicated inthe figures) on the surfaces of different materials and/ororientation. Given an image of a vehicle, 1) the pose ofthe vehicle is assumed known, then 2) the front and rearwheels are used to establish an object centered referenceframe. The center of the rear wheel is used as the origin, andcenter of the front wheel is used to specify the direction andscaling of the axes. The coordinates of the selected pointsare expressed in terms of this 2-D object centered frame.For example, when a truck-1 vehicle is hypothesized foran image actually obtained of a tank or some unknownvehicle, the material properties of the truck-1 are used,but image measurements are obtained from the image ofthe tank at locations given by transforming the coordinatesof the truck-1 points (in the truck-1 centered coordinateframe) to the image frame computed for the unknownvehicle. The features are computed based on the image

data and model information. Table 3 shows interclass andintraclass variation for a feature of type under the truck-1hypothesis—for images obtained from different vehicles ateight different times over two days. The feature of typeexhibits similar behavior—high between-class separationand low within-class variation.

IV. SYNTHESIS OF MULTISENSORY IMAGERY

Model derived features and imagery are useful in the de-velopment of recognition systems that use trainable classi-fiers, or other model-driven approaches. In order to achievelow error rates it is necessary to train the classifier onmultiple images of the same object in a variety of sceneconditions. This requires the existence of a large trainingdatabase. Creating this database using real scene imagery isexpensive and sometimes impossible. It is also difficult tomaintain a large database for training. Furthermore, if theclassifier is to be trained on multimodal data the size of thedata set is even greater. An attractive solution to the storageand data acquisition problems is to create accurate artificialimages of the objects. Not only are storage space anddatabase maintenance requirements lessened, but computergeneration of object imagery provides greater flexibility inspecifying environmental and object conditions. A unifiedmodel of different modes of sensing, such as visual, ther-mal, and ladar, augments the ability to quickly generatea large multisensory data set for training, without theexpense of field work. These reasons have motivated thedevelopment of synthetic image generation and featureprediction systems.

One of the principal issues facing researchers in thearea of recognition systems is the establishment of modelsthat can simulate the image-generating physical processespeculiar to each of the different imaging modalities. Further,it is attractive to use a single modeling scheme that can beused for the different modalities. Early work in this area byOhet al. [33] addressed the generation of thermal and visualimagery using 3-D object models stored in the form ofoctrees [34]. The simulation of energy exchange and energyflows within the object gave rise to predictions of surfacetemperatures and hence the thermal image, while the 3-D

NANDHAKUMAR AND AGGARWAL: PHYSICS-BASED INTEGRATION OF MULTIPLE SENSING MODALITIES 159

Page 14: Physics-Based Integration of Multiple Sensing Modalities ...cvrc.ece.utexas.edu/Publications/N. Nandhakumar... · radiation sensed by optical cameras, and by a synthetic aperture

(a)

(b)

Fig. 11. The quadtree representation of a binary image, and theoctree representation of a 3-D object.

surface was used to create the visual image. Further work byKarthik et al. [35], increased the accuracy of the model byincorporating nonhomogeneities in the object model. Thisapproach was further augmented to allow the simulationof laser radar (range, reflectance, and doppler) imagery inaddition to the visual and infrared modalities [36], [37]. Anoverview of this approach is presented below.

When the goal of the simulation is to generate predictionof only feature values used for recognition—as opposedto imagery—then, a more efficient scheme may be used.Rather than model a fully tessellated 3-D object, onecan represent (at a coarse level of resolution) only themajor components of an object, each consisting of uniformmaterial properties. Thus very few nodes are used inthe simulation of energy flows and exchange. Accurateprediction of multisensory physics-based features have beenproduced by this approach [38].

A. 3-D Object Model Construction

The 3-D object model is represented in the form of anoctree. The octree of a 3-D object is obtained by subdivid-ing the universe into eight equal octants successively untileach octant corresponds either to an empty space or to avolume inside the object. The octree is an extension of aquadtree which may be used to represent 2-D objects. Thequadtree of a binary image is obtained by subdividing theimage into four quadrants successively until each quadrantis either entirely black or white. Each quadrant is then anode in the tree. Each node has either four children or is aleaf node. The nodes can be white, black or gray. A graynode is not a leaf node and has four child nodes that canbe gray, white, or black nodes. The quadtree and octreerepresentations are illustrated in Fig. 11.

Multiple silhouettes are used as the input to the octreeformation. The contour of each silhouette is smoothed usinga tension spline and the contour normal at each contourpixel is computed. For each view, a region contour (RC)quadtree is constructed, wherein a node can be either black(object), white (nonobject), contour (contour pixel), or gray(parent of others). The contour normal is stored with eachcontour node. Using the RC quadtree from different views,

(a) (b)

(c) (d)

Fig. 12. Visual Image of object reconstructed from three silhou-ettes showing false surfaces resulting from the “basic” volumeintersection approach.

Fig. 13. Visual image of object reconstructed using the improvedvolume intersection technique based on silhouette partitioning.

a volume intersection is performed and a volume surface(VS)-octree is constructed, wherein the nodes are black(internal), white (empty), surface, or gray (parents of theothers). Each surface node is encoded with the surfacenormal. The above approach suffers from the disadvantagethat concave boundaries in the silhouettes result in falsevolumes in the 3-D reconstruction. However, these errorsmay be avoided by using line-drawing information to par-tition the silhouettes into different regions prior to volumeintersection [37].

B. Generating Different Imagery

1) Visual Image Generation:It is easy to generate thevisual image of an object from its octree representation. Allthe surface nodes are scanned and checked for visibility.Only the visible faces of the leaf nodes (voxels) areprojected to form the visual image. Lambertian or otherreflectance functions may be assumed. Either orthographic,weak perspective, or perspective projection can be em-ployed. Any arbitrary viewpoint as well as the direction ofillumination can be specified. Figs. 12 and 13 are examplesof results generated by this process.

160 PROCEEDINGS OF THE IEEE, VOL. 85, NO. 1, JANUARY 1997

Page 15: Physics-Based Integration of Multiple Sensing Modalities ...cvrc.ece.utexas.edu/Publications/N. Nandhakumar... · radiation sensed by optical cameras, and by a synthetic aperture

2) Thermal Image Generation:Each voxel (leaf node) inthe octree is encoded with thermophysical material prop-erties and links are attached to spatially adjacent nodes.Applying the conservation of energy to each node we havethe energy balance law

(23)

where is the mass density, is the specific heat, is thevolume of the node, denotes the temperature of the node,subscript denotes the node number (spatial location),superscript represents the time instant, denotesthe time interval, is the magnitude of solar irradiation,

is the angle between the direction of solar irradiationand the surface normal, is the absorptivity of the node,

is the surface area of the node (relevant face),is thevolumetric heat generation, is the number of adjacentnodes, is the thermal conductivity, is the area ofthe interface between the nodeand each of the adjacentnodes, is the distance between the centroid of nodeand the centroid of theth adjacent node, is the convectioncoefficient, is the ambient temperature, and is theradiation coefficient. The above process is simulated forthe desired environmental and object conditions to predictsurface temperatures.

The gray level, , produced by an infrared cameraoperating in the 8 m–12 m band and at distances of upto a few hundred meters from commonly occurring objectsis related to the surface temperature,, by

(24)

where m, m, and are constantswith and

. and are constants based on the cameraand digitizing parameters. Thus the surface temperaturescan be projected using the same technique as for visualimagery and then transformed by the above expression toproduce the thermal image.

3) Ladar Range Image:An AM ladar range image iscreated by sensing the difference between the phase ofthe modulation envelope of the transmitted signal andthat of the received signal. Since the phase difference isperiodic with a period of , the range measurement is onlyavailable as a fraction of the wavelength. The ambiguityinterval for AM ladar can be written as

(25)

where is the ambiguity interval, is the speed of light,and is the modulation frequency.

A range measurement is subject to noise. Let the actualrange be given by

(26)

Fig. 14. Simulated ladar images of a T-72M1 tank model syn-thesized from silhouettes. The images contain real backgroundswith the simulated object placed in the scene. The top image is theabsolute range image. The middle image is the fine range imagewith a ambiguity interval of 19.8 m. The bottom image is thereflectance image. Constructive and destructive interference noisehas been simulated on the object and its effect on the reflectanceimage is shown. Destructive interference prevents extraction ofrange information. This effect on the fine range image is shown.

where is the number of ambiguity intervals,is the fraction of the ambiguity interval measured by thesensor, and is the wavelength of the modulating envelope.The range including effects of noise is

(27)

is the output of the laser radar imagingsystem. is the noise [39]

(28)

where is the Rayleigh distributed noise amplitude,is the Rayleigh distributed signal amplitude, SNR

is the detector signal to noise ratio, and is theuniformly distributed phase error. Typically, a modulatingfrequency wavelength of 15–20 m is used for relative range,depending on the object to be sensed. In addition to noisein the range values, speckle noise due to rough surfaces is

NANDHAKUMAR AND AGGARWAL: PHYSICS-BASED INTEGRATION OF MULTIPLE SENSING MODALITIES 161

Page 16: Physics-Based Integration of Multiple Sensing Modalities ...cvrc.ece.utexas.edu/Publications/N. Nandhakumar... · radiation sensed by optical cameras, and by a synthetic aperture

common. Appropriate statistical models of this process canbe used to simulate speckle noise in the reflectance image.Fig. 14 shows an example of ladar image synthesis.

V. CONCLUSION

In this paper we described a new class of techniquesfor integrating information from different sensing modali-ties—a physics-based approach that used appropriate phys-ical models of the image generation process. These modelsusually rely on the principle of conservation of energy,which when applied to the imaged scene provides analyticalconstraints between material properties and the imaged graylevels. This approach makes available physically meaning-ful object features that are highly specific—they providegood separation between different classes of objects.

The advantages of multisensory approaches to computervision are evident from the discussions in the previoussections. The integration of multiple sensors and/or multiplesensing modalities is an effective method of minimizingthe ambiguities inherent in interpreting perceived scenes.Although the multisensory approach is useful for a varietyof tasks including pose determination, surface reconstruc-tion, object recognition, and motion computation, amongothers—the current paper addressed the problem of objectrecognition. Several problems that were previously difficultor even impossible to solve because of the ill-posed natureof the formulations are converted to well-posed problemswith the adoption of a multisensory approach.

Recent and continuing developments in multisensoryvision research may be attributable to several factors,including: 1) new sensor technology that makes affordablepreviously unexplored sensing modalities, 2) new scientificcontributions in computational approaches to sensor fusion,and 3) new insights into the electro-physiological mecha-nisms of multisensory perception in biological perceptualsystems. Most of the progress to date may be attributed tothe second cause listed above. The development of new,affordable sensors is currently an important and active areaof research and may be expected to have a significant futureimpact on the capabilities of vision systems. For example,the availability of low cost imaging laser ranging sensors,passive infrared sensors, and high frequency radar imagerswould provide significant impetus to research in developingmultisensor-based autonomous navigation, object recogni-tion, and surface reconstruction techniques. Many lessonsfrom nature are yet to be learned neuro-physiological andpsycho-physiological studies of natural perceptual systems.Such studies may provide useful clues for deciding whatcombination of sensing modalities are useful for a specifictask, and may also provide new computational models forintersensory perception.

REFERENCES

[1] N. Nandhakumar and J. K. Aggarwal, “Multisensory computervision,” Advances in Computers, vol. 34, M. C. Yovits, Ed.New York: Academic, 1992, pp. 59–111.

[2] D. Hilbert, Theory of Algebraic Invariants. Cambridge, U.K.:Cambridge Univ. Press, 1993.

[3] N. Nandhakumar and J. K. Aggarwal, “Integrated analysis ofthermal and visual images for scene interpretation,”IEEE Trans.Patt. Anal. Mach. Intell., vol. 10, pp. 469–481, July 1988.

[4] , “Multisensor integration—experiments in integratingthermal and visual sensors,” inProc. IEEE Computer Soc. 1stInt. Conf. on Computer Vision, London, U.K., June 1987, pp.83–92.

[5] , “Thermal and visual information fusion for outdoor sceneperception,” inProc. IEEE Int. Conf. on Robot. Automation,Philadelphia, PA, Apr. 1988, pp. 1306–1308.

[6] N. Nandhakumar, “Robust physics-based sensor fusion,”J. Opt.Soc. Amer., JOSA-A, special issue on Physics-Based ComputerVision, vol. 11, no. 11, pp. 1–9, Nov. 1994.

[7] P. H. Hartline, L. Kass, and M. S. Loop, “Merging of modalitiesin the optic tectum: Infrared and visual information integrationin rattlesnakes,”Sci., vol. 199, pp. 1225–1229, 1978.

[8] E. A. Newman and P. H. Hartline, “The infrared ‘vision’ ofsnakes,”Scientif. Amer., vol. 246, no. 3, pp. 116–127, Mar.1982.

[9] P. H. Hartline, “The optic tectum of reptiles: Neurophysiolog-ical studies,”Comparative Neurology of the Optic Tectum, H.Vanegas, Ed. New York: Plenum, 1984, pp. 601–618.

[10] F. P. Incropera and D. P. DeWitt,Fundamentals of HeatTransfer. New York: Wiley, 1981.

[11] N. Nandhakumar and S. Malik, “Multisensor integration for un-derwater scene classification,”Appl. Intell., vol. 5, pp. 207–216,1995.

[12] D. R. Jackson, D. P. Winebrenner, and A. Ishimaru, “Appli-cation of the composite roughness model to high-frequencybottom backscattering,”J. Acoust. Soc. Amer., vol. 79, no. 5,pp. 1410–1422, May 1986.

[13] D. M. Owen, “A multi-shot stereoscopic camera for close-upocean-bottom photography,” inDeep-Sea Photography, J. B.Hersey, Ed. Baltimore: Johns Hopkins, 1967, pp. 95–105.

[14] C. J. Shipek, “Deep-sea photography in support of underwateracoustic research,” inDeep-Sea Photography, J. B. Hersey, Ed.Baltimore: Johns Hopkins, 1967, pp. 89–94.

[15] S. Stanicet al., “High-frequency acoustic backscattering from acoarse shell ocean bottom,”J. Acoust. Soc. Amer., vol. 85, no.1, pp. 125–136, Jan. 1989.

[16] L. B. Wolff, “Polarization-based material classification fromspecular reflection,”IEEE Trans. Patt. Anal. Mach. Intell., vol.10, pp. 1059–1071, Nov. 1990.

[17] , “Scene understanding from propogation and consistencyof polarization-based constraints,” inProc. IEEE Computer Soc.Computer Vision and Patt. Recognition Conf., Seattle, WA, June21–23, 1994, pp. 1000–1005.

[18] , “Reflectance modeling for object recognition and detec-tion in outdoor scenes,” inProc. ARPA Image UnderstandingWorkshop, Palm Springs, CA, Feb. 12–15, 1996, pp. 799–803.

[19] R. Kapoor and N. Nandhakumar, “A physics-based approachfor detecting man-made objects in UWB-SAR imagery,”IEEEComputer Soc. Workshop on Physics Based Models for Com-puter Vision, Cambridge, MA, June 18–20, 1995, pp. 33–39.

[20] , “Features for detecting obscured objects in ultra-wideband (UWB) SAR imagery using a phenomenologicalapproach,” to be published inPatt. Recognition.

[21] E. M. Kennaugh and D. L. Moffatt, “Transient and impulseresponse approximations,” inProc. IEEE, vol. PROC-53, pp.893–901, 1965.

[22] G. J. Klinker, S. A. Shafer, and T. Kanade, “Image segmentationand reflection analysis through color,” inProc. DARPAImage Understand. Workshop, Cambridge, MA, 1988, pp.838–853.

[23] G. Healey, “Using color to segment images of 3-D scenes,”in Proc. SPIE Conf. Applicat. AI, Orlando, FL, Apr. 1991, vol.1468, pp. 814–825.

[24] G. Healey, S. Shafer, and L. Wolff, Eds.,Physics Based Vision:Principles and Practice, COLOR. Boston: Jones and Bartlett,1992.

[25] G. Healey and D. Slater, “Using illumination invariant colorhistogram descriptors for recognition,” inProc. IEEE ComputerSociety Conf. on Computer Vision and Patt. Recognition, Seattle,WA, June 21–23, 1994, pp. 355–360.

[26] M. Swain and D. Ballard, “Color indexing,”Int. J. ComputerVision, vol. 7, pp. 11–32, 1991.

[27] S. W. Shaw, R. J. P. deFigueiredo, and K. Kumar, “Fusionof radar and optical sensors for space robotic vision,” in

162 PROCEEDINGS OF THE IEEE, VOL. 85, NO. 1, JANUARY 1997

Page 17: Physics-Based Integration of Multiple Sensing Modalities ...cvrc.ece.utexas.edu/Publications/N. Nandhakumar... · radiation sensed by optical cameras, and by a synthetic aperture

Proc. IEEE Robot. Automat. Conf., Philadelphia, PA, 1988, pp.1842–1846.

[28] N. Nandhakumar, V. Velten, and J. Michel, “Thermophysicalaffine invariants from IR imagery for object recognition,”IEEE Computer Soc. Workshop on Physics Based Models forComputer Vision, Cambridge, MA, June 18–20, 1995, pp.48–54.

[29] J. D. Michel, T. Saxena, N. Nandhakumar, and D. Kapur, “Us-ing elimination methods to compute thermophysical algberaicinvariants from infrared imagery,” inProc. AAAI-96, SeattleWA, 1996.

[30] N. Nandhakumaret al., “Robust thermophysics-based interpre-tation of radiometrically uncalibrated IR images for ATR andsite change detection,”IEEE Trans. Image Process., Dec. 1996.

[31] D. Kapur, Y. N. Lakshman, and T. Saxena, “Computing in-variants using elimination methods,” inProc. IEEE Int. Symp.on Computer Vision, Coral Gables, FL, Nov. 21–23, 1995, pp.97–102.

[32] D. Forsythet al., “Invariant descriptors for 3D object recogni-tion and pose,”IEEE Trans. Patt. Anal. Mach. Intell., vol. 13,Oct. 1991.

[33] C. Oh, N. Nandhakumar, and J. K. Aggarwal, “Integratedmodeling of thermal and visual image generation,”IEEE Conf.on Computer Vision and Patt. Recognition, San Diego, June4–8, 1989.

[34] C. H. Chien and J. K. Aggarwal, “Volume/surface octrees forthe representation of 3D objects,”Computer Vision, Graphicsand Image Process., vol. 36, pp. 100–113, 1986.

[35] S. Karthik, N. Nandhakumar, and J. K. Aggarwal, “Non-homogeneous 3-D objects for thermal and visual image syn-thesis,”SPIE Conf. on Applicat. AI, Orlando, FL, 1991.

[36] J. D. Michel and N. Nandhakumar, “Unified octree-based objectmodels for multisensor fusion,”2nd IEEE Workshop on CADBased Vision, Champion, PA, Feb. 8–11, 1994, pp. 181–168.

[37] , “Unified 3D models for multisensor image synthesis,”Computer Vision, Graphics, and Image Process., vol. 57, no. 4,July 1995.

[38] G. Hoekstra and N. Nandhakumar, “Quasiinvariant behav-ior of thermophysical features for interpretation of multisen-sory imagery,” Opt. Engin., vol. 35, no. 3, pp. 1–14, Mar.1996.

[39] J. Leonard and E. G. Zelnio, “Intrinsic separability of laser radarimagery,” in Proc. 2nd Automatic Target Recognizer Syst. andTechnol. Conf., Mar. 17, 1992.

N. Nandhakumar (Senior Member, IEEE) re-ceived the B.E. (honors) degree in electronicsand communication engineering from the P.S.G.College of Technology, University of Madras,India, in 1981, the M.S.E. degree in computer,information and control engineering from theUniversity of Michigan, Ann Arbor, in 1983,and the Ph.D. degree in electrical engineeringfrom the University of Texas at Austin in 1987.

He is currently a Senior Member of the Tech-nical Staff at Electroglas, Inc., Santa Clara, CA,

where he is leading the development of new machine vision technology forsemiconductor wafer inspection. He has also served as Assistant Professorof Electrical Engineering at the University of Virginia, and Director of theMachine Vision Laboratory from 1989 to 1996. The results of his researchhave appeared in more than 70 papers in journals, conference proceedings,and book chapters. He is Associate Editor ofPattern Recognition, and hasbeen the Guest Editor of several special issues on sensor fusion.

Dr. Nandhakumar was awarded the 1993 Young Faculty TeachingAward by University of Virginia’s Electrical Engineering Department.He has been Chairman of the 1994, 1993, and 1992 SPIE Conferenceson Sensor Fusion and Aerospace Applications, Co-Chairman of the 1991SPIE Conference on Applications of Artificial Intelligence, member ofthe international advisory committee of the 2nd Asian Conference onComputer Vision, and member of the program committees of the 1993 and1997 IEEE CVPR Conferences, and the 1st and 2nd IEEE InternationalConferences on Image Processing. He is a member of the SPIE.

J. K. Aggarwal (Fellow, IEEE) is the CullenProfessor of Electrical and Computer Engineer-ing and Director of the Computer and VisionResearch Center at The University of Texas atAustin, where he has served on the faculty since1964. His research interests include computervision, parallel processing of images, and pat-tern recognition. He is the author or editor ofseven books and 31 book chapters and authorof over 160 journal papers, as well as numerousproceedings papers and technical reports.

Dr. Aggarwal was the recipient of the 1996 Technical AchievementAward of the IEEE Computer Society.

NANDHAKUMAR AND AGGARWAL: PHYSICS-BASED INTEGRATION OF MULTIPLE SENSING MODALITIES 163