Richards 2005

download Richards 2005

of 11

description

remote sensing

Transcript of Richards 2005

  • 422 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 43, NO. 3, MARCH 2005

    Analysis of Remotely Sensed Data: TheFormative Decades and the Future

    John A. Richards, Fellow, IEEE

    AbstractDevelopments in the field of image understandingin remote sensing over the past four decades are reviewed, withan emphasis, initially, on the contributions of David Landgrebeand his colleagues at the Laboratory for Applications of RemoteSensing, Purdue University. The differences in approach requiredfor multispectral, hyperspectral and radar image data are em-phasised, culminating with a commentary on methods commonlyadopted for multisource image analysis. The treatment concludesby examining the requirements of an operational multisourcethematic mapping process, in which it is suggested that the mostpractical approach is to analyze each data type separately, bytechniques optimized to that datas characteristics, and then tofuse at the label level.

    Index TermsFusion, multisensor, multisource, thematicmapping.

    I. INTRODUCTION

    DAVID Landgrebe and his coworkers from the Laboratoryfor Applications of Remote Sensing (LARS) at PurdueUniversity were seminal in devising many of the proceduresthat are now commonplace in thematic mapping from remotelysensed image data. Because of early data limitations, many ofthe analytical procedures developed had to appeal to simple datamodels and techniques taken directly from signal processing(such as maximum-likelihood estimation). Nevertheless, manyof those methods have endured and set the benchmarks againstwhich newer developments are often assessed.

    In many ways, the field had its genesis in the signal processingmethods of the 1950s and 1960s and their extension to handlingimage data. It received special impetus with interest in lunar andplanetary mapping missions, which drove not only the develop-ment of imaging technologies, but also many of the image pro-cessing and enhancement procedures that sit alongside methodsfor thematic mapping. A good account of this can be seen in thefirst edition of Castlemans book [1].

    We are now at the stage where we have available a rangeof successful and widely used analytical procedures, some ofwhich are tailored to particular data types. We also have an abun-dance of data, quite contrary to the situation at the start of thespaceborne era, so that the analyst is now challenged to choosefrom among a number of coincident datasets when undertakingthematic mapping in an operational setting.

    But perhaps the greatest contemporary challenge is to derivepractical labeling methodologies for effective thematic mapping

    Manuscript received February 15, 2004; revised July 26, 2004.The author is with the Research School of Information Sciences and Engi-

    neering, The Australian National University, Canberra ACT 0200, Australia(e-mail: [email protected]).

    Digital Object Identifier 10.1109/TGRS.2004.837326

    from mixed data types to allow the inherent information con-tents of complementary datasets to be brought together. Indeed,much of the current research agenda is focused on finding viablefusion methods. It is difficult, though, to envisage operationalimage understanding with a continuation of methods that seek tofuse data. At minimum, the fact that each given data type is usu-ally best handled by its own (matched) analytical methods be-lies data fusion as an operational method for multisource imageanalysis. Instead, individual data types may be best analyzedseparately, with combination occurring at the label level throughsome form of symbolic processing.

    After reviewing the significant developments in thematicmapping, concentrating particularly on the work of Landgrebeand colleagues, the problem of operational multisource classifi-cation is considered from the perspective of label-level fusion.

    II. PROBLEM DOMAIN

    The essential problem in thematic mapping is to specify thedata or information that needs to be gathered about a pixelto allow a label to be attached to the pixel consistent with anapplication of interest. In a modern operational setting, theapplication requirements would be specified by a client, anda consultant would have the task of choosing the datasets andanalytical methods for generating an acceptable thematic map.That is the ideal. In reality, and certainly in the early yearsof remote sensing, the problem statement is more like: giventhe data or information actually available, how effectively cana pixel be labeled into a class of interest?

    The process is a mapping from available data (and sometimesother information) to a label. We express this mapping as

    where

    in which is the data description of the pixel, and is theassociated class of interest. As indicated, generally consistsof a (column) pixel vector of the measurements.

    We need to find procedures that allow us to extract mean-ingful information ( ) from the data ( ) and then move to anunderstanding of the scene being imaged. In other words, weneed to move along the chain

    data information understanding

    All relevant, coregistered spatial measurements can contributeto the pixel data description and thus aid in scene understanding.A more complete data description of a pixel would then be

    0196-2892/$20.00 2005 IEEE

  • RICHARDS: ANALYSIS OF REMOTELY SENSED DATA 423

    Fig. 1. Deriving the most appropriate label for a pixel based on spatial,contextual, and prior sources of data or information.

    where the represent the different data types. Recognition ofthis more complete pixel descriptor was the forerunner of datafusion methods, and paralleled also the evolution of geographicinformation systems (GIS).

    It is not just pixel-specific measurements that tell us some-thing useful about what the pixel represents; its context in re-lation to other pixels is also significant. That context could beassociations among near neighboring pixels (spatial context) orcould be represented by the texture of the region in which thepixel resides. So, the data description more generally could beexpressed

    where the is a representation of context.Further, we may have some preexisting information available

    about the pixelfrom the knowledge of some expert or from theapplication domain, so that the pixel description most generallyis now a data and information description of the form

    (1)where represents sources of available ancillary information.

    Notwithstanding the vast range of procedures, over the pastthree to four decades research has focused on devising map-ping techniques for labeling pixels based on differing degreesof complexity in the data/information description of (1), and asdepicted in Fig. 1.

    III. DRIVERS

    Early algorithm development was based on multispectralvisible and infrared (optical) data with about 12 to 18 chan-nels.1 With so few samples of reflectance spectra it was notpossible to understand the corresponding ground cover typefor a pixel by relying on scientific (spectroscopic) knowledge.Nevertheless, the available samples were sufficient to differen-tiate fundamental cover types and to study vegetation type andcondition. Building on earlier work with learning machines[2] and statistical pattern recognition [3], the logical analyticalmethod to separate apparent classes in the recorded data wasto use discriminant analysis of one form or another.

    1Even though there had been aircraft radar missions since the early 1960s,the single data channel available (even with the spaceborne missions of thelate 1970s and early 1980s) meant that automated interpretation techniqueswere not relevant (as against image processing and enhancement tools tomake the data more interpretable by a human analyst).

    What has happened since? Three broad trends in sensor andinformation system developments have driven the evolution ofmachine assisted interpretation algorithms. They are:

    improvement in spectral measurements in the optical do-main, both in numbers of channels and spectral resolution;

    availability of multidimensional (wavelength, polariza-tion, and incidence angle) radar data;

    ready availability of geolocated spatial data types in ageographical information systems.

    The many and varied methods developed by different researchgroups have been in response largely to one of these three evo-lutions. We now highlight those developments concentrating,where appropriate, on the major contributions that grew outof the Laboratory for Applications of Remote Sensing (LARS)under the leadership of David Landgrebe.2

    IV. MULTISPECTRAL AND HYPERSPECTRAL DATA METHODS

    A. FundamentalsAlthough not readily available,3 a LARS technical report

    in 1966 [5] by Landgrebe first explored the use of patternrecognition procedures for analyzing remotely sensed data.Several related reports followed in 1966 and 1967, leadingto the first general publication of the signal-processing-basedpattern recognition approach to pixel labeling in 1969 [6].Gaussian maximum-likelihood classification was advanced asthe most reasonable way to undertake pixel labeling with thesmall numbers of spectral measurements per pixel then available,thus introducing the analytical process that has become thestandard thematic mapping procedure in remote sensing eversince. Essentially, it seeks to find the class for a pixel thatmaximizes the posterior probability . By using Bayesrule, that is equivalent to maximizing

    (2)

    in which is the prior probability of class membership ofthe pixel.

    Many of the other processes now regarded as standard com-ponents in pixel labeling methodologies were also devised atLARS and applied to aircraft multispectral data over the fouryears 19681971. These include clustering for unsupervisedclassification [7], the use feature selection [8], [9] and the useof the minimum-distance algorithm for supervised classifica-tion [10]. Commonly used separability measures for featurereduction were also investigated and employed around thatsame period [11], [12].

    Thus, effectively before the launch of Landsat-1, LARS haddeveloped the suite of techniques regularly applied to multispec-tral data from the spaceborne era. Based on this experience, the

    2Landgrebe was not the inaugural Director of LARS. The Laboratorycommenced in 1966 under Ralph Shay, then Head of Purdue UniversitysDepartment of Botany and Plant Pathology who in 1961 had chaired a NationalAcademy of Science committee on Aerial survey methods in agriculture[4].Landgrebe took over the Directorship in 1969 and continued in that role until1981

    3Many of the LARS technical reports will be found in downloadableform at http://www.lars.purdue.edu/home/References.html.

  • 424 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 43, NO. 3, MARCH 2005

    first book dealing with quantitative techniques for analyzing re-motely sensed data was produced from LARS in 1978 [13].

    A 1975 report by Fleming et al. [14] demonstrated how clus-tering can be used to resolve data multimodality in preparationfor the use of Gaussian maximum-likelihood supervised clas-sification.4 Flemings hybrid methodology has remained oneof the analytical mainstays for thematic mapping from opticalimage data. More recently, it has been seen that the method canbe generalized, rendering it amenable to hyperspectral datasets[15].

    During the same period, there were a number of key devel-opments occurring outside LARS that have had a continuing in-fluence of remote sensing image analysis. Haralick et al. [16]in 1973 provided measures of image texture that are still in usetoday [17]. In 1976, Hord and Brooner [18] looked at measuresof thematic map accuracy and thus commenced a significant pe-riod of continuing research concerned with sampling methodsfor accuracy assessment [19], [20].

    So, by the end of the first decade after the formation of LARS,all the foundations for digital image analysis in remote sensinghad been established. It was the principal decade upon whichlater work was to elaborate and expand.

    In the latter part of the 1970s, there were three early, impor-tant developments to do with image transformation. In 1976, Ep-pler used the class-sensitive transform associated with canonicalanalysis to effect feature reduction [21], while Jensen and Waltz[22], in a celebrated short paper in 1979, applied the principalcomponents transformation for feature reduction and displaypurposes. Soha and Schwartz [23] showed how the transforma-tion could lead to a decorrelated image display procedure. Kauthand Thomas in 1976 [24] demonstrated that application-specifictransformations could be devised, and proposed their tasseledcap model. In 1979, it was shown that the principal componentstransformation could be used to monitor changes in land coverbetween images of the same region taken on different dates [25].

    B. Classification in ContextThe next significant era in remote sensing thematic mapping

    commenced with the realization that classification should becarried out sensitive to the spatial neighborhood of a pixel andby incorporating information from other available spatial datasources.

    The significance of spatial correlation among pixels at satel-lite spatial resolutions was recognized early on by LARS staff[26][28] leading to the well-known ECHO classifier describedin detail in 1976 [29]. ECHO, which is still used 25 years afterits introduction,5 is based on region growing to find homoge-neous sets of pixels that can be characterized by group meansand covariances. All the pixels of a given group are labeled inone step by assessing group similarity to each of the trainingclasses. Pixels that do not naturally occur in groups are han-dled individually by regular point classification methods (suchas maximum likelihood).

    4Mixture models have also been employed to resolve multimodalityseeSection IV-F

    5Available in the MuliSpec software package.

    Hoffer and colleagues at LARS were also among the first toexperiment with the incorporation of data, other than spectral,into a multispectral classification [30], [31], while Swain, whoreceived his Purdue Ph.D. in 1970 when working at LARS, pi-oneered statistical approaches to the incorporation of context[32].

    Building on the work of Rosenfeld and colleagues, from thethen Computer Vision Laboratory at the University of Mary-land, for developing consistency in scene labeling by the use oflabel relaxation [33], Richards, Landgrebe, and Swain showedthat relaxation methods could be used to incorporate both spa-tial context and the effect of ancillary (topographic) data into aclassification [34][36].

    The problem of context has been handled more recently usingMarkov (or Gibbs) Random Fields. One of the earliest treat-ments was that of Jeon and Landgrebe [37]. Solberg et al. [38]have employed Markov random field (MRF) models to incor-porate spatial and temporal context. Jung and Swain [39] haveshown how the use of better estimates for class statistics, alongwith MRF for including spatial context, can lead to improvedclassification accuracy. Sarkar et al. [40] have devised an unsu-pervised procedure based on MRF.

    Interestingly, the MRF approach essentially uses the priorprobability term in the posterior probability of (2) to embedthe effect of context via the Gibbs distribution energy term. Al-though not an MRF approach, Strahler [41], as early as 1980,also used the prior probability term to incorporate contextualinformation into a classification.

    C. The Hyperspectral ChallengeAlthough they had always been aware of the so-called Hughes

    phenomenon, it was in the 1980s that Landgrebe and colleaguestook a greater interest in problems to do with training maximum-likelihood classifiers with limited training samples [42][45].Initially their focus was on feature selection, but with the adventof hyperspectral datasets, they turned their attention to the diffi-culty of obtaining reliable estimates of class covariance matricesin the face of limited training data [46]. The approach has beento use an approximation to the class sample covariance matrixalong the general line

    where , are two different estimates of the sample co-variance. Typically, would be the (poor) class covarianceestimate obtained from the available training samples, andwould be the (global) covariance matrix computed over all thetraining samples treated as a single group. Sometimes the prin-cipal diagonal matrix of the covariance matrix, or a diagonalmatrix with the trace of the matrix as its elements, is used in-stead of , as appropriate.

    This work developed into a series of major contributionsby Landgrebe and colleagues through the 1990s and theearly 2000s, all concerned with the problem of reliable max-imum-likelihood classification with high-dimensionality imagedatasets [47][56]. Much of this work has appeared in Ph.D.theses supervised by Landgrebe and is summarized in his recentbook [57].

  • RICHARDS: ANALYSIS OF REMOTELY SENSED DATA 425

    An alternative method for handling covariance matrix estima-tion in the face of limited training samples has been to simplifythe structure of the matrix. By recognizing that interband cor-relations are generally strongest in the region of the diagonal, ablock diagonal simplification of the matrix can be adopted thatrelaxes considerably the need to gather large numbers of trainingsamples per class [58].

    Algorithm development for thematic mapping may well havetaken a different path had technological limitations not restrictedthe spectral range and resolution of the first sensors. If hyper-spectral measurements had been available from the start, it ispossible that pixel labeling might have been based on spec-troscopic principles because the samples could then have beencomposed into a discrete version of the real reflectance spec-trum of the earth cover type.

    With upward of 200 spectral channels now available the sam-pled pixel spectrum contains enough detail to allow spectro-scopic principles to be applied for image understanding. Ratherthan doing that ab initio each time a pixel has to be identified,the approach normally taken is to compile reference libraries ofprerecorded spectra against which newly measured data can becompared for identification. This work actually commenced inthe mid 1980s [59]. Early research concentrated on coding thespectral data to make full-spectrum library searching feasible[60]. More recently, though, matching procedures have beendeveloped around only those spectral features known to be di-agnostic [61]. They are generally absorption features that canbe characterized by measurements of their position, depth, andwidth. Expert system methods are used to assist in the featureselection process.

    D. Multisource MethodsIn the mid to late 1980s, LARS researchers and their as-

    sociates looked systematically at statistical and evidentialprocedures for thematic mapping from several data sources[62][64]. Generally, statistical approaches have been based onmaximizing the multisource posterior probability

    or, having used Bayes rule, maximizing

    (3)To render the problem tractable the different sources are gen-erally assumed to be independent so that (3) can be written asthe product of individual distribution functions. That also allowsindices, , to be added to reflect confidence in the differentsources that contribute to the joint decision:

    This work was generalized by Benediktsson and Swain in de-vising the consensus theoretic approach to multisource classifi-cation [65].

    Evidential methods for handling multisource data derive fromthe treatment of Shafer [66]. These entail allocating measuresof evidential mass or belief to each of the possible labels fora pixel (the higher the mass the more likely the label). Masscan also be allocated to joint labeling possibilities (where we

    have reason to believe a pixel may be a mixture of cover types)and, explicitly, to any uncertainty in the labeling process. Themasses allocated over all single and joint labeling propositions,and uncertainty, sum to unity.

    Allocation of mass can be derived from any reasonable ap-proach. Some authors use the posterior probabilities from max-imum-likelihood classification for that purpose, with the massgiven to uncertainty based on the confidence one has in the clas-sified product.

    The real strength of evidential reasoning for multisourceclassification is that the mass distributions derived from severalsources can be combined through the so-called orthogonalsum [62], [67]. The outcome generally concentrates evidentialmass on preferred label(s), and reduces uncertainty. Evidentialreasoning has also been combined with Markov Random Fieldmethods [68].

    Decision trees have long been regarded as valuable meansfor handling difficult classification problems, including thoseinvolving several sources, because they allow different featuresets to be used at each decision node. The application of decisiontrees was largely pioneered at LARS [69][71].

    E. Neural Networks and Support Vector MachinesThe earliest pattern classifiers sought to label data into two

    different classes by attempting to place a linear separating hy-perplane between them in multidimensional (feature) space [2].Multiple classes were handled, usually, by adopting binary de-cision trees. Because accuracy depended on the classes beinglinearly separable, the so-called perceptron approaches of the1960s did not find great application in remote sensing. Therewere some notable exceptions in which piecewise linear deci-sion surfaces were derived [72], [73] but, in general, thematicmapping of remote sensing data by perceptron-like classifiersdid not become viable before the advent of the (artificial) neuralnetwork.

    The great breakthrough the neural network offered was theavailability of feasible training techniques for nonlinearly sep-arable data, but at the expense of algorithmic and training com-plexity. Benediktsson et al. [74], [75] appear to have been thefirst to demonstrate the use of neural networks (using backprop-agation) for remote sensing image labeling.

    Perhaps the most interesting development in classificationbased on linear methods in recent years has been the supportvector machine (SVM) and the use of kernels to transform in-separable data into a feature space where linear separability canbe exploited [76].

    In the simple perceptron, the linear separating hyperplane isfound by iterating an initial guess into a final position such thatthe pixels from each of the training classes are finally on theircorrect side of the hyperplane. That iterative process uses all thetraining data to find the hyperplane. Since the objective is simplyto find a surface which separates two classes of data, it is reallyonly those pixels nearest the hyperplane that need be used toderive the surface. Moreover, there is an optimal surface (whichis missed in perceptron methods) given by the hyperplane ori-entation that maximizes class separation as depicted in Fig. 2.The hyperplane is found by a constrained optimization processthat maximizes the separation between the classes (the margin

  • 426 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 43, NO. 3, MARCH 2005

    Fig. 2. Separation of two classes of pixel by the optimal hyperplane of asupport vector machine and by maximum-likelihood classification. A typicalhyperplane as implemented by a simple linear perceptron is also depicted.

    in Fig. 2) subject to the condition that the training classes mustbe on their correct side [77][79].

    The real advance with the SVM comes with the realizationthat the pixel vectors in the optimization and decision rule for-mulas always appear in pairs related through a scalar product.It is possible therefore to replace those products by higher orderfunctions of the pairs of vectors, essentially projecting the pixelvectors into a higher dimensional space where the data becomelinearly separable. Such a projection is carried out by the use ofso-called kernels, the two most common of which are polyno-mial and radial basis function transformations. Recent studiesof the value of SVM in remote sensing image analysis will befound in [80] and [81].

    F. Mixtures, Fuzzy Memberships, and Bayesian MethodsWhile clustering is often used as a means for resolving multi-

    modality in information classes, as noted in Section IV-A, mix-ture analysis has been used for this purpose by a number ofauthors, including Landgrebe for the case of hyperspectral data.The probability of a pixel belonging to a given class is rep-resented as a linear combination of Gaussian distributions

    [57]

    in which the are a set of mixing proportions. If such amodel is to be used successfully, the number of components

    , the mixing parameters , and the means and covariancesfor each of the individual distributions all have to be found.Kuo and Landgrebe [82] use expectation-maximization (EM)and a range of goodness of fit measures to apply the model tohyperspectral data. They use feature selection based on non-parametric weighted feature extraction (NWFE) to render theapproach tractable [57]. Dundar and Landgrebe [56] essentiallyexamine the same problem, but use a range of regularization

    methods for estimating the class covariances while generatingthe model.

    Unsupervised mixture models can be devised through theAutoClass procedure based on Bayesian classification assump-tions [83]. The computational demand however can be high,especially when a significant number of classes needs to be usedto reveal the underlying data structure, sometimes requiringparallel algorithms to render the computation viable. Bayesianmethods have also been applied to data mining and multisourceinference problems [84] and content-based image retrieval [85].

    Most classification procedures allow the user to get some feelfor the relative likelihoods of a pixel belonging to each of theavailable classes. In the maximum-likelihood rule, a set of pos-terior probabilities is produced by the algorithm, from which thepreferred label is generally chosen by maximum selection. Thelast step is not essential and, if desired, the user can accept theposterior probabilities as indications of the likelihoods of classmembership. Likewise, evidential methods produce a belief dis-tribution that suggests relative likelihoods. So, it is possible toavoid maximum selection steps and produce instead maps oflikelihoods or, in some cases, abundances. In the case of opticaldata, this is often done explicitly through endmember analysis,in which it is assumed that a given pixel is a composed of a linearcombination of pure cover types. Spectral unmixing methodscan then be used to assess the relative abundances present in agiven pixel [86].

    Fuzzy classification procedures can also be used to assign aset of membership possibilities to a pixel instead of providingthe label of the most likely class. Schowengerdt [87] providesa simple summary of the approach, for both supervised and un-supervised labeling, while Zhang and Foody [88] demonstratefuzzy classification of a suburban region, in which fuzzy groundtruth data was also used.

    V. RADAR DATA METHODS

    In principle, the procedures devised for thematic mappingfrom visible and infrared spectral data can be applied to the anal-ysis of radar imagery. However, there are several fundamentaldifferences in the data types that suggests that radar should behandled differently.

    The multiplicative speckle noise inherent in radar data givesan effective signal to noise ratio of 0 dB in its raw capturedform and (so-called multilook) averaging is needed to reducethe noise to a level where meaningful image analysis canbe carried out. Even then, simple point classifiers, such asthe maximum-likelihood rule and neural networks, will notperform well unless further filtering is performed, or so-calledper field classification is carried out [89]. Averaging over timehas also been used to reduce speckle noise in preparation forclassification [90].

    The nature of the energymatter interaction process is anothermajor difference between radar and optical imagery. Whereasthe interaction is largely with the surface elements of the land-scape at visible and reflected infrared wavelengths (with sometransmission and multiple reflections for vegetation and water),at the much longer wavelengths that characterize radar, diffusesurface scattering is only one of a number of backscattering

  • RICHARDS: ANALYSIS OF REMOTELY SENSED DATA 427

    mechanisms. Surface scattering can also be specular at longwavelengths. Additionally, volume scattering can be a major el-ement of backscatter; it occurs from forest canopies, crops, andsea ice. Subsurface scattering in hyperarid regions at very longwavelengths is also possible.

    There can also be a backscattered component resulting fromdouble bounce strong reflection involving the right angle formedby adjacent vertical and horizontal structures. Tree trunks andthe ground, buildings and the ground, and ships and the sea sur-face are all situations that give rise to this mechanism. On oc-casions, facets (such as sloped building roofs) can also give astrong specular return to the radar.

    In order to interpret radar imagery effectively, the analystneeds to be aware of those scattering complexities. It is impor-tant to recognize also that there are three dimensions to radardata: wavelength, polarization, and incidence angle, in contrastto just wavelength for optical data. Further, the energy is co-herent so that the backscattered returns are complex, involvingboth amplitude and phase, and can have a different polarizationfrom that transmitted. With such rich dimensionality one wouldexpect, therefore, that scattering models could be devised forthe purpose of thematic mapping by relating the received energymore directly to the biophysical variables of interest [91][93],rather than proceeding via a classification.

    In the simplest cases, these models seek to simulate the be-havior of the scattering coefficient which relates backscatteredpower density to incident power density

    where is polarization, wavelength and the look angle.Sometimes these models are based on regression relationships.

    More complex models simulate the scattering matrix, the el-ements of which relate the scattered electric field vector to theincident field for various polarization combinations

    Inversion of scattering models, which is often nontrivial, al-lows the analyst to derive meaningful biophysical information(biomass, age of sea ice, crop yield, etc.) directly from therecorded data.

    We now look at some of the more significant developments inbackscatter modeling that have found application over the pasttwo decades.

    Moisture content is an important determinant of the strengthof scattering from vegetation. One of the earliest effective,though simple, radar scattering models was the water-cloudmodel for vegetation canopy scattering developed by Attemaand Ulaby [94]. It represents the canopy of a forest, say, asa cloud of water droplets and then uses results from cloudphysics to avoid having to model the complex canopy physicalstructure.

    Lang et al. [95][97] introduced dielectric disk models forfoliage to give a more realistic canopy model, while Richardset al. [98] showed how simple separable structural models can

    be used, demonstrating the importance of the trunk componentin scattering from mature forests at long wavelengths.

    The first comprehensive radar scattering model (MIMICS)was devised by Ulaby et al. [99], while van Zyl [100] demon-strated how knowledge of the phase changes that occur with thenumber of scattering events can be used to provide simple forestthematic mapping.

    Knowledge of scattering behaviors, either from prior expe-rience or derived from image characteristics, has been used toconstruct expert system or knowledge-based approaches to theanalysis of radar image data [101]. More recently, case-basedreasoning has been proposed as a knowledge-based methodusing predetermined pixel attributes or features, and fuzzymeasures of class membership [102], in the descriptions ofradar resolution cells.

    Lee and coworkers [103][105] have developed a classifierfor radar that appeals directly to the statistical nature of theactual data. Based on the knowledge that a vector formed fromthe copolarized and cross-polarized elements of the scatteringmatrix is distributed in a Gaussian fashion with zero mean,a distance-based discriminant function can be developed forclassification purposes. When multilook averaging and sev-eral wavebands are taken into account, the scattering vector isbased on Wishart statistics, again leading to a distance-deriveddiscriminant function that performs well in forest thematic map-ping. In a very interesting later article, Lee et al. [106] havecombined scattering models and Wishart-based maximum-like-lihood classification to devise a new unsupervised classificationprocedure for polarimetric radar imagery.

    Unlike passive optical remote sensing systems, radar imagingoffers additional utility because of the coherent nature of theradar returns. For two radar passes over the same region, thevector difference of the signals carries information in its phaseand in the correlation of the two complex signals. The latter isgenerally expressed as coherence which can be used as a fea-ture for biophysical parameter estimation [107], [108] and asa feature in unsupervised classification of interferometric SAR(InSAR) imagery [91]. As importantly, however, the phase in-formation obtained by interfering the two images allows topo-graphic analysis of the landscape [109]. Topography derived inthis manner has been used to supplement other radar features indecision tree analysis of radar image data [110].

    VI. MULTISOURCE INFERENCE

    Pixel labeling from multiple, mixed data sources was recog-nized as important early in the history of thematic mapping, andwas a feature of LARS research in the late 1970s and early 1980s[31], [36].

    With the emergence of several data gathering nations andspace programs in the mid to late 1980s, and the varieties ofdata type available, attention to forming joint inferences aboutthe most appropriate labels to assign to equivalent ground pixelshas accelerated significantly, to the extent that the IEEE Geo-science and Remote Sensing Society now has a Technical Com-mittee on Data Fusion,6 and special issues of journals dealingwith the problem have appeared [111].

    6http://ewh.ieee.org/soc/grss/

  • 428 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 43, NO. 3, MARCH 2005

    Methods such as multisource (joint) statistical modeling, ev-idential methods and relaxation labeling show promise for datafusion, as do support vector machines [112] and more generalcredit assignment methods [113]. But from an operational view-point, they have a number of potentially serious practical lim-itations, as do all methods that rely on fusing at the data anddecision levels, as discussed in the following.

    It is unlikely that a given analyst would be expert in under-standing more than one particular data type. Moreover, manyapplication specialists are unlikely to be expert in techniquesfor handling data and, in some cases, might not even careabout the primary datasets that went into deriving productsof importance. While they might find it interesting that thedata originated from satellite sensors, that may not be sig-nificant from the point of view of their specific application.As a simple analogue, those who are interested in weatherforecasts would not normally know (or care) anything aboutcloud top temperatures.

    In addition, given that the user most likely would now beserved spatial data over a network, there are a number of fun-damental requirements that an operational thematic mappingschema should satisfy. Apart from being able to cope with avariety of data types, including mixtures of categorical and nu-merical data, it must do the following.

    1) Account for relative data quality and relevance. This hasbeen well known since the earliest studies of multisourceclassification methods. Poor quality data, or that which ismarginally relevant for a particular application, must notunreasonably influence the outcome.

    2) Allow each data source to be analyzed separately in timeand location, and by its own experts. Different data typesare generally recorded by different agencies, and some-times those data are not all available when an analysisis first performed. Also, each particular data source hasits own expert analysis community and it is unlikely thatany single agency or person will be expert across the va-riety of relevant, contemporary data typesspectroscopicanalysis of hyperspectral data and the derivation of bio-physical information from multipolarization InSAR datais an illustration.

    3) Allow preexisting thematic maps to be incorporated intothe analysis process, and thematic map revision must beaccommodated. Previously classified data may have valueto a current exercise, as might existing categorical maps,and should be able to be incorporated. Likewise, map re-vision is important.

    4) Accept that the thematic classes from a combined datasetmight be different from the classes achievable with anydataset on its own. This is a particularly important reasonas to why data fusion methods are limited when seekingto form operational joint inferences. Table I provides asimple illustration: the types of information class relevantto one particular dataset can be quite different from thoserelevant to a different data typein other words infor-mation classes are often source-specific. The informationclasses of interest to the user may be quite different again,but should be derivable from the source-specific labels.

    TABLE ICLASSES CAN BE DIFFERENT FOR DIFFERENT DATA TYPES

    Meeting all these conditions is difficult when using any (fusion)technique that depends on combining data, since the datasources would then all have to be available simultaneouslyand the class definitions would need to be consistent over thesources.

    The best prospect for an operational methodology, therefore,seems to be for a technique that allows each dataset to be an-alyzed independently, using whatever procedure is optimallymatched to the characteristics of that particular source, and thenlook for a combination process that permits the thematic classesidentified in the individual analyses to be combined. Not onlydoes that offer the prospect of getting the best possible perfor-mance out of each source/classifier combination, but it also al-lows each data source to be analyzed where and when available.That includes the case of a primary data supplier convertingrecorded data to geophysical products for distribution to clientagencies.

    Spatial context, as appropriate, could be incorporated at thelevel of individual data analysis. The labeling of a pixel in spatialrelation to its neighbors is probably most effective when donein conjunction with the primary form of data analysis.

    An effective method for operating on labels to form joint in-ferences is to adopt some form of symbolic reasoning similar tothat used in an expert system. For example, if analysis of a multi-spectral dataset reveals a vegetation type, while interpretation ofcoincident radar data suggests a smooth specular surface, thenan expert would conclude that the cover type was most likelygrassland (smooth vegetation). As a production rule that wouldbe expressed

    from spectral data the region is vegetatedfrom radar the region is smooththe region is probably grassland

    Since labels from each individual data source are found usingthe most appropriate algorithm for the task, production rulesprovide an easy means for forming joint inferencesi.e., forfusing the labeling outcomes from each data source. In con-trast to data-level, feature-level, and decision-level fusion, thisis label-fusion as depicted in Fig. 3, akin to fusion at the deci-sion level, but with the possibility that the final thematic labelscan be different from the source-specific labels.

  • RICHARDS: ANALYSIS OF REMOTELY SENSED DATA 429

    Fig. 3. Handling mixed data types by fusion at the label level.

    A rule-based expert system for handling mixed data analysisproblems in remote sensing was devised by Srinivasan [114],[115]. While focused on the problem of both single source anal-ysis and label combination, its main interest here is that it pro-vides a ready combination tool. Analysis of single data sourcescan be carried out either by a rule-based process (as might be im-portant for radar when using scattering models) or by any of themore traditional and, where appropriate, numerical and statis-tical methods. Provided, however, they deliver sets of thematiclabels relevant to the particular data source, a joint inference canbe created if symbolic combination rules are available. Alterna-tive symbolic reasoning procedures are also possible [116].

    From an operational perspective, that is only part of the story.Who will develop and apply the joint inferential symbolic pro-cessing techniques? Is it reasonable to expect that the averageuser will possess the knowledge and skills to work at that level?In the early decades of remote sensing, with only a single spa-tial data type and a few standard thematic mapping proceduresavailable, it was reasonable to expect that all the analytical workmight be carried out by the skilled end user. Now, however, themore effective operational approach might involve two broadphases as follows.

    Single datasets should be analyzed by experts with thosedata types. The resultant products can be generated in re-sponse to a specific requirement or might be producedspeculatively and archived for later use.

    There will need to emerge consultant communities ca-pable of using expert knowledge to convert single sourceproducts to the jointly inferred products required by theend user. The consultant, having taken instruction fromthe user, would select the products generated by singledata suppliers that most appropriately match the users re-quirements, and then perform label-level fusion using ex-pert knowledge as depicted in Fig. 4. The consultant canalso account, at least in a qualitative sense, for relativedata quality by the manner in which symbolic reasoningis applied [115].

    Of course, all such methodologies involve compromises. It iscertain that there will be some combinations of data types thatmay well give better results through data fusion, rather than

    Fig. 4. Operational spatial data analysis framework for handling mixed datatypes.

    being handled individually. However, in an operational context,and in an era where the data types and suppliers are many, it isnecessary to adopt a schema that will work over all applicationsand that will not require either the analyst or client to developexpertise across the breadth of the technology sector.

    VII. CONCLUDING COMMENTSThe contributions to quantitative digital image interpreta-

    tion in remote sensing by David Landgrebe, his students, andcoworkers established the foundations for the field and haveinfluenced the evolution of the quantitative approach ever since.It has been an important journey. From the challenges of thepre- and early spaceborne days through to the current era ofthe hyperspectral dataset, the Purdue group has identified theproblems and proposed the solutions. The array of analyticaltechniques now regularly used in image analysis owes much tothose contributions.

    The next era will be characterized by the need for scene un-derstanding that better matches the requirements of the clientcommunity. We have done much over the past ten years or so toattempt accurate pixel labeling with mixed data types by usingtechniques that refine the results obtained on one dataset byusing other available data types. It is important that we nowmove away from methodologies that require the end user, oreven a person specialist in one particular dataset, to have enoughexpertise to create meaningful thematic products from the rangeof spatial data types available. Instead, we need to accept thata truly operational spatial data community will need both do-main experts and those who can manipulate labels. Symbolicprocessing needs to become as much a part of the new era ofimage analysis as quantitative extraction of information fromdata has been in the past.

    By its nature, and in its mission, LARS has always beenapplications driven. The analytical techniques devised by DaveLandgrebe and his collaborators have been in response, eitherdirectly or indirectly, to the perceived needs of communitiesas diverse as forestry, agriculture, soils mapping, and land use

  • 430 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 43, NO. 3, MARCH 2005

    generally. Remote sensing, fundamentally, is an applications-driven field, and while there is still room for the developmentof further thematic mapping algorithms, the requirements ofthe end user must drive the outcomes. Our methodologies mustnow be as much about choosing the most relevant primary datatypes as doing the actual analysis. It is to consultants, skilledin translating user requirements into tasks for data sourceexperts, and expert in combining the single-source outputsinto the product required by the client, that the field mustgiven. Only then will the ground-breaking and fundamentalanalytical work pioneered at LARS, and by Dave Landgrebe,find maturity.

    REFERENCES[1] K. R. Castleman, Digital Image Processing, 1st ed. Englewood Cliffs,

    NJ: Prentice-Hall, 1979.[2] N. J. Nilsson, Learning Machines. New York: McGraw-Hill, 1965.[3] R. O. Duda and P. E. Hart, Pattern Classification and Scene Anal-

    ysis. New York: Wiley, 1973.[4] D. A. Landgrebe. (1986) A brief history of the Laboratory for

    Applications of Remote Sensing (LARS). [Online]. Available:www.lars.purdue.edu/home/LARSHistory.html

    [5] G. Cardillo and D. Landgrebe, On pattern recognition, Purdue Univ.,West Lafayette, IN, LARS Tech. Note 101 866, 1966.

    [6] K. S. Fu, D. A. Landgrebe, and T. L. Phillips, Information processingof remotely sensed agricultural data, Proc. IEEE, vol. 57, no. 4, pp.639653, Apr. 1969.

    [7] A. Wacker, A cluster approach to finding spatial boundaries in multi-spectral imagery, Purdue Univ., West Lafayette, IN, LARS Tech. Note122 969, 1969.

    [8] K. S. Fu and P. J. Min, On feature selection in multiclass pattern recog-nition, Purdue Univ., West Lafayette, IN, LARS Tech. Note 080 168,1968.

    [9] P. J. Min, D. A. Landgrebe, and K. S. Fu, Feature selection in mul-ticlass pattern recognition, Purdue Univ., West Lafayette, IN, LARSTech. Note 050 170, 1970.

    [10] A. Wacker and D. A. Landgrebe, The minimum distance approach toclassification, Ph.D. dissertation, Purdue Univ., School of Elect. Eng.,West Lafayette, IN, 1971.

    [11] P. H. Swain and A. G. Wacker, Comparison of the divergence and B-dis-tance in feature selection, Purdue Univ., West Lafayette, IN, LARSTech. Note 020 871, 1971.

    [12] P. H. Swain and R. C. King, Two effective feature selection criteriafor multispectral remote sensing, in Proc 1st Int. Joint Conf. PatternRecognition, Washington, DC, Nov. 1973, pp. 536540.

    [13] P. H. Swain and S. M. Davis, Remote Sensing: The Quantitative Ap-proach. New York: McGraw-Hill, 1978.

    [14] M. D. Fleming, J. S. Berkebile, and R. M. Hoffer, Computer-aided anal-ysis of Landsat-1 MSS data: A comparison of three approaches includinga modified clustering approach, in Proc. Symp. Machine Processing ofRemotely Sensed Data, West Lafayette, IN, Jun. 35, 1975, pp. 5461.

    [15] X. Jia and J. A. Richards, Cluster space representation for hyperspectralclassification, IEEE Trans. Geosci. Remote Sens., vol. 40, no. 3, pp.593598, Mar. 2002.

    [16] R. M. Haralick, K. Shanmugan, and I. Dinstein, Texture features forimage classification, IEEE Trans. Syst., Man Cybern., vol. SMC-3, no.6, pp. 610621, Nov. 1973.

    [17] K. R. Castleman, Digital Image Processing, 2nd ed. Upper SaddleRiver, NJ: Prentice-Hall, 1996.

    [18] R. M. Hord and W. Brooner, Land-use map accuracy criteria, Pho-togramm. Eng. Remote Sens., vol. 42, no. 5, pp. 671677, 1976.

    [19] R. G. Congalton and K. Green, Assessing the Accuracy of RemotelySensed Data: Practices and Principles. Boca Raton, FL: Lewis, 1993.

    [20] G. M. Foody, Status of land cover classification accuracy assessment,Remote Sens. Environ., vol. 80, pp. 185201, 2002.

    [21] W. G. Eppler, Canonical analysis for increased classification speed andchannel selection, IEEE Trans. Geosci. Electron., vol. GE-14, no. 1, pp.2633, Jan. 1976.

    [22] S. K. Jensen and F. A. Waltz, Principal components analysis and canon-ical analysis in remote sensing, in Proc. Amer. Soc. Photogrammetry45th Annu. Meeting, 1979, pp. 337348.

    [23] J. M. Soha and A. A. Schwartz, Multispectral histogram normalizationcontrast enhancement, in Proc 5th Can. Symp. Remote Sensing, 1978,pp. 8693.

    [24] R. J. Kauth and G. S. Thomas, The tasseled cap A graphic descriptionof the spectral-temporal development of agricultural crops as seen byLandsat, in Proc. Symp. on Machine Processing of Remotely SensedData, West Lafayette, IN, 1976.

    [25] G. R. Byrne, P. F. Crapper, and K. K. Mayo, Monitoring land coverchanges by principal components analysis of multitemporal Landsatdata, Remote Sens. Environ., vol. 10, pp. 175184, 1980.

    [26] T. Huang, Per field classifier for agricultural applications, PurdueUniv., West Lafayette, IN, LARS Tech. Note 060 569, 1969.

    [27] T. V. Robertson, Extraction and classification of objects in multispectralimages, in Proc. Symp. Machine Processing of Remotely Sensed Data,West Lafayette, IN, 1973, pp. 2734.

    [28] J. N. Gupta, R. L. Kettig, D. A. Landgrebe, and P. A. Wintz, Machineboundary finding and sample classification of remotely sensed data,in Proc. Symp. on Machine Processing of Remotely Sensed Data, WestLafayette, IN, 1973, pp. 2535.

    [29] R. L. Kettig and D. A. Landgrebe, Computer classification of remotelysensed multispectral image data by extraction and classification of ho-mogeneous objects, IEEE Trans. Geosci. Electron., vol. GE-14, no. 1,pp. 1926, Jan. 1976.

    [30] R. M. Hoffer, M. D. Fleming, L. A. Bartolucci, S. M. Davis, and R. F.Nelson, Digital processing of Landsat MSS and topographic data toimprove capabilities for computerized mapping of forest cover types,Purdue Univ., West Lafayette, IN, LARS Tech. Note 011 579, 1979.

    [31] M. D. Fleming and R. M. Hoffer, Machine processing of LandsatMSS data and DMA topographic data for forest cover types mapping,in Proc. Symp. Machine Processing of Remotely Sensed Data, WestLafayette, IN, 1979, pp. 377390.

    [32] J. C. Tilton, P. H. Swain, and S. B. Vardeman, Context distribution esti-mation for contextual classification of multispectral image data, PurdueUniv., West Lafayette, IN, LARS Tech. Note 040 280, 1980.

    [33] A. Rosenfeld, R. Hummel, and S. Zucker, Scene labeling by relaxationalgorithms, IEEE Trans. Syst., Man Cybern., vol. SMC-6, no. 6, pp.420433, Jun. 1976.

    [34] J. A. Richards, D. A. Landgrebe, and P. H. Swain, Pixel labeling bysupervised probabilistic relaxation, IEEE Trans. Pattern Anal. Mach.Intell., vol. PAMI-3, no. 2, pp. 188191, Mar. 1981.

    [35] , On the accuracy of pixel relaxation labeling, IEEE Trans. Syst.,Man Cybern., vol. SMC-11, no. 4, pp. 303309, Apr. 1981.

    [36] , A means for utilizing ancillary information in multispectral clas-sification, Remote Sens. Environ., vol. 12, pp. 463477, 1982.

    [37] B. Jeon and D. A. Landgrebe, Classification with spatio-temporal in-terpixel class dependency contexts, IEEE Trans. Geosci. Remote Sens., vol. 30, no. 4, pp. 663672, Jul. 1992.

    [38] A. H. S. Solberg, T. Taxt, and A. K. Jain, A Markov random field modelfor classification of multisource satellite imagery, IEEE Trans. Geosci.Remote Sens. , vol. 34, no. 1, pp. 100113, Jan. 1996.

    [39] Y. Jung and P. H. Swain, Bayesian contextual classification based onmodified M-estimates and Markov random fields, IEEE Trans. Geosci.Remote Sens. , vol. 34, no. 1, pp. 6775, Jan. 1996.

    [40] A. Sarkar, M. K. Biswas, B. Kartikeyan, V. Kumar, K. L. Majumder, andD. K. Pal, A MRF model-based segmentation approach to classificationfor multispectral imagery, IEEE Trans. Geosci. Remote Sens. , vol. 40,no. 5, pp. 11021113, May 2002.

    [41] A. H. Strahler, The use of prior probabilities in maximum likelihoodclassification of remotely sensed data, Remote Sens. Environ., vol. 10,pp. 135163, 1980.

    [42] H. M. Kalayeh, M. J. Muasher, and D. A. Landgrebe, Feature selectionwith limited training samples, IEEE Trans. Geosci. Remote Sens. , vol.GE-21, no. 4, pp. 434438, Oct. 1983.

    [43] M. J. Muasher and D. A. Landgrebe, A binary tree feature-selectiontechniques for limited training sample size, in Proc 8th Intl. Symp. Ma-chine Processing of Remotely Sensed Data, West Lafayette, IN, 1982,pp. 130137.

    [44] C. Lee and D. A. Landgrebe, Decision boundary feature selection fornonparametric classifiers, in Proc. SPIE 44th Annu. Conf., St. Paul,MN, May 1991.

    [45] , Feature selection based on decision boundaries, in Proc.IGARSS, Espoo, Finland, Jun. 1991.

    [46] B. M. Shahshahani and D. A. Landgrebe, Using partially labeled datafor normal mixture identification with application to class definition, inProc. IGARSS, Houston, TX, May 2629, 1992, pp. 16031605.

    [47] C. Lee and D. A. Landgrebe, Analyzing high-dimensional multispectraldata, IEEE Trans. Geosci. Remote Sens. , vol. 31, no. 4, pp. 792800,Jul. 1993.

  • RICHARDS: ANALYSIS OF REMOTELY SENSED DATA 431

    [48] J. P. Hoffbeck and D. A. Landgrebe, Covariance matrix estimation andclassification with limited training data, IEEE Trans. Pattern Anal.Mach. Intell., vol. 18, no. 7, pp. 763767, Jul. 1996.

    [49] S. Tadjudin and D. A. Landgrebe, Covariance estimation for limitedtraining samples, in Proc. IGARSS, Seattle, WA, Jul. 610, 1998.

    [50] P.-F. Hsieh and D. A. Landgrebe, Statistics enhancement in hyperspec-tral data analysis using spectral-spatial labeling, the EM algorithm andthe leave-one-out covariance estimator, in Proc. SPIE Int. Symp. Op-tical Science, Engineering, and Instrumentation, San Diego, CA, Jul.1924, 1998.

    [51] S. Tadjudin and D. A. Landgrebe, Covariance estimation with limitedtraining samples, IEEE Trans. Geosci. Remote Sens., vol. 37, no. 4, pp.21132118, Jul. 1999.

    [52] Q. Jackson and D. A. Landgrebe, An adaptive classifier design for highdimensional data analysis with a limited training data set, IEEE Trans.Geosci. Remote Sens., vol. 39, no. 12, pp. 26642679, Dec. 2001.

    [53] B.-C. Kuo and D. A. Landgrebe, A covariance estimator for smallsample size classification problems and its application to feature extrac-tion, IEEE Trans. Geosci. Remote Sens., vol. 40, no. 4, pp. 814819,Apr. 2002.

    [54] Q. Jackson and D. A. Landgrebe, An adaptive method for combinedcovariance estimation and classification, IEEE Trans. Geosci. RemoteSens., vol. 40, no. 5, pp. 10821087, May 2002.

    [55] B.-C. Kuo and D. A. Landgrebe, Regularized covariance estimatorsfor hyperspectral data classification and its application to feature extrac-tion, in Proc. IGARSS, Toronto, ON, Canada, Jun. 2428, 2002.

    [56] M. M. Dundar and D. A. Landgrebe, A model-based mixture-super-vised classification approach in hyperspectral data analysis, IEEETrans. Geosci. Remote Sens. , vol. 40, no. 12, pp. 26922699, Dec.2002.

    [57] D. A. Landgrebe, Signal Theory Methods in Multispectral RemoteSensing. Hoboken, NJ: Wiley, 2003.

    [58] X. Jia and J. A. Richards, Efficient maximum likelihood classificationfor imaging spectrometer data sets, IEEE Trans. Geosci. Remote Sens.,vol. 32, no. 2, pp. 274281, Mar. 1994.

    [59] M. A. Piech and K. R. Piech, Symbolic representation of hyperspectraldata, Appl. Opt., vol. 26, pp. 40184026, 1987.

    [60] A. S. Mazer, M. Martin, M. Lee, and J. E. Solomon, Image processingsoftware for imaging spectrometry data analysis, Remote Sens. Env-iron., vol. 24, pp. 201211, 1988.

    [61] R. N. Clark, G. A. Swayze, K. E. Livio, R. F. Kokaly, S. J. Sutley, J. B.Dalton, R. R. McDougal, and C. A. Gent, Imaging spectroscopy: Earthand planetary remote sensing with the USGS tetracorder and expert sys-tems, J. Geophys. Res., vol. 108, no. E12, pp. 51315175, Dec. 2003.

    [62] T. Lee, J. A. Richards, and P. H. Swain, Probabilistic and evidentialapproaches for multisource data analysis, IEEE Trans. Geosci. RemoteSens., vol. GE-25, no. 3, pp. 283293, May 1987.

    [63] H. Kim and P. H. Swain, Multisource data analysis in remote sensingand geographic information systems based on Shafers theory ofevidence, in Proc. IGARSS, Vancouver, BC, Canada, Jul. 1989, pp.829832.

    [64] , Evidential reasoning approach to multisource-data classificationin remote sensing, IEEE Trans. Syst., Man, Cybern., vol. 25, no. 8, pp.12571265, Aug. 1995.

    [65] J. A. Benediktsson and P. H. Swain, Consensus theoretic classificationmethods, IEEE Trans. Syst., Man, Cybern., vol. 22, no. 4, pp. 688704,Jul.-Aug. 1992.

    [66] G. Shafer, A Mathematical Theory of Evidence. Princeton, NJ:Princeton Univ. Press, 1976.

    [67] J. A. Richards and X. Jia, Remote Sensing Digital Image Analysis, 3rded. Berlin, Germany: Springer-Verlag, 1999.

    [68] A. Bendjebbour, Y. Delingon, L. Fouques, V. Samson, and W. Picznski,Multisensor image segmentation using DempsterShafer fusion inMarkov fields context, IEEE Trans. Geosci. Remote Sens., vol. 39, no.8, pp. 17891798, Aug. 2001.

    [69] C. L. Wu, D. A. Landgrebe, and P. H. Swain, The decision-tree ap-proach to classification, Ph.D. dissertation, Purdue Univ., School ofElect. Eng., West Lafayette, IN.

    [70] P. H. Swain and H. Hauska, The decision-tree classifier: Design and po-tential, IEEE Trans. Geosci. Electron., vol. GE-15, no. 3, pp. 142147,Jul. 1977.

    [71] S. Safavian and D. A. Landgrebe, A survey of decision tree classi-fier methodology, IEEE Trans. Syst., Man, Cybern., vol. 21, no. 3, pp.660674, May 1991.

    [72] T. Lee and J. A. Richards, Piecewise linear classification using senioritylogic committee methods with application to remote sensing, PatternRecognit., vol. 17, no. 4, pp. 453464, 1984.

    [73] , A low-cost classifier for multitemporal applications, Int. J. Re-mote Sens., vol. 6, pp. 14051417, 1985.

    [74] J. A. Benediktsson, P. H. Swain, and O. K. Ersoy, Neural network ap-proaches versus statistical methods in classification of multisource re-mote sensing data, IEEE Trans. Geosci. Electron., vol. GE-28, no. 4,pp. 540552, Jul. 1990.

    [75] J. A. Benediktsson and P. H. Swain, Statistical Methods and NeuralNetwork Approaches for Classification of Data from Multiple Sources,Ph.D., Purdue Univ., School of Elect. Eng., West Lafayette, IN, 1990.

    [76] J. A. Gualitieri and R. F. Cromp, Support vector machines for hyper-spectral remote sensing classification, in Proc. SPIE27th AIPR Work-shop Advances in Computer Assisted Recognition, vol. 3584, , R. J.Merisko, Ed., 1998, pp. 221232.

    [77] B. Sholkoff and A. Smola, Learning with Kernels. Cambridge, MA:MIT Press, 2002.

    [78] C. J. C. Burges, A tutorial on support vector machines for pattern recog-nition, Data Mining Knowl. Discov., vol. 2, pp. 121167, 1998.

    [79] C. Huang, L. S. Davis, and J. R. G. Townshend, An assessment of sup-port vector machines for land cover classification, Int. J. Remote Sens.,vol. 23, no. 4, pp. 725749, 2002.

    [80] J. A. Gualtieri and S. Chettri, Support vector machines for classifica-tion of hyperspectral data, in Proc. IGARSS, vol. 2, Honolulu, HI, Jul.2428, 2000, pp. 813815.

    [81] F. Melgani and L. Bruzzone, Classification of hyperspectral remotesensing images with support vector machines, IEEE Trans. Geosci. Re-mote Sens., vol. 42, no. 8, pp. 17781796, Aug. 2004.

    [82] B.-C. Kuo and D. A. Landgrebe, A robust classification procedurebased on mixture classifiers and nonparametric weighted featureextraction, IEEE Trans. Geosci. Remote Sens., vol. 40, no. 11, pp.24862494, Nov. 2002.

    [83] P. Cheeseman and J. Stutz, Bayesian classification (AutoClass): Theoryand results, in Advances in Knowledge Discovery and Data Mining,U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy,Eds. Menlo Park, CA: AAAI Press, 1996.

    [84] M. Datcu and K. Seidel, Bayesian methods: Applications in informa-tion aggregation and image data mining, in Int. Arch. Photogramm. Re-mote Sensing, vol. 34 part 7-4-3, Valladolid, Spain, Jun. 34, 1999, W6.

    [85] M. Schroeder, H. Rehrauer, K. Siedel, and M. Datcu, Interactivelearning and probabilistic retrieval in remote sensing image archives,IEEE Trans. Geosci. Remote Sens., vol. 38, no. 5, pp. 22882298, Sep.2000.

    [86] A. Plaza, P. Martinez, R. Perez, and J. Plaza, A quantitative and com-parative analysis of endmember extraction algorithms for hyperspectraldata, IEEE Trans. Geosci. Remote Sens., vol. 42, no. 3, pp. 650663,Mar. 2004.

    [87] R. A. Schowengerdt, Remote Sensing: Models and Methods for ImageProcessing. San Diego, CA: Academic, 1997.

    [88] J. Zhang and G. M. Foody, A fuzzy classification of sub-urban landcover from remotely sensed imagery, Int J. Remote Sens., vol. 19, no.14, pp. 27212738, 1998.

    [89] F. del Frate, G. Schiavon, D. Solimini, M. Borgeaud, D. H. Hoekman,and M. A. M. Vissers, Crop classification using multiconfigurationC-band SAR data, IEEE Trans. Geosci. Remote Sens. , vol. 41, no. 7,pp. 16111619, Jul. 2003.

    [90] M. E. Engdahl and J. M. Hyyppa, Land-cover classification using mul-titemporal ERS-1/2 InSAR data, IEEE Trans. Geosci. Remote Sens. ,vol. 41, no. 7, pp. 16201628, Jul. 2003.

    [91] X. Blaes and P. Defurney, Retrieving crop parameters based on tandemERS1/2 interferometric coherence images, Remote Sens. Environ., vol.88, pp. 374385, 2003.

    [92] J. R. Santos, C. F. Freitus, L. S. Aruajo, L. V. Dutra, J. C. Mura, F. F.Gama, L. S. Sola, and S. J. S. SantAnna, Airborne P-band SAR appliedto the aboveground biomass studies in the Brazilian tropical rainforest,Remote Sens. Environ., vol. 87, pp. 482493, 2003.

    [93] Special issue on retrieval of bio- and geophysical parameters from SARdata for land applications, IEEE Trans. Geosci. Remote Sens. , vol. 41,no. 7, pp. 15391710, Jul. 2003.

    [94] E. P. W. Attema and F. T. Ulaby, Vegetation modeled as a water cloud,Radio Sci., vol. 13, pp. 357364, 1978.

    [95] R. H. Lang, Electromagnetic scattering from sparse distribution oflossy dielectric scatterers, Radio Sci., vol. 16, pp. 1530, 1981.

    [96] R. H. Lang and J. S. Sidhu, Electromagnetic backscattering from a layerof vegetation: A discrete approach, IEEE Trans. Geosci. Remote Sens.,vol. GE-21, no. 1, pp. 6271, Jan. 1983.

    [97] R. H. Lang and H. A. Saleh, Microwave inversion of leaf area and incli-nation angle distribution from backscattered data, IEEE Trans. Geosci.Remote Sens. , vol. GE-23, no. 5, pp. 685694, Sep. 1985.

  • 432 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 43, NO. 3, MARCH 2005

    [98] J. A. Richards, Q. Sun, and D. S. Simonett, L-band radar backscattermodeling of forest stands, IEEE Trans. Geosci. Remote Sens., vol.GE-25, no. 4, pp. 487498, Jul. 1987.

    [99] F. T. Ulaby, K. Sarabandi, K. McDonald, M. Whit, and M. C. Dobson,Michigan microwave canopy scattering model (MIMICS), Int. J. Re-mote Sens., vol. 12, pp. 12231253, 1990.

    [100] J. J. van Zyl, Unsupervised classification of scattering behavior usingradar polarimetry data, IEEE Trans. Geosci. Remote Sens. , vol. 27, no.1, pp. 3645, Jan. 1989.

    [101] C. M. Dobson, L. E. Pierce, and F. T. Ulaby, Knowledge-based land-cover classification using ERS-1/JERS-1 SAR composites, IEEE Trans.Geosci. Remote Sens., vol. 34, no. 1, pp. 8397, Jan. 1996.

    [102] X. Li and A. G. Yeh, Multitemporal SAR images for monitoring culti-vation systems using case-based reasoning, Remote Sens. Environ., vol.90, pp. 524534, 2004.

    [103] J. S. Lee, M. R. Grunes, and R. Kwok, Classification of multi-lookpolarimetric SAR imagery based on complex Wishart distribution, Int.J. Remote Sens., vol. 15, pp. 22992311, 1994.

    [104] J. S. Lee, M. R. Grunes, and G. de Grandi, Polarimetric SAR specklefiltering and its implication for classification, IEEE Trans. Geosci. Re-mote Sens. , vol. 37, no. 5, pp. 23632373, Sep. 1999.

    [105] L. Ferro-Famil, E. Pottier, and J. S. Lee, Unsupervised classificationof multifrequency and fully polarimetric SAR images based on theH/A/Alpha-Wishart classifier, IEEE Trans. Geosci. Remote Sens. , vol.39, no. 11, pp. 23322342, Nov. 2001.

    [106] J.-S. Lee, M. R. Grunes, E. Pottier, and L. Ferro-Famil, Unsupervisedterrain classification preserving polarimetric scattering characteristics,IEEE Trans. Geosci. Remote Sens., vol. 42, no. 4, pp. 722731, Apr.2004.

    [107] B. Aiazzi, L. Alparone, S. Baronti, and A. Garzelli, Coherence esti-mation from multilook incoherent SAR imagery, IEEE Trans. Geosci.Remote Sens., vol. 41, no. 11, pp. 25312539, Nov. 2003.

    [108] J. Askne, M. Santoro, G. Smith, and J. E. S. Fransson, Multitemporalrepeat-pass SAR interferometry of boreal forests, IEEE Trans. Geosci.Remote Sens., vol. 41, no. 7, pp. 15401550, Jul. 2003.

    [109] S. R. Cloude and K. P. Papthanassiou, Polarimetric SAR interferom-etry, IEEE Trans. Geosci. Remote Sens., vol. 36, no. 5, pp. 15511565,Sep. 1998.

    [110] M. C. Crawford, S. Kumar, M. R. Ricard, J. C. Gibeaut, and A. Neuen-schwander, Fusion of airborne polarimetric and interferometric SARfor classification of coastal environments, IEEE Trans. Geosci. RemoteSens., vol. 37, no. 3, pp. 13061315, May 1999.

    [111] Special issue on data fusion, IEEE Trans. Geosci. Remote Sens. , pt.1, vol. 37, no. 3, pp. 11871377, May 1999.

    [112] G. H. Halldorsson, J. A. Benediktsson, and J. R. Sveinsson, Supportvector machines in multisource classification, in Proc. IGARSS, vol. 3,Toulouse, France, Jul. 2003, pp. 20542056.

    [113] C. M. Bachman, M. H. Bettenhausen, R. A. Fusina, T. F. Donato, A.L. Russ, J. W. Burke, G. M. Lamela, W. J. Rhea, B. R. Truit, and J.H. Porter, A credit assignment approach to fusing classifiers of multi-season hyperspectral imagery, IEEE Trans. Geosci. Remote Sens., vol.41, no. 11, pp. 24882499, Nov. 2003.

    [114] A. Srinivasan, An artificial intelligence approach to the analysis of mul-tiple information sources in remote sensing, Ph.D. thesis, Univ. NewSouth Wales, School of Elect. Eng., Kensington, Australia, 1991.

    [115] A. Srinivasan and J. A. Richards, Analysis of GIS spatial data usingknowledge-based methods, Int. J. Geograph. Inf. Syst., vol. 7, no. 6,pp. 479500, 1993.

    [116] J. Lloyd, Logic for Learning, ser. Cognitive Technologies Se-ries. Berlin, Germany: Springer-Verlag, 2003.

    John A. Richards (S68M72SM83F96)received the B.E. (Hons1) and Ph.D. degrees fromthe University of New South Wales, Kensington,Australia, in 1968 and 1972, respectively.

    He is currently Director of the Research Schoolof Information Sciences and Engineering at theAustralian National University, Canberra, Australiaand was Deputy Vice-Chancellor and Vice-Presidentfrom 1998 to 2003. He previously worked for 11years at the University College, University of NewSouth Wales, Australian Defence Force Academy,

    where he served as Head of the School of Electrical Engineering from June1987 to July 1996, becoming Deputy Rector and then Rector in July 1996.From 1981 to 1987, he was Foundation Director of the Centre for RemoteSensing at the University of New South Wales. He is the author of the textbookRemote Sensing Digital Image Analysis (Berlin, Germany: Springer-Verlag,1986; Revised 1999, with X. Jia).

    Dr. Richards is a Fellow of the Australian Academy of Technological Sci-ences and Engineering.

    tocAnalysis of Remotely Sensed Data: The Formative Decades and the John A. Richards, Fellow, IEEEI. I NTRODUCTIONII. P ROBLEM D OMAIN

    Fig. 1. Deriving the most appropriate label for a pixel based onIII. D RIVERSIV. M ULTISPECTRAL AND H YPERSPECTRAL D ATA M ETHODSA. FundamentalsB. Classification in ContextC. The Hyperspectral ChallengeD. Multisource MethodsE. Neural Networks and Support Vector Machines

    Fig. 2. Separation of two classes of pixel by the optimal hyperpF. Mixtures, Fuzzy Memberships, and Bayesian MethodsV. R ADAR D ATA M ETHODSVI. M ULTISOURCE I NFERENCE

    TABLE I C LASSES C AN B E D IFFERENT FOR D IFFERENT D ATA T YPESFig. 3. Handling mixed data types by fusion at the label level.Fig. 4. Operational spatial data analysis framework for handlingVII. C ONCLUDING C OMMENTSK. R. Castleman, Digital Image Processing, 1st ed. Englewood CliN. J. Nilsson, Learning Machines . New York: McGraw-Hill, 1965.R. O. Duda and P. E. Hart, Pattern Classification and Scene AnalD. A. Landgrebe . (1986) A brief history of the Laboratory for AG. Cardillo and D. Landgrebe, On pattern recognition, Purdue UniK. S. Fu, D. A. Landgrebe, and T. L. Phillips, Information proceA. Wacker, A cluster approach to finding spatial boundaries in mK. S. Fu and P. J. Min, On feature selection in multiclass patteP. J. Min, D. A. Landgrebe, and K. S. Fu, Feature selection in mA. Wacker and D. A. Landgrebe, The minimum distance approach to P. H. Swain and A. G. Wacker, Comparison of the divergence and BP. H. Swain and R. C. King, Two effective feature selection critP. H. Swain and S. M. Davis, Remote Sensing: The Quantitative ApM. D. Fleming, J. S. Berkebile, and R. M. Hoffer, Computer-aidedX. Jia and J. A. Richards, Cluster space representation for hypeR. M. Haralick, K. Shanmugan, and I. Dinstein, Texture features K. R. Castleman, Digital Image Processing, 2nd ed. Upper Saddle R. M. Hord and W. Brooner, Land-use map accuracy criteria, PhotoR. G. Congalton and K. Green, Assessing the Accuracy of RemotelyG. M. Foody, Status of land cover classification accuracy assessW. G. Eppler, Canonical analysis for increased classification spS. K. Jensen and F. A. Waltz, Principal components analysis and J. M. Soha and A. A. Schwartz, Multispectral histogram normalizaR. J. Kauth and G. S. Thomas, The tasseled cap A graphic descripG. R. Byrne, P. F. Crapper, and K. K. Mayo, Monitoring land coveT. Huang, Per field classifier for agricultural applications, PuT. V. Robertson, Extraction and classification of objects in mulJ. N. Gupta, R. L. Kettig, D. A. Landgrebe, and P. A. Wintz, MacR. L. Kettig and D. A. Landgrebe, Computer classification of remR. M. Hoffer, M. D. Fleming, L. A. Bartolucci, S. M. Davis, and M. D. Fleming and R. M. Hoffer, Machine processing of Landsat MSJ. C. Tilton, P. H. Swain, and S. B. Vardeman, Context distributA. Rosenfeld, R. Hummel, and S. Zucker, Scene labeling by relaxaJ. A. Richards, D. A. Landgrebe, and P. H. Swain, Pixel labelingB. Jeon and D. A. Landgrebe, Classification with spatio-temporalA. H. S. Solberg, T. Taxt, and A. K. Jain, A Markov random fieldY. Jung and P. H. Swain, Bayesian contextual classification baseA. Sarkar, M. K. Biswas, B. Kartikeyan, V. Kumar, K. L. MajumderA. H. Strahler, The use of prior probabilities in maximum likeliH. M. Kalayeh, M. J. Muasher, and D. A. Landgrebe, Feature selecM. J. Muasher and D. A. Landgrebe, A binary tree feature-selectiC. Lee and D. A. Landgrebe, Decision boundary feature selection B. M. Shahshahani and D. A. Landgrebe, Using partially labeled dC. Lee and D. A. Landgrebe, Analyzing high-dimensional multispecJ. P. Hoffbeck and D. A. Landgrebe, Covariance matrix estimationS. Tadjudin and D. A. Landgrebe, Covariance estimation for limitP.-F. Hsieh and D. A. Landgrebe, Statistics enhancement in hyperS. Tadjudin and D. A. Landgrebe, Covariance estimation with limiQ. Jackson and D. A. Landgrebe, An adaptive classifier design foB.-C. Kuo and D. A. Landgrebe, A covariance estimator for small Q. Jackson and D. A. Landgrebe, An adaptive method for combined B.-C. Kuo and D. A. Landgrebe, Regularized covariance estimatorsM. M. Dundar and D. A. Landgrebe, A model-based mixture-supervisD. A. Landgrebe, Signal Theory Methods in Multispectral Remote SX. Jia and J. A. Richards, Efficient maximum likelihood classifiM. A. Piech and K. R. Piech, Symbolic representation of hyperspeA. S. Mazer, M. Martin, M. Lee, and J. E. Solomon, Image processR. N. Clark, G. A. Swayze, K. E. Livio, R. F. Kokaly, S. J. SutlT. Lee, J. A. Richards, and P. H. Swain, Probabilistic and evideH. Kim and P. H. Swain, Multisource data analysis in remote sensJ. A. Benediktsson and P. H. Swain, Consensus theoretic classifiG. Shafer, A Mathematical Theory of Evidence . Princeton, NJ: PrJ. A. Richards and X. Jia, Remote Sensing Digital Image AnalysisA. Bendjebbour, Y. Delingon, L. Fouques, V. Samson, and W. PicznC. L. Wu, D. A. Landgrebe, and P. H. Swain, The decision-tree apP. H. Swain and H. Hauska, The decision-tree classifier: Design S. Safavian and D. A. Landgrebe, A survey of decision tree classT. Lee and J. A. Richards, Piecewise linear classification usingJ. A. Benediktsson, P. H. Swain, and O. K. Ersoy, Neural networkJ. A. Benediktsson and P. H. Swain, Statistical Methods and NeurJ. A. Gualitieri and R. F. Cromp, Support vector machines for hyB. Sholkoff and A. Smola, Learning with Kernels . Cambridge, MA:C. J. C. Burges, A tutorial on support vector machines for patteC. Huang, L. S. Davis, and J. R. G. Townshend, An assessment of J. A. Gualtieri and S. Chettri, Support vector machines for clasF. Melgani and L. Bruzzone, Classification of hyperspectral remoB.-C. Kuo and D. A. Landgrebe, A robust classification procedureP. Cheeseman and J. Stutz, Bayesian classification (AutoClass): M. Datcu and K. Seidel, Bayesian methods: Applications in informM. Schroeder, H. Rehrauer, K. Siedel, and M. Datcu, Interactive A. Plaza, P. Martinez, R. Perez, and J. Plaza, A quantitative anR. A. Schowengerdt, Remote Sensing: Models and Methods for ImageJ. Zhang and G. M. Foody, A fuzzy classification of sub-urban laF. del Frate, G. Schiavon, D. Solimini, M. Borgeaud, D. H. HoekmM. E. Engdahl and J. M. Hyyppa, Land-cover classification using X. Blaes and P. Defurney, Retrieving crop parameters based on taJ. R. Santos, C. F. Freitus, L. S. Aruajo, L. V. Dutra, J. C. Mu

    Special issue on retrieval of bio- and geophysical parameters frE. P. W. Attema and F. T. Ulaby, Vegetation modeled as a water cR. H. Lang, Electromagnetic scattering from sparse distribution R. H. Lang and J. S. Sidhu, Electromagnetic backscattering from R. H. Lang and H. A. Saleh, Microwave inversion of leaf area andJ. A. Richards, Q. Sun, and D. S. Simonett, L-band radar backscaF. T. Ulaby, K. Sarabandi, K. McDonald, M. Whit, and M. C. DobsoJ. J. van Zyl, Unsupervised classification of scattering behavioC. M. Dobson, L. E. Pierce, and F. T. Ulaby, Knowledge-based lanX. Li and A. G. Yeh, Multitemporal SAR images for monitoring culJ. S. Lee, M. R. Grunes, and R. Kwok, Classification of multi-loJ. S. Lee, M. R. Grunes, and G. de Grandi, Polarimetric SAR specL. Ferro-Famil, E. Pottier, and J. S. Lee, Unsupervised classifiJ.-S. Lee, M. R. Grunes, E. Pottier, and L. Ferro-Famil, UnsuperB. Aiazzi, L. Alparone, S. Baronti, and A. Garzelli, Coherence eJ. Askne, M. Santoro, G. Smith, and J. E. S. Fransson, MultitempS. R. Cloude and K. P. Papthanassiou, Polarimetric SAR interferoM. C. Crawford, S. Kumar, M. R. Ricard, J. C. Gibeaut, and A. Ne

    Special issue on data fusion, IEEE Trans. Geosci. Remote Sens. ,G. H. Halldorsson, J. A. Benediktsson, and J. R. Sveinsson, SuppC. M. Bachman, M. H. Bettenhausen, R. A. Fusina, T. F. Donato, AA. Srinivasan, An artificial intelligence approach to the analysA. Srinivasan and J. A. Richards, Analysis of GIS spatial data uJ. Lloyd, Logic for Learning, ser. Cognitive Technologies Series