Quantitative ethnobotany: applications of multivariate and...

50
to People and Plants Initiative, Division of Ecological Sciences, UNESCO, 7 Place de Fontenoy, This series of working papers is intended to provide information and to generate fruitful discussion on key issues in the sustainable and equitable use of plant resources. Please send comments on this paper and suggestions for future issues 6 PEOPLE AND PLANTS WORKING PAPER - JUNE 1999 0.01 0.10 1.00 10.00 Diameter at br Diameter at br east height [cm] east height [cm] Total bark dry weight of bottom 2 m of the plant [kg] Quantitative Ethnobotany Applications of multivariate and statistical analyses in ethnobotany M. Höft, S.K. Barik and A.M. Lykke 6 12 18 24 6 12 18 24 0

Transcript of Quantitative ethnobotany: applications of multivariate and...

Page 1: Quantitative ethnobotany: applications of multivariate and ...pure.au.dk/portal/files/17477574/HoeftBarikLykke1999.pdf · Some wild plant resources are severely threatened by habitat

to

People and Plants Initiative,

Division of Ecological Sciences,

UNESCO, 7 Place de Fontenoy,

This series of working papers

is intended to provide information and

to generate fruitful

discussion

on key issues

in the sustainable

and equitable use

of plant resources.

Please

send comments

on this paper

and suggestions

for future

issues

6PEOPLE AND PLANTS WORKING PAPER - JUNE 1999

0.01

0.10

1.00

10.00

Diameter at brDiameter at breast height [cm]east height [cm]

Tot

al b

ark

dry

wei

ght

of b

otto

m 2

m o

f th

e pl

ant

[kg]

Quantitative EthnobotanyApplications of multivariate and statistical analysesin ethnobotany

M. Höft, S.K. Barik and A.M. Lykke

6 12 18 246 12 18 240

Page 2: Quantitative ethnobotany: applications of multivariate and ...pure.au.dk/portal/files/17477574/HoeftBarikLykke1999.pdf · Some wild plant resources are severely threatened by habitat

The designations employed and the presentation of material throughout this publication do not implythe expression of any opinion whatsoever on the part of UNESCO concerning the legal status of anycountry, territory, city, or area of its authorities, or concerning the delimitation of its frontiers orboundaries. The opinions expressed in this paper are entirely those of the authors and do not commitany Organization.

Authors’ addresses:

M. Höftc/o UNESCO Office NairobiP.O. Box 30592NairobiKENYA

S. K. BarikCentre for Environmental StudiesNorth-Eastern Hill UniversityShillong 793 014INDIA

A. M. LykkeDept. of Systematic BotanyNordlandsvej 688240 RisskovDENMARK

Photos: all photos by R. Höft except Photo 1 by Y. Morimoto

Cover illustration: M. Höft; Callus formation and bark regeneration of Rytigynia kiwuensis (top);fruiting Rytigynia kigeziensis(bottom); graph showing relationship between bark weight and diame-ter of three species of Rytigynia (data from Kamatenesi 1997).

Published in 1999 by the United Nations Educational, Scientific and Cultural Organization7, place de Fontenoy, 75352 Paris Cedex 07 SP, FRANCEPrinted by UNESCO on chlorine-free recycled paper

Edited by Robert HöftDesign: Ivette FabbriLayout: Martina Höft and Robert Höft

© UNESCO / M.Höft, S.K. Barik & A.M. Lykke 1999

SC-99/WS/

Recommended citation: Höft, M., Barik, S.K. & Lykke, A.M. 1999. Quantitative ethnobotany.Applications of multivariate and statistical analyses in ethnobotany. People and Plants working paper 6.UNESCO, Paris.

Page 3: Quantitative ethnobotany: applications of multivariate and ...pure.au.dk/portal/files/17477574/HoeftBarikLykke1999.pdf · Some wild plant resources are severely threatened by habitat

Some wild plant resources are severelythreatened by habitat loss and species-selec-tive overexploitation. In addition, indige-nous knowledge about the uses of wildplant resources is rapidly disappearing fromtraditional communities. In the context ofconservation and sustainable and equitableuse of wild plant resources, quantitative eth-nobotany can contribute to the scientificbase for management decisions.

In the past, most ethnobotanical studieshave recorded vernacular names and uses ofplant species with little emphasis on quanti-tative studies. In this working paper, aselection of multivariate and statisticalmethods particularly applicable to theanalysis of ethnobotanical field data is pre-sented. The working paper aims at assistingresearchers and students to recognize theappropriate method to analyse their data andto develop management recommendationsfrom scientifically sound conclusions.

The techniques presented include clus-ter and principal component analysis,regression analysis, analysis of variance,and log-linear modelling.

Multivariate and statistical analysisrequires computerized statistics and graph-ics programs. Basic technical knowledge touse such tools as well as basic understandingof statistical terms are important require-ments to get most benefit from this publica-tion.

PEOPLE AND PLANTS WORKING PAPER 6, JUNE 1999Quantitative ethnobotany

M. HÖFT, S.K. BARIK & A.M. LYKKE

1

Abstract

Quantitative ethnobotanyAPPLICATIONS OF MULTIVARIATE AND STATISTICAL

ANALYSES IN ETHNOBOTANY

Photo 1. In most cases ethnobotanical data collection requires simpletools such as measuring tape or spring balance. This photo shows the

Loita Ethnobotany Team quantifying amounts of ‘olorien’, Olea europaea L. ssp. africana (Mill.) P. Green (Oleaceae),

used for fuel in Maasai households.

Page 4: Quantitative ethnobotany: applications of multivariate and ...pure.au.dk/portal/files/17477574/HoeftBarikLykke1999.pdf · Some wild plant resources are severely threatened by habitat

PEOPLE AND PLANTS WORKING PAPER 6, JUNE 1999Quantitative ethnobotanyM. HÖFT, S.K. BARIK & A.M. LYKKE

2

1 Abstract 2 Contents

3 Introduction3 Dimensions of data4 Sampling and organization of data8 Data standardization and transformation

9 Classification and ordination techniques9 Clustering and classification12 Ordination13 Examples of data matrices15 Matrix structure and analysis

18 Applications of cluster and principal component analysis18 Cluster analysis of ‘Wood identification’ task20 Principal component analysis of the ‘Paired comparison of wood species’ task

22 Comparisons of several means22 Hypothesis testing25 Prediction26 Linear correlation27 Cross-tabulation

30 Applications of general linear models30 Analysis of variance31 Regression analysis32 Correlation33 Chi-square analysis of contingency tables

34 References 35 Acknowledgements36 Appendix

Contents

Page 5: Quantitative ethnobotany: applications of multivariate and ...pure.au.dk/portal/files/17477574/HoeftBarikLykke1999.pdf · Some wild plant resources are severely threatened by habitat

PEOPLE AND PLANTS WORKING PAPER 6, JUNE 1999Quantitative ethnobotany

M. HÖFT, S.K. BARIK & A.M. LYKKE

3

In order to enhance the indicative value of eth-nobotanical studies, there have been attempts inrecent years to improve the traditional compila-tion-style approach through incorporating suit-able quantitative methods of research in ethnob-otanical data collection, processing and interpre-tation. Such quantitative approaches aim todescribe the variables quantitatively and analysethe observed patterns in the study, besides testinghypotheses statistically. The concept of quantita-tive ethnobotany is relatively new and the termitself was coined only in 1987 by Prance and co-workers (Prance, 1991). Quantitative ethnob-otany may be defined as "the application of quan-titative techniques to the direct analysis of con-temporary plant use data" (Phillips & Gentry1993a and b). Quantification and associatedhypothesis-testing help to generate quality infor-mation, which in turn contributes substantially toresource conservation and development. Further,the application of quantitative techniques to dataanalysis necessitates refinement of methodolo-gies for data collection. Close attention tomethodological issues not only improves the dis-cipline of ethnobotany but also enhances theimage of ethnobotany among other scientists(Phillips & Gentry 1993a and b).

Different approaches are taken to collect andanalyse quantitative and qualitative ethnobotani-cal data. The approaches depend on the objec-tives of the researcher and the nature of study andaim at the objective evaluation of the reliabilityof the conclusions based on the data.Multivariate and statistical methods are typicallyapplied to the interpretation of the followingtypes of ethnobotanical data (the list is notexhaustive): • relative importance of plant taxa and vegeta-

tion types to different ethnic, social or gendergroups;

• knowledge and uses of plants by differentethnic, social or gender groups;

• preference information on different plantspecies;

• size class distribution of woody plant species;• quantitative impact of human uses on growth

and regeneration patterns;• quantitative impact of environmental factors

on certain plant traits;• quantitative impact of agricultural or horti-

cultural techniques on certain plant traits;• quantitative plant morphological and pharma-

cological characteristics of useful plants.

The data processing techniques in ethno-botany may range from calculating a simpleindex to complex computational techniques ofmultivariate analysis such as classification andordination. The selection of a particular tech-nique for application to the data is based on theeffectiveness of the technique for sound interpre-tation of the results and identification of theinter-relationships that may exist among the vari-ables studied. In general, statistical applicationsmay be classified into two broad categories:1. Sets of data where the measurements are

taken only on one attribute or response vari-able and the data so obtained are analysedthrough a set of techniques called univariateanalysis techniques.

2. Sets of data where the measurements aretaken simultaneously on more than one vari-ables and the statistical techniques applied tosuch data sets are called multivariate analysistechniques.Studies of multivariate nature are more com-

mon in ethnobotanical research, and are treatedin more detail in this paper.

Dimensions of dataBecause of the complexities involved in mostethnobotanical studies, it is common for ethno-botanical researchers to collect observations onmany different variables. The need to understandthe relationships between many variables makesmultivariate analysis mathematically complexand the techniques to analyse such data invari-ably need a computer. Today a large number ofcomputer packages are available for analysis ofmultivariate data sets. BMDP (BMDP StatisticalSoftware Inc.), CANOCO (Ter Braak, 1988a and1988b), NTSYS (Rohlf, 1985), PC-ORD (MjMSoftware Design), R-Package (Casgrain 1999),SAS (SAS Institute Inc.), SYSTAT (SPSS Inc.),SPSS (SPSS Inc.) and TWINSPAN (Hill, 1979)are some of the popular and powerful softwarepackages widely used for a variety of multivari-ate and statistical data analyses. Besides theiranalytical features most of them include graphi-cal functions.

Generally, multivariate and statistical methods aim at making large data sets mentallyaccessible, structures recognizable and patternsexplicable, if not predictable. Johnson andWichern give five basic applications for thesemethods (Johnson & Wichern 1988):

Introduction

Page 6: Quantitative ethnobotany: applications of multivariate and ...pure.au.dk/portal/files/17477574/HoeftBarikLykke1999.pdf · Some wild plant resources are severely threatened by habitat

1. Data reduction or structural simplifica-tion: The phenomenon being studied is repre-sented as simply as possible with reducednumber of dimensions but without sacrificingvaluable information. This makes interpreta-tion easier.

2. Sorting and grouping: Groups of similarobjects or variables are created.

3. Examining relationships among variables:Variables are investigated for mutual interde-pendency. If interdependencies are found thepattern of dependency is determined.

4. Prediction: Relationships between variablesare determined for predicting the values ofone or more variables on the basis of obser-vations on the other variables.

5. Testing of hypothesis:Specific statisticalhypotheses formulated in terms of the para-meters of multivariate populations are tested.This may be done to validate or rejectassumptions.The different multivariate and statistical

analysis techniques, which are available for theabove applications are derived from one simplelinear mathematical model, the MultivariateGeneral Linear Hypothesis (MGLH). In thispaper the following linear models are presentedalong with their applications:1. classification and clustering;2. ordination;3. analysis of variance:4. regression analysis;5. correlation;6. log-linear modelling;

These techniques will be demonstrated usingexamples from a ‘People and Plants’ workshopon species used for woodcarving in Kenya, aPh.D. study on alkaloid patterns of Tabernae-montana pachysiphon,two Ugandan M.Sc. stud-ies, one on Rytigynia kiwuensis and one on med-icinal plant collection habits of different special-ist groups. Before getting to the practical appli-cations, some general remarks regarding types ofdata, sampling size, sorting and grouping of dataare presented.

First of all, the different types of quantitativeand qualitative data must be distinguished (seeBox 1, page 5). In the majority of cases ethno-botanical data are quantitative on an ordinalscale. Frequency and abundance are key parame-ters in vegetation analysis and populationdynamics, ranking order reveals important infor-mation on preferences of user groups and orderedmultistate character are data that fall into prede-fined hierarchical groups. Quantitative data on aratio or interval scale may be collected to deter-mine growth patterns of individual plant species,

to assess the effectiveness of a certain remedy, orto express the impact of human uses.

Qualitative data like ‘presence/absence’ or‘yes/no’ are often recorded during interviewswhen people’s knowledge of certain species ormanagement techniques is assessed or the poten-tial for the acceptance of substitutes for a partic-ular resource is gauged.

Counts are obtained when numbers of peoplefalling in a certain category, or numbers ofevents taking place in a pre-defined category ortime span are recorded. In order to assist in deter-mining relationships among and between vari-ables and how they can be classified and appro-priate analysis techniques identified, Box 2 (page5) lists some common data settings and researchquestions to which corresponding parametric andnon-parametric methods exist. Not all of thesetechniques are discussed in this paper.

Parametric methods apply to approximatelynormally distributed data. In a simple linearmodel

Y = a + bX + e

Y is the dependant and X the independent vari-able. Variables are defined as quantities that canvary in the same equation. In contrast, the para-meters a and b are quantities that are constant ina particular equation, but can be varied in orderto produce other equations in the same generalfamily. The parameter a is the value of Y when X= 0. This is sometimes called a Y-intercept(where a line intersects the Y-axis in a graphwhen X = 0). The parameter b is the slope of theline, or the number of units Y changes, when Xchanges by one unit; “e” is referred to as an“error” or residual, which is a departure of anactual Y from what the equation predicts. Thesum of all e is zero.

Sampling and organization ofdataHaving developed a well thought out researchdesign before going to the field is likely to: 1. save a lot of time (and money) when

analysing data;2. enhance the expected output in terms of

meaningful results;3. allow more easily for the results being trans-

lated into scientifically sound recommenda-tions;

4. leave you and others satisfied with the work.The following reflections are crucial when plan-ning ethnobotanical research in the field:• How many samples need to be taken in dif-

ferent categories?

PEOPLE AND PLANTS WORKING PAPER 6, JUNE 1999Quantitative ethnobotanyM. HÖFT, S.K. BARIK & A.M. LYKKE

4

Page 7: Quantitative ethnobotany: applications of multivariate and ...pure.au.dk/portal/files/17477574/HoeftBarikLykke1999.pdf · Some wild plant resources are severely threatened by habitat

PEOPLE AND PLANTS WORKING PAPER 6, JUNE 1999Quantitative ethnobotany

M. HÖFT, S.K. BARIK & A.M. LYKKE

5

Box 1. Qualitative and quantitative data.

Binary or two-state Yes/no or presence/absenceQualitative (usually coded as 1/0 or TRUE/FALSE)

Nominal or multi-state Categories

Frequency (number of observations for each value)

Discrete, ordinal scale Abundance (number of observations for each value per unit of space)

Quantitative Ranking order

Continuous, ratio Units in time and space (e.g. for temperature, weight, height, or interval scale circumference, etc.)

Box 2. Applications of multivariate and statistical analyses techniques based onlinear models.

Research interest Method of analysis

General relationships among variables ⇒⇒⇒⇒ Correlation analysis/analysis of co-variance

Associations between variables ⇒⇒⇒⇒ Detrended correspondence analysis

Quantitative relationships among variables and prediction ⇒⇒⇒⇒ Regression analysis

Similarity/dissimilarity among variables or groups of variables ⇒⇒⇒⇒ Cluster analysis

Variance among variables or subjects in counted observations ⇒⇒⇒⇒ Principal component analysis

Testing of hypothesis regarding factorial effects on variables ⇒⇒⇒⇒ Analysis of variance/Kruskal Wallis test

Relationships among categories in multi-way frequency tables ⇒⇒⇒⇒ Log-linear modellingand prediction of cell frequencies based on counts

Exploring survival rates ⇒⇒⇒⇒ Survival analysis

• How many categories can realistically bestudied without cutting down the minimumnumber of samples to be taken in each cate-gory?

• Can equal sampling be assured for each cate-gory?

• Which are the categories that would be repre-sentative for the research question?

• Have seasonal, diurnal, or circumstantialfluctuations to be accounted for?

• Is repeated sampling necessary? (i.e. samesamples studied at different times)

• Are the samples representative for the popu-lation?

• If processes are to be documented: canchanges realistically be observed within thetime-frame of the study? (i.e. growth recordsof plants)

• What indicators can be used to documentprocesses?Due to high cost in terms of time and money

true random sampling is not practicable in many

applied research situations. A stratified or sys-tematic random sampling strategy is, therefore,usually applied to study plant use by people.Charles Peters (1996) provides a good discussionon the different approaches.

Photo 2. Mzee Ali Mwadzpea and Alex Jeremani constructing litter traps.Sixteen traps were randomly set up in a coastal forest in Kenya to study the

nutrient dynamics of soil and vegetation.

Page 8: Quantitative ethnobotany: applications of multivariate and ...pure.au.dk/portal/files/17477574/HoeftBarikLykke1999.pdf · Some wild plant resources are severely threatened by habitat

True random sampling of a finite populationwould mean the assignment a number to eachcase and then the random selection of a sample ofnumbers. In SYSTAT, random numbers between1 and 73,500 can be generated with the followingexpression:

1 + INT (73500 * URN)When people are interviewed and questions

asked with respect to some particular knowledge,the sample should be representative and includepeople from different social backgrounds, age orgender. In order to allow interpretation of associ-ations that may arise from data analysis, it is nec-essary to record as much information as possiblefrom the interviewees.

If the aim for ethnobotanical applications isto predict one quantitative plant trait from anoth-er (usually ready to measure) quantitative planttrait, the sample size must be sufficiently large toinclude individual variation according to envi-ronmental factors at the study site (e.g. altitude,exposition, soil nutrients) and endogenous fac-tors within the species itself (e.g. age, phenolog-ical status). While for statistical confidence andaccuracy a sampling size of 10% of the total pop-ulation is desirable, for practical reasons theactual sampling size may not even reach 1% ofthe total population. An absolute minimum offour individuals within each cell (category) isindispensable for any statistical analysis.Obviously, the predictive value of inferenceincreases with increasing sample size.

The state of environmental factors may bedescribed either quantitatively (e.g. concentra-tion of nitrogen and phosphorus in the soil,average daily light sums, amount of water inthe soil) or simply categorically (e.g. ‘fertile’soil, ‘high’ light intensity, ‘dry’ site). Theseattributes are subjective depending on the per-ceptions of the researcher. For quantitativeobservations of the independent variable on acontinuous or ratio scale, regression analysiswould apply. However, if the independent vari-able is categorical, then analysis of variance isapplied. In both cases, relationships amongvariables and significant interactions betweenthe environmental factors may be expected andhave to be accounted for in analysis. Samplingschemes in the field should be planned in suchmanner as to allow separate accounting forenvironmental effects. In addition to samplingin the field, experimental designs are used toseparately test effects of factors and properplanning is imperative in this respect. Linearrelationships or ‘co-variances’ among variablesmust be tested before applying any of the stan-dard procedures.

Ideally, the sample size in each category hasto be equal and samples have to be taken consis-tently in time. Such a design is then called fullfactorial. Repeated sampling at different timesmight be done to account for seasonal and diur-nal variations. Repeated sampling analysis is aspecial form of analysis of variance and usuallycomputed with the ‘general linear model’ option.

The power of the test depends on the samplesize. The larger the sample, the smaller the min-imum detectable difference. There is no upperlimit as to the number of samples, as long as onecan handle them. In theory, there is also no limitto the number of factors that might be analysedsimultaneously. However, the number of possi-ble interactions becomes unwieldy and interpre-tation of interactions of more than three or fourvariables extremely difficult.

Another group of multivariate analysisinvolves the application of log-linear models,also referred to as discrete analysis of variance.In this case data are counts and are arranged intwo- or multi-dimensional contingency tables.Again, the number of observations should ideal-ly be equal in each of the categories.

When preparing the data set it is crucial toclear ambiguous signs (e.g. numbers with ques-tion marks) or in-between categories for the finalrecord. Usually a lot of time is wasted in cleaningof data sets from such ambiguous entries. Oftenthe whole entry is lost when the meaning of sym-bols one used to mark a certain entry at a certaintime can not be recalled. It is better to invest moretime in the field measuring or inquiring to obtainclear data entries from the beginning.

In vegetation research extensive relevés areoften produced and the crucial problem in thebeginning is to decide on the right samplingmethod. Species distribution can be recordedusing transects, whereas species abundance isrecorded in plots. In stratified plot sampling,plots are arranged along imaginary lines follow-ing environmental gradients. Stratified plot sam-pling combines two approaches to vegetationanalysis and is mainly used for investigating pop-ulation dynamics. In ecosystems where woodyvegetation is sparsely distributed, plotless sam-pling is most appropriate when wishing to deriveestimates of woody species density. The simplestplotless sampling method is the nearest individ-ual method. Random sampling points are deter-mined in the area and the distance to the nearestindividual(s) of each tree species is recorded.Successive distance measurements are taken andthe procedure is repeated for a number of randompoints. The density of each species is thenderived from the following formula:

PEOPLE AND PLANTS WORKING PAPER 6, JUNE 1999Quantitative ethnobotanyM. HÖFT, S.K. BARIK & A.M. LYKKE

6

Page 9: Quantitative ethnobotany: applications of multivariate and ...pure.au.dk/portal/files/17477574/HoeftBarikLykke1999.pdf · Some wild plant resources are severely threatened by habitat

PEOPLE AND PLANTS WORKING PAPER 6, JUNE 1999Quantitative ethnobotany

M. HÖFT, S.K. BARIK & A.M. LYKKE

7

DSp= √√ mean area/2

where: mean area = (mean distance to nearestindividual of a species)2

In forestry, another plotless samplingmethod, the point-centred quarter method isused to assess the economic value of tree stands.Here, the point centre is marked by an individualtree, and four equally sized plots are delineatedaround the centre.

With respect to plot size or transect lengththe following leads exist: inside forest and whendealing with large trees: subplot size should be20 x 10 m or 20 x 20 m. For the analysis ofregeneration patterns, subplot sizes of 10 x 10 mare sufficient to cover total areas between e.g.0.1 and 1 ha. In grassland and when analysingherbaceous vegetation (including treeseedlings), subplot sizes of 1 x 1 or 5 x 5 m areusually chosen. Transects may have lengthsbetween 100 and 1000 m and are usuallybetween 1 and 5 m wide along each side.

Data are entered into the com-puter with the aid of a spreadsheet(rows and columns), that can laterbe imported into any statisticalpackage, or simply as ASCII(American Standard Code forInformation Interchange) file,where entries are separated byblanks or tabulators. Some statisti-cal programs put limits to the max-imum file width (i.e. number ofcolumns) that can be importedwithout specification of the filewidth.

Data are entered either asnumerical values (SI units) orcharacter values. Character vari-ables are also referred to as‘string’ variables and in most pro-grams are marked with the ‘$’sign after the variable name (i.e.name$), while numeric variableshave no special sign added (i.e.length). In many statistical pack-ages it is not possible to inter-change character and string vari-ables by simple editing of thespreadsheet. Instead, a new vari-able has to be defined, based onthe value of the variable that is tobe altered. Variable names shouldbe as simple and straightforwardas possible.

Photo 4. To study the abundance of three species of much sought aftermedicinal plants, ‘nyakibazi’ (Rytigynia kigeziensis Verdc., R. kiwuensis(K. Krause) Robyns, and R. bagshawei (S. Moore) Robyns, Rubiaceae),in Bwindi Impenetrable National Park, Uganda, Maud Kamatenesi set up

more than 300 plots of 20 x 20 m, counted the individuals, measured DBHand height and determined the amount of bark used.

Photo 3. Moses Kipelian of the Loita Ethnobotany Team measuring the DBH of ‘oltarakwai’, Juniperus procera Endl. (Cupressaceae) trees, which are highly valued for

the construction of stockades and fences. The resulting size class distribution curveshowed a lack of regeneration which has raised serious concern and has led to the

establishment of a tree nursery.

Page 10: Quantitative ethnobotany: applications of multivariate and ...pure.au.dk/portal/files/17477574/HoeftBarikLykke1999.pdf · Some wild plant resources are severely threatened by habitat

PEOPLE AND PLANTS WORKING PAPER 6, JUNE 1999Quantitative ethnobotanyM. HÖFT, S.K. BARIK & A.M. LYKKE

8

Data standardizationand transformationClassical parametric methods of inference makethe assumption that the underlying populationfrom which the sample data are drawn shows anormal distribution. Normal probability plots(Figure 1) help to visualize the distribution of one

or more variables. A sample from a normal distri-bution results in an approximately S-shaped curve. A few of these methods, e.g. t - test, are robust inthe sense that they are not sensitive to modestdeparture from normality. However, the accura-cy of most tests is seriously affected at largedeviations from normality. In that case, data aretransformed so as to approximate a normal dis-tribution (Berenson et al. 1983). In order tomeet the conditions of normality, standardiza-tion of the basic data matrix is an essential stepin most techniques. Besides, standardization incertain multivariate tests (e.g. principal compo-nent analysis, factor analysis) is done in order toremove the measurement units from the basicdata. Standardization or transformation isachieved by treating the data with one of thetransformation functions given in Box 3, wherex’ ij is the transformed, while xij , and y are theoriginal data.

Binary/two state character data are not stan-dardized. For combinations of two- and multi-state characters ordering should be used. Forcombinations of qualitative and quantitativedata, one of the following options should be fol-lowed:a. ignore the problem;b. divide the data matrix;c. convert the quantitative data to qualitative.

0 10 20 30 40 50Variable X

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Fra

ctio

n of

Dat

a

Figure 1. A quantile plot showing the standardized values ofa variable Y (Fraction of Data) as a function of a variable X.

Logarithmic transformation:

x’ ij = log10 (x ij )

or

x’ ij = log10 (x ij + 1)

Square root transformation:

x’ ij = √x ij

or

x’ ij = √x ij + 0.05

Divide by standard deviation:

x ijx’ ij =

δ1

Standardization:

_X ij − X

X’ ij =

δ

Proportional function:

x ijx’ ij = 0.0 ≤ x ≤ 1.0

Σ n I =1 x ij

Divide by the range value:

x ijx’ ij= 0.0 ≤ x ≤ 1.0

x max − x min

Ordering

x’ ijx’ ij= 0.0 ≤ x ≤ 1.0

x max − x min

Linear transformation:

Y’ = ( Y − a )/ b+c

SUBTRACTION OPTIONS:

y − y min

_y − y i

DIVIDE OPTIONS:

y / y max

y / y max − y min

y / δ

_y / √ y −− y

y / √ Σy

y / √ Σy2

y / Σy

Box 3. Data transformations.

Page 11: Quantitative ethnobotany: applications of multivariate and ...pure.au.dk/portal/files/17477574/HoeftBarikLykke1999.pdf · Some wild plant resources are severely threatened by habitat

PEOPLE AND PLANTS WORKING PAPER 6, JUNE 1999Quantitative ethnobotany

M. HÖFT, S.K. BARIK & A.M. LYKKE

9

Classification and ordination techniques In general, multivariate techniques are used tocategorize or group the objects or experimentalunits. The aim of classification or ordinationcould be:1. to get an overview of the variance;2. to compare groups or trends among them-

selves or with additional data;3. produce hypotheses to prepare further studies.

Clustering and classification Classes have boundaries and hence an innerstructure and relationships with external objectsor other classes. Thus, algorithms have toaddress the problem of what to include in a par-ticular class and what to exclude. Important cri-teria for judging, recognizing and testing of clas-sifications and classes are:• the centres (averages for elements);• the density of classes; • the variance of classes;• the number of members;• the “distinctness” of delimitation.

In different methods, different criteria areoptimized. The significance of the respective cri-teria must be seen in relation to the objective ofthe study. The choice of methods depends on the

objectives. Figure 2 gives an overview of thedivision of classification methods.

There are situations where the categorizationis done in terms of groups that are themselvesdetermined from the data. Such exploratory tech-niques for grouping objects (variables or items)are called ‘clustering’. In classification methodsother than cluster analysis, the number of groupsare known beforehand and the objective is toassign new observations (items) to one of thesegroups. In cluster analysis, in contrast, noassumptions are made concerning the number ofgroups. Grouping is done on the basis of similar-ities or distances. The inputs required are simi-larity measures or data from which similaritiescan be computed.

CLUSTER ANALYSIS

Cluster analysis attempts to subdivide or par-tition a set of heterogeneous objects into relative-ly homogeneous groups. The objective of clusteranalysis is to develop subgroupings such thatobjects within a particular subgroup are morealike than those in a different subgroup. Thus, theoutcome of cluster analysis is a classificationscheme that provides the sequence of groupings

Pattern recognition

Discriminant analysis Cluster analysis

Hierarchical Non-hierarchical

Divisive Agglomerative

Monothetic Polythetic

Association analysisGroup analysis

Divisive informationanalysis

Nodal analysis

Indicator species(PHYTO)

(TWINSPAN)

Serial clusteringRelocated group

clusteringGrid analysis

Single linkageComplete linkageCentroid sortingAverage linkage

Minimum variance

Figure 2. Classification of classification methods (after Fischer & Bemmerlein 1986).

Page 12: Quantitative ethnobotany: applications of multivariate and ...pure.au.dk/portal/files/17477574/HoeftBarikLykke1999.pdf · Some wild plant resources are severely threatened by habitat

PEOPLE AND PLANTS WORKING PAPER 6, JUNE 1999Quantitative ethnobotanyM. HÖFT, S.K. BARIK & A.M. LYKKE

10

by which a set of objects is subdivided. Box 4 listssome examples of data which are suitable for clus-ter analysis.

The processes of sequencing are hierarchicalor non-hierarchical clustering. In non-hierarchi-cal clustering, objects are divided into groups,without relationships being established between

them, i.e. no dendrogram can be produced. Non-hierarchical clustering is particularly suitable forlarge data sets, since no complete similarity matrixmust be calculated. All non-hierarchical clustersare calculated in the following way:1. choice of number and position of initial clus-

ter centres;2. allocation of all objects to one respective

cluster centre;3. new calculation of cluster centres;4. re-iteration of steps 2) and 3) until no further

changes occur in the structure of clusters;5. eventually merging of clusters.

The more widely used approach is hierarchi-cal clustering arrangement. In this approach,once two objects are linked together at a particu-lar stage, they cannot be separated into differentclusters later on. Therefore, clustering decisionsat a particular step are conditioned by thearrangement of objects at the previous step. Inthis approach the number of possible clusteringchoices decreases at each step. In hierarchicalclustering, groups at any lower level of a clusterare exclusive subgroups of those groups at high-er levels. In contrast to non-hierarchical cluster-ing, statements on the relationships of classes(but not of the relationships of members in therespective classes) can be made in the hierarchi-cal approach. The results can be depicted in theform of a dendrogram. All methods discussed inthe following paragraphs are hierarchical.

Hierarchical clustering may be either divisiveor agglomerative. In a divisive cluster analysis,the entire collection of objects is divided and re-divided, based on object similarities, to arrive atthe final groupings (i.e., picture an inverted tree).In an agglomerative classification, as its nameimplies, individual objects are combined and re-combined successively to form larger groups ofobjects, (i.e. the tree).

Divisive and agglomerative arrangementsmay be either monothetic or polythetic.Agglomerative methods are always polythetic.The following groups exist: • monothetic divisive,• polythetic divisive,• polythetic agglomerative.

In a monothetic clustering, the similarity ofany two object groups is based on the value of asingle variable, for example, preference rankingbased on a single factor. In a polythetic classifi-cation, the similarity of any two objects orgroups is based on their overall similarity asmeasured by numerous variables, for example,preference ranking based on several factors andfinally combined to an index.

Box 4. Examples of data suitable for cluster analysis.

• Similarity/dissimilarity of people’s responses to well definedquestions.

• Similarity/dissimilarity of plant utilization patterns among differentethnic, social or gender groups.

• Similarity/dissimilarity of species based on people’s indication ofuse values

• Similarity/dissimilarity of phenotypic characteristics (e.g. seeds)in different varieties of food plants.

• Similarity/dissimilarity of the pattern of secondary compounds(e.g. essential oils) in different varieties of medicinal or aromaticplants.

Photo 5. Woman selling herbal medicine at a market in Menglun, YunnanProvince, China. In most cases the older members of

a community have a deeper knowledge of the environment and the properties and uses of plant species.

Page 13: Quantitative ethnobotany: applications of multivariate and ...pure.au.dk/portal/files/17477574/HoeftBarikLykke1999.pdf · Some wild plant resources are severely threatened by habitat

PEOPLE AND PLANTS WORKING PAPER 6, JUNE 1999Quantitative ethnobotany

M. HÖFT, S.K. BARIK & A.M. LYKKE

11

Agglomerative clustering procedures beginby considering each object as its own distinctcluster. Then two objects are placed together in asingle cluster according to certain optimizationcriteria while grouping each of the remainingobjects separately. In the next step, objects aregrouped into either one cluster of three or twoclusters of two (with each remaining objectgrouped separately). This clustering procedurecontinues sequentially until all objects aremerged into one cluster.

Another criterion for defining cluster analy-ses is related to the measure of distance utilized inlinking the objects for cluster formation.Alternative approaches are followed, includingcomplete linkage, single linkage and averagelinkage. In complete linkage, the merger of twosubsets of objects is based on the maximum dis-tance between objects. This approach is alsocalledfarthest neighbour or diameter methodandproduces compact clusters of approximatelyequal size (unsuitable for ethnobotanical researchquestions). In single linkage, the merger is basedon the minimum distance between objects. Thisapproach is alternatively known as nearest neigh-bour methodand often produces a single largechain-like cluster and several small clusters dur-ing its sequencing process. The average linkageapproach bases the merger of two subsets ofobjects on the average distance between objectsand is considered to be a way in between the firsttwo approaches.

The general approach to cluster analysis is tocompute a normal mode resemblance matrixbetween the objects (also referred to as samplingunits or operational taxonomic units (OTUs))using appropriate resemblance functions. Thesimilarities/distances between all pairwise com-binations of sampling units (SUs) in a collectionare summarized into a SU x SU similarity/dis-tance matrix and the various cluster analysisstrategies operate on this matrix.

The cluster analysis models described hereare agglomerative: they begin with a collectionof N individual SUs and progressively buildgroups or clusters of similar SUs. During eachclustering cycle, only one pair of entities may bejoined to form a new cluster. This pair may be:1) an individual SU with another individual SU, 2) an individual with an existing cluster of SUs, 3) a cluster with a cluster. Hence, the term pair-

group cluster analysis is applied.The first step in all pair-group cluster analy-

sis strategies involves searching thesimilarity/distance matrix for the smallest dis-tance value between two individual SUs. Thesetwo individual SUs may be represented by the

symbols j and k, respectively. Hence, the firstcluster is formed at a distance D(j,k) and this canbe diagrammed using a dendogram. The initialcollection of N SUs is now reduced to one clus-ter C1 (= SUs j and k joined) and N − 2 individ-ual SUs. Special equations have been developedto compute the distance between this cluster andeach of these N − 2 remaining SUs. A generallinear combinatorial equation developed byLance & Williams (1967) is given below:

D (j, k) = αα1 D (j, h) + αα2 D (k, h) + ββ D (j, k)

where the distance between the new cluster (j,k)is formed from the jth and kth SUs. A third hthSU or group of SUs can be calculated from theknown distances D(j,k) D(j,h) and D(k,h) and theparameters α

1, α2, and β. The distance betweenSU 3 and the cluster represented by SUs 1 and 4is given by:

D(1,4)(3) = 1 D(1,3) + 2 D(4,3) + D(1,4)

The different clustering strategies differ onlyin their values for α1, α2, and β, which are theweights for determining the new distances.

Depending on the weighting scheme used,the resultant cluster formation varies. The groupmean clustering strategy (the unweighted pair-group method with arithmetic averages -UPGMA) is most commonly used and it effec-tively computes the mean of all distancesbetween SUs of one group to the SUs of anotherand, hence, is unweighted (see Legendre &Legendre 1998 for weighting strategies).

Photo 6. Pramoth Kheowvongsri interviewing a Palong healer in No Lai, northernThailand, on medicinal plants use and trade. Responses from structured interviews

can be analysed using cluster analysis.

Page 14: Quantitative ethnobotany: applications of multivariate and ...pure.au.dk/portal/files/17477574/HoeftBarikLykke1999.pdf · Some wild plant resources are severely threatened by habitat

PEOPLE AND PLANTS WORKING PAPER 6, JUNE 1999Quantitative ethnobotanyM. HÖFT, S.K. BARIK & A.M. LYKKE

12

OrdinationOrdination involves reduction of dimension-

ality. The basic objective of reducing dimension-ality in analysing multi-response data is to obtainsimplicity for better understanding, visualizationand interpretation. While reducing the dimen-sions, the techniques ensure the retention of suf-ficient details for adequate representation. Someof the important goals of reducing the dimen-sionality of multiple response data are as follows(Gnanadesikan 1977):1. to screen out redundant variables or to find

more insightful ones as a preliminary step tofurther analysis;

2. to stabilize scales of measurement, when asimilar property is described by each of sever-al variables. Here the aim is to compound thevarious measurements into fewer numbers;

3. to help in assessing the significance for test-ing a null hypothesis by compounding themultiple information. For example, smalldepartures from null conditions may be evi-denced on each of several jointly observedresponses. It is advisable to integrate thesenon-centralities into a smaller dimensionalspace wherein their existence might be moresensitively indicated;

4. to obtain the preliminary specification of aspace, which may be used later on in classifi-cation and discrimination procedures;

5. to detect the possible functional dependenciesamong observations in high-dimensionalspace.In ordination two distinctly different

approaches exist: direct and indirect gradientanalysis (Figure 3).

Reduction of dimensionality (ordination)

Direct gradient analysis Indirect gradient analysis

Principal componentanalysis

Detrended correspondenceanalysis

Factoranalysis

Multidimensionalscaling

Correspondenceanalysis

Canonical correspondenceanalysis

Bray-and-Curtis-Ordination

Figure 3. Classification of ordination methods.

Historically, these methods are employed toinvestigate the relative importance of underlyingecological factors in vegetation analysis. Indirect gradient analysis, vegetation relevés arearranged in an ecological space along axes ofmoisture, nutrients, altitude, etc. and the influ-ence of the respective factors on the vegetation isdetermined. The indirect gradient analysis, incontrast to direct gradient analysis, focuses onthe floristic composition. Five methods are dis-tinguished:• Bray-Curtis-Ordination,• correspondence analysis,

• multidimensional scaling,• principal component analysis, and• factor analysis.

PRINCIPAL COMPONENT AND FACTORANALYSIS

The two most widely used classical linearreduction methods are principal componentanalysis (PCA) and factor analysis. In PCA, a d-dimensional observation (usually with corre-lated variables) is replaced by a k-linear combi-nation of uncorrelated variables, where k is muchsmaller than d. Biplots are used to graphically

Page 15: Quantitative ethnobotany: applications of multivariate and ...pure.au.dk/portal/files/17477574/HoeftBarikLykke1999.pdf · Some wild plant resources are severely threatened by habitat

PEOPLE AND PLANTS WORKING PAPER 6, JUNE 1999Quantitative ethnobotany

M. HÖFT, S.K. BARIK & A.M. LYKKE

13

describe both, relationships among the d-dimen-sional observations x1, x2, x3 ....xn and relation-ships among the variables in two dimensions.Underlying assumptions for the data set to beanalysed used PCA are:1) data are normally distributed,2) linear relationships exist between

variables.Only linear relationships are

elaborated through PCA. The methodlooks at the objects (respondents) as anassembly of dots in a space, who’s axes repre-sent the (plant) species in question. The aim ofthe method is to project the multi-dimensionalonto a two-dimensional hyperspace, such thatminimum information on the distances betweendots is lost. The first axis is laid through thecentre of the dot cloud into the direction oflargest variance. The second and following axesare perpendicular to the first axes, pointing intothe direction of largest rest-variance. PCA is atransformation, in which the origin of the co-ordinate system is moved to the centre of thedot cloud and the axes are arranged according tovariance. The problem of moving axes is math-ematically solved through analysis of “Eigen”(German, meaning “self”) vectors of the co-variance or correlation matrix. Detrended corre-spondence analysis and reciprocal averaging areforms of PCA which were specifically devel-oped for plant sociological analyses and are notfurther discussed here.

Factoranalysis, amethod oftenconfused with PCA,attempts to extract a lowerdimensional linear structure from thedata that explains the correlations between the vari-ables. However, when one subset of variables iscompared with the subset of the remaining vari-ables in the set, the method of canonical correlation(not discussed here) is used to find suitable linearcombinations within each subset. If any groupingof the observations in a lower dimension isrequired to be highlighted, then canonical discrim-inant analysis (discriminant coordinates) can beperformed. Linear combinations are then chosen tohighlight group separation. In Box 5 some exam-ples for application of principal component analy-

sis are given.

Examples of datamatricesThe statistical analysis of theexamples provided in this work-ing paper are all based on matri-ces and matrix algebra. The fol-lowing examples are drawn froman exercise where sixteenKenyan woodcarvers were inter-viewed. During a workshop,three sets of data were collected:• free listing of wood suitable for

carving; • wood identification task

(yes/no; binary or two statecharacter);

• paired comparison of woodspecies (ordered multistatecharacter);

In the following paragraphs fur-ther details are provided on thesedata sets.

Box 5. Examples of data suitable for prin-cipal component analysis.

• People asked to rank or categorize plant use values. PCA can becarried out on the People x Species matrix (with the rank in thecell). The resulting ordination diagram (with people in plantsspace) will reveal if there are certain groups of people that tendto value the same species in the same way, i.e. gender, ethnic orage groups. The species vectors in the diagram will indicatewhich species are characteristic for which groups.

• Spot people who respond differently from the majority. If a per-son just gave random answers or purposely replied incorrectlythis person will be seen as an outlier on the ordination diagram,on the condition that there is a pattern in the answers in general.

• People indicating if certain species are useful (or not) for a num-ber of purposes. A Species x Use matrix can be formed (with thenumber of species indicated in a certain use category).Ordination on these data will group species according to the usevalues assigned by people, and the vectors will indicate whichuses characterize a group of species.

• Characterizing changes in e.g. floristic composition along envi-ronmental gradients. The axes would provide information on themost influential factor.

Photo 7. A vendor at a market in Menglun, Yunnan Province, China,selling spices. Market surveys can provide insight into the extent

of trade and harvesting pressure on plant resources collectedfrom the wild. Market prices tend to be good indicators

of scarcity.

Page 16: Quantitative ethnobotany: applications of multivariate and ...pure.au.dk/portal/files/17477574/HoeftBarikLykke1999.pdf · Some wild plant resources are severely threatened by habitat

PEOPLE AND PLANTS WORKING PAPER 6, JUNE 1999Quantitative ethnobotanyM. HÖFT, S.K. BARIK & A.M. LYKKE

14

I. DATA SET ON THE ‘FREE LISTING OFWOOD SUITABLE FOR CARVING’

For free listing of wood species that are suit-able for carving, 16 interviewees were selectedrepresenting women, old carvers, medium-agedcarvers and young apprentices. The question,‘Which trees can be used for carving?’ was askedto each interviewee and fourteen most preferredspecies along with the frequency and position oftheir mention by the interviewees were recorded(Table 1).

II. BASIC DATA MATRIX FOR THE‘WOOD IDENTIFICATION TASK’

The data on the ‘Wood identification task’were collected on eight species based on theresponses of sixteen respondents involved in a

woodcarving project in Kenya. Each artisan wasasked the question separately for eight species toknow if he or she can identify the species or not.In the event of a positive reply (‘Yes’), the value1 was allotted; alternatively, if the reply was ‘No’,a 0 value was assigned. In this way, the matrix forsixteen respondents and eight species was com-pleted. The species were arranged across the rows,while the respondents were arranged across thecolumns. (see Table 4, Appendix, p. 36).

III. BASIC DATA MATRIX FOR A ‘PAIREDCOMPARISON OF WOOD SPECIES’

In order to assess species preference amongthe artisans, a ‘Paired comparison of woodspecies’ was undertaken. For the purpose, fivetree species used for woodcarving were selectedand the respondents (the sixteen artisans) wereasked to state their preference between any twospecies set or pair combination of the fivespecies. Preferences of each respondent inrespect of five such possible species pair combi-nations (n (n-1) / 2) were tabulated as shown inTable 2. The score is defined by the total numberof mentions in the table and the highest rank isassigned to the species with the highest score.Pairwise rank matrices were then prepared inrespect of each respondent (R1.....R16). Finally,the ranks for five species so obtained from theresponses were tabulated in matrix form. Therows of the matrix represented the species and thecolumns were respondents.

Table 1. The fourteen most preferred species in theKenyan woodcarving industry with frequency andposition of their mention by sixteen interviewees.

Species Frequency of mention Position of mention(average rank)

(X) (Y)

Brachylaena huillensis 16 1.4

Dalbergia melanoxylon 16 1.8

Combretum schumannii 16 4.9

Zanthoxylum chalybeum 10 6.2

Azadirachta indica 12 6.3

Sterculia africana 13 8.2

Olea europaea ssp. africana 6 9.3

Erythrina sacleuxii 12 9.7

Commiphora baluensis 11 9.7

Mangifera indica 10 10.3

Albizia anthelmintica 8 10.5

Terminalia brownii 9 10.7

Platycelyphium voense 6 12.0

Oldfieldia somalensis 10 12.5

Table 2. A pairwise ranking matrix for five tree speciesused in woodcarving. *

S1 S2 S3 S4 S5 Score Rank

S2 S3 S4 S5 S1 0 1

S2 S2 S2 S2 4 5

S3 S5 S3 2 3

S5 S4 1 2

S5 3 4

* The table is based on the preferences expressed by one respondent (R1).

Photo 8. ‘Muhuhu’ (Brachylaena huillensis O. Hoffm.,Asteraceae) logs piled up outside a carving workshop

in Wamunyu, Eastern Province, Kenya. Each year40,000 indigenous trees are felled in Kenya for

woodcarving.

Page 17: Quantitative ethnobotany: applications of multivariate and ...pure.au.dk/portal/files/17477574/HoeftBarikLykke1999.pdf · Some wild plant resources are severely threatened by habitat

PEOPLE AND PLANTS WORKING PAPER 6, JUNE 1999Quantitative ethnobotany

M. HÖFT, S.K. BARIK & A.M. LYKKE

15

Matrix structure and analysisThe term descriptor is used for the attributes thatdescribe or compare the objects of the study. Theobjects may be the respondents, samples, loca-tions, quadrats, observations or any other sam-pling units (e.g. operational taxonomic units -OTUs in numerical taxonomy). In our examplethe respondents (R1 - R16) were objects and thespecies were the descriptors, i.e. responses of theartisans (measures of ability to identify woodspecies used for woodcarving). Yes/No or posi-tive/negative reply values were recorded in‘Wood identification task’ while rank values ofthe species were recorded in ‘Pairwise ranking ofwood species’ data.

NORMAL VS. INVERSE ANALYSIS

Data matrices can be viewed either downcolumns or across rows, i.e. one can look at rela-tions between objects or between descriptors. Forinstance, one may wish to explore the relationshipbetween respondents/objects to see whether certaingroups of people gave similar responses and there-fore may have similar attitudes towards carvingwood. Or, one may wish to explore relationsbetween descriptors/ rowsto highlight for whichspecies people tend to givesimilar responses.Maximum information canoften be obtained by mak-ing both modes of analysis.The two modes of analysisrequire different measuresof association as objects areindependent of each other(sampling of objects ispreferably done in a way toensure mutual indepen-dence of sampling units),whereas descriptors may bedependent. A variety ofassociation measures areavailable to study the rela-tionship of objects (e.g.Legendre & Legendre,1998). Different correlationcoefficients are applied tothe study of relations between descriptors.

If objects are grouped on the basis of theentire set of descriptors, it is sometimes referredto as normal analysis, whereas in an inverseanalysis, descriptors are grouped on the basis oftheir distribution in a series of objects (Kent &Coker 1994). In connection with ordinations thetwo modes of analysis have been referred to as

‘objects in descriptors space’ and ‘descriptors inobjects space’, e.g. ‘people in species space’ and‘woodcarving species in peoples space’.

The two modes of analysis described above,are also frequently referred to as R and Q mode.The use of the terms R- and Q-mode, however, isa possible point of confusion as certain authorsdefine the mode on the basis of the associationmatrix, whereas others define the mode on thepurpose of the analysis. Authors who base thedefinition of the mode on the association matrixcall analyses based on the relationships betweendescriptors ‘R-mode’, and analyses based on therelationships of objects ‘Q-mode’ (Jongmann etal. 1987, Legendre & Legendre 1998). Authorswho base the definition of the mode on the pur-pose of the analysis, do so in two contradictoryways: in some literature R-mode relates to clas-sification/ordination of objects and Q-moderelates to species classification/ordination(Pielou 1984; Causton 1988; Kent & Coker1994). Again, in other literature, Q-mode relatesto classification/ordination of objects and R-mode relates to species classification/ordination(Romesburg 1984).

Object ordinations normally begin with a dis-persion/correlation matrix of descriptors (althoughthey can be based on a association matrix ofobjects). According to Legendre & Legendre(1998), object ordinations can therefore be both, R-and Q-mode. Because of these confusing nota-tions, we prefer to use the terms ‘normal’ and‘inverse’ to describe the purpose of the analysis.

Photo 9. National Museums of Kenya researchers Mohamed Pakia, RaymondObunga and Hamisi Mududu measuring DBH and basal diameters of standing and

cut ‘mgurure’, Combretum schumannii Engl. (Combretaceae), trees in Dzombo Forest, coastal Kenya.

Page 18: Quantitative ethnobotany: applications of multivariate and ...pure.au.dk/portal/files/17477574/HoeftBarikLykke1999.pdf · Some wild plant resources are severely threatened by habitat

SIMILARITY MEASURES

The analysis is started on a resemblance matrixwhich is derived either from the original or atransformed/standardized data matrix. Theseresemblance matrices arecalled ‘similarity matrix’ or‘dissimilarity matrix’ depend-ing on the way in whichresemblance functions are cal-culated and the matrix isderived. In this section, someresemblance functions thatquantify the similarity or dis-similarity between samplesare described. The more simi-lar the objects (respondents orsamples) are with respect to aparticular character (variable),the greater their resemblanceand the smaller the distancebetween them when projectedinto a geometric space.Resemblance functions quan-tify the similarity or dissimi-larity between two objects(samples) based on observa-tions over a set of descriptors(Sneath & Sokal 1973). Toexplore the nature of relation-ships or affinities that existsamong the respondents, nor-mal mode analysis is usuallyapplied. Two types of normalmode resemblance functionsare distinguished: 1. similarity coefficients and 2. distance coefficients.

Similarity coefficients vary from a minimumof 0 (when a pair of respondents are completelydifferent) to 1 (when the respondents are identi-cal). On the other hand, distance coefficientsassume a minimum value of 0 when a pair ofrespondents are identical and have some maxi-mum value (in some cases infinity) when the pairof respondents are completely different. Hence,distance coefficients are also referred to as dis-similarity coefficients. In fact, a similarity indexmay always be expressed as a distance just by asimple transformation such as 1 − similarity(Legendre & Legendre 1998). Thus, distancemay be thought of as the complement of similar-ity (Sneath & Sokal 1973).

Similarity coefficients are widely usedindices. These indices are based solely on pres-ence/positive reply (indicated with a ‘1’) orabsence/negative reply (indicated with a ‘0’) data(see Appendix, Tables 4 to 8 for illustration).

Three indices - Ochiai, Dice and Jaccard -are useful for calculating the similarity index ofpresence/absence or positive/ negative replydata (qualitative) (see Box 6).

These indices can be used to measure thedegree of association between species (aninverse mode analysis, i.e., across the rows ofthe data matrix) as well as to compute a normalmode similarity between respondents. It may bementioned here that these are the only types offunctions that are used to measure both normalmode (sample similarity) and inverse mode(species association) resemblance (Ludwig &Reynolds 1988).

DISTANCE COEFFICIENTS

Measures of distance may be categorized intothree groups:

1. E-group (the Euclidean distance coeffi-cients);

2. BC-group (the Bray-Curtis dissimilarityindex);

3. RE-group (the relative Euclidean distancemeasures).

PEOPLE AND PLANTS WORKING PAPER 6, JUNE 1999Quantitative ethnobotanyM. HÖFT, S.K. BARIK & A.M. LYKKE

16

Box 6. Indices for calculating similarityindex of presence/absence or posi-tive/negative reply data (qualitative).

Ochiai Index (OI)

aOI = In the above example:

√a+b√a+c

1OI R14,R15 = = 0.577

√1√ 3

Dice Index (DI) (Sorensen Index)

2aDI = In the above example:

2a+b+c

2DI R14,R15 = = 0.5

2+0+2

Jaccard Index (JI)

aJI = In the above example:

a+b+c 1

JI R14,R15 = = 0.331+0+2

Page 19: Quantitative ethnobotany: applications of multivariate and ...pure.au.dk/portal/files/17477574/HoeftBarikLykke1999.pdf · Some wild plant resources are severely threatened by habitat

The distances computed between all possiblepairs of sampling units (SUs) based on any of theabove similarity or distance measures are arrangedin a SU x SU matrix. Examination of this matrixquickly reveals the distance between any two SUs.It is on this distance matrix that the clusteringstrategies and ordination techniques such as prin-cipal component analysis operate. The distancecoefficients are explained in Box 7.

PEOPLE AND PLANTS WORKING PAPER 6, JUNE 1999Quantitative ethnobotany

M. HÖFT, S.K. BARIK & A.M. LYKKE

17

BC-GROUP DISTANCE

This group is represented by a single index first introducedby Bray & Curtis (1957). The step is to compute the per-cent similarity (PS) between SUs j and k as

2WPS jk = 100

A + BWhere W = Σ si=1 min(X ij , Xik)

A = Σ si=1 X ij

B = Σ si=1 Xik

Percent Dissimilarity (PD):

PD = 100 − PS

PD may also be computed on a 0 − 1 scale as

PD = 1 − [2W/(A+B)]

RE-GROUP DISTANCE

This group contains distance indices that are expressed onstandardized or relative scales.

RELATIVE EUCLIDEAN DISTANCE (RED)

RED jk = √ Σ si=1 [(X ij / Σ s

i X ij ) − (Xik / Σ si Xik)]

2

RED ranges from 0 to √ 2.

RELATIVE ABSOLUTE DISTANCE (RAD)

RAD jk = Σ si=1 [(X ij / Σ s

i X ij ) − (Xik / Σ si Xik)]

RAD has a range from 0 to 2.

CHORD DISTANCE (CRD)

This is done by projecting the SUs on to a circle of unit radiusthrough the use of direction cosinuses. The measure isthen the chord distance between the two SUs after such aprojection.

CRD jk = √ 2 (1 − ccos jk)

Where the chord cosinus (ccos) is computed from:

Σsi = 1 (X ij , Xik)

ccos jk =

√Σ siX 2 ij Σ s

i X2ik

In case of binary data, this ccos is identical to Ochiai's coef-ficient. CRD, like RED, ranges from 0 to √2.

GEODESIC DISTANCE (GDD)

This measure is the distance along the arc of the unit circle(rather than the chord distance) after projection of the SUsonto a circle of unit radius:

GDD jk = arccos (ccos jk)

GDD has a range from 0 to π/2 (i.e. 0 to 1.57).

Box 7. Distance coefficients (after Ludwig & Reynolds 1988).

E-GROUP DISTANCES

EUCLIDEAN DISTANCE (ED)

This measure is the familiar equation for calculating the dis-tance between two points Rj and Rk in Euclidean space:

ED jk = √Σ si=1(X ij − Xik)

2

The value of ED ranges from zero to infinity, as do all of theE-group measures.

SQUARED EUCLIDEAN DISTANCE (SED)

This measure is the square of ED:

SED jk =Σ si=1(X ij − Xik)

2

MEAN EUCLIDEAN DISTANCE (MED)

MED is similar to ED, but the final distance is on a smallerscale since the mean difference is used:

Σ si=1(X ij − Xik)

2

MED jk = √S

ABSOLUTE DISTANCE (AD)

This measure is the sum of the absolute differences takenover the S species:

AD jk =Σ si=1(X ij − Xik)

This distance measure is also known as Manhattan or Cityblock dissimilarity coefficient measure. The AD measureis the character difference in numerical taxonomy (Sneath& Sokal 1973).

MEAN ABSOLUTE DISTANCE (MAD)

The MAD is similar to AD, but a mean distance is usedrather than an absolute distance:

Σ si=1(X ij − Xik)

MAD jk =S

MAD is equivalent to the mean character difference used innumerical taxonomy (Sneath & Sokal 1973).

Page 20: Quantitative ethnobotany: applications of multivariate and ...pure.au.dk/portal/files/17477574/HoeftBarikLykke1999.pdf · Some wild plant resources are severely threatened by habitat

PEOPLE AND PLANTS WORKING PAPER 6, JUNE 1999Quantitative ethnobotanyM. HÖFT, S.K. BARIK & A.M. LYKKE

18

Cluster analysis of the ‘Woodidentification’ taskThe six basic steps involved in cluster analysisare described below taking the data sets from theKenyan woodcarving project ‘Wood identifica-tion task’ and ‘Paired comparison of woodspecies’ as example. The utility of cluster analy-sis on such data are:• The responses (objects) can be grouped

according to their resemblances, i.e. based onthe respondents’ ability to identify a particu-lar species used for woodcarving in case of‘Wood identification task’ data and onspecies preferences in the ‘Paired comparisonof wood species’ data. The respondents ineach cluster should have a number of com-

mon characteristics that set them apart fromthe respondents of other such clusters.

• The data sets can be reduced to homogeneousgroups or clusters. The objective is to demon-strate the relationships of the respondents toeach other and to simplify these relationshipsso that some general statements about theclasses of respondents that exist can be made.

Being an ethnobotanical problem, where theinterest is to know about the respondents throughtheir view on the individual species, normalmode analysis will be used for both data sets.The procedure is a polythetic, agglomerativeclassification technique. The results are based onthe output of the NTSYS package but the basicsteps are similar for any other package.

Applications of cluster and principal componentanalysis

STEP 1 Obtaining the basic data matrix(see Appendix,Tables 4 and 5, page 36).

STEP 2 Standardizing the basic data matrix. The basicdata matrix is standardized for following reasons:• To make the species contribute more equally to

the similarity between the respondents.• To remove all the measuring units (not applic-

able to the data presented here).The standardization is performed through a

linear transformation of the original values foreach character/element of the basic data matrix.Since binary data are not standardized, the basicdata matrix for the ‘Wood identification task’ willbe used for further analysis. The basic data matrixfor ‘Paired comparison of wood species’ has beenstandardized by dividing the matrix elements bythe standard deviation (see Appendix, Table 6).

STEP 3 Computing the resemblance matrix.The nextstep in cluster analysis is to compute a normalmode resemblance between the respondents(R1...R16). Although any of the numerous resem-blance functions available could be used, distancemeasures have been used for multistate characterdata in the ‘Paired comparison of wood species’because of their heuristic value in a cluster analy-sis (Sneath & Sokal 1973). However, for two statedata in the ‘Wood identification task’, the similar-ity measure is Jaccard's coefficient. The distancesbetween all pairwise combinations of respondentsare summarized into a 16 x 16 distance (D) orresemblance matrix for each data set (see Appendix,

Tables 9 and 10, page 37 and 38). The further clus-ter analysis strategies operate on these resemblancematrices.

STEP 4 Executing the clustering method and obtainingthe tree matrix. The clustering technique usedhere is a hierarchical agglomerative procedurebased on UPGMA (unweighted pair-group methodwith arithmetic averages). The clustering was exe-cuted on the resemblance matrices (see Appendix,Tables 9 and 10) to yield the tree matrices (seeAppendix, Tables 11 and 12).

STEP 5 Drawing the tree or dendrogram. The treematrix derived below produces a tree on scaleshowing the clustering scheme. The dendrogramsor trees for ‘Wood identification’ and ‘Pairedcomparison of wood species’ data is given inFigures 4 and 5, respectively (page19).

STEP 6 Computing the cophenetic matrix and coeffi-cient, and plotting. A tree is not exactly like thedata matrix it represents. It is necessary to knowhow well the tree represents the basic data matrix.The cophenetic correlation coefficient measureshow well the tree and the resemblance matrixmatches. The values that appear in the copheneticmatrix (see Appendix, Tables 13 and 14, page 39)stem from the tree and are compared with those ofthe basic data matrix either through a matrix plotor Pearson product moment correlation coeffi-cient. Figures 11 and 12 (Appendix, page 39) showthe relationships between the cophenetic andresemblance matrices.

Box 8. Steps involved in cluster analysis.

Page 21: Quantitative ethnobotany: applications of multivariate and ...pure.au.dk/portal/files/17477574/HoeftBarikLykke1999.pdf · Some wild plant resources are severely threatened by habitat

PEOPLE AND PLANTS WORKING PAPER 6, JUNE 1999Quantitative ethnobotany

M. HÖFT, S.K. BARIK & A.M. LYKKE

The clustering resultsdepicted in the dendro-gram (Figure 4) for the‘Wood identification’ taskexhibit a clear separationof respondents at the eight-cluster level. The two-clus-ter solution separates thetwo groups (R16 from therest) which may be differ-ent in their socio-economicconditions, age structure,artisan skills or ethnic com-position. This may beexamined through thealready collected data inthese respects or a newexplorative study may bedesigned for testing theabove hypothesis. Further,the tree shows that R1, R2,R3, R4, R5, R15 and R11 aresimilar. Based on this, theresearcher can treat theserespondents as similar infurther experiments or indesigning new studies. Inaddition, the factors respon-sible for such similaritymay also be explored,which may have high eth-nobotanical relevance.

Similarly, the tree inFigure 5 for the ‘Pairedcomparison of woodspecies’, reveals the exis-tence of two groups ofrespondents (i.e. R3, R4,R16 and R7 in one groupand the rest in anothergroup). The underlyingfactors for such groupingpattern may be explored.Further, two distinctgroups of respondentsexist: one group with R1,R2, R10 and R14 respon-dents and the other withR6, R13, R9, R11 and R12.Each group consists of alarge number of similar respondents (fiveand four respectively). Thefactor(s) behind such simi-larity may be an interesting ethnobotanicalobservation.

19

Figure 4. Dendrogram based onthe distance matrix and showingclustering of sixteen respondentsin respect to eight tree speciesused for woodcarving according tothe responses of the interviewees.

Figure 5. Dendrogram based on the sim-ilarity matrix and showing clustering ofsixteen respondents in respect of fivetree species used for woodcarving inthe ‘Paired comparison of woodspecies’ according to the responses ofthe interviewees.

Page 22: Quantitative ethnobotany: applications of multivariate and ...pure.au.dk/portal/files/17477574/HoeftBarikLykke1999.pdf · Some wild plant resources are severely threatened by habitat

PEOPLE AND PLANTS WORKING PAPER 6, JUNE 1999Quantitative ethnobotanyM. HÖFT, S.K. BARIK & A.M. LYKKE

20

Principal component analysis ofthe ‘Paired comparison of woodspecies’ taskIn the ‘Paired comparison of wood species’ data,the responses of sixteen respondents for fivespecies have been recorded. The basic purpose ofprincipal components is to account for the totalvariation among these sixteen respondents in afive-dimensional space by forming a new set oforthogonal (uncorrelated) composite variates, theordination axes. Each of these axes is a linearcombination of the original set of variables. Inlinear combinations each successive compositevariate accounts for a smaller portion of the totalvariation. Thus, the first principal component ishaving the largest variance, the second is havinga variance less than the first and more than thethird, etc. The new composite variables required toaccount adequately for the total variation are gen-erally fewer than the original variables.

The objectives are as follows:• To simplify and condense data sets. If there

are many species, dimensions can be reduced,such that relationships between the respon-dents can more easily be examined.

• To project the research participants in thespace of coordinates according to theirresponses so that their relative positions tothe axes and to each other provide maximuminformation about their similarities. By iden-tifying similar informants from their positionwith respect to the axes, underlying factors inthe observed pattern may be searched.

• The observed patterns may later be explainedon the basis of social and cultural features ofthe research participants. Differences in thepattern may be correlated with ethnicity, rela-tive economic conditions, family structure, etc.The basic data matrices (Appendix, Tables 4

and 5, page 36) will be used. PCA is carried outthrough the following six steps (Box 9).

STEP 1. Standardization of basic data matrix. Usually, itis not necessary to standardize data before comput-ing PCA. To work with standardized data, the cor-relation matrix instead of the co-variance matrix ischosen, so that data are standardized automatically.If more weight is put on commonly mentionedspecies (as is sometimes preferable) then the co-variance matrix is chosen to work with unstandard-ized data. As in cluster analysis, binary data shouldnot be standardized. In the following, the standard-ized data matrix (see Appendix, Table 6, page 36)for ‘Paired comparison of wood species’ computedacross the rows for cluster analysis is utilized.

STEP 2. Calculation of correlation between characters.Unlike in cluster analysis, in PCA the correlationamong the variables/characters (correlation amongspecies with respect to respondents) is computed(inverse mode analysis). Similarity/correlationmeasures commonly used here are: correlation,variance-covariance and matrix times its transpose(X x T). However, for two-state data, the indicessuch as Jacard, Phi and Simple MatchingCoefficients (SMC) are used. Thus, resemblancematrices across the rows are computed (seeAppendix, Tables 15 and 16, page 40). If other sim-ilarity matrices than correlation and co-variance areused, the ordination is generally named ‘principalcoordinate analysis’ (PCO or PCoA) For binarydata sets, PCO is recommended instead of PCA.

STEP 3. Double decentering the resemblance matrix.Anadditional step in data transformation for PCA is‘double decentering’. This is performed on the resem-blance matrix (see Appendix, Tables 17 and 18).

STEP 4. Eigen-analysis for deriving principal compo-nents. Eigen-values for each ordination axis andEigen-vectors for each variable (species) are com-puted. The Eigen-value is the variance of a partic-ular principal component while Eigen-vector is theset of coefficients defining the principal compo-nent. In our example, the first three principal com-ponents explain more than 85% of the variance inthe case of ‘Wood identification task’ data (seeAppendix, Table 19, page 41) and more than 89%in the ‘Paired comparison of wood species’ data(see Appendix, Table 20). Therefore, the first threeprincipal components were used for further analy-sis. Eigen-vector matrices (U) with the loading ofeach variable (species) in each principal compo-nent are presented in Tables 21 and 22 in theAppendix.

STEP 5. Projection of the respondents into the ordina-tion space. The projection matrix (Y) for eachdata set is computed from the basic data matrix(A) and the Eigen-vector matrix (U). Y = A x U.This operation results in projection matrices (notshown).

STEP 6. Ordination or plotting of projection matrix. Theprojection matrix can be plotted in both a two-dimensional (Figures 6 and 7) and three-dimension-al space (Figures 8 and 9, page 21). The former fig-ures depict the position of respondents with respectto the species in a two principal component space(in two dimensions since only two PCs are consid-ered), while the latter two arrange the respondentsin a three principal components space (thus in threedimensions).

Box 9. Steps involved in principal component analysis.

Page 23: Quantitative ethnobotany: applications of multivariate and ...pure.au.dk/portal/files/17477574/HoeftBarikLykke1999.pdf · Some wild plant resources are severely threatened by habitat

PEOPLE AND PLANTS WORKING PAPER 6, JUNE 1999Quantitative ethnobotany

M. HÖFT, S.K. BARIK & A.M. LYKKE

21

INTERPRETING THE RESULTS OFPRINCIPAL COMPONENT ANALYSIS

Three composite variables (principal compo-nents) were derived in both the above cases outof eight in the ‘Wood identification task’ and fivein the case of the ‘Paired comparison of woodspecies’ original species variables.

The coefficients (i.e. loadings) represent thecorrelation of Y1 with the respective originalvariable. Thus, 0.2695 can be interpreted as thecorrelation between Y1 and the variable speciesS1. The principal components (composite vari-ables) are interpreted on the basis of those vari-ables with strong loading patterns. In the exam-ple (see Appendix, Table 21), the first principalcomponent Y1 may be interpreted and namedaccordingly: for example poor people’s speciesor ‘X’ ethnic community’s species dependingupon information concerning the respondentsand according to which S1, S2, S4 and S7 have

similar (positive) loading profiles within Y1.Similarly, the PC 2 (Y2) may be appropriatelynamed in which S3 and S8 are the importantdefining variables and so on.

In the ‘Paired comparison of wood species’data, PC 1 can be defined in terms of species S1(0.729) and S2 (0.6063), PC 2 with S3 (0.4586)and PC 3 in terms of S5 (0.5808) since their load-ings dominated the respective composite vari-ables (principal components) (see Appendix,Table 22, page 41).

Further, the relationship among the infor-mants with respect to the above identified groupsof species can be depicted based on the projec-tion plots of the informants in the space of thethree principal components (Figures 6-9, pages19 and 20) according to their responses to thequeries. Such relationships can be later correlat-ed to various underlying factors important fromethnobotanical point of view.

0.0 0.2 0.3 0.5

R9

R15

R16

R14R12

R6 R13

-0.3

-0.2

0.0

0.2

0.3

-0.2

R10

Principal component 1

Prin

cipa

l com

pone

nt 2

-0.8 0.0 0.8 1.6

R14

R15

R16

R4R5

R13R2R7

R3

R8

--0.5

0.0

0.5

1.0

1.5

-1.6

R10

Principal component 1

Prin

cipa

l com

pone

nt 2

Figure 7. Projection of eleven respondents in thespace defined by the first, second and third principalcomponent in the ‘Paired comparison of woodspecies’.

Figure 6. Projection of eight respondents in thespace defined by the first and second principalcomponent in the ‘Wood identification task’.

R3

R6

R14

R15

R13

R9

R12

R10

Prin

cipa

l com

pone

nt 3

Principal component 1 Principal com

ponent 2

Figure 8. Projection of eight respondents in thespace defined by the first, second and third princi-pal components in the ‘Wood identification’ task.

R10 R8R3

R14

R11

R2R7

R15

R10R5

R4

Prin

cipa

l com

pone

nt 3

Principal component 2

Principal component 1

Figure 9. Projection of eleven respondents in the spacedefined by the first, second and third principal compo-nents in the ‘Paired comparison of wood species’.

Page 24: Quantitative ethnobotany: applications of multivariate and ...pure.au.dk/portal/files/17477574/HoeftBarikLykke1999.pdf · Some wild plant resources are severely threatened by habitat

PEOPLE AND PLANTS WORKING PAPER 6, JUNE 1999Quantitative ethnobotanyM. HÖFT, S.K. BARIK & A.M. LYKKE

22

Analysis of variance, regression analysis, clusterand principal component analysis, are all basedon linear mathematical equations derived fromthe following formula:

y = αα ++ ββi ++ εεi

In this formula εi represents the “residual” or“error”, i.e. the departure of an actual y-valuefrom what the regression equation predicts.

Hypothesis testingMost computer packages for statistics providetwo procedures for analysis of variance:ANOVA and general linear model (GLM). Thelatter is usually less automated and allows theanalysis of randomized and incomplete blockdesigns, analysis of co-variance with one or morecovariates, crossover designs, split plot, repeatedmeasurements, nesting etc., and the definition ofseparate error terms for factors in means modelsand effects models to test hypotheses in missingcells designs.

ANALYSIS OF VARIANCE (ANOVA)

Inferential multivariate techniques are gener-alizations of classical univariate procedures. Themultivariate analogue of analysis of variancemodels is also known as MANOVA. Similarly,for carrying out simultaneous hypothesis testsand constructing simultaneous confidence inter-vals, procedures for univariate cases are usuallygeneralized and magnified for multivariate situa-tions. Important assumptions on the populationsample for application of ANOVA include thefollowing:1. data (standardized or otherwise transformed)

are normally distributed; 2. distances of variances are equal (condition of

homogeneity of variances or homoscedasticity);3. no significant interactions exist between vari-

ables;4. group means and standard deviations are

independent (i.e. the size of the group meansis not related to the size of their standarddeviations);

5. data contain no gross outliers (outliers maybe excluded from analysis upon plausibilitychecks);

6. number of observations in different cate-gories (cells) are equal (not obligatory).If after standardization or transformation

conditions 1) to 3) are not met, data can beanalysed by defining alternative multivariate

general linear models or hypotheses can be test-ed by specifying nonparametric models. If sig-nificant interactions among variables are sus-pected (e.g. the influence of the level of a givenplant compound on the level of its chemicalderivative), analysis of co-variance is carried outto adjust or remove the variability in the depen-dent variable due to the covariate.

When the homogeneous variance part of theassumptions is false, it is sometimes possible toadjust the degrees of freedom to result in approxi-mately distributed F statistics. In SYSTAT, a pro-cedure based on Levene’s test for unequal vari-ances, allows to save residuals and perform anANOVA on the transformed absolute values ofthe residuals, merged with the original groupingvariables. If the test is significant, separate vari-ance tests in the GLM module can be performed.

Although generalized from two-way proce-dures, it is invalid to perform multivariatehypothesis testing on all possible pairs ofhypotheses.

H0: µ1 = µ2

H0: µ2 = µ3 invalid

H0: µ1 = µ3

H0: µµ1 = µµ2 = µµ3 valid

The above example is for hypothesis testingof a means model: Alternatively, hypothesis test-ing on an effects model would read:

H0: αα1 = αα2 = αα3 = 0

The null hypothesisThe null hypothesis H0 assumes that there is

no difference between population means. Whencomparing alkaloid contents in leaf samples fromtwo different sites, the null hypothesis assumesthat the contents are approximately equal (i.e.,not a significantly different). The F-test is usedto calculate whether the null hypothesis must beaccepted or rejected and which confidence levelis reached. For example, p<0.05 represents a95% confidence limit.

If the null hypothesis of equal group meansis true, then the mean squares (MS) of the groupsand the errors of the MS will each be an estimateof the common variance σ2. However, if the pop-ulation means are not equal, then the groups’ MSin the population will be greater than the popula-tions’ error MS. Mean square is the mean

Comparison of several means

}

Page 25: Quantitative ethnobotany: applications of multivariate and ...pure.au.dk/portal/files/17477574/HoeftBarikLykke1999.pdf · Some wild plant resources are severely threatened by habitat

PEOPLE AND PLANTS WORKING PAPER 6, JUNE 1999Quantitative ethnobotany

M. HÖFT, S.K. BARIK & A.M. LYKKE

23

squared deviation from the population mean, andthe sum of squares is the summation of these.

groups’ MSF =

error MS

If the calculated F-ratio is at least as large asthe critical value, then H0 is rejected and thealternative hypothesis HA (population meansbeing unequal) accepted.

The critical value is computed from thedegrees of freedom involving the total sum ofsquares (SS) of the overall mean and the sums ofsquares of the group means.

A typical computer output of a two factoranalysis (A and B) tabulates the source of varia-tion (singly: A, B, and interactions: A x B), thesum of squares (SS), degrees of freedom (df),mean squares (MS), the F statistics and the pvalue. From this result it can, however, not bedetermined which groups differ from whichother groups. To examine specific pairwisegroup differences, post hoc testing is used.Bonferroni, Scheffé and Tukey tests are availablein most statistical packages to test pairwise dif-ferences in multi-way designs. When the numberof groups is small, the Bonferroni procedure isrecommended. For more groups, the Tukey testyields more sensitive results.

Linear and quadratic contrastsContrasts are used to test relationships

among means. A contrast is a linear combinationof means µi with coefficients αi. Typically, thehypothesis takes the following form:

H0 = αα1µµ1 + αα2µµ2 + .....+ ααkµµk + = 0

The test statistics for a contrast is similar tothat for a two-sample t - test. The result of thecontrast (a certain relation among means)appears in the numerator of the test statistics, andan estimate of within-group variability (thepooled variance estimate or the error term fromthe ANOVA) is part of the denominator. Specificcontrast coefficients can be selected to test forexample the following:• pairwise comparison for testing the differ-

ence between two particular means;• linear combinations of means (e.g. two treat-

ment means vs. a control mean); or• linear or quadratic increases or decreases of a

certain quality in response to different cate-gories of treatment.

Block and repeated-measures experimental designs

Imagine we would want to compare the alka-loid accumulation patterns in the leaves of a cer-

tain plant species under three different soil con-ditions (factor A) and two light regimes (factorB). Within each of the three levels of factor A,we sampled seven individuals (or blocks) with anobservation for each individual at each of the twolevels of factor B. The total variability would bedivided into two parts: the variability amongblocks and the variability within blocks (due toindividual behaviour). Hypotheses testing forFactor A would take the following form:

H0: Mean alkaloid content of leaves is the samefor all three soil types.

HA:Mean alkaloid content of leaves is not thesame for all three soil types.

soil types MSF=

blocks within soil types MS

For Factor B:

H0: Mean alkaloid content of leaves is the sameunder two light regimes.

HA: Mean alkaloid content of leaves is not thesame under two light regimes.

light regimes MSF=

light regimes x (blocks within soil types MS)

For A x B interaction:

H0: Mean alkaloid content is independent oflight regimes.

HA: Mean alkaloid content is not independent oflight regimes.

light regimes x soil types MSF=

light regimes x (blocks within soil types MS)

Repeated measurements may be taken at dif-ferent time intervals to quantify changes overtime. In repeated measures design, the same vari-able is measured several times for each subject.A paired-comparison t - test is the most simpleform for this design (e.g. before and after mea-sure). The following steps are involved to manu-ally calculate a t statistics:

• For each subject the difference between twomeasures is computed;

• The average of the differences is calculated;

• The standard deviation of the differences iscalculated;

• The test statistics using this mean and stan-dard deviation is calculated as shown below:

−− −−X1 −− X2t =

−− −−sXi −− X2

Page 26: Quantitative ethnobotany: applications of multivariate and ...pure.au.dk/portal/files/17477574/HoeftBarikLykke1999.pdf · Some wild plant resources are severely threatened by habitat

PEOPLE AND PLANTS WORKING PAPER 6, JUNE 1999Quantitative ethnobotanyM. HÖFT, S.K. BARIK & A.M. LYKKE

24

Changes are tested within subjects andbetween subjects. Tests of the within-subjectsvalues are called polynominal tests of order1,2,...k., where k is one less than the number ofrepeated measures. The first polynominal is usedto test linear changes (e.g. do the repeated mea-sures increase or decrease around a line with asignificant slope?), the second tests if theresponses fall along a quadratic curve, etc.

Types of models

1. If the levels of a factor are specifically chosenor predetermined, the design is a fixed-effectsmodelor Model IAnova.

2. If we are interested in general differencesbetween different categories, and samples aretaken truly randomly, then we have random-effects modelor Model II Anova.

3. If we have a factorial design with both, fixedand random effects, the model is called mixedeffects modelor Model III Anova.

GENERAL LINEAR MODEL (GLM)

Specific general linear models (means oreffects models) can be defined using the generallinear model option, available in standard statis-tical computer packages, if the data design is notfully factorial, i.e. if numbers of observations areunequal in categories or ‘missing cells’ occur.The analysis is not robust to violation of normaldistribution and equal distances of variance. Thelatter can be tested using e.g. Levene’s Test. Themeans model takes the following form:

Y = constant + A + B + C + A x B x C

In a means model, predictors are coded ascell means, while in the classic effects modeleffects are coded as differences from the grandmean. Box 10 lists some examples of data suit-able for analysis of variance.

KRUSKAL-WALLIS TEST

The multivariate analogue to the univariateMann-Whitney Rank Sum test is the KruskalWallis Rank Sum Test. The Kruskal Wallis testis also referred to as “analysis of variance byranks” and is applied when data do not meet anyof the six conditions listed on page 22.

Box 10. Examples of data suitable for analysisof variance andKruskal-Wallis Test.

• Quantifiable effects of herbal medicines asdependant on site, growing season, prepata-tion procedure, etc.

• Bioassay testing of ethno-medical recipes.• Validating and quantifying the described

effect of an ethno-medical recipe in differentuser groups.

• Assessment of quantifiable ethnobotanicalknowledge as dependent on age, gender,ethnicity or other social factors.

• Storability of grains and grain quality asdependant on the quality of granaries.

• Life span of beehives as dependent on stor-age conditions during rainy season.

• Effect of specific agricultural methods (e.g.soil working methods, burning, mechanicaltreatment of fruit trees) on yields.

• Effect of reduced harvesting schemes on theregeneration potential of wild plant populations.

• Effect of harvesting season for raw materialon quality of baskets or other householditems.

Photo 10. Stephen Weru from Gatei, Kenya, collectsthe sap of ‘mwaritha’, Dalbergia lactea Vatke

(Fabaceae), used locally to cure hepatitis and otherliver ailments. Beyond its medicinal value this liana isalso used for the production of durable tea baskets

which has led to overharvesting in tea areas.

Page 27: Quantitative ethnobotany: applications of multivariate and ...pure.au.dk/portal/files/17477574/HoeftBarikLykke1999.pdf · Some wild plant resources are severely threatened by habitat

PEOPLE AND PLANTS WORKING PAPER 6, JUNE 1999Quantitative ethnobotany

M. HÖFT, S.K. BARIK & A.M. LYKKE

25

As in parametric analysis of variance it can,however, not be concluded which groups differfrom which other groups. The only inference tobe drawn is that at least one difference among thegroups exists. The test is called nonparametricbecause no population parameters are used in thestatement of hypotheses, and neither parametersnor sample statistics are used in the test calcula-tions. Examples of data suitable for analysis ofvariance and Kruskal-Wallis test are listed inBox 10 (page 24).

PredictionAs for analysis of variance, two underlyingassumptions with respect to the distribution ofvalues must be true for regression analysis:1) data must come from an approximately nor-

mal distributed population;2) variances must be equal.

MULTIVARIATE REGRESSION ANALYSIS

The relationship between two variables maybe one of functional dependence of one variableon the other. The magnitude of one variable maythus be a function of the magnitude of the secondvariable, whereas the reverse is not true.

Regression analysis is a statistical methodfor predicting values of one or more response(i.e. dependent) variables from a collection ofpredictor or explanatory (i.e. independent) vari-able values (Poole, 1974; Zar, 1996). In a simplelinear regression analysis, a linear model isdeveloped from which the values of a dependent(i.e. response) variable can be predicted based onparticular values of a single independent vari-able. The population regression model isexpressed as:

Yi = ββ0 + ββ1 X1i + εεi

where:β0 = is the true intercept, a constant factor in

the regression model representing theexpected or fitted value of Ywhen X = 0;

β1 = the true slope representing the amountthat Y changes (either positively or neg-atively) per unit change in X;

εi = the random error or residual in Y forobservation i.

Since the entire population can not be mea-sured, it is not possible to compute the parame-ters β0 and β1 and obtain the population regres-sion model. Therefore, the approximations b0(for β0 ) and b1 (for β1) are generally estimatedfrom the sample using the methods of leastsquares. With this method the statistics b0 and b1

are computed from the sample in such a mannerthat the best possible fit within the constraints ofthe least squares model is achieved. Thus, thefollowing sample regression equation isobtained, in which the residual does not figure:

Y = b0 + b1 X1

Multivariate models from samples can beconsidered as the extensions of univariate modeldescribed above. In multiple regression at leasttwo independent variables (X1 , X2 ) are used topredict the value of a dependent variable (Y ). Asin the case of simple linear regression model,when sample data are analysed, the sampleregression coefficients (b0 , b1 and b2 ) are usedas estimates of the true parameters (β0 , β1 andβ2). Thus the sample regression equation for themultiple linear regression model with two inde-pendent variables would be:

Y = b0 + b1 X1 + b2 X2

Both the models described above focus onlinear relationships between the variables.However, in nature nonlinear relationships arequite often encountered. One of the commonnonlinear relationships between two variables isa curvilinear polynomial wherein Y increases (ordecreases) at a changing rate for various valuesof X. The second degree polynomial relationshipbetween X and Y may be expressed by the fol-lowing model:

Y = b0 + b1X1 +b2X22

where: b0 = Y intercept;b1 = linear effect on Y;b2 = curvilinear effect on Y.

In addition to the above, interaction termsinvolving the product of independent variablesalso contribute to the multiple regression model.When two such independent variables areinvolved, the model is:

Y = b0 + b1X1 + b2X2 + b3X1X2

Regression models are also developed bytransforming the values of the independent vari-ables, the dependent variable or both, dependingupon the situation.

If a reciprocal transformation is applied tothe values of each of the independent variables,the multiple regression model would be:

Y = b0 + b1(1/X1) + b2 (1/X2)

Page 28: Quantitative ethnobotany: applications of multivariate and ...pure.au.dk/portal/files/17477574/HoeftBarikLykke1999.pdf · Some wild plant resources are severely threatened by habitat

PEOPLE AND PLANTS WORKING PAPER 6, JUNE 1999Quantitative ethnobotanyM. HÖFT, S.K. BARIK & A.M. LYKKE

A logarithmic transformation would result ina model:

Y = b0 + b1lnX1 + b2 lnX2

On square root transformation, the modelwould be:

Y = b0 + b1√√X1 + b2 √√X2

In some situations, the transformations canbe applied to change what appear to be nonlinearmodels into linear models. For example, the mul-tiplicative model :

Y = b0 X1b1 X2

b2

can be transformed (by taking natural loga-rithms of both sides of the equation) to:

ln Y = ln b0 + b1 ln X1 + b2 ln X2

which is linear in logarithms. Similarly, theexponential model:

Y = exp [b0 + b1 X1 + b2 X2]

can also be transformed into one of linearform (by taking natural logarithms of both sidesof the equation):

ln Y = b0 + b1X1 + b2X2

For detailed methods on multivariate regres-sion analysis, Johnson & Wichern (1988) may beconsulted. Box 11 lists some examples of ethno-botanical data suitable for regression analysis.

Linear correlationCorrelations are calculated when variables cannot be designated as being either X (independent)or Y (dependant). Generally, correlations arecomputed between properties or quantifiable actsof the same individual which are not connectedby cause and effect (see Box 12 for examples).An ethnobotanical example would be to analysethe relationship between age and daily amountsof firewood collected by women in a given area.

The simple correlation coefficient is calculatedas:

ΣXYr =

√√ΣX2ΣY2

A positive correlation implies that anincrease in the value of one variable is accompa-nied by an increase in the second variable, whilea negative correlation implies that a decrease inone variable is accompanied by an increase in thesecond variable. The coefficient of determinationor ‘correlation index’ r2 may be described as ameasure of how much the variability of one vari-able is accounted for by correlating it to the sec-ond variable. R2 and not r may be considered asa measure of strength of the straight line rela-tionship. Correlation indices below 0.4 or largerthan −0.4 may be regarded as indices of weakcorrelations.

No statistical assumptions need to be satis-fied in order to compute correlation coefficients.However, X and Y values are assumed to stemfrom a bivariate normal population. Sometimesonly one of two variables come from a bivariatenormal population and data may be transformedto alter this situation. If, like in most ethnobotan-ical applications neither variable comes from anormal population, rank correlations come intoplay. Two widely known methods have been pro-posed by Spearman and Kendall, respectively.

SPEARMAN’S RANK CORRELATIONCOEFFICIENT

Instead of the actual values, the ranks of themeasurements of each variable are used in com-puting Spearman’s rank correlation coefficient.The correlation index is also referred to asSpearman’s ρ :

n6 Σ di

2i=1rs = 1 −−

n3 −− n

where:di = rank of Xi − rank of Yi.

The value rs may range from −1 to +1 andhas no units. Instead of using differencesbetween ranks of pairs, the sums of the ranks ofeach pair can be used:

6 Σ Si2 7n −− 5rs = −−

n3 −− n n −− 1

where Si = rank of Xi + rank of Yi.

26

Box 11. Examples of data suitable for regression analysis.

• Relationship between fuelwood consumption and household size;

• Relationship between demand and harvesting activities for wild plantspecies;

• Relationship between tree diameter and bark yield;

• Relationship between distance of residence from forest and amount offorest products collected per unit of time;

• Relationship of distance to nearest health service and percentage ofreliance on traditional medical practitioners.

Page 29: Quantitative ethnobotany: applications of multivariate and ...pure.au.dk/portal/files/17477574/HoeftBarikLykke1999.pdf · Some wild plant resources are severely threatened by habitat

PEOPLE AND PLANTS WORKING PAPER 6, JUNE 1999Quantitative ethnobotany

M. HÖFT, S.K. BARIK & A.M. LYKKE

27

If identical values repeatedly occur for mea-surements of X or Y variables, then the variablesare tied, Goodman-Kruskal’s γ, Kendall’s τ-b andStuart’s τ-c are used to calculate correlation coeffi-cients for tied data. The methods differ only in theway ties are treated. “Tied” means that the samevalue for an observation occurs more than once ina column. Tied values must be corrected for.

KENDALL’S TAU -B RANK CORRELATIONCOEFFICIENT

According to Kendall, the correction for tiedX and Y data, respectively, is computed:

Σ (τi 3 − τi)ΣτX or Y =

12where τi is the number of tied variables (X or Y).

If ΣτX and ΣτY are zero, then the equation isidentical as for Spearman’s rank correlation. Theperformances of the Spearman and Kendall coef-ficients are very similar. However, Spearman isrecommended when n becomes large.

ANALYSIS OF VARIANCE OF MULTIPLECORRELATION

A situation where the Y variable is associat-ed with more than one X variable calls for multi-ple correlation analysis. As in multiple regres-sion analysis, the hypothesis that no interrela-tionships exist among the variables is tested by:

regression MSF =

residual MS

H0 = β1 = β2 = ....βk = 0

The coefficient of multiple determination is:

regression SSr2 = 1 −−

total SS

In the case of correlation, r2 may be consid-ered to be the amount of variability in any of thevariables that is accounted for by correlating itwith another variable. A measure of goodness offit is the adjusted coefficient of determination:

residual MSra

2 = 1 −− total MS

or

n −− 1 ra

2 = 1 −− (1 −− R2)n −− m −− 1

In contrast tor2 which increasesat each Xj added tothe model, ra

2 willincrease only if anadded Xj results in animproved fit of the regressionequation. The square root of themultiple determination coefficientis referred to as the multiple corre-lation coefficient. R is equal to thePearson correlation coefficient, r.

r = √√ r2

Cross-tabulationWhen variables are categorical, frequency tables(cross-tabulations) provide useful summaries.Categories may be unordered (e.g. forest, fallow,garden), ordered (high, medium, low) or formedby defining intervals on a continuous variablesuch as age (e.g. child, teen, adult and elderly).Such tables can be exploited in three ways:1. purely descriptive, e.g. calculating percent-

ages of cases falling in specified categories ofcross-classifications;

2. test of independence or measure of associa-tion between two categorical variables;

3. model relationships among two or more cate-gorical variables by fitting a log-linear modelto the cell frequencies.

In a number of cases, typically arising from ques-tionnaires, people may reply by categorizingdependant variables rather than by assigning

Box 12. Examples of data suitable for correlationanalysis.

• Correlation between seed size and awn length in sorghum plants.

• Correlation between degree of sclerenchymatization (formation of scle-renchyme cells) and drought resistance in wild cereals.

• Correlation between age and knowledge.

Photo 11. This farmer depends onthe collection of fodder from publicland around Rurie Swamp, MeruDistrict, Kenya, to feed his two

cows. As public lands is being takenby individuals for agriculture the timehe requires to fill two bags with suit-able fodder increases. A regressionanalysis would allow a prediction of

future time requirements.

Page 30: Quantitative ethnobotany: applications of multivariate and ...pure.au.dk/portal/files/17477574/HoeftBarikLykke1999.pdf · Some wild plant resources are severely threatened by habitat

PEOPLE AND PLANTS WORKING PAPER 6, JUNE 1999Quantitative ethnobotanyM. HÖFT, S.K. BARIK & A.M. LYKKE

28

exact quantities to the item in question. It is evi-dent that even after giving ‘numbers’ later to theso obtained different categories, such data cannot be fed into standard analysis of variance pro-cedures. Instead, discrete multivariate analysis isapplied. A detailed account of discrete multivari-ate analysis techniques is given in Agresti (1996)and in Bishop, Fienberg and Holland (1995).

In multivariate discrete data, each individualis described by a set of attributes. All individualswith the same description are enumerated, andthis count is entered into a cell of a cross-table.The resulting table is called a contingency table.The simplest contingency table is based on fourcells and the categories depend on two variables.Models can be altered to describe either expect-ed cell counts (m) or probabilities in each cell (p).Textbooks vary in the notation of cell countseither as frequencies (f) or (m). In this paper f willbe used:

Conventional methods for analysing rela-tionships between two (or more) discrete vari-ables are χ2 (Pearson chi-square) and G2

(Goodman or likelihood-ratio chi-square) tests.The traditional χ2 test has been used since thebeginning of the twentieth century and isdescribed in detail by Zar (1996). For ethnob-otanical data, however, the G2 test is recom-mended because of better additive properties, i.e.the results of several G2 tests can be summed.(Sokal & Rohlf 1995; Tabachnick & Fidell1996). The general structure of a contingencytable is depicted in Box 13.

Different tests and measures exist for differ-ent table structures and also depend on whetheror not the categories of the variables are ordered.The Pearson and likelihood-ratio chi-square sta-tistics apply to larger than 2 x 2, i.e. r x c tablesand categories need not be ordered. Other testsinclude the following:

q McNemar’s test of symmetry:applies forsquare tables where the number of rowsequals the number of columns. This struc-ture arises when the same subject is mea-sured twice (as in a paired comparison ttest) or when subjects are paired ormatched (e.g. cases and controls). In such adesign, the categories of rows and columnsare the same, but they are measured at dif-ferent times or under different circum-stances or for different groups of subjects.This test ignores the the counts along thediagonal of the table and tests whether thecounts above the diagonal differ fromthose below the diagonal. A significantresult indicates a greater change in onedirection than another.

q Phi, Cramer’s V, and contingency: likePearson’s chi-square, these are measures fortesting independence of variables in a table.They are applied to tables with unequalsample size. The three measures are scaleddifferently but test the same null hypothesis.For tables with two rows and two columns,phi and Cramer’s V are the same.

q Goodman-Kruskal’s gamma, Kendall’stau-b, Stuarts’s tau-c, Spearman’s rhoand Somer’s d: these are appropriate whenboth row and column variables have orderedcategories. The first three differ only in howties are treated, the fourth is like the usualPearson correlation except that the the rankorder of each value is used. Somer’s d is anasymmetric measure (in SYSTAT, the col-umn variable is considered ‘dependent’).

q Fisher’s exact test and Yates’s correctedchi-square: these are used specifically inthe analysis of 2 x 2 contingency tables withsmall sample sizes.q Yule’s Q and Yule’s Y: these measure

dominance in a 2 x 2 table. If either cell offthe diagonal is 0, both statistics are equal(otherwise they are less than 1). These statistics are 0 only if the chi-square statistics is 0.

Because of their wide range of applica-bility, Pearson chi-square and likelihood-ratiochi-square are presented in more detail in thefollowing paragraphs.

CHI-SQUARE ANALYSIS OF CONTINGENCY TABLES

With two variables (A, B) under considera-tion, the observed frequency is denoted as fij ,whereby i refers to rows and j to columns of the

Box 13. General appearance of contingency tables

Variable A Variable B Totals

category 1 category 2 category n

category 1

category 2

category 3

category n

Totals Grand total

B1 2

A 1 f11 f122 f21 f22

Page 31: Quantitative ethnobotany: applications of multivariate and ...pure.au.dk/portal/files/17477574/HoeftBarikLykke1999.pdf · Some wild plant resources are severely threatened by habitat

contingency table. For the computation of chi-square, only the expected frequencies are used,never proportions or percentages.

The total number of observations in all cellsof the table, the grand total is Σrow

i=1Σcolumnj=1 fij

or simply ΣΣ fij . For chi-square analysis of con-tingency tables, the following formula is used:

(̂fij − fij )2

χ2= ΣΣf̂ij

or (mathematically equivalent):fij

2

χ2= ΣΣ − 1f̂ij

fij 2

χ2= n(ΣΣ − 1)RiCj

In the above formulas, refers to the frequen-cy expected in a row i column j if the nullhypothesis is true. To calculate expected valuesin row i and columnj, the Pearson χ2 statistics isused:

row i total * column j totalgrand total

or (Ri) (Cj)

f̂ ij =n

The χ2 test is intended to be used when thevalues of observations are small enough for sam-pling variation to leave some doubt as to the inter-pretation of data (Box 14). The lower limit to thesample size for which the method is sufficientlyreliable is 5 for the expected value, although theactually observed value may be lower. If somecells in the table show expected values smallerthan 5, the respective rows or columns may bemerged into one combining two features.

Apart from demonstrating a significant asso-ciation among variables by a contingency table,one may want to assess the strength of that asso-ciation. A measure of association is:

χχ2 / nwhere n is the total number of observations.

LIKELIHOOD-RATIO CHI-SQUARE ORGOODMAN SQUARE ANALYSIS OF CON-

TINGENCY TABLES

The likelihood-ratio chi-square or Goodmansquare test statistics is additive for nested mod-els. Two models are nested if all the effects of thefirst are a subset of the second. The likelihood-ratio chi-square is additive because the statistics

for the second model can be subtracted from thatof the first. The difference provides a test of theadditional effects, i.e. the difference in the twostatistics has an asymptotic chi-square distribu-tion with degrees of freedom equal to the differ-ence between those for the two model chi-squares (or the difference between the number ofeffects in the two models). This property doesnot hold for the Pearson chi-square. Goodmanchi-square statistics are calculated as following:

G2 = 2ln L = 2 ΣΣ (observed) ln(observed / expected)

The summation is done over all cells in thetable. Both chi-square and Goodman test statis-tics are used to examine interactions among vari-ables by testing for goodness of fit for theobserved frequency distribution to the expectedfrequency distribution representing the H0. Thegreater the departure of the actual value from theexpected value, the greater are the values of G2

and χ2. Both test statistics are distributed as χ2. If

the χ2 or G2 value is greater than the tabulated χ2

at the desired α-level, then the null hypothesis ofno interaction between variables is rejected.

Frequency tests are non-parametric as thereare no assumptions made on the underlying pop-ulation distribution. There are, however, require-ments for independent samples, adequate samplesize, and the minimum size of expected frequen-cy in each cell. Problems may occur if there aretoo few cases relative to the number of categoriesand the following guidelines should be borne inmind (after Tabachnick & Fidell 1996):• the number of cases should be at least five

times the number of cells; • the expected cell frequency should be >1 in

all cells and <5 in no more than 20% of thecells;.

• if the contingency table contains smallerexpected frequencies, Fishers exact test isrecommended.

PEOPLE AND PLANTS WORKING PAPER 6, JUNE 1999Quantitative ethnobotany

M. HÖFT, S.K. BARIK & A.M. LYKKE

29

f̂ij

Box 14. Examples of data suitable forPearson chi-square or Goodmansquare analysis.

• Different user groups using different plant parts.

• Different user groups gaining different amounts of money from thesale of their products.

• Effectiveness of a drug (plant species) to combat a certain ailment.

• Number of incidents in which a plant is mentioned to cure a certain ailment.

Page 32: Quantitative ethnobotany: applications of multivariate and ...pure.au.dk/portal/files/17477574/HoeftBarikLykke1999.pdf · Some wild plant resources are severely threatened by habitat

The theory of the previous chapter is exemplifiedin four case studies:• The effects of soil nutrient levels and light

supply on alkaloid accumulation of Taber-naemontana pachysiphonStapf (Apocyna-ceae) were studied under experimental condi-tions;

• Data on diameter, bark weight and barkthickness were sampled from Rytigynia trees,valued for their anthelmintic propertiesaround Bwindi Impenetrable Forest;

• A correlation was calculated to examinewhether a relationship exists between the ageof women collecting firewood from Ragatiforest, Mt. Kenya, and the average weight oftheir headloads;

• The preferences of women and men special-ists and non-specialists for harvesting ofmedicinal plants in five different habitattypes were investigated around BwindiImpenetrable National Park.

Analysis of varianceAn example for analysis of variance using thegeneral linear model option in SYSTAT hasbeen chosen to demonstrate the effects of threelevels of fertilizer and two light intensities onthe accumulation of the alkaloid apparicine inleaves of Tabernaemontana pachysiphon. Theoutput of a full factorial (all interactions includ-ed and tested automatically) analysis of variancemodel is shown in Box 15. Alternatively, thegeneral linear model option in SYSTAT andSAS would allow to manually specify the inter-actions one wants to test. The means model isalso used to test hypotheses when missing cellsare encountered or to test hypothesis about spe-cific cell means. No significant interactionsbetween fertilizer and light intensity affect theaccumulation of apparicine (p = 0.48). Thus, theeffect of either factor can be interpreted inde-pendently. The Kolmogorov-Smirnov one-sam-ple test is used to compare the shape and loca-tion of a sample distribution with a uniform,normal or chi-square distribution. The Lillieforstest uses standardized variables (mean of zeroand standard deviation of 1) and tests whetherdata are normally distributed. A probability of p > 0.05 indicates normal distribution of stan-dardized data.

PEOPLE AND PLANTS WORKING PAPER 6, JUNE 1999Quantitative ethnobotanyM. HÖFT, S.K. BARIK & A.M. LYKKE

30

Applications of general linear models

Box 15. Analysis of variance com-putation using the generallinear model option (data fromHöft, 1995).

Effects coding used for categorical variables in model.

Categorical values encountered during processing are:F(ertilizer)$ (3 levels)

F0, F1, F2L(ight)$ (2 levels)

L-, L+

Dep Var: Apparicine N: 36Multiple R: 0.70 Squared multiple R: 0.49

Analysis of Variance

Source SS df MS F-ratio PF$ 426.40 2 213.20 4.49 0.02L$ 1047.85 1 1047.85 22.09 0.00F$*L$ 71.94 2 35.97 0.76 0.48

Error 1423.09 30 47.44

Durbin-Watson D Statistics 1.971First Order Autocorrelation 0.009

Means Model

Dep Var: Apparicine N: 36Multiple R: 0.70 Squared multiple R: 0.49

H0: All means equal.

Unweighted Means ModelAnalysis of Variance

Source SS df MS F-ratio PModel 1394.75 5 278.95 5.88 0.00Error 1423.09 30 47.44

Durbin-Watson D Statistics 1.971First Order Autocorrelation 0.009

COL/ROW F$ L$1 F0 L-2 F0 L+3 F1 L-4 F1 L+5 F2 L-6 F2 L+

Page 33: Quantitative ethnobotany: applications of multivariate and ...pure.au.dk/portal/files/17477574/HoeftBarikLykke1999.pdf · Some wild plant resources are severely threatened by habitat

PEOPLE AND PLANTS WORKING PAPER 6, JUNE 1999Quantitative ethnobotany

M. HÖFT, S.K. BARIK & A.M. LYKKE

31

Regression analysisData on bark volume and diameter of Rytigyniakiwuensis(K. Krause) Robyns (Rubiaceae) wereevaluated using regression analysis. A quadraticmodel was fitted to predict the total bark volumeof the bottom 2 metres of Rytigynia plants fromdiameter measurements between 1 and 24 cm(from Kamatenesi 1997). The relationship isdepicted in Figure 10. Bark volumes within reachfor harvesting from a standing tree (i.e. up to 2 mheight) can be predicted with sufficient accuracyfrom diameter measurements according to the fol-lowing formula:

Y = -105.75 + 99.11 X + 8.88 X2

(r2 = 0.795)

Likewise, a quadratic regression model canbe fitted from the trees’ diameters to predict thetotal bark dry weight of the bottom 2 metres ofthe plant (see graph on cover page).

Box 15. continued

Using unweighted means.Post Hoc test of Apparicine

Using model MSE of 47.436 with 30 df.Matrix of pairwise mean differences:

1 2 3 4 5 61 0.002 -7.39 0.003 10.08 17.48 0.004 -1.23 6.16 -11.31 0.005 8.15 15.54 -1.94 9.38 0.006 -6.24 1.15 -16.32 -5.01 -14.39 0.00

Tukey HSD Multiple Comparisons.Matrix of pairwise comparison probabilities:

1 2 3 4 5 61 1.002 0.36 1.003 0.16 0.00 1.004 1.00 0.56 0.08 1.005 0.43 0.01 1.00 0.28 1.006 0.59 1.00 0.01 0.78 0.03 1.00

Kolmogorov-Smirnov One Sample Test using Normal (0.00,1.00) distribution

Variable N-of-Cases MaxDif Lilliefors Probability (2-tail)Apparicine 36.00 0.10 0.43

0 8 16 24Diameter at breast height [cm]

0

2000

4000

6000

8000

Tot

al b

ark

volu

me

with

in 2

m p

lant

hei

ght [

kg]

Fig. 10. Relationship between total bark volume ot the bottom 2 metres of the plant and diameter at breast height inRytigynia kiwuensis, Rubiaceae (data from Kamatenesi 1997).

[cm

3 ]

Page 34: Quantitative ethnobotany: applications of multivariate and ...pure.au.dk/portal/files/17477574/HoeftBarikLykke1999.pdf · Some wild plant resources are severely threatened by habitat

PEOPLE AND PLANTS WORKING PAPER 6, JUNE 1999Quantitative ethnobotanyM. HÖFT, S.K. BARIK & A.M. LYKKE

For the correction of ties among AGE: there are two measurements of 70, three of 51, two of 50, two of 44, two of 38and two of 28

(23 − 2)+ (33 − 3) + (23 − 2)+ (23 − 2) + (23 − 2)+ (23 − 2)ΣτX = = 2.57

21

For the correction of ties among HEADLOADS: there are two measurements of 53, three of 51, two of 45 five of 40and two of 38.

(23 − 2) + (33 − 3) + (23 − 2) + (53 − 5) + (23 − 2)ΣτY = = 7.71

21

(213 − 21) / 6 − 1627 − 2.57 − 7.71) -5.66(rs)c = = = -0.0017

√√[( 213 − 21) / 6 − 2 (2.57)] [(213 − 21) / 6 − 2 (7.71)] 3246.36

Box 16. Computation of Spearman’s rank correlation coefficient (rs) usingage and fire wood collection data [kg/d] of women in Ragati Forest,Mt. Kenya (data from Wanja Waiganjo, 1999).

Age (X) Headload(Y) rank of X rank of Y di di2

40 53 8 18.5 -10.5 110.2541 51 9 16 -7 4950 51 13.5 16 -2.5 6.2544 71 10.5 21 -10.5 110.2538 51 6.5 16 -9.5 90.2528 40 2.5 7 -4.5 20.2550 44 13.5 11 2.5 6.2555 41 18 10 8 6430 37 5 2 3 951 45 16 12.5 3.5 12.2538 40 6.5 7 -0.5 0.2544 38 10.5 3.5 7 4928 53 2.5 18.5 -16 25629 40 4 7 -3 945 60 12 20 -8 6470 50 20.5 14 6.5 42.2565 40 19 7 12 14451 38 16 3.5 12.5 156.2570 35 20.5 1 19.5 380.2551 45 16 12.5 3.5 12.2523 40 1 7 -6 36

n = 21

Σ di2 = 1627

To test H0: (rs)0.05, 21 = 0.370 (critical value)

rs < 0.370, therefore do not reject H0 (p = 0.05)

n6 Σ di

2i=1

rs = 1 −n3 - n

6 (1627)rs = 1 −

9240

rs = -0.056

To test H0: (rs)0.05,21 = 0.370 (critical value)

(rs)c < 0.370, therefore do not reject H0 (p = 0.05)

32

CorrelationIn the following example it was tested whetherthere is a significant correlation between the age(X) and the amount of firewood (Y) collected by

21 women in Ragati forest, Mt. Kenya. Since dataare not normally distributed Spearman’s rank cor-relation coefficient (rs) wss calculated. The nullhypothesis of no correlation takes the form

H0: ρ = 0.

Page 35: Quantitative ethnobotany: applications of multivariate and ...pure.au.dk/portal/files/17477574/HoeftBarikLykke1999.pdf · Some wild plant resources are severely threatened by habitat

The Greek rho represents the correlation coeffi-cient. Box 16 (page 32) details major stepsinvolved in hypothesis testing and special atten-tion is paid to the correction of data which appearin a column more than once (“tied data”). The“critical value” can be obtained from statisticaltables. Standard computer packages calculatethese steps automatically. If the calculated corre-lation coefficient of the actual data, rs is smallerthan the critical value the null hypothesis of nocorrelation must be accepted. In this examplethere is thus no significant correlation between ageand average amount of firewood collected.

Chi-square analysis of contingency tables

Chi-square analysis was applied to test whetheror not male and female herbalists prefer differ-ent habitat types for the collection of medicinalplants (see Table 3). The frequency of mention

is shown in a two-dimensional contingencytable and the expected frequencies are indicatedin brackets below the actual observations. Atypical computer output (SYSTAT) is given inBox 17 (see Appendix, Table 23, page 42 for amore detailed output of analysis results). Log-linear modelling is applied to predict cell fre-quencies in multi-way tables. Some theoreticalbackground on the analysis of multi-dimension-al contingency tables and the prediction of cellfrequencies is given in the Appendix (page 44.)

Alternatively, Table 3 can be analysed in adifferent way. The values for female and maleprofessionals and for female and male non-pro-fessionals can be combined to yield two groups:female and male users. These groups can be test-ed on paired combinations of habitats.Furthermore, pairwise comparisons can be madebetween female and male professionals andfemale and male non-professionals for each possible pair of habitats.

PEOPLE ANDPLANTS WORKING PAPER 6, JUNE 1999

Quantitative ethnobotany

33

Table 3. Two-dimensional contingency table with two variables (habitats and profes-sional category) showing the frequency of mention of the habitat preferencefor the collection of medicinal plants around Bwindi Impenetrable Forest (data from Kyoshabire 1998).

Habitat typeProfessional category Garden Early fallow Old fallow Bushland Forest N

Traditional birth attendants (f) 17 42 46 39 40 184(expected) (15.5) (38.1) (40.5) (42.3) (47.8)

Women herbalists (f) 15 42 43 43 36 179(expected) (15.0) (37.1) (39.4) (41.1) (46.4)

Herbalists (m) 24 47 54 59 74 258(expected) (21.7) (53.5) (56.7) (59.3) (66.9)

Non-specialists (m) 4 17 14 23 35 93(expected) (7.8) (19.3) (20.5) (21.4) (24.1)

Box 17. Cross-tabulation statistics for Table 3.

Test statistics Value df p (probability)

Pearson Chi-square 17.34 12 0.14Likelihood ratio Chi-square 17.59 12 0.13

Coefficient Value Asymptotic SE (standard error)

Phi 0.16Cramer V 0.09Contingency 0.15Goodman-Kruskal Gamma -0.01 0.04Spearman Rho -0.01 0.04Lambda (column dependent) 0.02 0.02Uncertainty (column dependent) 0.01 0.00

Page 36: Quantitative ethnobotany: applications of multivariate and ...pure.au.dk/portal/files/17477574/HoeftBarikLykke1999.pdf · Some wild plant resources are severely threatened by habitat

PEOPLE AND PLANTS WORKING PAPER 6, JUNE 1999Quantitative ethnobotanyM. HÖFT, S.K. BARIK & A.M. LYKKE

34

Agresti, A. 1996. An introduction to categoricaldata analysis. John Wiley & Sons, New York.

Berenson, M.L., Levine, D.M. & Goldstein, M.1983. Intermediate statistical methods andapplications. A computer package approach.Prentice-Hall, Inc., Englewood Cliffs, NewJersey.

Bishop, Y.M.M., Fienberg, S.E. & Holland, P.W.1995. Discrete multivariate analysis. Theoryand practice.The Massachusetts Institute ofTechnology, USA.

BMDP Statistical Software Inc. 1999. BMDPStatistical Software. Los Angeles,California, USA.

Bray, J.R. & Curtis, J.T. 1957. An ordination ofthe upland forest communities of southernWisconsin. Ecol. Monographs. 27: 325-349.

Casgrain, P. 1999. R-package Version 4.University of Montreal, Canada.

Causton, D.R. 1988. Introduction to vegetationanalysis. Unwin Hyman, London.

Fischer, H. & Bemmerlein, F. 1986. NumerischeMethoden in der Ökologie. UnpublishedManuscript. University of Erlangen,Germany.

Gnanadesikan, R. 1977. Methods for statisticaldata analysis of multivariate observations.John Wiley & Sons, New York, USA.

Hill, M.O. 1979. TWINSPAN, a FORTRANprogram for arranging multivariate data in anordered two-way table by classification ofthe individuals and attributes. Section ofEcology and Systematics, CornellUniversity, Ithaca, New York.

Höft, M.G. 1995. Aut-ecological studies onTabernaemontana pachysiphonStapf andRauvolfia mombasianaStapf (Apocynaceae)in the Shimba Hills (Kenya) with special ref-erence to their alkaloid contents. BayreutherForum Ökologie, vol. 17, Bayreuth, Germany.

Johnson, R.A. & Wichern, D.W. 1988. Appliedmultivariate statistical analysis. Prentice -Hall, Inc., Englewood Cliffs, New Jersey.

Jongman, R.H.G., ter Braak, C.J.F. & vanTongeren, O.F.R. 1987. Data analysis incommunity and landscape ecology. Pudoc,Wageningen.

Kachigan, S.K. 1986. Statistical analysis: aninterdisciplinary introduction to univariateand multivariate methods . Radius Press.

Kamatenesi, M.M. 1997. Utilization of the med-icinal plant ’nyakibazi’ (Rytigynia spp.) inthe multiple use zones of BwindiImpenetrable National Park, Uganda.Unpublished M.Sc. thesis. MakerereUniversity, Uganda.

Kent, M. & Coker, P. 1994. Vegetation designand analysis. A practical approach. Wiley,Chichester.

Kyoshabire. M. 1998. Medicinal plants and theherbalist preferences around BwindiImpenetrable Forest, Uganda. UnpublishedM.Sc. thesis. Makere University, Uganda.

Lance, G. N. & Williams W. T. 1967. A generaltheory for classificatory sorting strategies. 1.Hierarchical systems. Computer Journal 9:373-380.

Legendre, P. & Legendre, L. 1998. Numericalecology. Elsevier Scientific PublishingCompany, Amsterdam.

Legendre, P. & Vaudor, A. (1991). The RPackage: Multidimensional analysis, spatialanalysis. Département de sciencesbiologiques, Université de Montréal,Montreal.

Ludwig, J.A. & Reynolds, J.F. 1988. Statisticalecology. A primer on methods and comput-ing. John Wiley & Sons, New York.

MjM Software Design. 1999. PC-ORD forWindows Version 4. Melillo ConsultingInc., Somerset, New Jersey, USA.

Orlóci, L. 1978. Multivariate analysis in vegeta-tion research. Dr. W. Junk b. v., Publishers,The Hague.

PC-ORD. Multivariate analysis of ecologicaldata. MjM Software Design, Glenden Beach.

Peters, C.M. 1996. Beyond nomenclature anduse: a review of ecological methods for eth-nobotanists. In: M.N. Alexiades (ed.),Selected Guidelines for EthnobotanicalResearch. The New York Botanical Garden:241-276.

Phillips, O.L.B. & Gentry, A.H. 1993a. The use-ful plants of Tambopata, Peru. I: Statisticalhypothesis tests with a new quantitativetechnique. Economic Botany. 47: 15-32.

Phillips, O.L.B. & Gentry, A.H. 1993b. The use-ful plants of Tambopata, Peru. II: Additionalhypothesis testing in quantitative ethno-botany. Economic Botany. 47: 15-32.

References

Page 37: Quantitative ethnobotany: applications of multivariate and ...pure.au.dk/portal/files/17477574/HoeftBarikLykke1999.pdf · Some wild plant resources are severely threatened by habitat

PEOPLE AND PLANTS WORKING PAPER 6, JUNE 1999Quantitative ethnobotany

M. HÖFT, S.K. BARIK & A.M. LYKKE

Pielou, E.C. 1984. The interpretation of ecologi-cal data. A primer on classification and ordi-nation. Wiley, New York.

Poole, R.W. 1974. An introduction to quantita-tive ecology. McGraw-Hill, Kosakusha Ltd.,Tokyo.

Prance, G.T. 1991. What is ethnobotany today?Journal of Ethnopharmacology. 32: 209-216.

Rohlf, F.J. 1985. NTSYS - Numerical TaxonomySystem of Multivariate Statistical Programs.Cornell Theory Center SoftwareDocumentation, Ithaca, New York, USA.

Romesburg, H.C. 1984. Cluster analysis forresearchers. Wadsworth Inc., LifetimeLearning, Belmont, California.

SAS Institute Inc. 1999. SAS for Windows,Version 6.12. SAS Institute Inc., Cary, NorthCarolina, USA.

Sneath, P.H.A. & Sokal, R.R. 1973. Numericaltaxonomy. Freeman, San Francisco, CA.

Sokal, R.R. & Rohlf, F.J. 1995. Biometry.Freeman, New York.

SPSS Inc. 1999. SPSS Version 10. SPSS Inc.Chicago, Illinois, USA.

SPSS Inc. 1998. Systat Version 8. SPSS Inc.Chicago, Illinois, USA.

Tabachnick, B.G. & Fidell, L.S. 1996. Using mul-tivariate statistics. Harper Collins, New York.

Ter Braak, C.J.F. 1988a. CANOCO - a Fortranprogram for canonical community ordina-tion by (partial) (detrended) (canonical) cor-respondence analysis, principal componentanalysis and redundancy analysis.

Agricultural Mathematics Group,Wageningen.

Ter Braak, C.J.F. 1988b. CANOCO - an extensionof DEDORANA to analyze species-environ-ment relationships. Vegetation 75: 159-160.

Wanja Waiganjo, F. 1999. Forest plants used inRagati, Mt. Kenya: their taxonomy, exploita-tion, economic values and conservation sta-tus. Unpublished M.Sc. thesis, KenyattaUniversity, Kenya.

Zar, J. H. 1996. Biostatistical Analysis. Thirdedition, Prentice- Hall International Inc.,New Jersey.

Further reading:

Bailey, N.T. 1981. Statistical methods in biolo-gy. Hodder and Stoughton. London, Sydney,Auckland, Toronto.

Digby, P.G.N. & Kempton, R.A. 1987.Multivariate analysis of ecological commu-nities. Chapman & Hall, London.

Feoli, E. & Orlóci, L. 1991. Computer assistedvegetation analysis. In: H. Lieth: Handbookof vegetation science. Kluwer, Dordrecht.

Gauch, H.G. 1982. Multivariate analysis in com-munity ecology. Cambridge UniversityPress, Cambridge.

Greig-Smith, P. 1983. Quantitative plant ecolo-gy. Blackwell Scientific Publishers, Oxford.

Krebs, C.J. 1989. Ecological methodology.Harper & Row, New York.

Krzanowski, W.J. & Mariott, H.H.C. 1994.Multivariate analysis. Arnold, London.

35

Having a manuscript on statistical methodsfor ethnobotanists has been a wish expressed bymany students supported over the years by thePeople and Plants Initiative. The current workingpaper evolved from a training course on quanti-tative methods in ethnobiology which took placefrom 20 August to 1 September in Nairobi andKilifi, Kenya. This Course was held essentiallyby Javier Caballero Nieto and Gary Martin andused the Kenyan woodcarving industry as anexample. From the material developed during theCourse S.K. Barik wrote a manual on clusteranalysis. This was complemented by a discussionof the application of several multivariate and sta-tistical methods using various kinds of data sets,mostly from work supported by People andPlants.

The authors wish to thank RemigiusBukenya-Ziraba, Tony Cunningham, JeremyMidgley and John Tabuti for their invaluablesuggestions and comments on various versionsof this Working Paper. Malcolm Hadley,Timothy Johns and Ebi Kimanani carefully readthe manuscript and made useful suggestions forimprovement. Nevertheless, we realize that thisdocument still has shortcomings and possiblysome errors which may have been overlookedand the authors assume the full responsiblity forthose.

The multidisciplinary Sahel-Sudan Environ-mental Research Initiative (SEREIN) of theDanish International Development AgencyDANIDA through financial support enabled AnneMette Lykke to contibute to this publication.

Acknowledgements

Page 38: Quantitative ethnobotany: applications of multivariate and ...pure.au.dk/portal/files/17477574/HoeftBarikLykke1999.pdf · Some wild plant resources are severely threatened by habitat

PEOPLE AND PLANTS WORKING PAPER 6, JUNE 1999Quantitative ethnobotanyM. HÖFT, S.K. BARIK & A.M. LYKKE

36

Table 4. Basic data matrix for the ‘Wood identification task’.

Species Respondents

R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R13 R14 R15 R16

S1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

S2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0

S3 1 1 1 1 1 1 1 0 0 1 1 1 0 1 1 0

S4 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1

S5 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 0

S6 1 1 1 1 1 1 0 1 1 1 1 0 1 0 1 0

S7 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1

S8 1 1 1 1 1 1 1 0 0 0 1 1 0 0 1 0

Table 5. Basic data matrix for the ‘Paired comparison of wood species’.

Species Respondents

R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R13 R14 R15 R16

S1 1 1 1 2 2 1 2 1 1 1 1 1 1 4 1 2S2 5 5 5 5 4 5 4 5 5 2 5 5 5 5 5 5S3 3 3 1 2 3 4 5 4 4 2 4 4 4 1 3 3S4 2 2 1 1 1 2 1 3 2 5 2 2 2 1 2 1S5 4 4 1 4 4 3 3 2 3 2 3 3 3 1 3 4

Tables 4 through 14 and figures 11 and 12 detailthe results of matrix transformations applied incluster analysis. The necessary transformationsteps are summarized in Box 8 (page 18).

Table 4 presents the responses (yes/no) of 16woodcarvers who were asked to identify pieces ofwood from eight different species. To demonstratethe following transformations, the last threecolumns and rows of the basic wood identificationdata matrix (Table 4) have been taken separatelyto constitute a smaller 3 x 3 data matrix (Table 7).The matrix represents the ability of the respondent(R14...R16) to identify the wood species (S6...S8).In a normal mode analysis the degree of similari-ty of the responses must be determined for eachpair of respondents (i.e. columns of the datamatrix). In this example, R14 and R16 could iden-tify only one species i.e. S7, while R15 could iden-tify all the three species. Hence, R14 and R16 aremore similar to each other than R15.

In order to compute the similarity index it isnecessary to be familiar with the concept and

terms relating to an association or similarity con-tingency table. Table 8 shows a typical contin-gency table using binary (yes/no coded as 1/0)data for two respondents (R14 and R15). The casethat both respondents recognize the speciesoccurs one time, the case that none of them rec-ognizes the species occurs zero times, etc. Aresemblance matrix for all respondents is subse-quently built from the 2 x 2 similarity contin-gency tables (Table 9).

Appendix

Table 6. The standardized data matrix

R1 R2 R3 R4 R5

S1 0.538 0.538 0.538 0.691 0.691

S2 0.394 0.394 0.394 0.394 0.867

S3 0.109 0.109 1.852 0.980 0.109

S4 0.122 0.122 0.854 0.854 0.854

S5 1.065 1.065 1.942 1.065 1.065

Page 39: Quantitative ethnobotany: applications of multivariate and ...pure.au.dk/portal/files/17477574/HoeftBarikLykke1999.pdf · Some wild plant resources are severely threatened by habitat

PEOPLE AND PLANTS WORKING PAPER 6, JUNE 1999Quantitative ethnobotany

M. HÖFT, S.K. BARIK & A.M. LYKKE

37

Table 7. A 3 x 3 data matrix constructed fromthe wood identification data matrix(Table 4) by taking its last 3 rows andcolumns for demonstration of thecomputation of similarity functions.

Respondents

Species R14 R15 R16

S6 0 1 0

S7 1 1 1

S8 0 1 0

Table 8. A 2 x 2 contingency similarity tablebased on the data from Table 7 for theRespondents R14 and R15.

Respondent R15

Yes No

Respondent R14 Yes 1 0 1 + 0 = 1

No 2 0 2 + 0 = 2

1 + 2 = 3

0 + 0 = 0

1 + 0 + 2 + 0 = 3

R1 1.000

R2 1.000 1.000

R3 1.000 1.000 1.000

R4 1.000 1.000 1.000 1.000

R5 1.000 1.000 1.000 1.000 1.000

R6 0.875 0.875 0.875 0.875 0.875 1.000

R7 0.875 0.875 0.875 0.875 0.875 0.750 1.000

R8 0.750 0.750 0.750 0.750 0.750 0.625 0.625 1.000

R9 0.750 0.750 0.750 0.750 0.750 0.625 0.625 1.000 1.000

R10 0.625 0.625 0.625 0.625 0.625 0.714 0.500 0.571 0.571 1.000

R11 1.000 1.000 1.000 1.000 1.000 0.875 0.875 0.750 0.750 0.625 1.000

R12 0.875 0.875 0.875 0.875 0.875 0.750 1.000 0.625 0.625 0.500 0.875 1.000

R13 0.625 0.625 0.625 0.625 0.625 0.500 0.500 0.833 0.833 0.429 0.625 0.500 1.000

R14 0.625 0.625 0.625 0.625 0.625 0.500 0.714 0.571 0.571 0.429 0.625 0.714 0.667 1.000

R15 1.000 1.000 1.000 1.000 1.000 0.875 0.875 0.750 0.750 0.625 1.000 0.875 0.625 0.625 1.000

R16 0.375 0.375 0.375 0.375 0.375 0..250 0.429 0.500 0.500 0.143 0.375 0.429 0.600 0.600 0.375 1.000

Table 9. Resemblance matrix based on Jaccard'sindex (measure of similarity) for ‘Wood identi-fication task’ data matrix. The matrix isderived from the basic data matrix (Table 4).

for the ‘Paired comparison of wood species’ data matrix.

R6 R7 R8 R9 R10 R11 R12 R13 R14 R15 R16

0.538 0.691 0.538 0.538 0.538 0.538 0.538 0.538 3.148 0.538 0.691

0.394 0.867 0.394 0.394 3.388 0.394 0.394 0.394 0.394 0.394 0.394

0.763 1.634 0.763 0.763 0.980 0.763 0.763 0.763 1.852 0.109 0.109

0.122 0.854 1.098 0.122 3.050 0.122 0.122 0.122 0.854 0.122 0.854

0.063 0.063 0.939 0.063 0.939 0.063 0.063 0.063 1.942 0.063 1.065

Page 40: Quantitative ethnobotany: applications of multivariate and ...pure.au.dk/portal/files/17477574/HoeftBarikLykke1999.pdf · Some wild plant resources are severely threatened by habitat

PEOPLE AND PLANTS WORKING PAPER 6, JUNE 1999Quantitative ethnobotanyM. HÖFT, S.K. BARIK & A.M. LYKKE

38

R1 0.000

R2 0.000 0.000

R3 0.000 0.000 0.000

R4 2.46 2.461 0.000 0.000

R5 4.027 4.027 0.000 8.824 0.000

R6 1.078 1.078 0.000 4.423 7.315 0.000

R7 3.878 3.878 0.000 4.964 4.627 2.949 0.000

R8 2.250 2.250 0.000 6.339 10.398 1.251 4.371 0.000

R9 1.078 1.078 0.000 4.423 7.315 0.000 2.949 1.251 0.000

R10 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

R11 1.078 1.078 0.000 4.423 7.315 0.000 2.949 1.251 0.000 0.000 0.000

R12 1.078 1.078 0.000 4.423 7.315 0.000 2.949 1.251 0.000 0.000 0.000 0.000

R13 1.078 1.078 0.000 4.423 7.315 0.000 2.949 1.251 0.000 0.000 0.000 0.000 0.000

R14 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

R15 1.157 1.157 0.000 16.471 0.000 1.184 8.691 4.016 1.184 0.000 1.184 1.184 1.184 0.000 0.000

R16 1.039 1.039 0.000 0.580 1.133 2.049 2.161 3.083 2.049 0.000 2.049 2.049 2.049 88.112 2.865 0.000

Table 10. Resemblance matrix based on Bray-Curtis Index (mea-sure of dissimilarity) for the 'Paired comparison ofwood species'. The resemblance matrix was derivedfrom the standardized data matrix (Table 6).

Table 11. Tree matrix for the‘Wood identificationtask’ data. The clus-tering was per-formed on theresemblance matrixpresented in Table 9.

R1 1.000R2 1.000R3 1.000R4 1.000R5 1.000R15 1.000R11 0.875R6 0.859R7 1.000R12 0.671R8 1.000R9 0.833R13 0.624R14 0.578R10 0.405R16 -----

Table 12. Tree matrix for the‘Paired comparisonof wood species’data. The clusteringwas performed onthe resemblancematrix presented inTable 10.

R1 0.0000000R2 0.0000000R10 0.0000000R14 0.5390000R6 0.0000000R13 0.0000000R9 0.0000000R11 0.0000000R12 1.1950000R8 3.3638500R5 0.0000000R15 4.2995208R3 0.0000000R4 0.2900000R16 2.3750000R7 ---

The tree matrix in Table 11 shows similaritieslinearly transformed into distances. Since thetree does not exactly represent the data matrix,a correlation coefficient is calculated to getsome measure of matching degree between tree

and resemblance matrix. This relationship isgraphically depicted in Figures 11 and 12(Page 39). Tables 13 and 14 show the results ofcophenetic matching between each pair ofrespondents.

Page 41: Quantitative ethnobotany: applications of multivariate and ...pure.au.dk/portal/files/17477574/HoeftBarikLykke1999.pdf · Some wild plant resources are severely threatened by habitat

PEOPLE AND PLANTS WORKING PAPER 6, JUNE 1999Quantitative ethnobotany

M. HÖFT, S.K. BARIK & A.M. LYKKE

39

R1 1

R2 1.000 1

R3 1.000 1.000 1

R4 1.000 1.000 1.000 1

R5 1.000 1.000 1.000 1.000 1

R6 0.875 0.875 0.875 0.875 0.875 1

R7 0.859 0.859 0.859 0.859 0.859 0.859 1

R8 0.671 0.671 0.671 0.671 0.671 0.671 0.671 1

R9 0.671 0.671 0.671 0.671 0.671 0.671 0.671 1.000 1

R10 0.578 0.578 0.578 0.578 0.578 0.578 0.578 0.578 0.578 1

R11 1.000 1.000 1.000 1.000 1.000 0.875 0.859 0.671 0.671 0.578 1

R12 0.859 0.859 0.859 0.859 0.859 0.859 1.000 0.671 0.671 0.578 0.859 1

R13 0.671 0.671 0.671 0.671 0.671 0.671 0.671 0.833 0.833 0.578 0.671 0.671 1

R14 0.624 0.624 0.624 0.624 0.624 0.624 0.624 0.624 0.624 0.578 0.624 0.624 0.624 1

R15 1.000 1.000 1.000 1.000 1.000 0.875 0.859 0.671 0.671 0.578 1.000 0.859 0.671 0.624 1

R16 0.405 0.405 0.405 0.405 0.405 0.405 0.405 0.405 0.405 0.405 0.405 0.405 0.405 0.405 0.405 1

Table 13. Cophenetic matrix for the ‘Wood identifica-tion task’ data. The cophenetic matrix wasderived from the tree matrix presented inTable 11.

R1 0.000

R2 0.000 0

R3 4.300 4.300 0

R4 4.300 4.300 0.000 0

R5 3.364 3.364 4.300 4.300 0

R6 0.539 0.539 4.300 4.300 3.364 0

R7 4.300 4.300 2.375 2.375 4.300 4.300 0

R8 1.195 1.195 4.300 4.300 3.364 1.195 4.300 0

R9 0.539 0.539 4.300 4.300 3.364 0.000 4.300 1.195 0

R10 0.000 0.000 4.300 4.300 3.364 0.539 4.300 1.195 0.539 0

R11 0.539 0.539 4.300 4.300 3.364 0.000 4.300 1.195 0.000 0.539 0

R12 0.539 0.539 4.300 4.300 3.364 0.000 4.300 1.195 0.000 0.539 0.000 0

R13 0.539 0.539 4.300 4.300 3.364 0.000 4.300 1.195 0.000 0.539 0.000 0.000 0

R14 0.000 0.000 4.300 4.300 3.364 0.539 4.300 1.195 0.539 0.000 0.539 0.539 0.539 0

R15 3.364 3.364 4.300 4.300 0.000 3.364 4.300 3.364 3.364 3.364 3.364 3.364 3.364 3.364 0

R16 4.300 4.300 0.290 0.290 4.300 4.300 2.375 4.300 4.300 4.300 4.300 4.300 4.300 4.300 4.300 0

Table 14. Cophenetic matrix for the ‘Paired comparisonof wood species’ data. The cophenetic matrixwas derived from the tree matrix presented inTable 12.

0.0

1.2

2.4

4.8

3.6

0 1005025 75

0.4

0.6

0.8

1.2

1.0

0.0 1.00.50.3 0.8

Figure 12. Matrix comparison plot showing the rela-tionship between cophenetic matrix and resemblancematrix in ‘Paired comparison of wood species’.

Figure 11. Matrix comparison plot showing the relationship between cophenetic matrix and resemblance matrix in the ‘Wood identification’ task.

Page 42: Quantitative ethnobotany: applications of multivariate and ...pure.au.dk/portal/files/17477574/HoeftBarikLykke1999.pdf · Some wild plant resources are severely threatened by habitat

PEOPLE AND PLANTS WORKING PAPER 6, JUNE 1999Quantitative ethnobotanyM. HÖFT, S.K. BARIK & A.M. LYKKE

40

Tables 15 through 22 detail the matrix transfor-mations necessary for principal componentanalysis. The steps involved are summarised inBox 9 (page 20). In contrast to cluster analysis, the focus in this exercise is onthe correlation between species used for wood-carving and not on the correlation betweenrespondents itself. Hence, any derived matricesare computed across rows and not across

columns. Computation of the resemblance matri-ces for paired comparison of wood species(Tables 15 and 16) is carried out on the stan-dardised data matrix (Table 6). Decentering issubsequently done to distribute the distancesmore equally among the variables. Tables 17 and18 show the results of decentering the respectiveresemblance matrices.

Table 15. Resemblance matrix based on simple matching coefficients for the ‘Woodidentification’ task data.*

S1 1.0000000

S2 0.9375000 1.0000000

S3 0.7500000 0.8125000 1.0000000

S4 0.9375000 0.8750000 0.6875000 1.0000000

S5 0.8125000 0.8750000 0.8125000 0.7500000 1.0000000

S6 0.7500000 0.8125000 0.6250000 0.6875000 0.8125000 1.0000000

S7 0.8750000 0.8125000 0.6250000 0.9375000 0.6875000 0.6250000 1.0000000

S8 0.6250000 0.6875000 0.8750000 0.6875000 0.8125000 0.6250000 0.6250000 1.0000000

*The analysis is across the rows and was performed on the basic data matrix at Table 3.

Table 16. Resemblance matrix based on variance-covariance across the rows for the ‘Pairedcomparison of wood species’ data. Theresemblance matrix was derived from thestandardized data matrix (Table 6).

S1 1.0000000

S2 0.0193613 1.0000000

S3 -0.4193588 0.1190259 1.0000000

S4 -0.4895840 -0.6254190 0.0708739 1.0000000

S5 -0.2103228 0.1421269 0.4148225 -0.2037374 1.0000000

Table 17. The transformed matrix after decentering ofthe resemblance matrix (Table 15) for the‘Wood identification task’.

S1 0.123

S2 0.045 0.092

S3 -0.064 -0.018 0.248

S4 0.076 -0.002 -0.111 0.154

S5 -0.049 -0.00 0.014 -0.096 0.154

S6 -0.033 0.014 -0.096 -0.080 0.045 0.311

S7 0.061 -0.018 -0.127 0.139 -0.111 -0.096 0.248

S8 -0.158 -0.111 0.154 -0.080 0.045 -0.064 -0.096 0.311

Page 43: Quantitative ethnobotany: applications of multivariate and ...pure.au.dk/portal/files/17477574/HoeftBarikLykke1999.pdf · Some wild plant resources are severely threatened by habitat

PEOPLE AND PLANTS WORKING PAPER 6, JUNE 1999Quantitative ethnobotany

M. HÖFT, S.K. BARIK & A.M. LYKKE

41

Table 19. Eigen-values (λλ) for the principalcomponents for the data on the‘Wood identification task’. Thevalues are derived from thedecentred matrix presented inTable 17.

i Eigen value Percent Cumulative

1 0.70775 43.14 43.14

2 0.45658 27.83 70.97

3 0.23233 14.16 85.13

4 0.12500 7.62 92.75

5 0.06686 4.08 96.82

6 0.04334 2.64 99.47

7 0.00877 0.53 100.00

8 0.00000 0.00 100.00

Table 18. The transformed matrix afterdecentering of the resemblancematrix (Table 16) for the ‘Pairedcomparison of wood species’.

S1 1.145

S2 0.014 0.843

S3 -0.531 -0.144 0.631

S4 -0.315 -0.601 -0.011 1.205

S5 -0.313 -0.112 0.055 -0.277 0.648

Table 22. Eigen-vector matrix (U) with theloading of each character in eachprincipal component for the dataon the ‘Paired comparison ofwood species’.

i PC1 PC2 PC3

1 0.7290292 -0.7280290 0.2209689

2 0.6063664 0.3678820 -0.5407599

3 -0.3844371 0.4585559 -0.0635704

4 -0.8926225 -0.5732191 -0.1974509

5 -0.0583360 0.4748102 0.5808122

Table 20. Eigen-values (λλ) for the principalcomponents for the data on‘Paired comparison of woodspecies’. The values are derivedfrom the decentred matrix pre-sented in Table 18.

i Eigen-value Percent Cumulative

1 1.84713 41.30 41.30

2 1.42966 31.96 73.26

3 0.72162 16.13 89.39

4 0.47447 10.61 100.00

5 0.00000 0.00 100.00

Table 21. Eigen-vector matrix (U) with theloading of each character in eachprincipal component for the dataon ‘Wood identification task’.

i PC1 PC2 PC3

1 0.2695094 -0.0062328 -0.1723919

2 0.1000234 -0.1082403 -0.2080325

3 -0.3723312 0.1829196 -0.2211452

4 0.3116364 0.1593744 0.0754053

5 -0.2026852 -0.1634770 -0.0267113

6 -0.0330392 -0.5098736 0.1677226

7 0.3784826 0.2194557 0.1632280

8 -0.4515962 0.2260740 0.2219250

Eigen-values are then computed for the principalcomponents (sources of variation for eachspecies) to arrive at some measure of variancefor a particular principal component (Tables 19and 20). Eigen-vectors pertain to the variables(species) itself and define the principal compo-nent (Tables 21 and 22). From Tables 19 and 20it can be seen that the first three principal com-ponents account for 85.13% and 89.39% (cumu-lative percentages) of the variances in the ‘Woodidentification’ and the ‘Paired comparison ofwood species’ tasks, respectively. Principalcomponents accounting for the remaining per-centages can be left aside in further analysis.Tables 21 and 22 show the ‘loading’ (Eigen-vec-tors) that each species is assigned by each of thethree most important principal components. Ahigher (more positive) loading indicates agreater importance of the principal componentfor the respective species i. In both examples theloading by the principal component 1 is high forspecies 1 and 2.

Page 44: Quantitative ethnobotany: applications of multivariate and ...pure.au.dk/portal/files/17477574/HoeftBarikLykke1999.pdf · Some wild plant resources are severely threatened by habitat

Table 23 details steps in the constructing of thelog-linear model. While the example below istwo-dimensional, log-linear models may be usedto identify structures of multi-dimensional naturein ethnobotanical data. Contingency tables withthree or more dimensions (three or more vari-ables) are often analysed by employing log-linearmodels. The term ‘model’ is an expression forhow the observed frequencies (or counts) areaffected by variables and combinations of vari-

ables. ‘Log-linear’ refers to a procedure wherebya multiplicative relationship is transformed to alinear relationship by the use of logarithms. In theterminology of log-linear models, interactions ofvariables are tested. A null hypothesis for no inter-actions between variables implies that all the vari-ables are independent. If the null hypothesis isrejected, the variables are said to be associated. Inthe example below professional category andhabitat preference are highly associated.

PEOPLEAND PLANTS WORKING PAPER 6, JUNE 1999Quantitative ethnobotany

42

Table 23. Output for log-linear model estimates of Table 3. (data after Kyoshabire, 1999; see text page 33).

Deviations = Observed-Expected==============================

CATEGORY HABITATGarden Early fallow Old fallow Bushland Forest

Traditional birth attendants 1.546 3.86 5.54 -3.26 -7.78Women herbalists -0.04 4.90 3.64 1.89 -10.38Male specialists -0.26 -6.48 -2.73 -0.26 7.15Male non-specialists -3.82 -2.28 -6.45 1.64 10.90

Standardized Deviates = (Obs-Exp)/sqrt(Exp)===========================================

CATEGORY HABITATGarden Early fallow Old fallow Bushland Forest

Traditional birth attendants 0.39 0.63 0.87 -0.50 -1.11Women herbalists -0.01 0.80 0.58 0.29 -1.52Male specialists 0.50 -0.89 -0.36 -0.03 0.87Male non-specialists -1.36 -0.52 -1.43 0.35 2.22

Pearson Chi-square = (Obs-Exp)^2/Exp====================================

CATEGORY HABITATGarden Early fallow Old fallow Bushland Forest

Traditional birth attendants 0.15 0.39 0.76 0.25 1.24Women herbalists 0.0 0.65 0.34 0.09 2.32Male specialists 0.25 0.78 0.13 0.00 0.77Male non-specialists 1.86 0.27 2.03 0.13 4.93

Likelihood Ratio Deviance = 2*(Exp-Obs+Obs*log(Obs/Exp))========================================================

CATEGORY HABITATGarden Early fallow Old fallow Bushland Forest

Traditional birth attendants 0.15 0.38 0.73 0.26 1.31Women herbalists 0.0 0.62 0.33 0.09 2.52Male specialists 0.24 0.82 0.13 0.00 0.74Male non-specialists 2.27 0.28 2.29 0.12 4.32

Page 45: Quantitative ethnobotany: applications of multivariate and ...pure.au.dk/portal/files/17477574/HoeftBarikLykke1999.pdf · Some wild plant resources are severely threatened by habitat

PEOPLE AND PLANTS WORKING PAPER 6, JUNE 1999Quantitative ethnobotany

M. HÖFT, S.K. BARIK & A.M. LYKKE

43

Table 23. continued.

Freeman-Tukey Deviates = sqrt(Obs)+sqrt(Obs+1)-sqrt(4*Exp+1)============================================================

CATEGORY HABITATGarden Early fallow Old fallow Bushland Forest

Traditional birth attendants 0.44 0.65 0.88 -0.47 -1.12Women herbalists 0.05 0.81 0.60 0.33 -1.57Male specialists 0.53 -0.88 -0.33 -0.00 0.88Male non-specialists -1.44 -0.47 -1.48 0.40 2.05

Contribution to log(likelihood) = -Exp+Obs*log(Exp)-log(gamma(Obs+1))=====================================================================

CATEGORY$ HABITAT$Garden Early fallow Old fallow Bushland Forest

Traditional birth attendants -2.41 -2.98 -3.20 -2.88 -3.42Women herbalists -2.28 -3.10 -2.96 -2.84 -3.97Male specialists -2.63 -3.26 -2.98 -2.96 -3.44Male non-specialists -2.77 -2.48 -3.39 -2.55 -4.86

Log-Linear Effects (Lambda)===========================

THETA 3.44

CATEGORY$Traditional birth attendants Women herbalists Male specialists Male non-specialists0.09 0.07 0.43 -0.59

HABITAT$Garden Early fallow Old fallow Bushland Forest-0.80 0.10 0.16 0.21 0.33

Standard Error of Lambda========================

THETA 3.44

CATEGORY$Traditional birth attendants Women herbalists Male specialists Male non-specialists0.07 0.07 0.06 0.08

HABITAT$Garden Early fallow Old fallow Bushland Forest0.11 0.08 0.07 0.07 0.07

Page 46: Quantitative ethnobotany: applications of multivariate and ...pure.au.dk/portal/files/17477574/HoeftBarikLykke1999.pdf · Some wild plant resources are severely threatened by habitat

The following explanations start with a two-dimensional design (after Bishop et al., 1995).The logarithm of the relative odds (cross-productratio α of cells of the contigency table) can also beexpressed as the linear contrast of the log-proba-bilities of the elementary cells:

log αα = logp11 - log p12 - log p21 + log p22

A linear model in the natural logarithms ofthe cell probabilities can be constructed by anal-ogy with analysis of variance models:

log fij = u + u1(i) + u2(j) + u12 (ij )

i = 1,2j = 1,2

where u is the grand mean of the logarithms of

the probabilities:u = 1/4 (logp11 + logp12 + logp21 + logp22)

and u + u1(i) is the mean of the logarithms of theprobabilities at level i of the first variable:u + u1(i) = 1/2 (logpi1 + log pi2 i = 1,2

Similarly for the jth level of the second variable:

u + u2(j) = 1/2 (log pj1 + log pj2 j = 1,2

Since u1(i) and u2(j) represent deviations from thegrand mean u:

u1(1) + u1(2) = u2(1) + u2(2) = 0

Similarly, u12(ij ) represents a deviation fromu + u1(i) + u2(j), so that:

u12(11)= - u12(12)= - u12(21)= u12(22)

PEOPLE AND PLANTS WORKING PAPER 6, JUNE 1999Quantitative ethnobotanyM. HÖFT, S.K. BARIK & A.M. LYKKE

44

Table 23. continued.

Lambda / SE(Lambda)===================

THETA 3.44

CATEGORY$Traditional birth attendants Women herbalists Male specialists Male non-specialists1.41 0.99 7.24 -7.05

HABITAT$Garden Early fallow Old fallow Bushland Forest0.11 1.37 2.20 2.83 4.67

Multiplicative Effects = exp(Lambda)====================================

THETA 31.33

CATEGORY$Traditional birth attendants Women herbalists Male specialists Male non-specialists1.10 1.07 1.07 1.54

HABITAT$Garden Early fallow Old fallow Bushland Forest0.45 1.11 1.18 1.23 1.39

Model ln(MLE): -61.373====================================

Term tested The model without the term Removal of term from modelln(MLE) Chi-Sq df p-value Chi-Sq df p-value

CATEGORY$ -101.862 98.57 15 0.0000 80.98 3 0.0000HABITAT$ -100.124 95.09 16 0.0000 77.50 4 0.0000

Page 47: Quantitative ethnobotany: applications of multivariate and ...pure.au.dk/portal/files/17477574/HoeftBarikLykke1999.pdf · Some wild plant resources are severely threatened by habitat

PEOPLE AND PLANTS WORKING PAPER 6, JUNE 1999Quantitative ethnobotany

M. HÖFT, S.K. BARIK & A.M. LYKKE

45

The additive properties imply that each u-termhas one absolute value for dichotomous vari-ables. By analogy with ANOVA models thegrand mean can be written:

log p+ +u=

4

In contrast to ANOVA, the interest in log-linear models is to test for interactions of factors,while in ANOVA, the focus is on main effects.Instead of using probabilities, the actual countsor frequencies are refered to as m in the follow-ing. A three dimensional model would look likethis:

mijk = exp[(u + u1(i) + u2(j) + u3(k)+ u13(ik) + u23(jk) + u123(ijk )]

The direct estimate would read:

χχi+k * χχ+jk

m̂ =χχ++k

Other notations in publications for log-lin-ear model parameters include the following:

log mij = µ + λiF + λj

S + λijFS

ξij = Θ + λiA + λj

B + λijAB

log mij = µ + λiA + λj

B + λijAB

Gij = Θ + λiF + λj

S + λijFS

or in multiplicative form:

Fij = ηriArj

BrijAB

where: ξ = log(Fij), Θ = logη, λi

A = log(riA) etc.

Like χ2 and G2 tests, log-linear models arebased on contingency tables, however, higherorder interactions are incorporated. A modelinvolves a linear combination of parameters thatare calculated on the basis of the contingencytable. The natural logarithm of expected cell fre-quencies (ln f) is described by the following lin-ear function:

ln (fijk) = Θ + λλAi + λλB

j + λλCk + λλAB

ij

+ λλACjk + λλBC

ik + λλABCijk

where:fijk = expected frequency in row i, column j

and depth k of a three-way contingencytable;

Θ = is an overall mean effect, calculated ofthe logarithms of the expected frequen-cies;

λ = parameters, summing to zero over thelevels of the row factors and the columnfactors.

λAi, λ

Bj and λC

k = main effects of the categoriesi, j and k of variables A, B and C;

λABij , λ

ACik and λBC

jk = second order interactionterms expressing the dependence of acategory of one factor on a category ofanother factor;.

λABCijk = third order interaction term expressing

the mutual dependency of all three vari-ables on each other.

For each cell, the logarithm of the expectedfrequency is the sum of Θ and λ's. Each celltherefore has its own combination of parametersthat are used to predict cell frequency. Theobserved cell frequencies are calculated exactlyby a saturated model that contains all maineffects and interactions. Three steps are suggest-ed for building the model: 1. screening through frequency analysis: vari-

ables that are not found to be significant insimple χ2 and G2 tests are excluded from themodel;

2. choice of an appropriate model; 3. evaluation and interpretation of the selected

model. One aim of modelling is to determine the

minimum of parameters necessary to adequatelydescribe the data set. An incomplete model withfewest effects may be preferred over the com-plete model, since a smaller number of parame-ters eases the interpretation of data. Ideally, oneuses its knowledge on the subject matter of thestudy to determine which effects to include in themodel. In practice, stepwise exclusion of para-meters from log-linear models is done whilemaintaining an adequate fit of expected toobserved cell frequencies. However, if manymodels are tested in search for the ideal model,one must bear in mind that the p value associat-ed with a specific test is valid for one test only. Pvalues may be used as relative measures whentesting several models. In the end the modelincludes only those parameters necessary toreproduce the observed frequency.

Testing for significance of a parameter in amodel is done by comparing two models, onewich includes the parameter in question and onewhich excludes it. The G2 statistic is computedfor each model, and the difference between theG2 values calculated. The calculated difference(a G2 statistic itself) tests the (goodness of) fitbetween observed and expected frequencies, and

Page 48: Quantitative ethnobotany: applications of multivariate and ...pure.au.dk/portal/files/17477574/HoeftBarikLykke1999.pdf · Some wild plant resources are severely threatened by habitat

a good model thus is one where G2 is not signif-icant and H0 retained. In order not to have toomany good models, Tabachnick and Fidell(1996) propose to use a higher than 0.05 α−sig-nificance level, e.g.0.1 or 0.25.

However, testing of hypotheses in log-linearmodels is not a goal in itself. The purpose ofmodelling in ethnobotany is to identify inter-pretable structures in the data and the signifi-cance values of hypotheses are used to guide themodelling process.

Types of modelsGenerally, two types of models exist, hier-

achical and non-hierachical models. Where sig-nificant interactions are present, lower orderterms for all possible combinations of variablesinvolved in the interaction must be included inhierachical models. In contrast, non-hierachicalmodels can be build without this restriction.

Hierachical modelling starts with the mostcomplex models. The three-way interaction in athree-way contingency table is tested for first. Anon significant G2 value indicates little evidencefor three factor interaction, and the l ABCijk termis omitted from the model. A significant three-factor interaction term implies that the degree ofassociation between any pair of variables

depends upon the different categories of the thirdvariable. In such case reduced models should notbe used (Sokal and Rohlf 1995). Analoguous toclassical analysis of varicane, lower order effectscannot be interpreted unambiguously if there arehigher order effects.

Non-hierachical models are mainly of inter-est in testing pre-specified models. As there areno clear statistical criteria for choosing amongnon-hierachical models, they are not recom-mended for model building in general.

The parameters in the model (l's) representincrease or decrease of m for a particular combi-nations of row, column and depth variables.Investigation of these parameters yields informa-tion about the effect of different categories andinteractions on the cell frequencies. A positive 1for a main effect indicates that the frequency inthis category is above average. The interactionparameter indicates how much difference there isbetween the individually and collectively takensums of effects of variables. They represent the"boost" or interference associated with a particu-lar combination of variables (SAS).

Effects with larger standardized parameterestimates are more important in predicting acell's frequency than effects with smaller stan-dardized parameter estimates.

PEOPLE AND PLANTS WORKING PAPER 6, JUNE 1999Quantitative ethnobotanyM. HÖFT, S.K. BARIK & A.M. LYKKE

46

Page 49: Quantitative ethnobotany: applications of multivariate and ...pure.au.dk/portal/files/17477574/HoeftBarikLykke1999.pdf · Some wild plant resources are severely threatened by habitat

Already published in this series:

1. Cunningham, A. B. 1993. African medicinal plants: Setting priorities at the interface between conservationand primary healthcare. (This publication is also available in Spanish.)

2. Cunningham, A. B. and Mbenkum, F.T. 1993. Sustainability of harvesting Prunus africana bark inCameroon: A medicinal plant in international trade.

3. Aumeeruddy, Y. 1994. Local representations and management of agroforests on the periphery of KerinciSeblat National Park, Sumatra, Indonesia. (This publication is also available in French and Spanish.)

4. Cunningham, A. B. 1996. People, park and plant use: Recommendations for multiple-use zones and deve-lopment alternatives around Bwindi Impenetable National Park, Uganda. (This publication is also availablein French.)

5. Wild, R. and Mutebi, J. 1996. Conservation through community use of plant resources. Establishing colla-borative management at Bwindi Impenetrable and Mgahinga Gorilla National Parks, Uganda. (This publica-tion is also available in French.)

Page 50: Quantitative ethnobotany: applications of multivariate and ...pure.au.dk/portal/files/17477574/HoeftBarikLykke1999.pdf · Some wild plant resources are severely threatened by habitat

Contact addresses:

WWF InternationalPlant Conservation OfficerPanda House, Weyside ParkGodalming, Surrey GU7 1XRUNITED KINGDOMFax: 44 1483 426409

Division of Ecological SciencesMan and the Biosphere ProgrammeUNESCO, 7 Place de Fontenoy75352 Paris Cedex 07 SP FRANCEFax: 33 1 45685804

The DirectorRoyal Botanic Gardens, KewRichmond Surrey TW9 3ABUNITED KINGDOMFax: 44 181 3325278

The People and Plants Initiative

was started in July 1992 by WWF, UNESCO and theRoyal Botanic Gardens, Kew to promote the sustain-able and equitable use of plant resources throughproviding support to ethnobotanists from developingcountries.

The initiative stems from the recognition that peoplein rural communities often have detailed andprofound knowledge of the properties and ecology oflocally occurring plants, and rely on them for many oftheir foods, medicines, fuel, building materialsand other products. However, muchof this knowledge is being lost with the transformationof local ecosystems and local cultures. Over-harvesting of non cultivated plants is increasinglycommon, caused by loss of habitat, increase in localuse and the growing demands of trade. Long-termconservation of plant resources and the knowledgeassociated with them is needed for the benefit of thelocal people and for their potential use to localcommunities in other places.

The diversity of traditional plant-resourcemanagement practices runs through a spectrum from“cultivation” through to gathering “wild” plants, all ofwhich are included in the People and Plantsapproach.

Ethnobotanists can work together with local people tostudy and record the uses of plant resources, identifycases of over-harvesting of non-cultivated plants, findsustainable harvesting methods and investigatealternatives such as cultivation.

The People and Plants initiative is building supportfor ethnobotanists from developing countries whowork with local people on issues related to theconservation of both plant resources and traditionalecological knowledge. Key participants organizeparticipatory workshops, undertake discussion andadvisory visits to field projects and provide literatureon ethnobotany, traditional ecological knowledge andsustainable plant resource use. It is hoped that anetwork of ethnobotanists working on these issuesin different countries and regions can be developedto exchange information, share experience andcollaborate on field projects.