Bivariate models for the analysis of internal nitrogen use ...

Bivariate Models for the Analysis of Internal Nitrogen Use

Efficiency: Mixture Models as an Exploratory Tool

Isabel Munoz Santa (Masters of Applied Science in Biometrics)

A thesis submitted for

the degree of Masters of Applied Science in Biometrics

in the University of Adelaide

School of Agriculture Food and Wine

July 2014

Acknowledgments

I would like to express my deepest gratitude to my advisor Dr. Olena Kravchuck for

her expertise, guidance, support and encouragement. This thesis would not have been

possible without her. Thank you for your efforts to make me a better professional and

give me the opportunity to study here!

I would also like to thank my second supervisor Dr. Petra Marschner for her guidance

in the biological aspects of my thesis and Dr. Stephan Haefele who kindly provided

the data for the case study of this thesis and provided support in the interpretation of

the results.

I would like to acknowledge the Faculty of Science for providing the Turner Family

Scholarship which supported this research and provided travel assistance to attend

the Australian Statistical Conference, July 2014 Adelaide, Young Statistician Con-

ference, February 2013 Melbourne, International Biometrics Society Conference, De-

cember 2014 Mandurah and Australian Statistical Conference in conjunction with the

Institute of Mathematical Statistics Annual Meeting, July 2014 Sydney.

I would like to thank all the people at the Biometry Hub: Bev, David, Jules, Paul,

Stephen and Wayne for their big smiles and for creating a positive working environ-

ment as well as the wonderful group of statisticians at Waite, meeting regularly for the

professional development discussions.

Thank you to all the friends I have met in Adelaide, where I have had one of the best

professional, cultural and personal experiences of my life. Thanks to Negar, Mohsen,

Casey, Amanda, Rodrigo, Mariana, Fien, Daniela, Diego, Kanch, Antonio, Maria,

Ruben, Alfonso, Lidia, Pablo, Ana, Antonija, Roey, Chris, Konrad, Diana, Luis and

and Lorinda.

Finally, with deep love and admiration, I thank my family for their love and support

iii

from thousands of kilometres away and Martin for his support, love and contagious

positive vision of life.

iv

Contents

1 Motivations and thesis outline 1

1.1 Feeding the world requires an efficient use of nitrogen fertilisers . . . . 1

1.2 Nitrogen efficiency measures . . . . . . . . . . . . . . . . . . . . . . . . 5

1.3 Strategies for improving the uptake and utilisation of nitrogen by cereals 8

1.4 Review of grain yield and nitrogen uptake analyses in agricultural research 9

1.4.1 Studies selected for the review . . . . . . . . . . . . . . . . . . . 9

1.4.2 Amount and format of grain yield and nitrogen uptake data . . 12

1.4.3 Relationship between grain yield and nitrogen uptake . . . . . . 12

1.4.4 Common methods of analysis of grain yield and nitrogen uptake

field data and their limitations . . . . . . . . . . . . . . . . . . . 13

1.5 Objectives of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

1.6 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2 Ratios of jointly normal variables 21

2.1 Introduction to the ratio of jointly normal variables . . . . . . . . . . . 21

2.2 Distribution of the ratio: history and properties . . . . . . . . . . . . . 23

2.2.1 Geary (1930) and Fieller (1932) expressions of the pdf . . . . . 23

2.2.2 Marsaglia (1965, 2006) expression of the pdf . . . . . . . . . . . 24

2.2.3 Pham-Gia et al. (2006) expression of the pdf . . . . . . . . . . . 27

2.3 Normal approximation of the pdf of the ratio . . . . . . . . . . . . . . . 30

2.4 Estimators of the ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.4.1 Point estimators: average of ratios, ratio of averages . . . . . . . 33

2.4.2 Confidence sets of the ratio of expected values . . . . . . . . . . 35

2.5 On the distributional properties of internal nitrogen use efficiency in rice. 42

v

2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3 Fundamentals of finite mixture models 47

3.1 Non-technical introduction . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.2 Common use of mixture models . . . . . . . . . . . . . . . . . . . . . . 51

3.3 Mathematical definition . . . . . . . . . . . . . . . . . . . . . . . . . . 52

3.4 Classifying data into groups: label random vectors and posterior prob-

abilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

3.5 Maximum likelihood estimation of the mixture parameters . . . . . . . 54

3.6 The EM algorithm for the estimation of mixture parameters . . . . . . 55

3.7 The EM algorithm for the estimation of parameters of mixtures of mul-

tivariate Gaussian distributions . . . . . . . . . . . . . . . . . . . . . . 57

3.7.1 Illustration of the EM algorithm on simulated data . . . . . . . 57

3.8 Difficulties in selecting the MLE of mixture models of Gaussian distri-

butions with heteroscedastic components . . . . . . . . . . . . . . . . . 59

3.8.1 Unboundedness of the likelihood function . . . . . . . . . . . . . 60

3.8.2 Multiple local maxima . . . . . . . . . . . . . . . . . . . . . . . 61

3.8.3 Spuriosities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

3.9 Strategy to select the MLE of mixtures with heteroscedastic Gaussian

components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

3.9.1 Starting strategies for the EM algorithm . . . . . . . . . . . . . 66

3.10 Bayesian approach to estimating parameters of mixture models of mul-

tivariate Gaussian distributions . . . . . . . . . . . . . . . . . . . . . . 67

3.10.1 The Gibbs sampler . . . . . . . . . . . . . . . . . . . . . . . . . 69

3.10.2 The Gibbs sampler for a mixture of multivariate Gaussian dis-

tributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

3.10.3 Label switching problem . . . . . . . . . . . . . . . . . . . . . . 72

3.11 Selecting the number of mixture components . . . . . . . . . . . . . . . 74

3.11.1 Information Criteria . . . . . . . . . . . . . . . . . . . . . . . . 74

3.11.2 Likelihood ratio test for selecting the number of clusters . . . . 76

3.12 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

vi

4 Bivariate models for internal nitrogen use efficiency: mixture models

as an exploratory tool 81

5 Conclusions and future lines of research 111

Appendix A List of studies in the review 117

Appendix B Journal information of the studies in the review 131

Appendix C Equivalence between Pham-Gia et al. (2006) and (Marsaglia,

1965, 2006) expressions of the pdf of the ratio 133

Appendix D Application of the EM algorithm for estimating the param-

eters of a mixture of multivariate Gaussian distributions 135

Appendix E R code for fitting mixtures models of univariate Gaussian

distributions 141

Appendix F R code for fitting mixture models of bivariate Gaussian

distributions 153

vii

Abstract

Ratios are commonly used among plant and soil scientists, in particular to express the

plant nutrient utilisation efficiency of macro- and micro-nutrients. The internal nutri-

ent efficiency can be understood in terms of maximising yield per a unit of nutrient in

the plant. At present, IEN data are usually collected from designed field trials where

different treatments are applied (e.g. fertiliser treatments) and analysed by univariate

linear mixed models. However, univariate linear models on the ratio do not maintain

information on the original traits, including their correlation, which presents a chal-

lenge when interpreting the effect of agronomic practices or environmental conditions

on the process of nutrient conversion into grain. Moreover, the distributional proper-

ties of ratios do not comply with the assumptions of these linear models favoured in

the area of soil and plant science research. A more suitable approach is to collect the

traits of interest and to use bivariate analyses. These analyses preserve the information

on the original traits and avoid issues associated with the ratio distributional properties.

If the data comes from field studies, different experimental and environmental con-

ditions may lead to the presence of patterns (groups) in the data in addition or con-

currently with designed treatments. Researchers in plant and soil sciences may be

interested in identifying those conditions, for example to understand the nature of

genotype-by-environment interactions. The inspection of the groups may reveal the

factors defining them, thus gaining insight into the experimental or environmental

drivers of the biological traits. Among bivariate analyses, bivariate mixture models

of Gaussian distributions are an appropriate methodology for identifying clusters in

the nutrient efficiency data, assuming that the traits are jointly normal. Studying this

methodology for the analysis of the internal nitrogen use efficiency traits is the focus

ix

of the present thesis.

The application of bivariate mixture models is suggested here as a complementary

analysis to bivariate mixed models in designed field trials and for exploratory purposes

only. The exploratory and supplementary character of the mixture analysis is due to

the potential violation of the independence assumption when the data are collected

from designed field trials.

In this project, bivariate mixed and mixture models are applied to a real-life de-

signed field trial on non-irrigated rice in Thailand for the analysis of grain yield (GY )

and plant nitrogen uptake (NU) data. The univariate counterparts of these analyses

are also applied on the ratio of these two traits (the internal nitrogen use efficiency).

The advantages of the bivariate analyses are discussed in comparison to the univari-

ate analyses on the ratio. In this case study, the bivariate mixture approach revealed

that soil water availability post-flowering and N supply in soil are the potential factors

defining the mixture groups.

The present work can be readily extended to the analysis of other similar traits in

agriculture when the objective is to explore potential environmental conditions affecting

the traits under study. In order to fully exploit the proposed methodology, field survey

is suggested as a more appropriate sampling procedure for the application of mixture

models than collecting data from designed field trials.

x

Declaration of originality

I certify that this work contains no material which has been accepted for the award of

any other degree or diploma in my name in any university or other tertiary institution

and, to the best of my knowledge and belief, contains no material previously published

or written by another person, except where reference has been made in the text. In

addition, I certify that no part of this work will, in the future, be used in a submission

in my name for any other degree or diploma in any university or other tertiary insti-

tution without the prior approval of the University of Adelaide and where applicable,

any partner institution responsible for the joint award of this degree.

I give consent to this copy of my thesis, when deposited in the University Library,

being made available for loan and photocopying, subject to the provisions of the Copy-

right Act 1968.

I also give permission for the digital version of my thesis to be made available on

the web, via the University digital research repository, the Library Search and also

through web search engines, unless permission has been granted by the University to

restrict access for a period of time.

xi

Common abbreviations in this thesis

BV N bivariate normal distribution

χ2 chi-square distribution

CV coefficient of variation

ρ correlation coefficient

σxy covariance between random variables X and Y

Cov() covariance operator

cdf cumulative distribution function

CVx CV of a random variable X

p dimension of a random vector

D Dirichlet distribution

∼ distributed as

EM Expectation and Maximisation

µx expected value of a random variable X

µ expected value of a random vector

µiexpected value of the i-th component of a mixture of

multivariate normal distributions

E() expected value operator

exp exponential function

F F-distribution

Γ gamma function

GY grain yield

↔ ⇔ if only if

Gi i-th group of a mixture of distributions

xiii

i.i.d. independent and identically distributed

IEN internal nitrogen use efficiency

L() log likelihood function

MLE maximum likelihood estimate

π mixing proportions

ψ mixture parameters

Mult multinomial distribution

MVN multivariate normal distribution

NU nitrogen uptake

g number of groups in a mixture

τij posterior probabilities of the j-th observation in Gi

pdf probability density function

∝ proportional

tr trace of a matrix

n sample size

Φ standard univariate normal cdf

ϕ standard univariate normal pdf

S straw yield

T student’s t-distribution

N univariate normal

σ variance

σx variance of a random variable X

Σ variance-covariance matrix

Σi

variance-covariance matrix of the i-th component of a

multivariate normal distributions

V ar() variance operator

θi vector of parameters of the i-th component

W Wishart distribution

xiv

List of Tables

1.1 Causes of N losses and associated environmental impacts. . . . . . . . . 3

1.2 Main Nitrogen Use Efficiency (NUE) indices. . . . . . . . . . . . . . . . 7

2.1 Conditions for the four scenarios of Fieller’s confidence sets . . . . . . . 38

B.1 Journals and their impact factor for studies in the review . . . . . . . . 131

xv

List of Figures

1.1 Nitrogen pathway from soil to grain . . . . . . . . . . . . . . . . . . . . 4

1.2 Distribution of the studies in the review by continents . . . . . . . . . . 11

1.3 Distribution of the studies in the review by the year of publication (left)

and the paper citation index (right). . . . . . . . . . . . . . . . . . . . 12

1.4 Typical scatter plots of grain yield and nitrogen uptake in wheat . . . . 13

2.1 Different shapes of the probability density function of the ratio of two

jointly normal variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.2 Probability density function of the ratio of two jointly normal variables

for different values of the coefficient of variation of the denominator (CVx) 32


for different values of the coefficient of variation of the numerator (CVy)

but same CVx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.4 Bootstrap distribution of the estimators defined in Eq. 2.11 (left) and

Eq. 2.12 (right) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.5 Feasible cases of Fieller’s confidence set of the ratio of expected values . 37

2.6 Construction of a wedge given the confidence interval of the ratio of

means (left) and vice versa (right) . . . . . . . . . . . . . . . . . . . . . 40

2.7 Confidence sets of the ratio of expected values in Von Luxburg & Franz

(2009). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

2.8 Grain yield versus nitrogen uptake from a sample of non-irrigated rice

in northeast Thailand (Naklang et al., 2006) . . . . . . . . . . . . . . . 43


with parameters given in Eq. 2.21 . . . . . . . . . . . . . . . . . . . . 44

xvii

2.10 Bootstrap distribution of the estimators defined in Eq. 2.11 (left) and

Eq. 2.12 (right) for the data in Fig. 2.8 . . . . . . . . . . . . . . . . . . 44

3.1 Solutions of three clusters obtained by the EM algorithm for the case

study (Chapter 4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.2 Spurious solution obtained by the EM algorithm when fitting 7 compo-

nents to the case study data (Chapter 4). . . . . . . . . . . . . . . . . . 50

3.3 Bimodal distribution generated from a mixture of three univariate nor-

mal components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

3.4 Scatter plot of a sample from a mixture two bivariate normal . . . . . . 58

E.1 Histogram of the internal nitrogen use efficiency data and the mixture

components found by the EM algorithm initiated from random starts . 144


components found by the EM algorithm initiated from a partition pro-

vided by the K-means algorithm . . . . . . . . . . . . . . . . . . . . . . 145


components found by the EM algorithm initiated on a random subsample147


components found after running several short runs of the EM algorithm 149

F.1 Cluster partition found by the EM algorithm initiated from random starts159

F.2 Cluster partition found by the EM algorithm initiated from the partition

obtained by the K-means algorithm . . . . . . . . . . . . . . . . . . . . 162

F.3 Cluster partition found by the EM algorithm initiated from simulated

means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

F.4 Cluster partition found by the EM algorithm initiated from the mixture

estimates obtained by running the EM algorithm on a random subsample

of 200 observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

F.5 Cluster partition found by the EM algorithm initiated from the mixture

estimates after running several short runs of the EM algorithm . . . . . 179

xviii

Chapter 1

Motivations and thesis outline

1.1 Feeding the world requires an efficient use of nitrogen

fertilisers

The world is experiencing a rapid increase of its population, from the current 7.2 bil-

lion to the expected 9.6 billion by 2050 (UN, 2013). This growth trajectory implies

an increase in the demand for cereals– for their direct consumption as the main staple

food, and indirectly, since the demand for meat is also increasing and cereals are used

for feeding animals (Cassman et al., 2002). According to the FAO (2009), to feed the

expected 9.6 billion of people will require producing one additional billion tonnes of

cereals per year. However, the grain production is constrained by environmental con-

ditions, especially by the availability of water and nitrogen (N) (Andrews et al., 2009).

Nitrogen is, after carbon (C), the major nutrient required by plants in largest quan-

tities (Marschner & Marschner, 2012, p.135). Plants take up N from soil, mainly as

nitrate and ammonium, to produce their proteins and nucleotides (Xu et al., 2012).

A fraction of this N is stored in the grain and removed from the field at harvest. If

additionally the straw is removed, the N depletion of the soil is greater.

To replace this nutrient and maintain the soil fertility status, N fertilisers are ap-

plied to provide plant nutrients in addition to the indigenous N in the soil. Nitrogen

fertilisers have been essential for the steady increase of the production of grain over

the last decades (Follett, 2001; Hirel et al., 2007), which explains the high demand for

1

this commodity (≈ . 169 Tg/year, (Smil, 1999)).

Not all the N applied to soil is taken up by plants. Although variable across coun-

tries (Table 5 in Bouwman et al. (2005)), it is estimated that more than half of the N

fertiliser applied is lost into the environment (Tilman et al., 2002). The losses of N are

due to the fact that nitrates are not easily retained by the soil matrix (Olarewaju et al.,

2009) and can be transported into water bodies through leaching and surface runoff

(Raun & Johnson, 1999). Additionally, N can be lost to the atmosphere through deni-

trification, ammonia volatilization and gaseous emission from leaves (Fageria & Baligar,

2005).

Nitrogen losses result in large economic, energy and environmental costs. The eco-

nomic loss is estimated 15.9 billion US dollars per year (Raun & Johnson, 1999). The

energy cost is related to the fertiliser manufacturing process (Haber-Bosch) in which

high pressure and temperature are needed to convert N2 in the atmosphere into am-

monia (NH3) (Galloway et al., 2003). Finally, the environmental impacts (Table 1.1)

are related to the pollution of soil, water and atmosphere (Ladha et al., 2005).

Minimising the losses of N is thus essential to satisfy the economic and popula-

tion growth needs without compromising the future of our planet. This is the base

of sustainable agriculture (Tilman et al., 2002), stated by Spiertz (2010) as the 3-P

principle, which consists of meeting ‘population’, ‘profit’ and ‘planet’ requirements.

To meet the 3-P principle, it is necessary to be efficient by achieving the maximum

possible yields with an optimum and responsible use of fertilisers and natural resources.

To quantify the N use efficiency, scientists have defined a variety of indices (Table 1.2)

and identified strategies to increase them (Section 1.3). However, the expected effects

of these strategies are not always achieved since the N utilisation depends on a myriad

of complex interactions between plant, soil, climate and microorganisms affecting the

physiological processes in plants (Figure 1.1).

2

Table 1.1: Causes of N losses and associated environmental impacts.

Cause of N loss Description Environmental impact

Denitrification Conversion of nitrate (NO−3 ) into N

gases (N2, NO, N2O)

Global warming and atmosphere

ozone depletion

Ammonia volatilization Conversion of urea (CO(NH2)2)

into N gases (NH3)

Acid rain, which leads to soil acidi-

fication.

Gaseous plant emission Release of NH3 from leaves Acid rain, which leads to soil acidi-

fication.

Leaching N is transported downward after ir-

rigation or strong precipitations

Groundwater contamination and

eutrophication∗

Surface runoff Water flows over the soil surface

and moves N into streams and lakes

Eutrophication

∗An excessive amount of nitrates in water results in a great increase of algae causing oxygen depletion

and putrefaction of the water.

Information has been compiled from Fageria & Baligar (2005); Ladha et al. (2005); Raun & Johnson

(1999).

3

Total amount of N in soil: indigenous N and external sources from organic and

inorganic fertilisers.

N in soil

N potentially taken up by plants. The N availability is subjected to a range

of factors such as the amount and source of N in soil, the availability of other

nutrients (e.g. C:N ratio), the activity of microorganisms (mineralization and

inmobilization processes), climate and soil characteristics (e.g. pH and moisture

content).

available N

N taken up by roots. N absorption depends on the availability of nutrients,

climate factors (e.g. availability of water), the crop demand for nutrients which

varies with the type of crop, the growth stage of the plant, the plant genotype

and the root distribution.

N absorbed

Concentration of N multiplied by the dry weight of plants. It differs from N

absorbed due to leaf senesce and fall, insect attacks, the root turnover and gaseous

plant emissions through the plant canopy.

N taken up

Concentration of N in grain multiplied by the grain yield. It can be affected by

temperature and drought stresses, variety, diseases etc.

N in grain

Figure 1.1: Nitrogen pathway from soil to grain. [Compiled from Fageria & Baligar (2005);

Marschner & Marschner (2012); Marschner (2013, pers. comm., 3 January); Xu et al. (2012)]

4

1.2 Nitrogen efficiency measures

Nitrogen in soil has to undergo different steps before being used in grain production

(Figure 1.1). To quantify the efficiency of N utilisation in these steps, plant and soil

scientists have suggested several Nitrogen Use Efficiency (NUE) indices. Table 1.2 dis-

plays the most common NUE indices found in plant and soil science literature.

Among various NUE indices, this thesis focuses on the statistical analysis of internal

N use efficiency, IEN (Eq. 1.1). This index measures the ability of plants to convert

N content in aboveground biomass into grain yield.

IEN =GY

NU(1.1)

where GY (kg/ha) is grain yield and NU (kg/ha) is N uptake – content of N in

aboveground plant parts. In field trials, NU is commonly measured (e.g. Naklang

et al., 2006) as follows1:

NU =m[N ]GGY + [N ]SS

1000(kg/ha) (1.2)

where [N ]G is N concentration in dry grain (g/kg), [N ]S is N concentration in straw

(g/kg), GY is grain yield (kg/ha), S is straw yield (kg/ha) and m is the standard

moisture correction factor, equal to 0.86.

Internal nitrogen use efficiency can be directly linked to another important agricul-

tural efficiency measure; the so-called harvest index, HI, (Eq. 1.3)

HI =GY

GY + S(1.3)

The pressure of the last decades to increase GY has made HI an important trait for

variety selection in plant breeding. Although HI emphasises the partition of carbon

(C) in plant (Sinclair, 1998), the importance of N for grain production induces a close

1Notice that no direct measures of NU are available. However, the fact that NU is a derived

variable from others is not investigated in this thesis.

5

association between HI and IEN (Sinclair, 1998). In particular, by Eq. 1.1, 1.2 and

1.3, the following relationship is derived:

IE−1N =

1

1000[m[N ]G + [N ]S

(HI−1 − 1

)]

6

Table 1.2: Main Nitrogen Use Efficiency (NUE) indices.

NUE index Definition Formula

Partial factor

productivity

(PFPN )

Ratio of the total grain yield to the level of N

fertiliser applied.PFPN =

GY

FN(kg kg−1)

Agronomic effi-

ciency (AEN )

Ratio of the increment in grain yield to the

level of N fertiliser applied. It differs from

PFPN in that it provides information about

the increase of grain due to fertilising.

AEN =(GY −G0)

FN(kg kg−1)

Recovery effi-

ciency (REN )

Ratio of the increment of N uptake to the level

of N fertiliser applied. It measures the ability

of plants to take up N from applied fertilisers.REN =

(NU −N0)

FN(kg kg−1)

Internal N use

efficiency (IEN )

Ratio of the total grain yield to the total N up-

take. It measures the ability of plants to utilise

N content in biomass for producing grain.IEN =

GY

NU(kg kg−1)

Physiological ef-

ficiency (PEN )

Ratio of the increment in grain yield to the

increment in N uptake due to fertilising. It

measures the ability of plants to utilise the in-

crement in N uptake due to fertilising to pro-

duce grain.

PEN =(GY −G0)

(NU −N0)(kg kg−1)

NUE: Nitrogen Use Efficiency. GY : grain yield. FN : level of fertiliser applied. G0: grain yield in

plots with no fertiliser (indigenous N only). NU : N uptake measured in aboveground biomass. N0:

N uptake measured in aboveground biomass in a plot with no fertiliser. Compiled from Dobermann

(2005).

7

1.3 Strategies for improving the uptake and utilisation of ni-

trogen by cereals

Maximising the production of grain and optimising the use of N fertiliser are major

priorities for soil and plant scientists. In the last 70 years, substantial effort has been

undertaken to develop strategies which enable an increase in the uptake and utilisation

of N for grain production (Evenson & Gollin, 2003). These strategies have been focused

on the improvement of agronomic practices and plant genotypes (Cassman et al., 2002;

Raun & Johnson, 1999; Spiertz, 2010; Tilman et al., 2002):

• Improvement of agronomic practices: better matching of the fertiliser ap-

plication to the crop demand, crop choice, crop rotation with legumes, utilisation

of slow release fertilisers, conservation tillage systems, cover crops with legumes,

utilisation of organic compost and green manures, water management and sowing

time management (Fageria & Baligar, 2005). These practices may differ depend-

ing on the climatic region. For instance, in arid regions the amount of N applied

is lower than in wetter regions because plants tend to grow less. Furthermore, in

wetter regions N loss via leaching or denitrification is high so more N is applied

although in small but frequent doses (Marschner 2014, pers. comm., 31 October)

.

• Improvement on plant genotypes: plant breeding or genetic engineering

(see Marschner & Marschner (2012, Section. 6.1.6)). At present, several fam-

ily of genes which can potentially increase NUE have been identified in rice

(Vinod & Heuer, 2012) and wheat (Foulkes et al., 2009; Hawkesford, 2014). How-

ever, these genotypes have been mostly tested in laboratory conditions and there

are not enough field studies to assess the genotype by environment interactions

(Hawkesford, 2014). The need of more research and the opposition of part of the

population to adopt genetically modified food makes this strategy a long-term

prospect.

8

Improving N efficiency has been also a priority at the Waite Campus where exten-

sive research across disciplines has been undertaken to develop strategies for increasing

the efficiency in nitrogen utilisation by cereals (e.g. Australian Centre for Plant Func-

tional Genomics, 2014; Waite Institute, 2014a; Waite Institute, 2014b). A comprehen-

sive discussion about the best approaches to increase the efficiency is beyond the scope

of this thesis. This thesis is focused on implementing new statistical methods which

can provide more insight into the conversion of NU into GY and how this conversion

is affected by environmental factors. The identification of such factors can help plant

and soil scientists to better understand the conversion process of NU into GY and to

develop specific management practices for increasing IEN .

With these objectives in mind, a literature search for current statistical methods

in the analysis of GY and NU in published agricultural research was carried out.

The objectives of this literature search were to determine: 1) the amount and format

of published GY and NU data, 2) typical trends between the two traits and 3) the

current most common statistical techniques employed to analyse these data.

1.4 Review of grain yield and nitrogen uptake analyses in agri-

cultural research

1.4.1 Studies selected for the review

1.4.1.1 Searching criteria

A selection of 100 studies (see Appendix A) from plant and soil science literature was

collected using the search engines of Web of Science and Google Scholar and employing

the following three searching criteria:

1. A combination of keywords such as yield, nitrogen, uptake, internal efficiency,

physiological efficiency, utilisation efficiency, rice or wheat were used for topic

searching e.g. yield and nitrogen and (rice or wheat). The searching was restricted

by research affiliations; in particular, to the International Rice Research Institute,

9

Indian Agricultural Research Institute and all the Australian affiliations (CSIRO,

University of Adelaide, etc.). These are world-wide recognised institutions in

cereal research; hence, a source of reliable data. Furthermore, studies performed

before 1980 were not included in the selection.

2. Articles which were referenced by the previously selected articles and articles

which cited them. These manuscripts were included to widen the searching frame-

work.

3. Studies containing both GY and NU data and performed in the field (farmers or

research stations fields).

1.4.1.2 Main features of the studies selected

In this section, I present the most important features of the studies evaluated in rela-

tion to: 1) journals in which the studies were published, 2) places where the studies

were conducted and 3) year of publication and manuscript citation index.

Most of the journals in which the studies were published belong to the category of

Agronomy followed by Agriculture Multidisciplinary (Table B.1). Within these cate-

gories, an impact factor close or higher than three is considered to be high. Therefore,

51% of the studies selected are considered to be published in high impact factor jour-

nals (Table B.1): Agriculture, Ecosystems and Environment (1%), Molecular Breeding

(1%), European Journal of Agronomy (8%), Plant and Soil (13%) and Field and Crops

Research (28%). These journals are placed in the top ten of their categories for a

total of 78 journals in Agronomy, 57 in Agriculture Multidisciplinary and 195 in Plant

Science.

Most of the studies selected were carried out in Asia (72%); mainly China (30%),

India (17%) and Philippines (10%). About 10%, 9%, 4% and 3% of the studies se-

lected were performed in Oceania, Europe, America and Africa, respectively. The 2%

of the studies were carried out in several countries belonging to, at least, two continents

10

(Others), see Figure 1.2. This distribution is explained by our searching criteria, which

gave priority to well-recognised Asian and Australasian institutions.

The majority of the studies were published in the last 8 years (Figure 1.3, left).

This partially explains why 31% of the manuscripts have been cited less than 3 times

and 59% have less than 20 cites (Figure 1.3, right)

Figure 1.2: Distribution of the studies in the review by continents. [Others refers to

studies which present experiments carried out in at least two countries from different continents.]

11

Figure 1.3: Distribution of the studies in the review by the year of publication (left)

and the paper citation index (right).

1.4.2 Amount and format of grain yield and nitrogen uptake

data

There is a vast amount of experimental data on GY and NU collected from designed

field trials. The data are mostly presented in: 1) tables, which display the means of

GY , NU or IEN across treatments, or 2) scatter plots in which GY is plotted against

NU . In the latter, least-square exponential or polynomial regressions are commonly

fitted to model the trend of GY on NU (Figure 1.4, left). Alternatively, two straight

envelope lines delimiting the points of the scatter plot are displayed (Figure 1.4, right)

1.4.3 Relationship between grain yield and nitrogen uptake

The trend of GY on NU usually follows a saturation curve. At low levels of NU , a small

increment of the latter sharply increases GY ; however, this effect flattens out at high

levels of NU (Figure 1.4, left). This pattern has been shown in Dobermann & Cassman

(2002); Naklang et al. (2006); Witt et al. (1999). This trend shape is due to the fact

that low levels of the N content in the plant limit GY (Xu et al., 2012). However, once

12

Figure 1.4: Typical scatter plots of grain yield and nitrogen uptake in wheat. [A

curvilinear regression has been fitted in the graph to the left (Takahashi et al., 2007). The open and

close points refer to plots with or without compost application, respectively. In the graph to the right,

two lines delimiting the points of the scatter plot are presented (Liu et al., 2006). The circles and

triangles refer to plots with and without fertiliser application, respectively]

the plant has accumulated enough N, other factors (e.g. other major or minor nutrients,

water or low rate of photosynthesis (Foulkes et al., 2009)) may be more limiting than

N . The flattening effect is also produced because there is a maximum quantity of grain

that plants can produce (Fowler, 2003; Otteson et al., 2007). If the amount of N in

plant is excessive, N may have a negative effect decreasing the grain production (Goyal

& Huffaker, 1984, Chapter. 6). Therefore, there is a change in the correlation between

both variables GY and NU (Figure 1.4, left). The correlation reveals how N taken

up is utilised for grain, and it changes depending on the environmental conditions,

agronomy practices and plant genotypes. We will return to this issue later in Chapter

3 and 4.

1.4.4 Common methods of analysis of grain yield and nitrogen

uptake field data and their limitations

The current most common techniques to analyse GY and NU data in agricultural

science journals can be divided into 1) statistical techniques and 2) biological quantita-

13

tive models. The limitations of these techniques to provide additional understanding of

the conversion of NU into GY are discussed below. The standard statistical software

packages are also detailed.

Statistical techniques:

• Least square regressions

This technique is used to model the trend of GY on NU . Around 16% of the stud-

ies in our literature selection used simple least-square polynomial or exponential

regressions. The main objective is to check if both variables are related and the

type of relationship they have e.g. linear, curvilinear, linear-plateau. However,

it is not clear if the observed scatter (e.g. Figure 1.4) is a direct functional of

the response of GY to NU or/and an overlay of growth processes across different

conditions. Furthermore, the least-square regressions of GY on NU ignore the

external factors affecting the relationship between both variables which are of

great interest to farmers and have to be taken into account to understand the

utilisation of NU for GY . In addition, one of the necessary conditions for simple

least-square regressions is that the variance along the fitted line is constant (ho-

mocedasticity of the error). However, as observed in Figure 1.4 (left hand side),

this requirement is not necessarily fulfilled.

• Marginal analyses on GY and NU data

These analyses are mainly used to test for differences between the means of GY

and NU across experimental treatments. Around 80% of the studies in our

selection performed marginal analyses of GY and NU . In the majority of these

studies the technique used is the univariate analysis of variance followed by a

post-hoc analysis, such as the Least Significant Difference test or the Duncan’s

Mean Range test. However, such analyses ignore the joint behaviour of the

variables, which poses a serious limitation on getting an in-depth understanding

of the conversion of NU into GY .

14

• Analyses on IEN

These analyses are mainly applied to test if there is any statistical difference

between the means of the ratio across treatments. Internal N use efficiency is

computed at each sampling unit (at which NU and GY are measured) and com-

monly analysed by the univariate analysis of variance. Since the distribution

of the ratio of two jointly normal variables is a mixture2 of non-normal heavy-

tailed distributions (Marsaglia, 1965, 2006), the analysis of the ratio may violate

the requirements for non-abnormality and homogeneity of error variances; thus,

inferential conclusions may not be reliable. Around 40% of the studies in our

literature selection carried out analyses on the ratio. Just four of them (Albrizio

et al., 2010; Giambalvo et al., 2010; Liu et al., 2011; Tetard-Jones et al., 2013)

stated to have checked the deviation from the normality assumption and three

(Albrizio et al., 2010; Giambalvo et al., 2010; Liu et al., 2011) the homogeneity

of error variances.

Analyses of variance were mostly employed to assess the effect of different fertiliser

treatments on GY , NU or IEN . However, N utilisation for grain production depends

on the available N in soil rather than on the amount of N applied (Figure 1.1). The

fraction of available N can substantially differ from the applied N due to complex envi-

ronmental interactions between climate, soil and plant (Marschner & Marschner, 2012,

p.315). Thus, plots receiving the same fertiliser treatment may present different levels

of available N, resulting in non-uniform realisations of treatments (ill-defined). Thus,

special care is required when designing such field trials as well as a high control on

the agronomy practices to ensure treatments applications are as consistent as possi-

ble. In addition, and in order to better understand the interactions of these fertiliser

treatments with non-controlled environmental conditions in the field, complementary

analyses to the ones commonly used (e.g. linear mixed models) may be required.

Biological quantitative models:

Biological models like Quantitative Evaluation of Fertility Tropical Soil (QUEFTS)

2see Chapter 3

15

(Janssen et al., 1990) or crop growth simulations models (see Cassman et al. (1998,

Section. 6.5) for a review) describe internal processes in plants and are based on the-

oretical equations derived from physical principles. Statistics is used to validate such

models. Consequently, these models are deterministic and beyond the scope of this

thesis. In our literature selection, 8% of the studies used the QUEFTS model and 3%3

employed crop growth simulation models.

Statistical software

The statistical software packages used to analyse GY and NU together with the per-

centage of studies in our literature search employing them are listed as follows. Sta-

tistical Analysis System, SAS, (34%) (SAS Institute, 2013), SPSS (8%) (SPSS, 1997),

Genstat (6%) (VSN International, 2012), IRRISTAT (5%) (IRRI, 1994), Excel (2%)

(Excel, 2010), R (2%)(R Core Team, 2012) and Others (2%). The rest (41 %) of the

studies did not provide any details on the statistical sofware used. This percentage

corresponds to studies which developed biological quantitative models (11%) or did

not mention the software employed (30%).

1.5 Objectives of the thesis

Statistical techniques currently used in plant and soil science publications to model GY

and NU data have serious limitations in contributing to a fundamental understanding

of the conversion process of NU into GY . In this thesis, I argue that a better approach

is to analyse GY and NU data jointly with bivariate analyses. For instance, bivari-

ate linear mixed models on (GY,NU) are more appropriate for comparing treatments

effects than univariate linear models on GY , NU or IEN . Recently, Ganesalingam

et al. (2013) proposed bivariate mixed models as an alternative to univariate mixed

model on a ratio for the analysis of plant survival data. As they showed, the bivariate

approach preserved the information on the original variables, allowed modelling the

spatial correlation for each trait, better utilised the experimental data and increased

3The percentages presented in this section does not sum up to 100% because some studies performed

more than one type of analysis

16

the accuracy of predictions for variety survival.

In designed field experiments, both GY and NU can be greatly affected by non-

controlled environmental conditions. These environmental conditions may often be

confounded with designed factors or interact with them resulting in non-uniform treat-

ments. This may complicate the interpretation of bivariate mixed model analyses. Our

hypothesis is that GY and NU field data collected across a range of environments can

reflect a heterogeneous population composed by subpopulations. Each of these sub-

populations (clusters) can be considered a separate environment. The identification

of such clusters and the close inspection of potential factors defining them can shed

additional light on the nitrogen utilisation process.

Finite mixture models of bivariate Gaussian distributions can be used to identify

such clusters and are proposed here as a complementary analysis to bivariate mixed

models for data collected in field experiments. The main benefits envisaged for the

analysis of the internal N efficiency traits are stated as follows. This approach 1)

preserves information on GY and NU , 2) acknowledges the change in the correlation

between GY and NU across environments (clusters), 3) avoids dealing with the ratio

distribution, 4) allows estimation of IEN within each cluster 5) provides additional

insight into potential environmental conditions affecting the mechanism of the N utili-

sation for grain production. In terms of agronomic studies, the identification of clusters

is useful to determine samples which belong to the same environment. The identifi-

cation of the factors defining each of the environments could be useful to implement

agronomic practices which counteract potential adverse environmental conditions. For

instance, the lack of rain at a certain growth stage of the plant could be identified as

one of the potential factors affecting N utilisation. To counteract drought stress in

non-irrigated systems, farmers could cover the crops with straw to retain the moisture

of the soil. The identification of these environmental conditions is also important to

improve further experimental designs or to include them in further statistical models.

For instance, to predict the best sowing time according to the local conditions of the

17

field.

The objective of this thesis is to investigate the benefits of the bivariate mixture

methodology as a complementary analysis of bivariate mixed models for field trials

in the presence of strong environmental conditions. At this current stage, mixture

models should not be applied alone due to the fact that GY and NU data are mostly

collected from designed field trials. Data from such trials may violate the assumption of

independence of mixture models4. Despite this limitation, I believe that the technique

can be useful for identifying environmental factors which may overshadow, or interact

with, treatment effects. Thus, the combination of bivariate mixed and mixture models

can provide a more insightful interpretation of the conversion process of NU into GY

in field trials.

1.6 Thesis outline

In this chapter I have presented the biological background and the motivation for this

project. The remainder of the thesis is presented in four chapters.

Chapter 2 presents the distributional properties of ratios of jointly normal variables.

The objective of this chapter is to draw the attention of the reader to the pitfalls of

analysing ratios with simple statistical techniques. The expressions of the probability

density function (pdf) of these types of ratios are reviewed highlighting the mixture

nature and possible heavy-tailedness of their distribution. I also revise the main esti-

mators of the ratio of expected values and properties. Finally, analytical and graphical

procedures for calculating confidence sets of the ratio of expected values are reviewed.

Chapter 3 exposes the theoretical fundamentals of finite mixture models of bivari-

ate Gaussian distributions and provides the theoretical framework of the methodology

of this thesis. I revise the frequentist and Bayesian approaches for mixtures as well as

the difficulties encountered when fitting mixture models of Gaussian distributions with

heteroscedastic components and common recommendations for overcoming them.

4See Chapter 3

18

In Chapter 4, the methodology of bivariate mixture models together with bivariate

mixed models for the analysis of (NU , GY ) is demonstrated for a field study reported

in Naklang et al. (2006). The benefits of the bivariate analyses (mixture and mixed

models) are discussed in comparison with their respective univariate counterparts on

IEN . This chapter is presented as a manuscript, following the required format for

submission to the Australian and New Zealand Journal of Statistics.

Finally, Chapter 5 presents the general conclusions of the thesis. I argue that

ideally the methodology of bivariate mixture models should be applied in field surveys

to fully exploit the potential of the technique as a means for identifying environmental

factors affecting NU and GY . Simulation studies to assess the coverage of Fieller’s

confidence intervals (Fieller, 1954) of the ratio of expected values for each component

of the mixture is proposed as a future line of research. Other research gaps identified

during the development of the present project are discussed.

19

Chapter 2

Ratios of jointly normal variables

2.1 Introduction to the ratio of jointly normal variables

Ratios are commonly used in agricultural research to quantify one variable with respect

to another. For instance, the efficiency of N utilisation by plants is quantified by sev-

eral indices (Table 1.2). However, researchers in agriculture and biometricians are not

always aware of the distributional properties of ratios and choose to apply simple sta-

tistical methods, which assume normality and homocedasticity of error variance, on the

ratio observations. Violations of these assumptions may lead to non-reliable inferences.

A ratio can take atypical large values if its denominator goes close to zero. This

results in higher probabilities of having outliers and heavy-tailedness. In particular, if

(X, Y ) are jointly normal, the probability density function (pdf) of the ratio R = Y/X

is a mixture1 of two heavy-tailed distributions, one of the components being Cauchy

distributed (Marsaglia, 1965, 2006)2.

The presence of a Cauchy component in the pdf of R results in the non-existence

of the expected value, E(Y/X), and the variance, V ar(Y/X). Since E(Y/X) does not

exist, E(Y )/E(X) is used for inference purposes (e.g. Lai et al., 2004). The confidence

sets of E(Y )/E(X) comprise of asymmetric bounded intervals, unbounded intervals or

the entire real line (Fieller, 1954; Von Luxburg & Franz, 2009). These solutions are

1See Chapter 32Further details on the study of Marsaglia (1965) were given in Marsaglia (2006)

21

derived in the following situations (Von Luxburg & Franz, 2009):

• The denominator is far from zero. Then, there are no problems in computing the

ratio and the confidence sets of E(Y )/E(X) are bounded intervals.

• The numerator is far from zero but the denominator is not. Then, the denomina-

tor can take positive or negative values, and E(Y )/E(X) can result in arbitrarily

large values of unknown sign. In this case, the confidence sets are in the form of

]−∞, q1] ∪ [q2,∞[ with q1 < 0 and q2 > 0. The only possibility for E(Y )/E(X)

to be in ]q1, q2[ is by having a small numerator or a large denominator, which is

not possible in this assumed situation.

• The denominator and the numerator can both take values close to zero. Then,

we have an indeterminacy of the type 0/0 and E(Y )/E(X) can take any value.

Thus, the confidence set is the entire real line.

This chapter revises the distributional properties of the ratio R, where (X, Y ) is

a jointly normally distributed (BV N) variable, with the following expected value (µ)

and variance-covariance(Σ):

µ = (µx, µy); Σ =

σ2

x σxy

σxy σ2y

(2.1)

The properties that will be reviewed in this chapter are readily applied to the IEN

distribution. Recall from Section 1.2 that IEN is defined as the ratio of GY to NU .

In field trials, GY and NU are measured at harvest and cumulatively on each experi-

mental plot. Conditional on major factors and assuming that the plant measurements

are weakly correlated, it can be assumed that at the plot level both GY and NU are

normally distributed (see Central Limit Theorem in Cramer, 1946, p. 219). Under

the same assumptions, the Central Limit Theorem can be extended to the bivariate

case and one can consider (NU , GY ) to be jointly normal (see Cramer, 1946, p. 286).

The chapter is structured as follows. Firstly, I present the main results regarding

the pdf of R and review the cases for which this distribution is approximately normal.

22

Then, I discuss the most typical estimators of E(Y )/E(X) and their performance.

Finally, I review analytical and graphical procedures (Fieller’s rule and a geometric

approach) for deriving the confidence sets of E(Y )/E(X).

2.2 Distribution of the ratio: history and properties

2.2.1 Geary (1930) and Fieller (1932) expressions of the pdf

The research on ratios of two jointly normal variables goes back to the 1930s when

Geary (1930) formulated the analytical expression of the pdf of R for the particular

case of µ = (0, 0).

fR(r) =σxσy

√1− ρ2

π(σ2xr

2 − 2ρσxσyr + σ2y)

(2.2)

where ρ is the correlation coefficient.

Since then, several authors have provided an expression of the pdf of R without re-

strictions on the means. The first to suggest a solution with µ being an arbitrary real

vector was Fieller (1932). The solution in Fieller (1932) was re-expressed in Hinkley

(1969):

fR(r) =b(r)d(r)√

2πσyσxa3(r)

[Φ b(r)√

(1− ρ2)a(r) − Φ− b(r)√

(1− ρ2)a(r)]

+

√1− ρ2

πσxσya2(r)exp− c

2(1− ρ2) (2.3)

where:

a(r) =

√r2

σ2y

− 2ρr

σxσy+

1

σ2x

b(r) =µyr

σ2y

− ρ(µy + µxr)

σxσy+µxσ2x

c =µ2y

σ2y

− 2ρµxµyσxσy

+µ2x

σ2x

d(r) = exp b2(r)− ca2(r)

2(1− ρ2)a2(r)

23

and Φ and exp are the cumulative distribution function (cdf) of the standard uni-

variate normal and the exponential function, respectively.

Fieller (1932), Geary (1930) and Hinkley (1969) derive the pdf of R by performing

the following change of variable:

X = X and Y = RX,

fX,R(x, r) = φX,Y (x, rx|µ, Σ)|x|.

The marginal density fR(r) is calculated by integrating fX,R with respect to x. Notice

that if µx = µy = 0, Eq. 2.3 reduces to Eq. 2.2. Furthermore, if ρ = 0 and σx = σy, Eq.

2.2 is the standard Cauchy distribution (Pham-Gia et al., 2006). The latter example

shows how much the distribution of R can differ from a normal.

The expressions of the pdf of R in Fieller (1932) and Hinkley (1969) are complex

and do not give much insight into its distributional shape. In 1965, Marsaglia (1965,

2006) expresses this pdf as a mixture of two heavy-tailed distributions. Marsaglia’s ex-

pression provides more intuition about the shape of the distribution, which can differ

greatly from the normal and can present skewness, bimodality and heavy tails.

2.2.2 Marsaglia (1965, 2006) expression of the pdf

The motivation problem for Marsaglia (1965) was a regression analysis to model the

number of red cells (y) in blood against time (r), y = α + βr. The r-intercept

(r = −α/β) was used to estimate the red cell life span. Thus, the r-intercept dis-

tribution and its expected value were of medical interest to detect anomalies related

to patients with the red cell life span shorter than expected. For modelling purposes,

Marsaglia (1965) considered the r-intercept to be the ratio of two normal variables.

Marsaglia (1965, 2006) proved that R can always be transformed, by a translation

and a change of scale, into a ratio of the form T =a+ U

b+ V, where U and V are

24

independent standard normal variables and a and b are non-negative constants. Thus,

instead of R, it is sufficient to study the ratios of the form T =a+ U

b+ V. The results in

Marsaglia (1965, 2006) are summarised in Theorem 1 and 2.

Theorem 1. Let (X, Y ) ∼ BV N(µ,Σ), Eq. 2.1. Then, ∃ s, w, a and b such that:

w

(Y

X− s)

= w

(Y − sXX

)∼ a+ U

b+ V↔ Y

X∼ 1

w

(a+ U

b+ V

)+ s

where U and V are two independent standard normal variables. The values of w, s, a

and b are given by

s = ρσy/σx w =σx

∓σy√

1− ρ2(2.4)

a = ∓µy/σy − ρµx/σx√1− ρ2

b = µx/σx (2.5)

Proof. (Sketch) Firstly, the value of s (translation) is selected such that (Y − sX) /X

is the ratio of two independent normal variables. Therefore,

E((Y − sX)X) = E(Y − sX)E(X)↔ E(Y X)− sE(X2) = E(Y − sX)E(X)↔

ρσxσy + µxµy − s(σ2x + µ2

x) = µxµy − sµ2x ↔ ρσxσy − sσ2

x = 0↔ s =ρσyσx

Secondly, the value of w (change of scale) is chosen such that Y − sX and X have the

standard deviations equal to 1. Thus, w = ±√V ar(X)/

√V ar(Y − sX). Finally, the

choice of a and b is straightforward by considering that U ∼ N(0, 1)↔ a+U ∼ N(a, 1),

where N denotes the univariate normal distribution. Consequently, a and b are chosen

to be the expected value of (Y − sX)/√V ar(Y − sX) and X/

√V ar(X), respectively.

For more details on the proof, refer to Marsaglia (2006).

Theorem 2. The pdf of the ratio T =a+ U

b+ Vis a mixture of two heavy-tailed distri-

butions given by:

f(t) = pf1(t) + (1− p)f2(t) (2.6)

where:

f1(t) =1

π(1 + t2)and f2(t) =

q∫ q

0exp−x2−q2

2dx

π(1 + t2)(expa2+b2

2 − 1)

with:

p = exp−(a2 + b2)

2 and q =

b+ at√1 + t2

25

Proof. (Sketch) The cdf of T =a+ U

b+ V, F (), is related to the function of Nicholson

(Nicholson, 1943), denoted here by H() as follows:

F (t) =1

2+

1

πtan−1(t) + 2H

(bt− a√1 + t2

,b+ at√1 + t2

)− 2H(b, a) (2.7)

where:

H(q, h) =

∫ q

0

∫ hx/q

0

ϕ(x)ϕ(y)dydx

and ϕ the standard normal pdf.

Then, the pdf of T =a+ U

b+ Vis obtained by taking the derivative of F (t) (Eq. 2.7) with

respect to t. For more details, refer to Marsaglia (1965).

The pdf f(t) (Eq. 2.6) can substantially differ from a normal. Notice that f1(t)

is the Cauchy. Thus, the moments of f(t) do not exist. However, if b > 4, Marsaglia

(2006) provides approximate values of the mean (µ) and the variance (σ2) for practical

purposes:

µ = a/(1.01b− 0.2713) σ2 = (a2 + 1)/(b2 + 0.108b− 3.795)− µ2

The pdf f(t) can be skewed or even bimodal depending on the values of a and b. Once

the pdf of T is obtained, it is straightforward to calculate the pdf of R by performing

the change of variable t = w(r − s), which leads to fR(r) = |w|fT (w(r − s)). An

illustration of possible shapes of the pdf of R is shown in Figure 2.1.

26

Figure 2.1: Different shapes of the probability density function of the ratio of two

jointly normal variables. [The parameters used are (µx, µy, σx, σy, ρ), (2, 38, 8, 24, 0.16) (left);

(2, 20, 8, 24, 0.66) (middle) and (30, 31, 4, 5, 0.8) (right)]

2.2.3 Pham-Gia et al. (2006) expression of the pdf

The pdf of R has been studied for the last 80 years and it is still an active field of

research. Recently, Pham-Gia et al. (2006) have provided a closed form of the pdf of R

in terms of Hermite functions. The pdf is firstly formulated for ratios of independent

normal variables (Theorem 3). Then, by using Theorem 1, it is possible to derive the

pdf of the ratio of two dependent jointly normal variables (Theorem 4).

Theorem 3. Let (X, Y ) be a random vector of two independent normal variables with

parameters as in Eq. 2.1 with σxy = 0. The pdf of R = Y/X is given by:

fR(r) =C1

σ2y + r2σ2

x

[H−2(s(r)) +H−2(−s(r))] (2.8)

where:

C1 =σxσyπ

exp−1

2

(µ2y

σ2y

+µ2x

σ2x

)

s(r) =1

σxσy

σ2xµyr + σ2

yµx√2(σ2

xr2 + σ2

y)

H−2(z) =

∫ ∞

0

t exp−t2 − 2tzdt; a particular type of Hermite function

27

Proof. Firstly, the change of variable X = X and Y = RX is performed. Then,

fX,R(x, r) = |x|fX(x)fY (rx), where fX and fY are pdfs of univariate normal distribu-

tions. Secondly, fX,R is reparametrized taking ε1 = (2σ2x)−1, η1 = −µx/σ2

x, ε2 = (2σ2y)−1

and η2 = −µy/σ2y , which yields the following expressions:

fX(x) =1√

2πσ2x

exp−x2 − 2xµx + µ2

x

2σ2x

=

√ε1π

exp−ε1x2 − xη1 exp−η21

4ε1

fY (rx) =1√

2πσ2y

exp−r2x2 − 2xrµy + µ2

y

2σ2y

=

√ε2π

exp−ε2r2x2 − xrη2 exp−η22

4ε2

fX,R(x, r) = fX(x)fY (rx) = |x|√ε1ε2π

exp−η21

4ε1+−η2

2

4ε2 exp−(ε1 + ε2r

2)x2 − x(η1 + η2r)

= |x|K exp−(ε1 + ε2r2)x2 − x(η1 + η2r)

where:

K =

√ε1ε2π

exp−η21

4ε1+−η2

2

4ε2

The next step is to integrate fX,R with respect to x.

fR(r) =

∫ ∞

−∞|x|fX(x)fY (rx)dx

=

∫ 0

−∞−xfX(x)fY (rx)dx+

∫ ∞

0

xfX(x)fY (rx)dx

=

∫ ∞

0

xfX(−x)fY (−rx)dx+

∫ ∞

0

xfX(x)fY (rx)dx

=

∫ ∞

0

xK exp−(ε1 + ε2r2)x2 + x(η1 + η2r)dx

+

∫ ∞

0

xK exp−(ε1 + ε2r2)x2 − x(η1 + η2r)dx

Taking t = x√

(ε1 + ε2r2), yields:

fR(r) =

∫ ∞

0

Kt

ε1 + ε2r2exp−t2 + 2t

(η1 + η2r)

2√ε1 + ε2r2

dt

+

∫ ∞

0

Kt

ε1 + ε2r2exp−t2 − 2t

(η1 + η2r)

2√ε1 + ε2r2

dt

Considering the definition of H−2(z), it becomes evident that:

fR(r) =K

ε1 + ε2r2[H−2(−s(r)) +H−2(s(r))]

28

with:

s(r) =1

2

η1 + rη2√ε1 + r2ε2

Substituting ε1, ε2, η1 and η2 by their expressions in terms of σx, σy, µx and µy, the

result follows (Pham-Gia et al., 2006).

Theorem 4. Let (X, Y ) ∼ BV N(µ,Σ), Eq. 2.1. The pdf of R=Y/X is given by:

fR(r) =C2

σ2xr

2 − 2ρσxσyr + σ2y

[H−2(l(r)) +H−2(−l(r))]

where:

C2 =σxσy

√1− ρ2

πexp−σ

2xµ

2y − 2ρσxσyµxµy + µ2

xσ2y

2(1− ρ2)σ2xσ

2y

l(r) =[−σ2

xµyr + ρσyσx(µxr + µy)− µxσ2y]√

2σ2xσ

2y(1− ρ2)(σ2

xr2 − 2ρσyσxr + σ2

y)

Proof. (Sketch) Pham-Gia et al. (2006) suggested to follow the same steps as in The-

orem 3. Alternatively, the proof can be done considering T =a+ U

b+ V(see Theorem 1)

and its pdf given in Theorem 3:

fT (t) =1

π(1 + t2)exp−a

2 + b2

2(H−2(

at+ b√2(t2 + 1)

) +H−2(− at+ b√2(t2 + 1)

)

)

By Theorem 1, R is distributed as T/w + s, with w and s given in Eq. 2.4. Then,

performing the change of variable t = (r − s)w, the pdf of R is given by:

fR(r) = |w|fT [(r − s)w]

= |w| exp−a2+b2

2

π(1 + (r − s)2w2)

(H−2(

a(r − s)w + b√2((r − s)2w2 + 1)

) +H−2(− a(r − s)w + b√2((r − s)2w2 + 1)

)

)

Substituting a, b, s and w by their expressions with respect to µx, µy, σx, σy and ρ,

the result follows.

It can be shown that the pdf in Eq. 2.8 is equivalent to the one in Eq. 2.6 (see

Appendix C). Furthermore, Lemma 1 in Pham-Gia et al. (2006) proves that H−2(z) +

H−2(−z) = F1(1, 1/2, z2), where F1(α, γ, z) is defined as:

F1(α, γ, z) =∞∑

k=0

(α, k)

(γ, k)

zk

k!γ 6= 0,−1,−2, . . . (2.9)

29

and (α, k) = α(α + 1) . . . (α + k − 1).

The advantage of expressing the pdf of T =a+ U

b+ Vas fT (t) =

C1

1 + t2F1(1, 1/2, s2(t))

with C1 =exp−(a2 + b2)/2

πand s(t) =

at+ b√2(t2 + 1)

is that it allows one to calculate

analytically the first derivative of fT (t) and study its sign for different positive values

of a and b. By doing so, Pham-Gia et al. (2006) obtained the following results:

1. fT (t) always has a mode in ]0, a/b]

2. If a = 0, or b = 0 and a ≤ 1, fT (t) is unimodal

3. If b = 0 and a > 1, fT (t) is bimodal with symmetric modes.

4. If b > 0 and a > 0, fT (t) can be bimodal or unimodal. In the case of bimodality,

the second mode belongs to ]−∞,−b/a[.

For further details, refer to Section 5 in Pham-Gia et al. (2006).

2.3 Normal approximation of the pdf of the ratio

The shape of the pdf of R can greatly differ from the normal in terms of skewness,

bimodality or/and heavy tails (see Figure 2.1). However, if the coefficient of variation of

the denominator (CVx = σx/µx) tends to zero, the cdf of R, F (r), can be approximated

by the cdf of a normal (Fieller, 1932; Hinkley, 1969) as in Eq. 2.10.

|F (r)− Φµxr − µyσyσxa(r)

| ≤ Φ−µxσx (2.10)

with a(r) defined in Eq. 2.3. This is illustrated in Figure 2.2. However, the limit-

ing value of CVx required for having a satisfactory normal approximation cannot be

determined exactly because it depends on CVy and ρ (Shanmugalingam, 1982). For

instance, Figure 2.3 displays two different shapes of the pdf of R with the same value of

CVx (0.13) and two different values of CVy (0.12 and 0.33). The ratio with CVy = 0.12

has an approximate normal shape whereas the ratio with CVy = 0.33 presents negative

skewness.

30

Marsaglia (2006) provides a list of specific conditions for the normal approximation

of the pdf of R. Marsaglia (2006) suggests that only when a < 2.25 and b > 4 (Eq.

2.5), the ratios of the form T =a+ U

b+ Vare approximately normal (see Figure 2.1, c).

The study in Pham-Gia et al. (2006) is focused on discerning the cases for which

the pdf of R is unimodal or bimodal, rather than on investigating the goodness of its

approximation by a normal distribution.

31

Figure 2.2: Probability density function of the ratio of two jointly normal variables for

different values of the coefficient of variation of the denominator (CVx)

Figure 2.3: Probability density function of the ratio of two jointly normal variables for

different values of the coefficient of variation of the numerator (CVy) but same CVx

32

2.4 Estimators of the ratio

2.4.1 Point estimators: average of ratios, ratio of averages

As previously stated, the Cauchy component in the distribution of R leads to the

non-existence of E(Y/X) (Marsaglia, 1965, 2006). Instead, E(Y )/E(X) is usually

considered for inference purposes (e.g. Lai et al., 2004). In particular, in agriculture,

the two main estimators of E(Y )/E(X) are (Qiao et al., 2006):

RA =1

n

n∑

i=1

yixi

(2.11)

RW = y/x =n∑

i=1

yi/

n∑

i=1

xi (2.12)

The estimator RA is greatly affected by atypical large values of yi/xi, produced when

xi is close to zero. The considerable variation of RA in comparison to RW is illustrated

in Figure 2.4, which shows the distribution of 60 bootstrap samples of size 100 for RA

(left) and RW (right). Each bootstrap sample was generated from a bivariate normal

with µx = µy = 2, σx = σy = 1 and ρ = 0.7.

Figure 2.4: Bootstrap distribution of the estimators defined in Eq. 2.11 (left) and Eq.

2.12 (right). [Sixty bootstrap samples of size 100 with µx = µy = 2, σx = σy = 1 and ρ = 0.7.]

33

Notice that if (xi, yi), i = 1 . . . n are i.i.d. according to BV N(µ,Σ), Eq. 2.1, RA

and RW are the sum of ratios of jointly normal variables and the ratio of jointly normal

variables, respectively. By Marsaglia (1965, 2006), the expected values and the vari-

ances of RA and RW do not exist and no direct comparison of their bias and efficiency

can be made. Although theoretically neither the expected value nor the variance of

both estimators exist, for practical purposes, RW is suggested as less sensitive to values

xi → 0 which also results in a less variable estimator in comparison to RW (see Figure

2.4 and Qiao et al. (2006)).

The performance of RA and RW to estimate E(Y )/E(X) when X and Y are in-

dependent and normally distributed is studied in Qiao et al. (2006). Their numerical

experiments showed that RA and RW can be used when CVX = σx/µx < 0.2 and

CVX =σx/√n

µx< 0.2, respectively. The advantage of using RW is that CVX =

σx/√n

µx

depends on the sample size (n). Thus, by choosing n >25σ2

x

µ2x

, CVX can always be

made less than 0.2 (Qiao et al., 2006). Similar numerical experiments to the ones in

Qiao et al. (2006) could be carried out when ρ 6= 0, but for the best of our knowledge

this has not been done yet.

The estimation of ratios is also of interest in sample surveys. If it is reasonable

to assume a linear relationship (Y = RX) between Y , the variable of interest, and

X, an auxiliary variable, we can estimate the mean or the total of Y with greater

precision by taking advantage of the correlation between both variables. The use of

ratio estimators in survey problems is discussed in Chapter 6 of Cochran (1977). In

particular, we refer to p. 175 for a discussion of the bias correction of RA when used to

estimate E(Y )/E(X) and E(X) is known. However, the estimation of ratios in sample

surveys is quite different to the objective of this thesis where we are interested in the

estimation of the ratio per se rather than in Y .

34

2.4.2 Confidence sets of the ratio of expected values

In this section I review two equivalent procedures for calculating the confidence sets

of E(Y )/E(X). The first procedure was derived by Fieller (1954) and is the most

common among statisticians (Jarret 2012, pers. comm., December). The second one

was derived in Von Luxburg & Franz (2009) and provides a geometrical insight to

Fieller’s rule.

2.4.2.1 Fieller’s Theorem

Theorem 5. Let (X, Y ) ∼ BV N(µ,Σ), Eq. 2.1. The exact confidence set (S) of

α = E(Y )/E(X) has three possible configurations: 1) completely unbounded confidence

sets, 2) exclusive unbounded confidence sets and 3) bounded intervals (Fieller, 1954).

Proof. Here we develop the proof given in Casella & Berger (2002, p.464) in detail.

Let the paired variable (Xi, Yi), i = 1 . . . n, be independent and identically distributed

(i.i.d.) according to BV N(µ,Σ), Eq. 2.1. Let us define θ = µy − αµx. It is easy to

verify that θ is normally distributed with:

E(θ) =E(µy)− αE(µx) = µy −µyµxµx = 0

V ar(θ) =V ar(µy) + α2V ar(µx)− 2αCov(µy, µx) = (σ2y + α2σ2

x − 2ασxy)/n

where Cov denotes the covariance.

By applying the two well-known results given by Eq. 2.13 and 2.14 (see Wackerly et al.,

1996, p.294-297), we get Eq.2.15.

(n− 1)V ar(θ)

V ar(θ)∼ χ2

n−1 (2.13)

N(0, 1)√χ2n−1/(n− 1)

∼ Tn−1 (2.14)

θ/√V ar(θ)√

[(n− 1)V ar(θ)/V ar(θ)]/(n− 1)=

θ√V ar(θ)

=µy − αµx√

(σ2y + α2σ2

x − 2ασxy)/n∼ Tn−1

(2.15)

35

where χ2n−1 and Tn−1 refer to a chi-square distribution and a Student’s t-distribution

with n− 1 degrees of freedom, respectively.

By the Eq. 2.15, the exact confidence set of α is calculated to satisfy Eq. 2.16

(µy − αµx)2

(σ2y + α2σ2

x − 2ασxy)/n≤ t2n−1, 1−γ/2 (2.16)

where t2n−1, 1−γ/2 is the 1−γ/2 quantile of Tn−1. For convenience, we refer to t2n−1, 1−γ/2

as t2, further in the text. Inequality 2.16 can be rewritten as:

µ2y + α2µ2

x − 2αµxµy − t2(σ2y + α2σ2

x − 2ασxy)/n ≤ 0

(µ2x − t2σ2

x/n)α2 + (−2µxµy + 2t2σxy/n)α + (µ2y − t2σ2

y/n) ≤ 0 (2.17)

Assuming µ2x−t2σ2

x/n 6= 0, the left hand side of Eq. 2.17 is a parabola (aα2+bα+c = 0)

with respect to α where:

a =µ2x − t2σ2

x/n

b =2(−µxµy + t2σxy/n)

c =µ2y − t2σ2

y/n (2.18)

The solution of inequality 2.16 has four scenarios, one of which (number 3 in the

list below) is not feasible for the values a, b and c in Eq. 2.18:

• a < 0

1. If b2 − 4ac < 0; we obtain a completely unbounded interval, S = R

2. If b2 − 4ac ≥ 0; we get an exclusive unbounded interval,

S =]−∞, q1] ∪ [q2,+∞[

• a > 0

3. If b2 − 4ac < 0; there is no real solution.

4. If b2 − 4ac ≥ 0; we obtain a bounded interval, S = [q1, q2]

36

Figure 2.5: Feasible cases of Fieller’s confidence set of the ratio of expected values

where:

q1,2 =(µxµy − t2σxy/n)∓

√(−µxµy + t2σxy/n)2 − (µ2

y − t2σ2y/n)(µ2

x − t2σ2x/n)

µ2x − t2σ2

x/n

It is straightforward to see that:

a > 0 ≡ µ2x

σ2x/n

> t2

b2 − 4ac ≥ 0 ≡ (−µxµy + t2σxy/n)2 − (µ2y − t2σ2

y/n)(µ2x − t2σ2

x/n) ≥ 0

≡ t4(σ2xy/n

2 − σ2xσ

2y/n

2) + t2(−2µxµyσxy/n+ µ2yσ

2x/n+ µ2

xσ2y/n) ≥ 0

≡ t2(σ2xy/n

2 − σ2xσ

2y/n

2) + (−2µxµyσxy/n+ µ2yσ

2x/n+ µ2

xσ2y/n) ≥ 0

≡ −2µxµyσxy/n+ µ2yσ

2x/n+ µ2

xσ2y/n

σ2xσ

2y/n

2 − σ2xy/n

2=

1

n[−2µxµyσxy + µ2

yσ2x + µ2

xσ2y

σ2xσ

2y − σ2

xy

] ≥ t2

Let us denote:

qexclusive =µ2x

σ2x/n

qcomplete =1


yσ2x + µ2

xσ2y

σ2xσ

2y − σ2

xy

] (2.19)

The conditions presented in terms of a, b and c can be rewritten in terms of qcomplete

37

and qexclusive (Table 2.1).

Table 2.1: Conditions for the four scenarios of Fieller’s confidence sets

Conditions Fieller’s solutions

a < 0 and b2 − 4ac < 0 ≡ qexclusive < t2 and qcomplete < t2 S = R

a < 0 and b2 − 4ac ≥ 0 ≡ qcomplete ≥ t2 > qexclusive S =]−∞, q1] ∪ [q2,+∞[

a > 0 and b2 − 4ac < 0 ≡ qexclusive > t2 > qcomplete Imaginary case (non-feasible)

a > 0 and b2 − 4ac ≥ 0 ≡ qcomplete ≥ t2 and qexclusive > t2 S = [q1, q2]

∗The values of a, b, c and qcomplete, qexclusive are defined in Eq. 2.18 and Eq. 2.19

Now we verify that the imaginary case is not feasible for the defined values of a,

b and c or equivalently for the values of qcomplete and qexclusive. The proof is done by

contradiction. The imaginary case implies that qexclusive > qcomplete or equivalently

qexclusive − qcomplete > 0. However,

qexclusive − qcomplete =µ2x

σ2x/n− 1


yσ2x + µ2

xσ2y

σ2xσ

2y − σ2

xy

]

=1

n[µ2x(σ

2xσ

2y − σ2

xy)− (−2µxµyσxy + µ2yσ

2x + µ2

xσ2y)σ

2x

σ2x(σ

2xσ

2y − σ2

xy)]

=1

n[µ2xσ

2xσ

2y − µ2

xσ2xy + 2µxµyσxyσ

2x − µ2

yσ4x − µ2

xσ2xσ

2y

σ2x(σ

2xσ

2y − σ2

xy)]

=1

n[−(µyσ

2x − µxσxy)2

σ2x(σ

2xσ

2y − σ2

xy)] ≤ 0

which contradicts qexclusive − qcomplete > 0

If a = 0, inequality 2.17 is of the form bα + c ≤ 0 with the solution given by:

[−c/b,∞[ if b < 0

]−∞,−c/b] if b > 0 (2.20)

38

with b and c defined as in Eq. 2.18.

Finally, let us highlight that confidence sets of the ratio of expected values may be

unbounded. This can never be inferred when treating the ratio as a normal variable and

calculating the confidence intervals by the normal theory. Thus, such normal-theory

intervals have low coverage3 even when the denominator is significantly different from

zero and/or for large sample sizes (Franz, 2007).

2.4.2.2 Von Luxburg & Franz (2009) geometric solution

Recently, Von Luxburg & Franz (2009) have presented a geometric construction for

confidence sets of E(Y )/E(X) which coincide with the ones in Fieller (1954). The

intuitive idea in Von Luxburg & Franz (2009) is outlined as follows. Let (X, Y ) be

BV N(µ,Σ), Eq. 2.1. Let us define Lr = (x, y) ∈ R2 : y = rx the line of slope r

which passes through the origin. If S = [l, u] is the confidence interval of E(Y )/E(X)

at the confidence level 1 − γ; then, we can construct a wedge (W ) given by the area

of the plane enclosed by Ll and Lu. The wedge comprises of all the lines Lr, such

that r ∈ [l, u] (see Figure 2.6, left). On the other hand, we can construct S by

intersecting W with the vertical line (x, y) ∈ R2 : x = 1 (see Figure 2.6, right).

Thus, providing an appropriate W in R2 is equivalent to calculating the confidence

set of E(Y )/E(X). Von Luxburg & Franz (2009) proves that such W is given by

the area enclosed by the tangent lines to the ellipse (E) through the origin, where

E = z ∈ R2 : (z− µ)Σ−1

(z− µ)> = t2/n and t is the 1− γ/2 quantile of Tn−1. The

results in Von Luxburg & Franz (2009) are presented in Theorems 6 and 7.

Theorem 6. Let (X, Y ) be BV N(µ,Σ) distributed, Eq. 2.1. The exact confidence set

for the ratio α = E(Y )/E(X) can be constructed in the following steps.

1. Estimate µx, µy and Σ

2. Take t the 1− γ/2 quantile of a Tn−1

3The probability that the interval contains the true parameter

39

Figure 2.6: Construction of a wedge given the confidence interval of the ratio of means

(left) and vice versa (right). [The lower and upper confidence limits are denoted by l and u,

respectively.]

3. Plot E

4. Study the position of E

• If (0, 0) /∈ E, construct W by taking the area enclosed by the tangents to E

through the origin. Then, take S = W ∩ (x, y) ∈ R2 : x = 1

• If (0, 0) ∈ E, take S = R

The procedure presented above yields the three different cases displayed in Figure 2.7.

Proof. (Sketch) Let us define πa : R2 → R, (x, y)→ a1x+a2y an orthogonal projection

on the line La2/a1 . The proof is in two steps. Firstly, it is demonstrated that πa(E)

is a confidence interval of level 1 − γ of πa(µ), ∀a. In particular, the projection of

E on the line Lα⊥ = (x, y) ∈ R2 : y = −xα, is a confidence interval of πα⊥(µ).

Secondly, it is proved that πα⊥(µ) ∈ πα⊥(E) ⇔ α ∈ S. Then, it immediately follows

that 1− γ = P (πα⊥(µ) ∈ πα⊥(E)) = P (α ∈ S). For more details, refer to Theorem 1

of Von Luxburg & Franz (2009).

40

Figure 2.7: Confidence sets of the ratio of expected values in Von Luxburg & Franz

(2009).

Notice that if n increases, the area of E decreases and then S shrinks. Furthermore,

if E intersects the y-axis in only one point, then the y-axis is one of the tangents of E

through the origin and S is given by the solution in Eq. 2.20.

Theorem 7. The confidence sets obtained by the geometric construction in Von Luxburg

& Franz (2009) coincide with the ones in Fieller (1954).

Proof. (Sketch) The proof is in two parts. In the first one, it is demonstrated that the

feasible cases of Fieller’s solutions (Table 2.1) are equivalent to the three cases derived

in Von Luxburg & Franz (2009). The equivalence for the completely unbounded interval

is demonstrated showing that:

(0, 0) ∈ E ⇔ (0− µ)Σ−1

(0− µ)> ≤ t2/n⇔ qcomplete ≤ t2

The equivalences for the exclusive unbounded set and the bounded interval are demon-

strated projecting E onto the x-axis. In the exclusive unbounded case, the y-axis

is intersected by E. Thus, 0 belongs to the projection of E onto the x-axis given by

[µx−t√σx/n, µx+t

√σx/n]. Then, the extremes of the latter interval have the opposite

sign and (µx− t√σx/n)(µx + t

√σx/n) < 0⇔ qexclusive < t2. The same idea is used for

41

the bounded interval, in which case 0 is not contained in [µx − t√σx/n, µx + t

√σx/n]

and then (µx − t√σx/n)(µx + t

√σx/n) ≥ 0 ⇔ qexclusive > t2. The second part of the

proof shows that q1 and q2 are equal to the slopes of the tangent lines to E through

the origin. This can be demonstrated by using Eq. 3 in Walter et al. (2008). For more

details on the proof, refer to Von Luxburg & Franz (2004).

The study of the position of E is a quick procedure to determine whether X or Y

are significantly different from zero and then which of the three situations in Section

2.1 applies. The key of this idea is that πa(E) is a confidence interval of πa(µ) ∀a.

In particular, for a1 = (1, 0) and a2 = (0, 1), which represent the projections onto

the x−axis and y−axis, respectively. For instance, let us consider that E intersects

the y−axis. By projecting E onto the x-axis, it is clear that 0 ∈ πa1(µ) = [µx −t√σx/n, µx+ t

√σx/n]. Thus, µx is not significantly different from zero. The same can

be applied to the numerator by projecting E onto the y-axis.

2.5 On the distributional properties of internal nitrogen use

efficiency in rice.

In this section I applied the previously described theory to a real data set. The data

set comprises 24 plot observations of NU and GY from non-irrigated rice in northeast

Thailand (Figure 2.8) from the experiments carried out by Naklang et al. (2006). All

the plots received a dose of fertiliser of 0 kg Nitrogen (N) ha−1, 21.8 kg Phosphorus (P)

ha−1 and 41.5 kg Potassium (K) ha−1 and were wet after the flowering period of the rice.

Using this data set, I provide examples of the pdf of IEN , the bootstrap distribution

of estimators of E(GY )/E(NU) and confidence intervals of E(GY )/E(NU) for this

sample.

1. The pdf of IEN

Let us assume that the true parameters of the population of non-irrigated rice

under the fertiliser and water conditions described above coincide with the sample

42

Figure 2.8: Grain yield versus nitrogen uptake from a sample of non-irrigated rice in

northeast Thailand (Naklang et al., 2006)

mean and variance-covariance (Eq. 2.1) given by:

µ = (49.47, 2135.42); Σ =

416.86 10986

10986 609382.4

(2.21)

For these values of the parameters, the pdf of IEN (Figure 2.9) is clearly skewed

to the right with tails heavier than the normal ones. The values of a and b (Eq.

2.5) are equal to 1.48 and 2.42, respectively. Thus, the distribution of IEN for

this population does not have a satisfactory normal approximation (Section 2.3).

2. Bootstrap distribution of estimators of E(GY )/E(NU).

The arithmetic mean of the observations of IEN (RA, Eq. 2.11) is usually em-

ployed to estimate E(GY )/E(NU). However, the bootstrap distribution of this

estimator presents much more variation than the distribution of the ratio of the

arithmetic means of GY and NU (RW , Eq. 2.12). This fact is illustrated in

Figure 2.10 which displays the bootstrap distribution of these estimators when

each bootstrap sample is generated from a BV N with parameters given in Eq.

2.21.

3. Confidence interval of E(GY )/E(NU). The Fieller’s confidence set of

E(GY )/E(NU) calculated from the sample in Figure 2.8 is a bounded interval

43

Figure 2.9: Probability density function of the ratio of two jointly normal variables

with parameters given in Eq. 2.21

Figure 2.10: Bootstrap distribution of the estimators defined in Eq. 2.11 (left) and Eq.

2.12 (right) for the data in Fig. 2.8. [Sixty bootstrap sample of size 100 with parameters given

in Eq. 2.21]

44

equal to [37.93, 49.45]. However, even if the Fieller’s confidence sets are bounded,

they can substantially differ to the ones obtained when assuming IEN having a

normal distribution (in this case [39.93, 52.56]). Additionally, confidence intervals

calculated according to the normal theory may have low coverage even when NU

is far from zero (Franz, 2007).

2.6 Summary

Many of the issues presented in this chapter: mixture nature of the ratio distribution,

heavy-tailedness, non-existence of moments and uninformative unbounded confidence

sets – are not considered in agricultural published research when analysing ratios. In

particular, most of the studies in our literature search (Section 1.4) which analysed

IEN , computed it for each experimental plot (IENi = GYi/NUi, where the subindex i

refers to observations on the i-th plot) and applied univariate linear models on the IENi,

i = 1 . . . n observations. Standard univariate linear models assume non-abnormality

and homogeneity of variance for the error distribution. From the studies selected, only

four (Albrizio et al., 2010; Giambalvo et al., 2010; Liu et al., 2011; Tetard-Jones et al.,

2013) reported to have checked for the departure from normality, and three (Albrizio

et al., 2010; Giambalvo et al., 2010; Liu et al., 2011) the homogeneity of error variances.

Bivariate analyses avoid dealing with the distributional issues of the ratio and main-

tain the complete information on GY and NU and their joint behaviour, which shed

more light on the utilisation of NU for GY . As discussed in Chapter 1, I proposed

to use finite mixture models of bivariate Gaussian distributions as a complementary

analysis to bivariate mixed models for the GY and NU field data collected across a

range of environments. In the next chapter, I review the fundamentals of finite mix-

ture models of multivariate Gaussian distributions with the objective of presenting the

theoretical framework of the methodology of this thesis.

45

Chapter 3

Fundamentals of finite mixture models

3.1 Non-technical introduction

Finite mixture models are commonly used to classify observations from a heterogeneous

population into subpopulations1 (e.g. Crossa & Franco, 2004; Di Zio et al., 2005; Pear-

son, 1894). For a detailed explanation of the technique, refer to Fruhwirth-Schnatter

(2006); Lindsay (1995); McLachlan & Peel (2000); Titterington et al. (1985).

To classify observations into subpopulations, it is required to estimate a set of

mixture parameters, ψ (Eq. 3.1), such as the mean, variance-covariance and the pro-

portion of individuals for each subpopulation. The maximum likelihood method (see

Wackerly et al., 1996, p.398-400) has been the most popular approach to estimating

ψ (Fruhwirth-Schnatter, 2006, p.49). This method is based on maximising the likeli-

hood, or the log likelihood, function. The likelihood function is defined from the joint

probability density function when a realisation of the random sample is given. The

maximisation of the likelihood function will give us the values of the parameters which

maximise the chance of having observed this sample. However, in the case of mixtures,

the log likelihood function, L(ψ), can be complex and its global maximum cannot be

calculated explicitly by analytical mathematical methods (Redner & Walker, 1984).

Thus, the search for the global maximum is performed numerically via an algorithm

called the Expectation and Maximisation (EM) algorithm (Dempster et al., 1977). The

EM algorithm is an iterative procedure which is initiated with a starting estimate (ψ0)

1In this thesis, clusters, groups and subpopulations are used indistinctly.

47

and then, at each iteration, searches a value of ψ which increases the value of L(ψ) in

comparison to the previous iteration. The algorithm stops when a stopping criterion

is fulfilled, e.g. when the difference in the value of L(ψ) between two consecutive iter-

ations is smaller than a chosen threshold (Ng, 2013).

Fitting mixture models with the EM algorithm involves dealing with some chal-

lengues. The difficulties are due to the nature of L(ψ) rather than to a failure of the

algorithm (McLachlan & Peel, 2000, p.99). The main difficulties are listed as follows.

1. The likelihood function may be unbounded: thus, its global maximum does not

exist (Kiefer & Wolfowitz, 1956). The values of ψ for which L(ψ) is unbounded

are called singularities. The non-existence of the global maximum implies that a

local maximum needs to be chosen as the maximum likelihood estimate (MLE)

of ψ (Redner & Walker, 1984).

2. The likelihood function may have multiple local maxima (Seidel et al., 2000).

This causes sensitivity of the algorithm to starting and stopping strategies (Seidel

et al., 2000) and creates the dilemma of what local maximum to choose as the

MLE (Figure 3.1).

3. Some local maxima do not produce meaningful cluster partitions. For instance,

when the clusters are non-realistic for biological reasons (see Section 3.8.3) or a

cluster is fitted to a ‘small localised random pattern’ (McLachlan & Peel, 2000,

p.99), for example to few points which randomly lie on a line. Such local maxima

are known as spuriosities (Figure 3.2).

The most common procedure to deal with the issues mentioned above is to decide

on the maximum number of components to fit, and then to perform the following steps

decreasing until just one component is fitted (Fraley & Raftery, 1998; McLachlan &

Peel, 2000; Melnykov & Maitra, 2010; Ng, 2013):

1. Initiate the EM algorithm from different starting points to widen the search for

local maximisers.

48

2. Stop the EM algorithm when the change in L(ψ) between two consecutive iter-

ations is less than a chosen threshold.

3. Check for and remove spuriosities and singularities.

4. Select the local maximum with the highest value of L(ψ) from the remaining

solutions.

Having an optimal solution for each number of clusters, another important decision is

how to select the number of clusters. Several information criteria are commonly used

for this purpose (Melnykov & Maitra, 2010). These criteria favour the models which

fit the data well, while at the same time penalise for the model complexity.

A detailed explanation of finite mixture models is presented in Section 3.2 to Section

3.11 and references therein. In order to fully understand the rest of the chapter, the

reader needs to have a basic knowledge of probability theory and statistics (see Samuels

et al. (2012) for an introduction on the subject for biologists and Wackerly et al. (1996)

as a second and deeper reading on the topic) and be familiar with the mathematical

notations.

49

Figure 3.1: Solutions of three clusters obtained by the EM algorithm for the case study

(Chapter 4)

Figure 3.2: Spurious solution obtained by the EM algorithm when fitting 7 components

to the case study data (Chapter 4). [The green component corresponds to few points which

randomly lie on a line and the dark blue and red ones correspond to groups with negative correlations

and therefore are biologically meaningless]

50

3.2 Common use of mixture models

Mixture models are used in a large number of fields, such as agriculture, medicine,

engineering, genetics, weather forecast and image segmentation and are applied for

modelling very different types of data (Titterington et al., 1985, Table 2.1.3). Mixture

models are used for two main purposes (Figueiredo & Jain, 2002; Fruhwirth-Schnatter,

2006, p.5-6):

1. For cluster analysis where the researcher has some scientific justification in as-

suming that the data come from different groups but there is little or no infor-

mation on their classification (e.g. Crossa & Franco, 2004; Di Zio et al., 2005).

Although, there are other clustering techniques, finite mixture modelling is the

only one which: 1) assumes a well characterised mathematical model and 2) al-

lows for parameter estimation by the maximum likelihood method and hypothesis

testing on the number of clusters (McLachlan et al., 1996).

2. For approximating distributional shapes whose analytical expression is unknown

(e.g. Marron & Wand, 1992; Xu et al., 2010). For instance, Figure 3.3 shows

the histogram of a univariate random variable whose distribution is bimodal and

asymmetric around the first mode. These types of non-standard distributions

can be modelled well using mixture models.

In this thesis, finite mixture models are used for clustering purposes to identify

groups in (NU,GY ) field data.

51

Figure 3.3: Bimodal distribution generated from a mixture of three univariate normal

components. [The parameters employed to generate the sample were n = 100, π = (0.25, 0, 25, 0.50),

θ1 = (0, 1), θ2 = (3, 2) and θ3 = (7, 0.5) where θi = (µi, σi)]

3.3 Mathematical definition

Let Y = [Y1,Y2, . . . ,Yn] be a random sample of n p-dimensional random vectors

Y(1×p)j , j = 1 . . . n. Let y = [y1,y2, . . . ,yn] be the observed values of the random

sample Y. Finite mixture models assume that Yj are i.i.d according to the finite

mixture density (f):

f(yj|ψ) =

g∑

i=1

πifi(yj|θi) (3.1)

where:

g is the number of components.

fi is the i-th component density.

θi is the parameter vector of fi.

π = (π1, . . . , πg) denotes the mixing proportions, πi ≥ 0 ∀ i and

g∑

i=1

πi = 1.

ψ = (π1, . . . , πg−1,θ1, . . . ,θg, g) is the parameter vector of f .

For the bivariate mixture analysis of (NU , GY ), it is assumed that every mixture

52

component is a bivariate Gaussian with the pdf:

φ(yj|µi,Σi) =1

2π|Σi|1/2exp−(yj − µi)Σ−1

i (yj − µi)>/2

where:

µi is the mean vector of the i-th component density.

Σi is the variance-covariance matrix of the i-th component density.

3.4 Classifying data into groups: label random vectors and

posterior probabilities

Mixture models assume that the information which classifies Yj into its group is miss-

ing. Therefore, a label random variable Zj is introduced to classify Yj into g groups

(Dempster et al., 1977) . The random variable Zj models the probability of a random

draw to have one of g possible outcomes. As a consequence, Zj has a multinomial

random distribution (Mult) with parameters m = 1, where m is the number of draws,

and π = (π1, . . . , πg) the probability of each outcome.

Zj ∼ Mult(1,π)

For convenience, Zj is considered as a g-dimensional vector Zj = [Zij], i = 1 . . . g and

j = 1 . . . n, defined as follows.

Zij =

1, if Yj belongs to the i-th group (Gi),

0, otherwise.

In this thesis, it is assumed that each group is generated by one mixture compo-

nent. Under this assumption, πi is interpreted as the proportion of individuals in Gi

or the probability of Yj to belong to Gi, i = 1 . . . g (McLachlan & Peel, 2000, p.7).

The component densities are viewed as the pdf of Yj conditional on the group of origin

53

fi(yj|θi) = f(yj|Zij = 1,θi) (McLachlan & Peel, 2000, p.7). Under this interpreta-

tion, it becomes easy to verify the mixture model density using the Total Probability

Theorem:

f(yj|ψ) =

g∑

i=1

f(yj, zj = i|ψ) =

g∑

i=1

f(yj|zj = i,θi)P (zj = i) =

g∑

i=1

fi(yj|θi)πi

The probability that Yj belongs to Gi, once yj has been observed, is called the

posterior probability and denoted by τij.

τij = Pr(zij = 1|yj) =f(yj|zij = 1)P (Zij = 1)

f(yj)=

πifi(yj|θi)∑gi=1 πifi(yj|θi)

(3.2)

The posterior probabilities are used to classify the data into groups by assigning yj to

Gi if τij = maxr=1...g

τrj (McLachlan & Peel, 2000, p.31). The calculation of the posterior

probabilities requires estimating the parameters of the mixture.

3.5 Maximum likelihood estimation of the mixture parame-

ters

Since the first publications on finite mixture models (Newcomb, 1886; Pearson, 1894)

several methods have been proposed for estimating the mixture parameters (see Everitt,

1996). The maximum likelihood method has been the most successful approach so far

(Fruhwirth-Schnatter, 2006, p.49). This method aims to find the global maximum of

the likelihood function (Eq. 3.3) or the log likelihood function2 (Eq. 3.4). The natural

way to proceed is to calculate the roots of the first derivative of L(ψ) and then, check

which one is the global maximum. However, in the case of mixture models, L(ψ) is

usually quite complex and the roots of the log likelihood equations cannot be calculated

2Notice that logarithm is a strictly monotone increasing function so it preserves the maximisers of

the likelihood function.

54

explicitly (Redner & Walker, 1984).

l(ψ) =n∏

j=1

g∑

i=1

πifi(yj;θi) (3.3)

L(ψ) =n∑

j=1

ln

(g∑

i=1

πifi(yj;θi)

)(3.4)

The estimation of the mixture parameters has become much easier due to the im-

plementation of iterative algorithms and the development of fast computers. Special

mention needs to be made to, probably, the most widespread algorithm: the Expec-

tation Maximisation algorithm also known as the EM algorithm (Dempster et al., 1977).

3.6 The EM algorithm for the estimation of mixture param-

eters

The EM algorithm is an iterative procedure which searches for the global maximum of

L(ψ) in the framework of missing data. Although it is not clear who first developed

the EM algorithm, it has been generally attributed to Dempster et al. (1977), who

generalised the algorithm to an extensive number of applications and discerned for the

first time the two well-known Expectation and Maximisation steps (Meng & Van Dyk,

1997). For a brief review of the history of the EM algorithm, refer to Meng & Van Dyk

(1997); Redner & Walker (1984).

In the context of mixture models, the realisations of Zj, j = 1 . . . n (Dempster

et al., 1977) are considered missing. Before explaining the algorithm computations, let

us introduce all necessary notations. Let (Z1,Y1, . . . ,Zn,Yn) be the complete random

sample and (Y1, . . .Yn) the incomplete random sample with their realisations denoted

by (z1,y1, . . . , zn,yn) and (y1, . . . ,yn), respectively. The log likelihood function of the

55

complete random sample (Lc) is given by:

Lc(z,y,ψ) =n∑

j=1

log f(zj,yj|ψ)

The log likelihood function of the incomplete random sample (L) is given by:

L(y,ψ) =n∑

j=1

log f(yj|ψ) (3.5)

The EM algorithm approaches the problem of finding ψ, a local maximiser3 of

L(y,ψ), by applying a numerical procedure which aims to maximise Lc(z,y,ψ). How-

ever, Lc(z,y,ψ) cannot be maximised directly because it depends on the values of

zj, j = 1 . . . n, which are missing. Instead of this, Dempster et al. (1977) suggested

maximising the expected value of Lc(z,y,ψ) given y, a realisation of the incomplete

random sample, and ψ an estimate of ψ. Therefore, each iteration of the EM algorithm

has two steps:

Let ψ(r) be the estimate produced in the r -th iteration of the algorithm.

1. Compute E(Lc(z,y,ψ)|ψ(r),y), denoted asQ(ψ|ψ(r)) in the literature, (Expectation-

step or E-step).

2. Find ψ(r+1) that maximises Q(ψ|ψ(r)) (Maximisation-step or M-step).

Dempster et al. (1977) proved that given an initial estimate ψ(0) and the values of the

observed sample y, the EM algorithm produces a sequence ψ(0),ψ(1) . . .ψ(s) with the

property given in Theorem 8.

Theorem 8. The sequence L(ψ(r))sr=1 where ψ(r+1) maximises Q(ψ|ψ(r)) is non-

decreasing.

Proof. Theorem 1 in Dempster et al. (1977). A more detailed proof can be found in

Borman (2004).

3In the case of mixture models L(y,ψ) may be unbounded, thus the global maximum does not

exist, see Section 3.17

56

The convergence properties of the EM algorithm are discussed in Wu (1983). Wu

(1983) stated that if Q(ψ|η) is continuous in ψ and η, then the algorithm converges

to a local/global maximum or a saddle point.

3.7 The EM algorithm for the estimation of parameters of

mixtures of multivariate Gaussian distributions

In this section I specify the EM steps for mixtures of multivariate Gaussian distri-

butions. A recommended reference addressing this topic is McLachlan & Peel (2000,

Section 3.2-3.3). For details on the calculations of the EM steps, refer to Appendix D.

Let ψ(r) be the estimate at the r -th iteration of the algorithm.

1. E-step. The algorithm updates the posterior probabilities:

τ(r+1)ij =

π(r)i φ(y j|µ(r)

i ,Σ(r)i )

∑gi=1 π

(r)i φ(y j|µ(r)

i ,Σ(r)i )

(3.6)

2. M-step. The algorithm updates the estimates of the mixture parameters:

π(r+1)i =

n∑

j=1

τ(r+1)ij

/n (3.7)

µ(r+1)i =

n∑

j=1

τ(r+1)ij y j

/ n∑

j=1

τ(r+1)ij (3.8)

Σ(r+1)i =

n∑

j=1

τ(r+1)ij (y j − µ(r+1)

i )>(y j − µ(r+1)i )

/ n∑

j=1

τ(r+1)ij (3.9)

3.7.1 Illustration of the EM algorithm on simulated data

Let us consider a sample of 5 observations from a mixture of two bivariate normals

with parameters:

π1 = 0.5 µ>1 =

5

10

Σ1 =

1 1

1 2

π2 = 0.5 µ>2 =

20

20

Σ2 =

1 0

0 1

(3.10)

57

The data sample is displayed below in matrix M and plotted in Figure 3.4.

M =

2.92 8.47

19.22 20.48

19.67 20.21

5.38 9.16

19.12 19.61

Figure 3.4: Scatter plot of a sample from a mixture of two bivariate normal with

parameters given in Eq. 3.10. [The points in different clusters are plotted in different colours].

For the purpose of this illustration, the EM algorithm is initiated with the actual

value of the parameters ψ(0) = ψ. The algorithm repeats the same operations in each

iteration so I only present the calculations in the first iteration.

1. E-STEP (first iteration):

Let us calculate τ(1)11 :

τ 111 =

0.5φ ((2.92, 8.47)|µ1,Σ1)

0.5φ ((2.92, 8.47)|µ1,Σ1) + 0.5φ ((2.92, 8.47)|µ2,Σ2)= 1

The rest of τ(1)ij , i = 1, 2 and j = 1 . . . 5, are calculated similarly and presented in

the matrix τ (1). The j -th row of τ (1) contains the posterior probabilities of the

58

observation yj to belong to the first cluster, τ(1)1j , or to the second one, τ

(1)2j .

τ (1) =

1 0

0 1

0 1

1 0

0 1

2. M-STEP (first iteration):

Let us compute π(1)1 , µ

(1)1 and Σ

(1)1 by using Eq. 3.7, 3.8 and 3.9.

π(1)1 =

2

5= 0.4

µ(1)1 =

1(2.92, 8.47) + 1(5.38, 9.16)

2= (4.15, 8.82)

Σ(1)1 =

[(2.92, 8.47)− (4.15, 8.82)][(2.92, 8.47)− (4.15, 8.82)]>

2+

[(5.38, 9.16)− (4.15, 8.82)][(5.38, 9.16)− (4.15, 8.82)]>

2

=

1.51 0.42

0.42 0.12

The calculations of π(1)2 , µ

(1)2 and Σ

(1)2 are performed using the same procedure.

3.8 Difficulties in selecting the MLE of mixture models of

Gaussian distributions with heteroscedastic components

An important problem when fitting mixture models of Gaussian distributions with

unrestricted covariance matrices is how to select the MLE of the mixture parameters

(Everitt et al., 2011; Figueiredo & Jain, 2002; Melnykov, 2013; Seo & Kim, 2012). The

difficulties encountered are not due to a failure of the EM algorithm but to the char-

acteristics of L(ψ) for this type of mixture (McLachlan & Peel, 2000, p.99). The log

likelihood function for mixtures of Gaussian distributions with heteroscedastic compo-

nents presents:

59

1. Unboundedness, which leads to the non-existence of the global maximiser.

The points where the log likelihood function is unbounded are called spuriosities.

Then, a local maximiser is chosen as a MLE substitute (Redner & Walker, 1984).

2. Multiple maximisers, which creates the dilemma of which local maximum to

choose as the MLE and produces sensitivity to the initial values of the parameters

and stopping rules (Seidel et al., 2000).

3. Spuriosities, local maxima which do not correspond to a feasible cluster parti-

tion, (McLachlan & Peel, 2000, p.99) and which require to be inspected.

3.8.1 Unboundedness of the likelihood function

Kiefer & Wolfowitz (1956) were the first who reported the unboundedness of L(ψ).

They used the following mixture of univariate Gaussian distributions to illustrate this

property:

π1 =0.5 π2 =0.5

µ1 =µ µ2 =µ µ unknown

σ1 =1 σ2 =σ σ unknown

The log likelihood function for this mixture is given by:

L(ψ) =n∑

j=1

1

2(2π)12

exp−(yj − µ)2

2+

1

2(2π)12σ

exp−(yj − µ)2

2σ2

If yj = µ and σ → 0 then, exp−(yj − µ)2

2σ = 1 and

1

2(2π)12σ→ ∞ and L(ψ) is

unbounded.

Similarly, in the case of multivariate Gaussian distributions, the unboundedness of

L(ψ) (Eq. 3.11) is produced when yj = µi and |Σi| → 0.

L(ψ) =n∑

i=1

g∑

i=1

πi1

(√

2π)p|Σi|1/2exp−(yj − µi)Σ−1

i (yj − µi)>/2 (3.11)

60

The values of the parameters at which L(ψ) becomes unbounded are called singu-

larities. The unboundedness of the L(ψ) and therefore the non-existence of the MLE

may seems puzzling at first. Nonetheless, and according to Redner & Walker (1984);

McLachlan & Peel (2000, p.43), it is sufficient to find a local maximum of L(ψ) which

satisfies certain properties such as efficiency4 and consistency5.

3.8.2 Multiple local maxima

The fact that L(ψ) may present multiple local maxima creates the dilemma of which

one to choose as the MLE (Redner & Walker, 1984). Furthermore, even if we widen

the search for local maximisers (see Section 3.9.1), we cannot be sure that all have been

found (McLachlan & Peel, 2000, p.44). The presence of multiple maximisers results

in sensitivity of the EM algorithm to the choice of the initial values of the parameters

(seeds) and stopping rules. This fact was shown by Seidel et al. (2000), who aimed to

approximate the distribution of the likelihood ratio test statistic, λ, by bootstrapping

(see Section 3.11). Seidel et al. (2000) wanted to test if a sample came from a single

exponential distribution with parameter θ or from a mixture of two with a parameter

vector P. The bootstrap distribution of −2 log λ = 2 logL(P)− 2 logL(θ) depends on

P, which is obtained by the EM algorithm. Seidel et al. (2000) showed on examples

that by using different starting values and stopping rules for this algorithm the values

of specific quantiles of −2 log λ can vary substantially between runs. The sensitivity to

the starting positions is also illustrated in Figure 3.1, which shows how two different

seeds for the EM algorithm can lead to very different cluster partitions of the same data.

4An estimator θ is said to be efficient if it is unbiased (E(θ) = θ and V ar(θ)) is minimum (Wackerly

et al., 1996, p.373)5An estimator θn is said to be consistent if by increasing the sample size n, it get closer to the

parameter that we want to estimate. Formally, P (|θn − θ| < ε)→ 1 (Wackerly et al., 1996, p.374)

61

3.8.3 Spuriosities

There may be some local maxima which do not correspond to feasible cluster parti-

tions. These solutions are called spuriosities. A solution is considered spurious if it

is biologically meaningless, denoted here as a biological spuriosity, or if a component

has been fitted to very few data points and the determinant of its covariance matrix

is small but not zero (McLachlan & Peel, 2000, p.99), denoted here as a mathematical

spuriosity.

Biological spuriosities in nitrogen efficiency

Several biological constraints need to be considered in order to identify unfeasible clus-

ter partitions. Firstly, the values of GY and NU are always positive and finite within

a range which varies depending on the environmental conditions and the cultivar used.

This information can be supplied by biologists. For example, for the case-study in

Chapter 4, the values of GY and NU are within a range of 0 to 5000 kg/ha and 0 to

120 kg/ha, respectively. Notice that values close to zero, although possible, will indi-

cate biological outliers, e.g. the crops in the plots do not perform well due to flooding.

Secondly, the correlation of GY and NU is expected to be positive or zero. This is due

to the fact that N is a major limiting factor for the grain production (Xu et al., 2012).

Therefore, an increment in NU is expected to increase GY , resulting in a positive

correlation between both variables. However, once a plant has accumulated enough

N, GY becomes unresponsive to an increment in NU and no correlation between GY

and NU is observed (see Figure 1.4, left). Situations when the amount of N in plants

is excessive, arguably derived by an oversupply of N fertilisers, such that it adversely

affects the synthesis of proteins as well as the growing pattern and health of plants to

reduce grain (Goyal & Huffaker, 1984, p.111) are not common in sustainable agricul-

ture. Thus, this scenario will not be considered in this work. Due to these biological

constraints, cluster partitions which contemplate negative values for GY and NU or

negative correlations between these two traits are considered spurious. For instance,

the dark blue and red components in Figure 3.2.

62

In the case of applying the mixture approach to the univariate analysis of IEN ,

biological spuriosities are solutions which allow negative values of the ratio.

Mathematical spuriosities

A mathematical spuriosity is a local maximum of L(ψ) (Eq. 3.5) for which one cluster

has been allocated to a ‘small and localised random pattern’ (McLachlan & Peel, 2000,

p.99). This cluster contains few data points and the determinant of the covariance

matrix of its correspondent component is small but not zero (McLachlan & Peel, 2000,

p.99). Equivalently, a procedure to detect spurious local maxima is to investigate the

mixing proportion and the eigenvalues of the covariance matrix for each component

(McLachlan & Peel, 2000, p.103). Notice that if one of the eigenvalues of the covariance

matrix tends to zero, the |Σi| is expected to be small. Recall that if Σi is a real

symmetric matrix, we can factorise Σi as follows.

Σi = QiDiQ>i (3.12)

where Qi is an orthogonal matrix with the eigenvectors of Σi, and Di is a diagonal

matrix with the eigenvalues denoted as (λi1, . . . , λip). Thus, |Σi| = |Qi||Di||Q>i |.

As Qi is an orthogonal matrix, its determinant is equal to one. Therefore, |Σi| =

λi1 . . . λip.

3.8.3.1 Visual methods to detect spuriosities

For bivariate Gaussian distributions, spuriosities can also be detected visually by plot-

ting the prediction ellipses6 which depict the means and variance-covariance matrices

of the mixture components (Eq. 3.13) (McLachlan & Peel, 2000, p.103). This visual

method is in accordance to the ones suggested by Friendly (2006) for multivariate anal-

ysis of variance. For instance, a mathematical spuriosity corresponds to a cluster whose

prediction ellipse contains few observations and has one or two axes of small length.

(y− µi)Σ−1

i (y− µi)> =(n− 1)p

n− pn+ 1

nF (1− γ, p, n− p) (3.13)

6The region given in Eq. 3.13 has 100(1− γ)% probability of containing a new observation of the

sample conditional on belonging to i-th group (see Chew, 1966). In the bivariate case p = 2.

63

where F (1 − γ, p, n − p) is the 1 − γ quantile of the F distribution with p and n − pdegrees of freedom. The detection of spuriosities by the length of the axes is due to the

fact that the length of the ellipse axes are proportional to the square root of eigenvalues

of Σi. A comprehensive explanation of the latter fact can be found in Rencher (1998),

and I summarise it here. Considering that Σi = QiDiQ>i , and substituting Σi by its

factorisation in Eq. 3.13, it follows that:

(y− µi)QiD−1

i Q>i (y− µi)> =

(n− 1)p

n− pn+ 1

nF (1− γ, p, n− p)

Taking zi = Q>i (y− µi)>, we arrive at:

z>i D−1

i zi =(n− 1)p

n− pn+ 1

nF (1− γ, p, n− p)

2∑

t=1

z2it

λit=

(n− 1)p

n− pn+ 1

nF (1− γ, p, n− p) (3.14)

The Eq. 3.14 is a canonical ellipse with the length of axes:√

(n− 1)p

n− pn+ 1

nF (1− γ, p, n− p)λit. (3.15)

The multiplication of a vector by an orthogonal matrix acts as a rotation of the vector.

Thus, Eq. 3.13 is an ellipse with its origin at µi, the axes directions given by the

eigenvectors of Σi, and the axes lengths given by Eq. 3.15

For mixture models of univariate Gaussian components, spuriosities can be detected

by plotting the pdf of each component. For instance, a mathematical spuriosity will

present a component with small variance (e.g. McLachlan & Peel, 2000, p.100).

3.9 Strategy to select the MLE of mixtures with heteroscedas-

tic Gaussian components

Selecting the MLE for mixtures of unrestricted covariance matrices is a difficult task

due to the presence of singularities, multiple local maxima and spuriosities. Let us

consider a mixture of g components. The most common strategy for selecting the

MLE is to perform the following steps (Biernacki et al., 2006; McLachlan & Peel, 2000;

Melnykov & Maitra, 2010; Ng, 2013):

64

1. Initiate the EM algorithm with different starting strategies to identify as many

local maximisers as possible (see Section 3.9.1).

2. Stop the EM algorithm when a stopping criterion is fulfilled, e.g. the change in

L(ψ) between two consecutive iterations is less than a chosen threshold.

3. Check for and delete spuriosities and singularities.

Then, the MLE is considered a solution which gives the highest value of L(ψ).

The logic of this procedure is based on the theoretical results obtained by Hath-

away (1985), who constrained the parameter space to avoid singularities and spurious

solutions for mixtures of univariate normal distributions. The constrained parameter

space was:

Ωεc = ψ ∈ Ω|πi ≥ ε, σi ≥ cσk 1 ≤ i 6= k ≤ g c > 0

Hathaway (1985) demonstrated that the constrained global maximiser was consistent,

given that ψ belongs to Ωεc. Assuming that the same result applies for mixtures of

multivariate Gaussian distributions, the same strategy could be performed (McLachlan

& Peel, 2000, p.97). The constrained parameter space for the multivariate case was

suggested by Hathaway (1985):

Ωεc = ψ ∈ Ω|πi ≥ ε, λm(ΣiΣ

−1k ) ≥ c, 1 ≤ i 6= k ≤ g c > 0

where λm refers to the minimum eigenvalue of a matrix. However, satisfying these

constraints involve a difficult optimisation problem.

Ingrassia & Rocci (2007) suggested a simpler approach. They observed that:

λm(ΣhΣ−1j ) ≥ λm(Σh)

λs(Σj)

where λs refers to the maximum eigenvalue of a matrix.

Then, by imposing the constraints: a ≤ λi(Σj) ≤ b, where λi(Σj) is the i-th eigenvalue

of the matrix Σj and takinga

b≥ c:

λm(ΣhΣ−1j ) ≥ λm(Σh)

λs(Σj)≥ a

b> c

65

which defines the constrained parameter space (Ωab) (Ingrassia & Rocci, 2007):

Ωab = ψ|πi ≥ ε a ≤ λk(Σi) ≤ b, k = 1 . . . p and i = 1 . . . g

This parameter space implies Ωεc but results in a simpler optimisation problem.

The main difficulty is how to choose the tuning parameters c and ε (univariate case)

or a, b and ε (multivariate case) to ensure that ψ belongs to the constrained parameter

space (Hathaway, 1985). For this reason, some authors (e.g McLachlan & Peel, 2000;

Melnykov, 2013) recommend not imposing constraints and deleting the singularities

and spuriosities post-hoc. I have also decided to delete spuriosities and singularities

rather than to restrict the domain of the parameters.

3.9.1 Starting strategies for the EM algorithm

Different strategies have been proposed for initiating the EM algorithm (e.g. Biernacki

et al., 2003; Karlis & Xekalaki, 2003; Maitra, 2009); a comprehensive review is given

in McLachlan & Peel (2000, p.54-55). However, none of them outperforms the others

for all the applications (Meila & Heckerman, 2001; Melnykov, 2013). The starting

strategies used in this current project are listed below.

1. Random starts. This strategy is based on constructing a random partition of

the data. For each yj, a random value, r, is generated from the set R = 1 . . . g.Then, yj is assigned to Gr by fixing τrj = 1 and τsj = 0 for s = 1 . . . g and s 6= r

(McLachlan & Peel, 2000, p.55).

2. Solution provided by the k-means algorithm. This procedure produces a

partition of the data set based on minimising the distance between the obser-

vations and the means of the clusters. The objective function to minimise isg∑

i=1

∑

yj∈Gi

d(yj−µi) where d denotes the Euclidean distance (Forgy, 1965, as cited

in Omran et al., 2007) . Then, τij = 1 if yj is allocated to Gi by the k-means

algorithm and τsj = 0, s = 1 . . . g and s 6= i (McLachlan & Peel, 2000, p.54).

66

3. Simulated starts. The initial values of the joint means, µ(0)i , i = 1 . . . g, are

generated from a bivariate normal distribution with mean y =n∑

j=1

yj/n and co-

variance matrix S =n∑

j=1

(yj − y)>(yj − y)/n. The initial values of the mixing

proportions and the covariance matrices of the components densities are given

by π(0)i =

1

gand Σ

(0)i = S, respectively (McLachlan & Peel, 2000, p.55).

4. Subsample solution. The EM algorithm is run from random starts on a sub-

sample from the data sample. Then, the solution provided by the EM algorithm

after few runs when applied on the subsample is used to initiate the EM algo-

rithm on the entire sample. The subsample size needs to be big enough to avoid

degenerate solutions (McLachlan & Peel, 2000, p.55).

5. Short run of the EM algorithm. This strategy is based on running several

‘short runs’ of the EM algorithm when the latter has been initiated with random

starts. Then, the solution with the highest value of L(ψ) is used to perform

a ‘long run’ on the EM algorithm (Biernacki et al., 2003, 2006). A ‘short run’

indicates that the threshold chosen to stop the EM algorithm is larger than that

for a ‘long run’.

Strategies 1 and 2 are available in the R (R Core Team, 2012) package EMMIX

(McLachlan et al., 1999), strategies 3, 4 and 5 have been programmed by the author

(the code can be found in Appendices F and E).

3.10 Bayesian approach to estimating parameters of mixture

models of multivariate Gaussian distributions

In this Section I review the Bayesian approach to estimating the parameters of mixture

models of multivariate normal distributions. Refer to Lee(2012); Pena (2002, Chapter

11) for an introduction to Bayesian statistics and Fruhwirth-Schnatter (2006) for a

detailed explanation of the Bayesian approach in the context of mixture models.

67

Bayesian statistics considers the parameters of a population to be random variables

for which a pdf, called the prior distribution, is specified. The prior distribution is

chosen depending on the previous knowledge of the researcher. Then, the researcher

updates his/her knowledge once the data are recorded. The combination of the re-

searcher’s previous knowledge and his/her learning from the data are expressed by

the conditional pdf of the parameters on the observed data, the so-called posterior

distribution. The prior and posterior distributions are related by Bayes’ Theorem.

p(ψ|Y) =f(Y|ψ)p(ψ)∫f(Y|ψ)p(ψ)dψ

∝ l(ψ|Y)p(ψ) (3.16)

where:

ψ is the parameter vector.

Y is the random sample.

p(ψ|Y) is the posterior pdf.

f(Y|ψ) is the joint pdf.

p(ψ) is the prior pdf.

l(ψ|Y) is the likelihood function.

Inference in Bayesian statistics is based on the posterior probability (Pena, 2002,

p.329). For instance, ψ is estimated by taking the expected value of p(ψ|Y), denoted

as E(ψ|Y) (Pena, 2002, p.329). In the case of mixture models, E(ψ|Y) always exists

in closed form, given that priors are proper7 and conjugate8 (Diebolt & Robert, 1994).

However, calculating E(ψ|Y) requires expanding the likelihood function (Eq. 3.3),

which involves gn operations corresponding to all possible allocations of the data into

g clusters (Lee et al., 2008). This is computationally prohibitive for large sample

sizes (Lee et al., 2008). Thus, ψ is estimated by calculating the mean of a random

sample generated from the posterior distribution (Diebolt & Robert, 1994). The most

common algorithm to generate the random sample is the Gibbs sampler (Lee et al.,

2008). The fundamentals of this algorithm, its application and issues associated with

7A proper prior is a prior which integrates to one (Pena, 2002, p.330)8The prior distribution is called a conjugate prior if the posterior distribution belongs to the same

parametric family as the prior (Pena, 2002, p.332)

68

the estimation of ψ are briefly revised below.

3.10.1 The Gibbs sampler

The Gibbs sampler (Geman & Geman, 1984) is a Markov chain Monte Carlo (MCM)

method for generating a random sample from a joint pdf, e.g. fxy(x, y), by sampling

from the conditional pdfs, fx|y(x|y) and fy|x(y|x). The steps of the Gibbs sampler in

its bivariate case is detailed as follows (Robert & Casella, 2004, Chapter 9).

Let x(0) be an initial value of the random variable X. The following steps are

repeated for r = 1 . . . N , where r indicates the iteration number of the algorithm.

1. Generate y(r) from the distribution fy|x(·|x(r−1))

2. Generate x(r) from the distribution fx|y(·|y(r))

After n initial steps, n large and 1 ≤ n ≤ N , (x(n), y(n)) is a random value from the

pdf fxy(x, y) (Diebolt & Robert, 1994).

3.10.2 The Gibbs sampler for a mixture of multivariate Gaus-

sian distributions

The Gibbs sampler is used to generate (z(1), . . . , z(N),ψ(1)g , . . . ,ψ(N)

g ), a sample of size

N from the posterior distribution p(Z,ψg|Y) (Pena, 2002, p.476). The subindex g is

used to indicate that the number of mixture components are equal to g. The sample

(z(1), . . . , z(N),ψ(1)g , . . . ,ψ(N)

g ) is obtained by generating (z(1), . . . , z(N)), a sample from

p(Z|ψg,Y), and (ψ(1)g , . . . ,ψ(N)

g ), a sample from p(ψg|Z,Y) (Pena, 2002, p.476). After

n initial iterations of the algorithm, n being sufficiently large and 1 ≤ n ≤ N ,

(z(n), . . . , z(N),ψ(n)g , . . . ,ψ(N)

g ) is a random sample from p(Z,ψg|Y) (Diebolt & Robert,

1994). In addition, (ψ(n)g , . . . ,ψ(N)

g ) is a random sample from p(ψg|Y) (Robert &

Casella, 2004, p.339). As pointed out by Diebolt & Robert (1994), (ψ(n)g , . . . ,ψ(N)

g )

69

can be used to approximate the posterior expected value, Ep(ψg|Y), as follows.

Ep(ψg|Y) ≈ 1

N − nN∑

k=n+1

ψ(k)g

The computation of posterior pdfs requires specifying the priors for the mixture

parameters (see Eq. 3.16). Bensmail et al. (1997) suggested the use of conjugate

priors:

π ∼ D(α1, . . . , αg)

µi ∼MVN(ξi,Σi/ki)

Σ−1i ∼ Wp(mi,Ci)

where D refers to the Dirichlet distribution (Eq. 3.17), MVN to the multivariate

normal and Wp to the Wishart distribution (Eq. 3.18).

fD(x|α1 . . . αg) = cxα11 . . . xαg−1

g (3.17)

where:

g∑

i=1

xi = 1 and c =Γ(∑g

i=1 αi)∏gi=1 Γ(αi)

fw(X|m,C) =|X|m−p−1

2 |C|m exp−tr(CX)Γp(m)

(3.18)

with:

Γp(m) = πp(p−1)/4

p∏

k=1

Γ(2m+ 1− k

2)

where Γ is the gamma function, p is the dimension of the matrix Xp×p and tr is the

trace function.

In particular, Bensmail et al. (1997) used αi = 1/g, ki = 1, mi = 5, ξi and Ci as

the mean and covariance matrix of the entire sample, respectively. Under these priors,

the Gibbs sampler for mixtures of multivariate normal distributions is set up as follows

(Bensmail, 1997; McLachlan & Peel, 2000, p.123).

70

Let z(0) be an initial partition of the data into g clusters. The following steps are

repeated for r iterations, r = 1 . . . N .

1. Generate:

π(r) ∼ D(α1 + n(r−1)1 , . . . , αg + n(r−1)

g )

µ(r)i ∼MVN(ξ

∗(r−1)i ,

1

n(r−1)i + ki

Σ(r−1)i )

Σ−1(r)i ∼ Wp(n

(r−1)i +mi,C

∗(r−1)i )

where:

ni =n∑

j=1

zij

ξ∗i =(niyi) + kiξini + ki

C∗i = C−1i + niSi +

nikini + ki

(yi − ξi)>(yi − ξi)−1

yi =1

ni

n∑

j=1

zijyj

Si =1

ni

n∑

j=1

zij(yj − yi)>(yj − yi)

2. Generate:

Z(r)j ∼ Multg(1, τ

(r)j )

where Multg is a multinomial distribution with 1 trial, g outcomes and τ j the

probability vector of the outcomes (Eq. 3.2).

3. Calculate n(r)i , y

(r)i and S

(r)i

The estimation of the parameters by the Gibbs sampler involves dealing with an

important pitfall, the so-called ‘label switching problem’ (Redner & Walker, 1984).

71

3.10.3 Label switching problem

A well-known mathematical feature of mixture models is its non-identifiability (Marin

et al., 2005; McLachlan & Peel, 2000; Redner & Walker, 1984). A parametric family F

is said to be identifiable, if given ψ1 and ψ2, two values of the parameter vector, and

f(y|ψ1), f(y|ψ2) ∈ F such that, f(y|ψ1) = f(y|ψ2); then, ψ1 = ψ2. The parametric

family of mixture models is non-identifiable because the mixture density is invariant to

a permutation on the parameters (Marin et al., 2005; McLachlan & Peel, 2000; Redner

& Walker, 1984). This type of non-identifiability was denoted by Redner & Walker

(1984) as the ‘label switching problem’. For instance, let us observe that:

π1f1(y|θ1)+π2f2(y|θ2) = π2f2(y|θ2)+π1f1(y|θ1), but (π1,θ1, π2,θ2) 6= (π2,θ2, π1,θ1)

The ‘label switching problem’ has serious implications for the estimation of ψ by

the Bayesian approach. The invariance of the mixture to permutations results in a pos-

terior distribution with g! modes and is thus, difficult to explore by the Gibbs sampler

or other Monte Carlo methods (Lee et al., 2008). Furthermore, if there is no previous

knowledge available for discerning the components of the mixture, it is common to

take a prior invariant to a permutation of the parameters (Stephens, 2000). However,

the latter practice results in posterior expected values of πi and θi equal for all the

components (Marin et al., 2005; Stephens, 2000). A comprehensive explanation of this

fact is given by Fruhwirth-Schnatter (2006, p.64), which I present here.

Firstly, let us observe that the likelihood function is invariant to a permutation on

the parameters. In the simple case of two components:

l(y|ψ) =n∏

i=1

[π1f1(y|θ1) + π2f2(y|θ2)] =n∏

i=1

[π2f2(y|θ2) + π1f1(y|θ1)]

Let us now consider g being a fixed number of components and η a permutation on the

set G = 1, . . . , g. If we define ψη = (πη(1), . . . , πη(g),θη(1), . . . ,θη(g)) and take a prior

such that p(ψη) = p(ψ); then, it is easy to see that:

p(ψ|y) = kl(y|ψ)p(ψ) = kl(y|ψη)p(ψη) = p(ψη|y) (3.19)

72

Furthermore, the marginal posterior distributions are also invariant to a permutation

on the parameters, p(θi|y) = p(θη(i)|y). This is demonstrated as follows.

p(θi|y) =

∫

Θg−1×[0,1]gp(ψ|y)d(θ1, . . . ,θi−1,θi+1, . . . ,θg, π1 . . . πg) (3.20)

where Θ is the parameter space of θi, ∀ i = 1 . . . g. Let us transform the previous

integral by applying the permutation η. Taking into account that the determinant of

the Jacobian matrix of a permutation is equal to one and that the parameter space

does not change under a permutation:

p(θi|y) =

∫

Θg−1×[0,1]gp(ψη|y)d(θη(1), . . . ,θη(i−1),θη(i+1), . . . ,θη(g), πη(1) . . . πη(g))

= p(θη(i)|y) (3.21)

The fact that p(θi|y) = p(θη(i)|y) implies that E(θi|y) = E(θη(i)|y). The latter equal-

ity holds for all the permutations of the set G. Then, the estimates of the component

parameters are identical for any data set which is an unsatisfactory result for the esti-

mation problem (Marin et al., 2005).

Some authors imposed constraints on the parameter space to ensure the uniqueness

of labelling (e.g. Lenk & DeSarbo, 2000). However, defining valid constraints can be

a challenge when p > 1 (Fruhwirth-Schnatter, 2006, p.20). Furthermore, these con-

straints can interfere with the ability of the algorithm to explore the posterior surface

or with the algorithm convergence (Lee et al., 2008). The latest efforts have been fo-

cused on constructing relabelling algorithms which reorder the chain (ψ(n), . . . ,ψ(N))

‘a posteriori’ (e.g. Marin et al., 2005; Stephens, 2000).

Even if the approach of reordering the chain of estimates a ‘posteriori’ is able to solve

the ‘label switching problem’, the Gibbs sampler, as other Monte Carlo methods, may

be trapped in some local modes of the posterior distribution and then, become unable to

reproduce the posterior surface (Lee et al., 2008). In addition, the Bayesian approach

is computationally more costly than the frequentist procedure because it requires a

73

post-treatment of the Monte Carlo chain. Due to these difficulties, I have chosen to

use the EM algorithm, which does not need to deal with the issue of exchangeable

components (Jasra et al., 2005; McLachlan & Peel, 2000, p.27).

3.11 Selecting the number of mixture components

Selecting the number of components is a difficult statistical problem, which has been

broadly studied with no unequivocal method outperforming the others for all the ap-

plications (McLachlan & Peel, 2000, p.175). The main approaches are based on 1)

information criteria, and 2) testing procedures (Melnykov & Maitra, 2010). Both al-

ternatives are briefly described here and a full discussion can be found in McLachlan

& Peel (2000, Chapter 6).

3.11.1 Information Criteria

Including more components in the mixture increases the value of L(ψ) but leads to

more complex models (Figueiredo & Jain, 2002). Information criteria choose the model

which provides the highest value L(ψ) while penalising for the complexity of the model.

Therefore, all the information criteria have a generic expression:

−2L(ψ) + 2C (3.22)

where L(ψ) is the value of the log likelihood function at ψ and C is a complexity

penalty term.

There is a large number of information criteria. For instance, Akaike’s Information

Criterion (AIC) (Akaike, 1973, 1974); Bayesian Information Criterion (BIC) (Schwarz,

1978); Integrated Classification Likelihood Criterion (ICL) (Biernacki et al., 2000);

Bayesian Information criterion type approximation of the Integrated Classification

Likelihood Criterion (ICL-BIC) (Biernacki et al., 2000); Normalized Entropy Criterion

(NEC) (Celeux & Soromenho, 1996); Bootstrap-Based information Criterion (EIC)

(Ishiguro et al., 1997); Cross-Validation-Based Information Criterion (CVIC) (Smyth,

74

2000); Informational Complexity Criterion (ICOMP) (Bozdogan, 1990, 1993); Approx-

imate Weight of Evidence (AWE) (Banfield & Raftery, 1993) and Minimum Message

Length Criterion for mixtures (L) (Figueiredo & Jain, 2002)

Independent on any information criterion used, the selection of the number of com-

ponents is performed as follows (e.g. Fraley & Raftery, 1998; McLachlan & Peel, 2000,

p. 219).

Decide on the maximum number of components to fit. Then, decreasing the number

of component to one, perform the following steps:

• Choose ψg according to the procedure detailed in Section 3.9. The subindex g

indicates that the mixture has g components.

• Compute the information criterion chosen.

The number of components selected is the one which minimises the information crite-

rion chosen.

In this thesis, the information criteria used to select the number of components

were:

• AIC(Akaike, 1973, 1974):

AIC = −2L(ψ) + 2d

• BIC (Schwarz, 1978):

−2 logL(ψ) + d log n

where d is the number of parameters in the mixture and n is the sample size. Fonseca

& Cardoso (2007) showed that BIC had the best performance to select the number of

clusters for mixtures of multivariate Gaussian distributions in an experiment carried

out with 42 data sets with the EM algorithm.

The main inconvenience for all the information criteria is that they do not provide

a measure of confidence for choosing a particular number of components (McLachlan

75

& Peel, 2000, p.184). Such measure can be obtained by applying bootstrap techniques

on the likelihood ratio test statistic (McLachlan, 1987).

3.11.2 Likelihood ratio test for selecting the number of clus-

ters

The selection of the number of components can be performed using the likelihood ratio

test statistic (λ). The null hypothesis H0 : g = g0 is tested against the alternative one

HA : g = g0 + 1.

λ =supθ ∈ H0

L(ψ)

supθ ∈ HAL(ψ)

=L(ψg0)

L(ψg0+1)

H0 is rejected when λ is small or equivalently when −2 log λ is large. The statistic

−2 log λ usually has an asymptotic distribution of a χ2d, where d is the difference be-

tween the number of parameters under HA and H0 (Wilks, 1938). This result assumes

that the parameter space of the null hypothesis (Θ0) is in the interior9 and an iden-

tifiable subset of the parameter space (Θ)10, which is not fulfilled for mixture models

(Ghosh & Sen, 1985; McLachlan & Peel, 2000, p.185-186).

Let us consider a simple example given in Ghosh & Sen (1985) which I detail here:

Imagine we want to test:

H0 : f(y, θ0), with θ0 fixed, against

HA : πf1(y, θ1) + (1− π)f(y, θ2)

In this example, Θ and Θ0 are given by:

Θ = [0, 1]× S1 × S2

where S1 and S2 is the parameter space of θ1 and θ2, respectively.

Θ0 = (1 × θ0 × S2) ∪ (0 × S1 × θ0) ∪ ([0, 1]× θ0 × θ0)9Interior(S)=S-Boundary(S), being S a set

10Non-identifiable: The null hypothesis holds for two different values of the parameters belonging

to the parameter space of the alternative hypothesis (McLachlan & Peel, 2000, p.185)

76

Note that Θ0 is on the boundary of Θ (Ghosh & Sen, 1985). Furthermore, if H0 is true,

f(y, θ0) is represented by three densities of the form πf1(y, θ1) + (1 − π)f(y, θ2): 1)

π = 1 and θ1 = θ0, 2) π = 0 and θ2 = θ0 and 3) π ∈]0, 1[ and θ1 = θ2 = θ0 (Ghosh

& Sen, 1985). Thus, Θ0 is in a non-identifiable subset of Θ (Ghosh & Sen, 1985).

The breakdown of the regularity conditions implies that the distribution of −2 log λ

is in general unknown. McLachlan (1987) proposed to use bootstrap techniques to

approximate this distribution as follows.

1. Apply the EM algorithm on the recorded data to find the MLE

ψg0 = (π1, . . . , πg0 , θ1, . . . , θg0) by the procedure in Section 3.9.

2. Generate B bootstrap samples of size n from the finite mixture model given by:

yb1, . . . ,ybn ∼

g0∑

i=1

πifi(y|θi), b = 1 . . . B

3. Apply the EM algorithm to the bootstrap sample yb1, . . . ,ybn with g0 and g1 =

g0 + 1 components and calculate:

−2 log λb = −2 logL(ψg1)b − 2 logL(ψg0)b, b = 1, . . . , B.

4. Order the values of −2 log λb from the smallest to the largest to obtain the order

statistics.

5. Calculate the value of the likelihood ratio test statistic for the original sample

−2 log λs = −2 logL(ψg1) − 2 logL(ψg0) and compare it with the k-th order

statistic, (−2 log λk), k = 1 . . . B, to get a significance level of α = 1− k

B + 1.

6. The null hypothesis is rejected if −2 log λs is larger than −2 log λk.

The main disadvantages of the bootstrap approach are: a) its computational cost,

b) its dependence to the strategies used to handle spurious cluster partitions, to ini-

tiate and to stop the algorithm and c) it provides only an approximation of the true

distribution of −2 log λ (McLachlan & Peel, 2000, p.193).

77

3.12 Summary

In this chapter I have reviewed the fundamentals of the methodology of finite mixture

models. Finite mixture models are used to model non-standard distributional shapes.

For instance, Xu et al. (2010) used finite mixture models of univariate Gaussian distri-

butions to model the pdf of abortion time from randomly selected dairy cows. Another

important application is the classification of observations into groups. A very early ex-

ample of this application can be found in Pearson (1894) who applied mixture models

to discover subspecies among a population of blue crabs.

The modelling of pdf shapes and the classification of observations into groups re-

quire estimating the mixture parameters. These estimates can be obtained by frequen-

tist or Bayesian procedures. Under the frequentist approach, the estimation of the

parameters is based on the maximum likelihood method and the application of the

EM algorithm. The log likelihood function is multimodal so the EM algorithm needs

to be initiated from different starting values to widen the search for local maximisers.

Then, the local maximum with the highest value of the log likelihood, after the dele-

tion of spuriosities and singularities, is chosen as the maximum likelihood estimate.

The Bayesian approach requires applying the Gibbs sampler algorithm to generate a

random sample from the posterior distribution of the parameters. Then, the estimates

of the mixture are obtained by taking the mean of the random sample. In this the-

sis, I use the frequentist approach because it is computationally less costly than the

Bayesian one and does not require dealing with the problem of interchange of mixture

components. The frequentist approach for fitting mixtures can be implemented with

the R packages mixtools (Benaglia et al., 2009) and Mclust (Fraley & Raftery, 1999)

(for mixtures of univariate Gaussian distributions) or EMMIX (McLachlan et al., 1999)

(for mixtures of multivariate Gaussian distribution).

After having introduced all the necessary details of the technique, in the next chap-

ter I apply the proposed methodology for clustering the (NU , GY ) field data collected

78

across environments in a real-life case study. The inspection of the mixture groups

could reveal environmental conditions affecting (NU , GY ). By fitting mixture models

one can estimate the expected value (mean) and the variance (degree of dispersion

around the mean) of each NU and GY as well as their correlation (degree of associa-

tion between both traits) for each group (environment). Furthermore, if the researcher

still wants to estimate IEN , bivariate mixture models allow one to do so by taking the

ratio of each of the estimated means and calculating confidence sets according to the

procedures detailed in Chapter 2. Chapter 4 is written as a manuscript according to

the guidelines of the Australian and New Zealand Journal of Statistics.

79

Chapter 4

Bivariate models for internal nitrogen use efficiency:

mixture models as an exploratory tool

This chapter is presented according to the requirements of submission to the Australian

and New Zealand Journal of Statistics.

Statement of authorship and author contributions

Title of Paper Bivariate models for internal nitrogen use efficiency: mixture models as an

explorative tool

Publication status In preparation for submission

By signing the Statement of Authorship, each author certifies that their stated

contribution to the publication is accurate and that permission is granted for the pub-

lication to be included in the candidate’s thesis.

81

Name of the Princi-

pal Author

Isabel Munoz-Santa

Contribution to the

paper

Originated the idea of the analysis, developed the methodology, undertook

critical review of the relevant literature, analysed and interpreted the case-study

data, developed the computational code, wrote and edited the manuscript and

will act as corresponding author

Signature and date

Name of Co-Author Petra Marschner

Contribution to the

paper

Supervised the development of the work in its biological aspects, provided

expertise for the interpretation of biological consequences of the analyses, con-

tributed to the editing of the manuscript as appropriate

Signature and date

Name of Co-Author S.M. Haefele

Contribution to the

paper

Supplied the data set, provided expertise for the interpretation of biological

consequences of the analysis, contributed to the editing of the manuscript as

appropriate

Signature and date

Name of Co-Author O. Kravchuk

Contribution to the

paper

Trained the first author in interpreting statistical aspects as appropriate, con-

tributed to the original discussion of the research ideas, contributed to the

editing of the manuscript as appropriate

Signature and date

Aust. N. Z. J. Stat. 2014 doi: 10.1111/j.1467-842X.XXX

Bivariate models for internal nitrogen use efficiency: mixture models as an exploratorytool

I. MUNOZ-SANTA1 , P. MARSCHNER 2, S.M. HAEFELE 3, O. KRAVCHUK 1

University of Adelaide, Australian Centre for Plant Functional Genomics

Summary

Internal nitrogen use efficiency (IEN ) in cereals, defined as the ratio of grain yield (GY ) tonitrogen uptake (NU ), is an important trait in agronomy and plant and soil science research.In this study we discuss the application of bivariate mixed and mixture models to the analysisof GY and NU field data and compare them to the univariate mixed and mixture modelsfor IEN . The bivariate analyses preserve the information on the GY and NU traits, andavoid dealing with the abnormalities issues of ratios. Bivariate mixture models are proposedas a classification tool for identifying field conditions affecting the utilisation of nitrogen.Due to the design constraints on the collection of data in agricultural field trials, the bivariatemixture technique is suggested as supplementary to bivariate mixed models. The mixturemethodology is demonstrated on a case-study of rice research in northeast Thailand, forwhich the technique is proven useful for exploring environmental conditions, in particular,soil fertility and water availability. The bivariate mixture methodology may be applicable forother efficiency indices in agricultural research.

Key words: bivariate analysis; classification and discrimination; cluster analysis; EMalgorithm; internal nitrogen use efficiency, mixed model; mixtures

1. Introduction

Nitrogen (N) is an elementary constituent of the nucleotides and proteins of cereals

(Xu et al. 2012), but cereal plant roots have available only a fraction of the N in soil. The

availability of N depends on complex interactions between soil, plant and environmental

1 The University of Adelaide, School of Agriculture Food and Wine. Waite Building (Waite Campus), WaiteRd, Urrbrae SA 5064, Australia2 The University of Adelaide, School of Agriculture Food and Wine. Prescott Building (Waite Campus), GlenOsmond SA 5064, Australia3 Australian Centre for Plant Functional Genomics (Waite Campus), Hartley Grove, Urrbrae, SA 5064,Australia∗Author for correspondence: Munoz-Santa, I.,e-mail:[email protected],telephone:+61406815920facsimile:+61(0)883137109

Acknowledgment. The authors acknowledge the Faculty of Science of the University of Adelaide for theMasters of Research scholarship for the first author and Paul Eckermann for his comments on the draft of themanuscript and help with ASReml-R.

c© 2014 Australian Statistical Publishing Association Inc. Published by Wiley Publishing Asia Pty Ltd.

Prepared using anzsauth.cls [Version: 2014/01/06 v1.01]

2 MIXTURES FOR INTERNAL NITROGEN USE EFFICIENCY

factors (Marschner 2012, p. 315). Once N is absorbed by roots, it is transported to the rest

of the plant, and a portion of this N is stored in the grain (Xu et al. 2012). To quantify the

efficiency of the N utilisation by plants, several N efficiency indices have been defined. In

the context of growing demand for cereals worldwide and limited agricultural land, a better

understanding of N efficiency is a research priority among plant and soil scientists.

Among several N efficiency indices, this study is focused on internal nitrogen use

efficiency (IEN ). Internal nitrogen use efficiency is defined as the ratio of grain yield (GY )

to nitrogen uptake (NU ), i.e. the content of N in the aboveground biomass. In agricultural

research, this index expresses the ability of plants to utilise NU for grain production.

However, how NU is utilised for grain is a complex process, governed by environmental

factors, plant genetics and agronomic practices (Cassman et al. 2002).

In agricultural field trials, GY and NU are measured at plot level at harvest. At this

stage, a typical scatter of the GY and NU data for a particular cultivar grown under a range

of conditions exhibits an increasing monotone linear-plateau shape (e.g. Witt et al. 1999;

Naklang et al. 2006). However, this shape may result from an overlay of growth processes

under the different conditions rather than from a direct functional response of GY to NU .

As environmental conditions vary, the process of NU utilisation for GY changes, and

so does the degree of association (correlation) between these two traits. Since a fraction of

NU is utilised for seed formation (Xu et al. 2012), one would expect the correlation between

NU and GY to be positive or zero. A negative correlation may occur when an excess of N

in plants adversely affects the synthesis of proteins as well as plant health and growth pattern

(Hauck 1984, p. 111). However, an excess of N, arguably caused by an oversupply of N

fertiliser (Hauck 1984, p. 97), is not common in sustainable agriculture nowadays and will

not be considered in this work.

At harvest, conditional on major environmental factors and in the absence of a strong

competition among plants, NU and GY at plot level can be seen as a cumulative effect of

a large number of independent random variables with finite variances. By the Central Limit

Theorem (see Cramer 1946, p. 219), each NU and GY is thus expected to follow a normal

distribution. Expanding this result to the bivariate case (see Cramer 1946, p. 286), the joint

distribution of (NU , GY ) is expected to be bivariate normal. Then, the probability density

function of IEN is a mixture of two heavy-tailed distributions (Marsaglia 1965, 2006) and

the shape of the IEN distribution can vary from normal-like to skewed or bimodal depending

on the means and variances of NU , GY and their correlation (Marsaglia 1965, 2006). The

effects of these parameters on the distributional shape of the ratio cannot be easily untangled.

Despite their intrinsic abnormalities, ratios are commonly used among plant and

soil scientists. In published research in agriculture, IEN is commonly computed for each

experimental plot and analysed by univariate linear models (e.g. Peng et al. 1996; Delogu

c© 2014 Australian Statistical Publishing Association Inc.Prepared using anzsauth.cls

I. MUNOZ-SANTA ET AL. 3

et al. 1998; Fang et al. 2006; Naklang et al. 2006). However, univariate linear models on

the ratio do not maintain information on the original traits, including their correlation, which

limits the interpretatability of these models.

A more adequate approach is bivariate analyses on (NU , GY ). Bivariate analyses

preserve the information on the original traits thus, giving more insight into the mechanism of

NU utilisation for GY . Furthermore, bivariate analyses avoid dealing with the abnormalities

issues of the ratio. Among bivariate analyses, estimating and testing effects of treatments

can be done algebraically with multivariate analyses of variance or numerically through

residual maximum likelihood analyses. Evidence for the advantages of bivariate analyses

over the univariate analyses on a ratio is provided by a recent paper by Ganesalingam et al.

(2013) who analysed data on canola survival of blackleg disease in randomised complete

block experiments. Their bivariate linear mixed model better utilised the experimental data,

allowed greater flexibility in modelling spatial correlations and increased the accuracy of

variety survival predictions.

At present, IEN data are predominantly collected in designed field trials where

treatments are different fertiliser applications (e.g. Peng et al. 1996; Delogu et al. 1998; Fang

et al. 2006; Naklang et al. 2006). However, as outlined earlier, the utilisation of NU for GY

depends on nutrient availability, which can differ substantially from the amount of nutrients

applied. Availability of nutrients, including N, depends on complex processes governed by

many environmental factors such as local microbial activity, soil characteristics and water

availability (Marschner 2012, p. 315). In the field, these factors are beyond the investigator’s

control and may induce different levels of available nutrients even for plots receiving the

same fertiliser treatment.

The effect of environmental conditions may overcome, or interact or be confounded

with, the treatments, complicating the interpretation of treatment-based analyses. In this

study, we argue that such non-controlled conditions may lead to very different patterns

of NU utilisation for GY even for the same treatment and thus to groups in the data

different to treatment groups. Identifying such groups in data collected across different

environments may complement treatment-based analyses when the objective is to gain insight

into environmental drivers of NU and GY . Thus, it is important to investigate the benefits of

finite mixture models of bivariate Gaussian distributions as a clustering technique for (NU ,

GY ) field data.

Finite mixture models are a flexible statistical tool used in a large number of fields

to model data sampled from different groups or to approximate unusual distributional

shapes (Figueiredo & Jain 2002). In agriculture, finite mixture models have been used only

occasionally. For instance, univariate mixture models have been used to model the distribution

of abortion time in randomly selected dairy cows (Xu et al. 2010). Bivariate mixture models



have been employed to classify soybean genotypes with respect to seed yield and seed protein

in randomised complete block designs (McLachlan & Basford 1988). To the best of the

authors’ knowledge, however, finite mixture models of bivariate normal distributions have

not been used for analysing NU and GY field data.

The analysis of (NU ,GY ) data by bivariate mixture models has three advantages: 1)

avoidance of the abnormalities of the IEN ratio, 2) identification of groups in the presence

of strong environmental factors and 3) ability to consider changes in the correlation between

NU andGY . Furthermore, if the researcher still wants to estimate the ratio within each group,

bivariate mixture models allow this through taking the ratio of the estimated means ofGY and

NU . The confidence set of the ratio of expectations can also be derived by straightforward

calculations (Fieller 1954). However, inference with mixture models is a difficult task (Chen

& Tan 2009), and we thus use the technique here for exploratory purposes only.

This study aims to demonstrate the usefulness of the bivariate mixture methodology,

as a complementary analysis to treatment-based analyses in designed field trials when the

objective is to identify potential environmental factors driving GY and NU . We discuss

the benefits of the bivariate mixture and mixed analyses in comparison with the univariate

counterparts for IEN . The proposed methodology is applied to a particular designed field

experiment on non-irrigated rice reported in Naklang et al. (2006). In that study the design of

the field experiment was treatment-based. However, the objective of the study was to analyse

IEN across a range of environmental conditions without focusing exclusively on the actual

treatments. This was intended to contribute to a better understanding of non-irrigated systems

and to improve current management practices.

This present paper is structured as follows. The fundamentals of finite mixture models

are first reviewed. The case study is then described and analysed by both mixed and mixture

models on IEN and (NU ,GY ). The final section discusses the advantages of using bivariate

models instead of the univariate models on IEN as well as the benefits of the mixture model

methodology for identifying environmental conditions affecting (NU , GY ).

2. Finite mixture models of bivariate Gaussian distributions

2.1. Definitions and notations

Let Y(2×1)j be a random vector, j = 1 . . . n where n is the sample size. Let yj be the

observed value of the random vector Yj . Mixture models of bivariate Gaussian distributions

consider the observations to be independent and identically distributed (i.i.d.) according to

the finite mixture density:



f(yj |ψ) =g∑

i=1

πiφ(yj |µi,Σi) (1)

where g is the number of components, πi is the mixing proportion of the i-th component and

φ is the bivariate normal density function. The parameters of the component densities are

µi, the joint mean, and Σi, the covariance matrix. The parameter vector of the mixture is

ψ = (π1, . . . , πg,µ1, . . . ,µg,Σ1, . . . ,Σg).

Finite mixture models are commonly used for clustering data (Everitt 1996), in which

case components correspond to groups. Clustering is achieved by assigning each observation

to the group (G) with the maximum posterior probability:

τij = P (yj ∈ Gi|yj) =πiφ(yj |µi,Σi)∑gi=1 πiφ(yj |µi,Σi)

(2)

The calculations of the posterior probabilities require estimating ψ. This is achieved

via the Expectation and Maximization (EM) algorithm (Dempster et al. 1977), an iterative

procedure for calculating maximum likelihood estimates in a framework of missing values.

The fundamentals of this approach are briefly reviewed below.

2.2. The EM algorithm

The EM algorithm estimates the mixture parameters by interpreting the data as a

missing data problem. The data are considered to be incomplete by introducing a g-random

vector Zj , which matches Yj with its group of origin in the following way:

Zij =

1, if Yj belongs to Gi;

i = 1 . . . g; j = 1 . . . n

0, otherwise.

The Z(j) are i.i.d. according to a multinomial distribution with g possible outcomes. The i-

th outcome occurs with probability πi. The realisations of Zj , denoted as zj , are treated as

missing observations, so that the complete data are [z1, . . . , zn, y1, . . . , yn].The log likelihood function (L) of the incomplete data [y1, . . . , yn] is mathematically

intractable, and the root of the log likelihood equations (∂L/∂ψ = 0) cannot be calculated

explicitly (Redner & Walker 1984). The EM algorithm approaches finding a local maximum

of L by maximising the expectation of the log likelihood function of the complete data

for given values of [y1, . . . , yn] and ψ. This results in a simpler optimisation problem with

the advantage of producing, from a given initial estimate, ψ(0), a sequence of estimates,



ψ(r)sr=0, such that L(ψ(0)) ≤ L(ψ(1)) . . . ≤ L(ψ(s)) (Dempster et al. 1977). Under very

weak conditions, the sequence ψ(r)sr=0 converges to a stationary point of L(ψ), which can

be a local or global maximum, if the algorithm is not trapped in a saddle point (Wu 1983).

For bivariate mixture models of Gaussian distributions (Eq. 1), the sequence of

estimates, ψ(r)sr=0, is generated starting from a chosen initial estimate and iteratively

performing the two EM steps, which are outlined below for the (r+1)-th iteration.

1. Update the posterior probabilities (E-step):

τ(r+1)ij =

π(r)i φ(yj |µ(r)

i ,Σ(r)i )

∑gi=1 π

(r)i φ(yj |µ(r)

i ,Σ(r)i )

2. Update the estimates of the mixture parameters (Eq. 1) by using the new posterior

probabilities (M-step):

π(r+1)i =

n∑

j=1

τ(r+1)ij

/n

µ(r+1)i =

n∑

j=1

τ(r+1)ij yj

/ n∑

j=1

τ(r+1)ij

Σ(r+1)i =

n∑

j=1

τ(r+1)ij (yj − µ(r+1)

i )(yj − µ(r+1)i )>

/ n∑

j=1

τ(r+1)ij

The algorithm stops when the difference between the values of the log likelihood function at

two consecutive iterations is smaller than a chosen threshold.

2.3. Fitting mixture models

Fitting mixtures of multivariate Gaussian distributions with unconstrained covariance

matrices is a challenging statistical and computational task widely discussed in the statistical

literature (e.g. Figueiredo & Jain 2002; Melnykov & Maitra 2010). The issues encountered

are related to the nature of L(ψ) rather than to a failure of the EM algorithm (McLachlan &

Peel 2000, p. 99). Firstly, L(ψ) is unbounded in the points (called singularities) where the

estimated mean of one of the mixture components is equal to a yj and the determinant of the

covariance matrix of this component tends to zero (Kiefer & Wolfowitz 1956; McLachlan &

Peel 2000, p. 94). Thus, the global maximum of L(ψ) does not exist and a local maximum

needs to be chosen as the maximum likelihood estimate (Redner & Walker 1984). Secondly,

L(ψ) can present multiple local maxima, bringing the dilemma of which particular local



maximum to choose and causing sensitivity of the algorithm to starting and stopping rules

(Seidel et al. 2000). Finally, L(ψ) can present a large local maximum when fitting a

component to few data points whose covariance matrix has at least one eigenvalue that is

very small. These solutions are known as spuriosities (McLachlan & Peel 2000, p. 99), and

we refer to them as mathematical spuriosities. It is also possible that some solutions fall

outside the biologically meaningful range of parameters. Instead of restricting the parameter

space, one can additionally consider such solutions as biological spuriosities.

A recommended strategy [see McLachlan & Peel (2000, p. 97), Melnykov & Maitra

(2010) and Ng (2013)] for selecting the estimates of the mixture parameters is to decide

on the meaningful maximum number of components to fit, and to perform the following

steps, starting with fitting the maximum number of components and decreasing to just one

component:

1. Initiate the EM algorithm from different starting values of the parameters ψ(0) or

group partitions z(0)j .

2. Run the EM algorithm on the unrestricted parameter space until the difference

between the log likelihood function of two consecutive iterations is less than a chosen

threshold (c).

3. Examine and remove singularities and mathematical or biological spuriosities.

4. From the remaining solutions, select the one which gives the highest value of the log

likelihood function.

Multiple starting strategies has been proposed to initiate the EM algorithm (e.g.

Biernacki et al. 2003; Karlis & Xekalaki 2003; Maitra 2009) and a comprehensive review

can be found in McLachlan & Peel (2000, p. 54-55). However, none of the starting strategies

has been shown to outperform the others in all cases (Melnykov & Maitra 2010). For the

purpose of our study, we have selected the following five strategies:

1. Random starts: the initial partition is constructed by randomly allocating each

observation to one of the groups (McLachlan & Peel 2000, p. 55).

2. K-means: the initial partition is provided by the k-means algorithm (Forgy 1965 cited

in Omran et al. 2007), which produces a group partition by minimising the distance

from observations to the means of the groups (McLachlan & Peel 2000, p. 54).

3. Simulated means: the EM algorithm is initiated with g simulated means from a

bivariate normal distribution with the mean and covariance matrix calculated from the



sample. The mixing proportion and covariance matrix for the i-th component are given

by π(0)i = 1/g and Σi = S, where S is the entire data covariance matrix (McLachlan

& Peel 2000, p. 55).

4. Subsample solution: the initial value of the parameters is the solution obtained by the

EM algorithm when the latter is applied to a random subsample and initiated from

random starts. The size of the subsample needs to be large enough not to produce

degenerate estimates (McLachlan & Peel 2000, p. 55).

5. Short runs of the EM algorithm: the initial value of the parameters is the solution with

the highest value of the likelihood function obtained after running several short runs

of the EM algorithm when the latter is initiated from random starts (Biernacki et al.

2003).

The first two strategies are available in the package EMMIX (McLachlan et al. 1999).

Strategies 3-5 were programmed by the first author in R (R Core Team 2012) (the code is

available upon request).

Another important decision is the choice of the number of groups. A common approach

is to select the model which minimises some information criteria (McLachlan & Peel 2000,

p. 184) such as the Bayesian Information Criterion, BIC (Schwarz 1978), or Akaike’s

Information Criterion, AIC (Akaike 1973, 1974).

BIC = −2L(ψ) + d log(n)

AIC = −2L(ψ) + 2d

where L(ψ) is the log likelihood value of the selected local maximum (ψ); d is the number

of parameters to estimate, in the case of mixtures of bivariate Gaussian distributions with

unrestricted covariance matrices d = 6g − 1 (Eq.1), and n is the sample size.

The methodology described above is easily translated to the case of mixtures of

univariate Gaussian distributions. In this case, the parameters to estimate are the mixing

proportions, πi, the means, µi, and the variances, σ2i , i = 1 . . . g. For a review of mixture

models of univariate Gaussian distributions, refer to Fruhwirth-Schnatter (2006, p. 169-190).

2.4. Visual guides for mixture models

As a visual guide for mixtures of bivariate Gaussian distributions, the data are usually

plotted in a scatter together with prediction/coverage ellipses of the mixture groups (e.g

Figueiredo & Jain 2002; McLachlan & Peel 2000, p. 103). This visual method is in



accordance with those suggested by Friendly (2006) for multivariate analyses of variance.

The ellipses depict the means, variances and correlations of each group and are thus useful

for identifying both biological and mathematical spuriosities. For instance, ellipses with

elongated or small circular shapes correspond to mathematical spuriosities (McLachlan &

Peel 2000, p. 103).

In the univariate case, the histogram of the data together with the probability density

functions of the fitted components is commonly used for visualisation purposes (e.g. Benaglia

et al. 2009). Spuriosities are detected by inspecting the component densities. For instance,

components with small variances correspond to mathematical spuriosities (e.g. McLachlan

& Peel 2000, p. 104).

3. Case study

The data set considered (Haefele, S.M., pers. comm., 2013) comprises 624 plot

observations of NU and GY of non-irrigated rice from the field experiments reported in

Naklang et al. (2006). The experiments were part of the field trials conducted by Wade

et al. (1999). Neither Wade et al. (1999) nor Naklang et al. (2006) formulated their research

questions exclusively in terms of treatment-based analyses; both also aimed to improve

understanding of non-irrigated rice systems across different environmental conditions. As in

our approach, Wade et al. (1999) performed a cluster analysis using the Ward’s agglomerative

hierarchical algorithm (Delacy et al. 1996) to identify environmental groups. However, the

cluster analysis was conducted exclusively on GY after removing the site and treatment

effects.

The field experiments were carried out at eight sites in the northeast of Thailand: Udon

Thani, Sakhon Nakhon, Khon Kaen, Chum Phae, Tung Kula Ronghai, Phi Mai, Surin and

Ubon Ratchathani (see Figure 1 in Naklang et al. (2006) for their locations). At each site, a

completely randomised block design was implemented with three blocks and eight fertiliser

treatments (Table 1) applied on plots of 20 m2. The experimental layout was maintained over

the wet seasons of 1995, 1996 and 1997. For each site and year, the soil water status was

visually assessed every week and rated according to three categories: dry soil surface, wet

soil surface or ponded water. However, only the dominant water conditions at pre-flowering,

flowering and post-flowering were reported. This procedure resulted in the water conditions

being completely confounded with the site.year effect in the design (Table 2, some sites

excluded as explained later). The experiment was repeated fully irrigated only at the Ubon

Ratchathani site.

Grain and straw yield samples were collected from an area of 8 m2 in the centre of

each plot and analysed for N concentration. Grain yield was adjusted to a standard moisture



content of 14%. Nitrogen uptake was estimated by the following equation:

NU =[N ]SS +m[N ]GGY

1000(kg/ha)

where [N ]S is N concentration in straw (g/kg), [N ]G is N concentration in dry grain (g/kg),

S is straw yield (kg/ha), GY is grain yield (kg/ha) and m is the moisture correction factor,

equal to 0.86.

TABLE 1 ABOUT HERE

TABLE 2 ABOUT HERE

The observations from Chum Phae, Phi Mai and Tung Kula Ronghai were excluded for

the following reasons. Chum Phae and Phi Mai presented large amount of missing values of

[N ]G or [N ]S . In Tung Kula Ronghai and unlike the remaining sites, rice was direct-seeded in

1996, which resulted in observations substantially different to those coming from seedlings.

All the (NU , GY ) observations from the remaining sites over the three years were included

and presented the typical linear-plateau scatter (Figure 1).

FIGURE 1 ABOUT HERE

4. Linear mixed model analyses

4.1. Univariate mixed model

Internal nitrogen use efficiency was analysed with a univariate mixed model in Genstat

(VSN International 2012). At each site, the design considered was a strip plot with three

blocks and treatments and years treated as the strip factors (Table 3). Let s, t, y, b and n denote

the number of sites, fertiliser treatments, years, blocks and the total number of observations,

respectively. For this case study, s = 6, t = 8, y = 3, b = 3 and n = 432.

Let y(n×1) be the vector of IEN observations; y was modelled as:

y = Xτ + Zaua + Zbub + Zcuc + Zdud + Zeue + ε

where τ (k×1) is the vector of fixed effects containing the overall mean, year, treatment and

treatment.year effects, k = 1 + y + t+ yt; and X(n×k) is the corresponding design matrix.

The vectors u(s×1)a , u(sb×1)

b , u(st×1)c , u(sy×1)

d , u(sty×1)e are the site, site.block, site.treatment,

site.year and site.treatment.year random effects with design matrices Z(n×s)a , Z(n×sb)

b ,

Z(n×st)c , Z(n×sy)

d and Z(n×sty)e , respectively. The vector ε(n×1) is the vector containing the



plot errors. Site was considered a random term because the locations were randomly selected

from northeast Thailand. The random effects and plot errors were assumed to be independent

and normally distributed with mean zero and var(ua) = σ2aIs, var(ub) = σ2

b Isb, var(uc) =σ2cIst, var(ud) = σ2

dIsy , var(ue) = σ2eIsty and var(uε) = σ2

ε In, where var() denotes the

covariance matrix and Ir the identity matrix of dimension r.

TABLE 3 ABOUT HERE

In the univariate mixed model analysis of IEN , treatment was significant (p-value

≤ 0.01), whereas year and treatment.year were not (p-values of 0.45 and 0.86, respectively).

Control and PK presented the largest means of IEN , and FYM NPK and ALL presented the

lowest (Table 4).

The random term site.year and site.treatment.year had a very large variance component

(Table 5).

The IEN residuals did not violate the normality and homoscedasticity assumptions in

any obvious way, except for some slight heavy-tailedness (Figure 2).

TABLE 4 ABOUT HERE

TABLE 5 ABOUT HERE

FIGURE 2 ABOUT HERE

4.2. Bivariate mixed model

The (NU , GY ) data were analysed by a bivariate mixed model conducted with

ASReml-R (Butler et al. 2007). As in the univariate case, at each site, the design considered

was a strip plot with three blocks and treatments and years treated as strip factors (Table 3).

Let y(2n×1) = [y>NU , y>GY ]> be the vector containing the observations of NU and GY ;

y(2n×1) was modelled as:

Y = Xτ + Zaua + Zbub + Zcuc + Zdud + Zeue + ε

where τ (2k×1) is the vector of fixed effects containing the overall joint mean, year, treatment

and treatment.year effects for both NU and GY , and X2n×2k is the corresponding design

matrix. The vectors u(2s×1)a , u(2sb×1)

b , u(2st×1)c , u(2sy×1)

d , u(2sty×1)e are site, site.block,

site.treatment, site.year and site.treatment.year random effects for NU and GY with

design matrices Z(2n×2s)a , Z(2n×2sb)

b , Z(2n×2st)c , Z(2n×2sy)

d and Z(2n×2sty)e , respectively.

The vector ε = (ε>1 , . . . , ε>n )> contains the plot errors. As in the univariate analysis site



was considered a random term. The random effects and plot errors were assumed to be

independent and normally distributed with mean zero and the following covariance matrices:

var(ua) =

[σ2a1 σa12

σa12 σ2a2

]⊗ Is var(ub) =

[σ2b1 σb12

σb12 σ2b2

]⊗ Isb

var(uc) =

[σ2c1 σc12

σc12 σ2c2

]⊗ Ist var(ud) =

[σ2d1 σd12

σd12 σ2d2

]⊗ Isy

var(ue) =

[σ2e1 σe12

σe12 σ2e2

]⊗ Isty var(uε) =

[σ2ε1 0

0 σ2ε2

]⊗ In

where ⊗ denotes the Kronecker product, σr1 and σr2 are the ur variances for NU and GY ,

respectively, and σr12 is the covariance of ur between NU and GY .

In the bivariate mixed model analysis, treatment was significant (p-value ≤ 0.001 for

GY and NU ) whereas year and treatment.year were not (p-value of 0.77 and 0.12 for GY

and NU for year, and 0.86 and 0.12 for GY and NU for treatment.year). Control and PK

presented the lowest means for bothNU andGY , and FYM NPK and ALL the largest (Table

4).

A large variance component for site was observed indicating great heterogeneity for

both NU and GY among sites (Table 6). The high correlation between GY and NU (Table

6) in all the random effects is partially due to the fact thatNU is a variable derived fromGY .

Thus, it is likely this correlation is spurious.

TABLE 6 ABOUT HERE

4.3. Limitations of the linear mixed models

In agricultural research, field trials are designed to compare IEN across different

fertiliser treatments. However, the utilisation ofNU forGY depends on the levels of nutrients

available in soil rather than the amount of nutrients applied. The availability of nutrients

is conditioned by complex environmental interactions between climate, plants and soil and

varies greatly with time and space (Marschner 2012, p. 136). This may result in very different

levels of available nutrients, even in plots under the same fertiliser application and agronomy

practice. Thus, such trials may present non-uniform (ill-defined) treatments. For instance,

Control was clearly ill-defined in our case study. This is evident even though there was no

direct measures on N available, as NU can be taken as a good surrogate indicator of N

available in Control plots (Dobermann et al. 2003). Since there was considerable variation



of NU among the Control plots at different sites and in different years (see Figure 1),

considerable variation of available N was expected.

Additionally, for this case study there was no design factor which could explain

the effect of water. Furthermore, site and year created a range of different environmental

circumstances but did not shed much light on the relationship between NU and GY .

5. Mixture model analyses

5.1. Univariate mixture model

For this case study, the residuals of IEN from the univariate linear mixed model did

not violate the normality and homoscedasticity assumptions in any obvious way (Figure 2).

Thus, the probability density of the ratio can be approximated by a mixture of univariate

Gaussian components. The univariate mixture analysis was performed using the R (R Core

Team 2012) mixtools package (Benaglia et al. 2009), which implements the EM algorithm

for fitting mixtures of univariate Gaussian distributions, and settings specified in Table 7. The

R (R Core Team 2012) code is available upon request.

TABLE 7 ABOUT HERE

The information criteria AIC and BIC selected two as the optimal number of groups

(Table 8). The first group included the plots with IEN between 24 and 44 and the second all

the others. The second group had a larger mean and standard deviation than the first (Table

9).

TABLE 8 ABOUT HERE

TABLE 9 ABOUT HERE

The first group contained 70% of the plots which received FYM NPK (Figure 3 a). For

the other fertiliser treatments, the proportions of observations falling into the two groups

did not differ as much as for FYM NPK (e.g. Figure 3 b). Specifically, the proportions

of observations classified in the first and second group for each fertiliser treatment are as

follows: Control (42% vs 58%), PK (41% vs 59%), N (64% vs 36%), FYM (42% vs 58%),

NPK (60% vs 40%), CR NPK (50% vs 50%) and ALL (66% vs 34%). The proportions of

observations classified in the groups for each soil water status at post-flowering were: dry

(45% vs 55%), wet (62% vs 38%) and ponded water (52% vs 48%) (Figure 3 c).

FIGURE 3 ABOUT HERE



5.2. Bivariate mixture model

The (NU,GY ) data were analysed using the R (R Core Team 2012) package EMMIX

(McLachlan et al. 1999), which implements the EM algorithm for fitting mixtures of

multivariate Gaussian distributions (the settings are specified in Table 7). The R code is

available upon request.

The information criterion BIC selected three groups, whereas AIC selected five (Table

10). BIC has been shown to perform better than AIC (Fonseca & Cardoso 2007). Thus, three

was chosen as the optimal number of groups. The first and second groups presented lower

means of NU and GY than the third (Table 11). In terms of the estimated correlations, NU

and GY were tightly correlated in the first group (Table 11). The mean of IEN of each group

was estimated from the bivariate analysis by taking the ratio of the estimated joint mean,

defined in Table 11 as ˆIENi. The confidence sets of IENi were derived by straightforward

calculations according to the confidence rule in Fieller (1954) (Figure 4).

TABLE 10 ABOUT HERE

TABLE 11 ABOUT HERE

FIGURE 4 ABOUT HERE

It appears that the soil water status post-flowering and the soil N supply are the main

factors (from the measurements recorded for the analysis) defining the mixture groups (Figure

5). Most of the plots (62%) classified in the first group did not receive any added N fertiliser

(Control or PK). Soil N supply is the limiting factor for grain production (Xu et al. 2012),

which explains the low means of NU and GY and the strong correlation between NU and

GY in this group (Table 11). The first group presented the largest mean of IEN (Table

11). This result was in agreement with those provided by the univariate linear mixed model

analysis, in which Control and PK plots utilised N more efficiently (Table 4). There were

also plots with no N added in the second group but at a lower proportion (21%) than in the

first (62%). The plots with dry soil (86%) or ponded water (73%) post-flowering were mostly

classified in the first or second groups (Figure 5). Plots with dry soil had lower values of NU

and GY due to the fact that many physiological processes related to the uptake of nutrients

are impaired under water stress (Tanguilig et al. 1987). The low GY in plots with ponded

water post-flowering may have been due to the fact that rice remained green for longer,

which affected grain filling period, translocation of N from green biomass into grain, and

grain ripening (Ntanos & Koutroubas 2002). The plots in the third group were characterised

by having N, P and K added as well as by being wet post-flowering – 71% of the plots with

wet soil post-flowering were classified in the third group. This group had the largest mean

GY .



FIGURE 5 ABOUT HERE

5.3. Limitations of the mixture model approach

Finite mixture models assume independence of observations. However, this assumption

may be violated in designed field trials. Recent developments in mixture models allow

handling correlated observations by mixtures of linear mixed models (Ng 2013). Conditional

on the mixture components, the observations on the experimental units are modelled by a

mixed model, which provides a means to estimate correlations (Ng 2013). In our case,

modelling the correlation will be a complex task. For instance, the correlation between

plots depends on N availability, which is conditioned by environmental factors (spatial soil

variability, microbial activity) and agronomic practices (re-application of fertiliser over the

3 years and the presence of straw residues or losses of nutrients from the previous harvest).

Therefore, the extension of mixture models to include this type of correlation is challenging

and may not even be possible. Broadly speaking, as pointed out by Ng et al. (2006), adapting

clustering techniques to a wide variety of experimental design is an open research question.

6. Discussion

In published agricultural research, field trials are commonly designed to compare IENacross different fertiliser treatments and the analysis is often done by univariate linear models.

However, environmental factors may cause plots under the same fertiliser treatment and

agricultural practice to present different levels of available nutrients, resulting in a lack of

consistency in treatment replications in field trials. Furthermore, univariate linear models of

the ratio do not maintain the information on the original traits, and are often applied without

checking the normality and homogeneity of error variance assumptions.

Sampling across a range of environments may lead to different patterns (groups) of

NU for GY in field data. In this study, we have investigated the use of bivariate mixture

models for identifying groups. Once the groups are identified, their close investigation could

reveal the underlying defining factors, which may not necessarily coincide with experimental

treatments.

The benefits of using bivariate mixture models in nitrogen efficiency research have been

clearly demonstrated in our analysis of the case study on non-irrigated rice. Soil water status

post-flowering has been revealed as an environmental factor defining the mixture groups. In

terms of fertiliser treatments, both bivariate mixture and mixed models indicate that plots

with no added N produced less grain and had shown less N uptake, which supports the fact

that soil N supply is the limiting factor for grain production (Xu et al. 2012).



This study has shown several benefits of using bivariate analyses on NU and GY in

comparison with their univariate counterparts for IEN . Firstly, bivariate analyses preserve

information onNU andGY , which is lost when the data are analysed as a ratio. For instance,

different levels of a factor may affect both NU and GY proportionally. Thus, even if NU

and GY values change considerably, only minor changes may be observed in the ratio. This

can be seen in our case study. The means of NU and GY increased considerably when CR

NPK was added to the soil in comparison with plots under Control treatment (Table 4). A

difference of 16.31 kg/ha (SE = 4.17) was observed for NU and 630 kg/ha (SE = 127.9)

forGY . However, the change in the ratio was minor, 3.09 (SE = 2.32). Similarly, the loss of

information is illustrated in the univariate mixture analysis of IEN which fails to identify soil

water status as a factor defining the mixture groups (Figure 3 c). Secondly, bivariate analyses

avoid dealing with the mixture and possible heavy-tailedness of the IEN distribution. The

distribution of the ratio may violate the assumptions of normality and homogeneity of error

variances leading to non-reliable inferences by linear mixed model. The departure from

normality of the ratio distribution complicates the interpretation of components as physical

groups in univariate mixture analyses with univariate Gaussian components. For instance, for

the same physical environment, if the ratio distribution is skewed, the EM algorithm may

detect more than one component in its attempt to approximate a non-Gaussian density.

There are some limitations that need to be considered when applying mixture models

to designed field trials. Mixture models assume that the observations are independent, but

possible correlations may arise as a result of the experimental design. Thus, in our opinion, the

application of bivariate mixture models in designed field trials should be used for exploratory

purposes only, and complementary to bivariate mixed models. Furthermore, for this case

study, utilising our approach most effectively would have required the recording of data on

other potential environmental factors affecting nitrogen utilisation such as temperature, the

presence of diseases or the indigenous levels of nutrients in the soil.

Consequently, designs should be adopted which avoid correlated observations and are

able to provide a more complete picture of the factors defining the mixture groups. Data

collection through surveys is a more appropriate sampling procedure for the application of

mixture models for clustering (e.g. Di Zio et al. 2005; Genge 2013). However, to the best of

our knowledge, there is a lack of research on field survey design for efficient clustering with

bivariate mixture models. Thus, how to best design a field survey for (GY , NU ) to apply

finite mixture models is an open research question.

In conclusion, bivariate mixture models of Gaussian distributions are a useful

exploratory tool for identifying potential environmental factors driving NU and GY and

effectively complement bivariate mixed models in the analysis of designed field trials.

Bivariate mixture models can also be used for analysing other similar bivariate traits in



agriculture or natural resource research but to fully exploit its potential they should be applied

on designed field surveys.

References

AKAIKE, H. (1973). Information theory and an extension of the maximum likelihood

principle. In B. N. Petrov & F. Csaki, editors, Second International Symposium on

Information Theory. Akademia Kiado, Budapest. pp. 267–281.

AKAIKE, H. (1974). A new look at the statistical model identification. IEEE Trans. Automat.

Control 19, 716–723.

BENAGLIA, T., CHAUVEAU, D., HUNTER, D.R. & YOUNG, D.S. (2009). mixtools: An R

package for analyzing finite mixture models. J. Stat. Softw. 32, 1–29.

BIERNACKI, C., CELEUX, G. & GOVAERT, G. (2003). Choosing starting values for the

EM algorithm for getting the highest likehood in multivariate Gaussian mixture models.

Comput. Statist. Data Anal. 41, 561–575.

BUTLER, D. G., CULLIS, B. R., GILMOUR, A. R. & GOGEL, B. J. (2007). ASReml-R

reference manual Queensland Deparment of Primary Industries and Fisheries, Australia

CASSMAN, K.G., DOBERMANN, A. & WALTERS, D.T. (2002). Agroecosystems, nitrogen-

use efficiency, and nitrogen management. Ambio 31, 132–140.

CHEN, J. & TAN, X. (2009) Inference for multivariate normal mixtures. J. Multivariate

Anal., 100, 1367–1383.

CHEW, V. (1966). Confidence, prediction, and tolerance regions for the multivariate normal

distribution. J. Amer. Statist. Assoc. 61, 605–617.

CRAMER, H. (1946). Mathematical Methods of Statistics. Princeton: Princeton University

Press.

DELACY, I.H., BASFORD, K.E., COOPER, M., BULL, J.K. & MCLAREN, C.G. (1996)

Analysis of multi-environment trials–and historical perspective In Cooper, M. &

Hammer, G. L. (ed.) Plant adaptation and crop improvement. CAB International, 39–

124. Wallingford.

DELOGU, G., CATTIVELLI, L., PECCHIONI, N., DE FALCIS, D., MAGGIORE, T. &

STANCA, A.M. (1998). Uptake and agronomic efficiency of nitrogen in winter barley

and winter wheat. Eur. J. Agron. 9, 11–20.

DEMPSTER, A.P., LAIRD, N.M. & RUBIN, D.B. (1977). Maximum likelihood from

incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B 39, 1–38.

DI ZIO, M., GUARNERA, U. & LUZI, O. (2005). Editing systematic unity measure errors

through mixture modelling. Surv. Methodol. 31, 53–63.



DOBERMANN, A., WITT, C., ABDULRACHMAN, S. et al. (2003). Estimating indigenous

nutrient supplies for site-specific nutrient management in irrigated rice. Agron. J. 95,

924–935.

EVERITT, B.S. (1996). An introduction to finite mixture distributions. Stat. Methods Med.

Res. 5 , 107–127.

FANG, Q., YU, Q., WANG, E. et al. (2006). Soil nitrate accumulation, leaching and crop

nitrogen use as influenced by fertilization and irrigation in an intensive wheat-maize

double cropping system in the North China Plain. Plant Soil 284, 335–350.

FIELLER, E.C. (1954). Some problems in interval estimation J. Roy. Statist. Soc. Ser. B 16,

175–185.

FIGUEIREDO, M.A.T. & JAIN, A.K. (2002). Unsupervised learning of finite mixture

models. IEEE Trans. Pattern Anal. Mach. Intell. 24, 381–396.

FONSECA, J.R.S. & CARDOSO, M.G.M.S. (2007). Mixture-model cluster analysis using

information theoretical criteria. Intell. Data Anal. 11, 155–173.

FRIENDLY, M. (2006). Data ellipses, HE plots and reduced-rank displays for multivariate

linear models: SAS software and examples. J. Stat. Softw. 17, 1–43.

FRUHWIRTH-SCHNATTER, S. (2006). Finite mixture and Markov switching models

Springer: New York.

GANESALINGAM, A., SMITH, A.B., BEECK, C.P., COWLING, W.A., THOMPSON, R. &

CULLIS, B.R. (2013). A bivariate mixed model approach for the analysis of plant

survival data. Euphytica 190, 371–383.

GENGE, E. (2013). A latent class analysis of the public attitude towards the euro adoption in

Poland. Adv. Data Anal. Classif. (in press).

HAUCK, R.D. (1984). Nitrogen in Crop Production. Madison: American Society of

Agronomy- Crop Science Society of America- Soil Science Society of America.

KARLIS, D. & XEKALAKI, E. (2003). Choosing initial values for the EM algorithm for

finite mixtures. Comput. Statist. Data Anal. 41, 577–590.

KIEFER J. & WOLFOWITZ J. (1956) Consistency of the maximum likelihood estimator in

the presence of infinitely many incidental parameters. Ann. Math. Statist., 27, 887–906.

MAITRA, R. (2009). Initializing partition-optimization algorithms. IEEE/ACM Trans.

Comput. Biol. Bioinf. 6, 144–157.

MARSAGLIA, G. (1965). Ratios of normal variables and ratios of sums of uniform variables.

J. Amer. Statist. Assoc. 60, 193–204.

MARSAGLIA, G. (2006). Ratios of normal variables. J. Stat. Softw. 16, 1–10.

MARSCHNER, H. & MARSCHNER, P. (2012). Marschner’s mineral nutrition of higher

plants. London : Elsevier Science.



MCLACHLAN, G.J. & BASFORD, K.E. (1988). Mixture models. Inference and applications

to clustering. New York: Dekker.

MCLACHLAN, G.J. & PEEL, D. (2000). Finite mixture models. New York: Wiley.

MCLACHLAN, G.J., PEEL, D., BASFORD, K.E. & ADAMS, P. (1999). The EMMIX

software for the fitting of mixtures of normal and t-components. J. Stat. Softw. 4. URL

http://www.stat.ucla.edu/journals/jss.

MELNYKOV, V. & MAITRA, R. (2010). Finite mixture models and model-based clustering.

Stat. Surv. 4, 80–116.

NAKLANG, K., HARNPICHITVITAYA, D., AMARANTE, S.T., WADE, L.J. & HAEFELE,

S.M. (2006). Internal efficiency, nutrient uptake, and the relation to field water resources

in rainfed lowland rice of northeast Thailand. Plant Soil 286, 193–208.

NG, S.K. (2013). Recent developments in expectation-maximization methods for analyzing

complex data. WIREs Comput Stat 5, 415–431.

NG, S.K., MCLACHLAN, G.J., WANG, K., JONES, L.B.T. & NG, S.W. (2006). A mixture

model with random-effects components for clustering correlated gene-expression

profiles. Bioinformatics 22, 1745–1752.

NTANOS, D.A. & KOUTROUBAS, S.D. (2002). Dry matter and N accumulation and

translocation for Indica and Japonica rice under Mediterranean conditions. Field Crop.

Res. 74, 93–101.

OMRAN, M.G.H., ENGELBRECHT, A.P. & SALMAN, A. (2007). An overview of clustering

methods. Intell. Data Anal. 11, 583–605.

PENG, S., GARCIA, F.V., LAZA, R.C., SANICO, A.L., VISPERAS, R.M. & CASSMAN,

K.G. (1996). Increased N-use efficiency using a chlorophyll meter on high-yielding

irrigated rice. Field Crop. Res. 47, 243–252.

R CORE TEAM (2012). R: A Language and Environment for Statistical Computing.

R Foundation for Statistical Computing, Vienna, Austria. URL http://www.

R-project.org/. ISBN 3-900051-07-0.

REDNER, R.A. & WALKER, H. F. (1984). Mixture densities, maximum likelihood and the

EM algorithm. SIAM Rev. 26, 195–239.

SCHWARZ, G. (1978). Estimating the dimension of a model. Ann. Statist. 6, 461–464.

SEIDEL, W., MOSLER, K. & ALKER, M. (2000). A cautionary note on likelihood ratio tests

in mixture models. Ann. Inst. Statist. Math. 52, 481–487.

TANGUILIG, V.C., YAMBAO, E.B., O’TOOLE, J.C. & DE DATTA, S.K. (1987). Water

stress effects on leaf elongation, leaf water potential, transpiration, and nutrient uptake

of rice, maize, and soybean. Plant Soil 103, 155–168.



VSN INTERNATIONAL (2012). Genstat for Windows 15th Edition. VSN International,

Hemel Hempstead UK. URL http://www.vsni.co.uk/.

WADE, L.J., AMARANTE, S.T., OLEA, A. et al. (1999). Nutrient requirements in rainfed

lowland rice. Field Crop. Res. 64, 91–107.

WITT, C., DOBERMANN, A., ABDULRACHMAN, S. et al. (1999). Internal nutrient

efficiencies of irrigated lowland rice in tropical and subtropical Asia. Field Crop. Res.

63, 113–138.

WU, C.F.J. (1983). On the convergence properties of the EM algorithm. Ann. Statist. 11,

95–103.

XU, G., FAN, X. & MILLER, A.J. (2012). Plant nitrogen assimilation and use efficiency.

Annu. Rev. Plant Biol. 63, 153–182.

XU, L., HANSON, T., BEDRICK, E. J. & RESTREPO, C. (2010). Hypothesis tests on mixture

model components with applications in ecology and agriculture J. Agric. Biol. Environ.

Stat. 15, 308–326.



TABLE 1Fertiliser treatments applied in the experiments in Naklang et al. (2006)

Treatment Description

Control No fertiliser appliedPK 0 N, 21.8 kg Phosphorus (P) ha−1, and 41.5 kg Potassium (K) ha−1

N 50 kg N ha−1

FYM Farmyard manure (cattle manure) at 10 t ha−1 (fresh weight)NPK 50 kg N ha−1, 21.8 kg P ha−1 and 41.5 kg K ha−1

CR NPK 50 kg N ha−1 (controlled-release), 21.8 kg P ha−1, and 41.5 kg K ha−1

FYM NPK Combined application of the treatments FYM and NPKALL NPK as in the NPK treatment + lime and trace elements

TABLE 2Number of sites selected from the experiment in (Naklang et al. 2006) over the three years with soils

reported to be dry, wet or having ponded water at three developmental stages of the plant

Soil status Plant developmental stages

Pre-flowering Flowering Post-flowering

Dry 0 1 6Wet 8 3 8Ponded water 10 14 4

TABLE 3Diagram of the strip plot design for two blocks at one site for the experiment in the case study (Naklang

et al. 2006)

NPK FYM PK ALL N CR NPK Control FYM NPK

1995

1996 Block 1

1997

ALL CR NPK Control PK NPK FYM FYM NPK N

1995

1996 Block 2

1997



TABLE 4Estimated means (mean standard error) of internal nitrogen use efficiency (IEN ), nitrogen uptake (NU )

and grain yield (GY ) for the fertiliser treatments described in Table 1 across sites, years and blocks

Treatment Means of IEN Means of NU Means of GY

Control 48.87 (2.39) 36.22 (7.34) 1677 (266.50)PK 48.69 (2.39) 35.53 (7.34) 1629 (266.50)N 42.71 (2.39) 52.94 (7.34) 2192 (266.50)FYM 46.29 (2.39) 49.66 (7.34) 2191 (266.50)NPK 43.16 (2.39) 54.51 (7.34) 2305 (266.50)CR NPK 45.78 (2.39) 52.53 (7.34) 2307 (266.50)FYM NPK 37.92 (2.39) 69.39 (7.34) 2552 (266.50)ALL 40.76 (2.39) 61.18 (7.34) 2445 (266.50)

TABLE 5Estimated variance components (standard error) of the linear mixed model of internal nitrogen use

efficiency

Random term Estimated variance component

site 5.52 (14.18)site.block 3.18 (1.96)site.year 34.92 (18.09)site.treatment 1.57 (4.58)site.treatment.year 30.93 (7.48)

TABLE 6Estimated variance covariance components (standard error) of the bivariate mixed model of nitrogen

uptake (NU ) and grain yield (GY )

Random term variances for NU variances for GY Correlation

site 245.44 (176.43) 318 818.03 (243873.43) 0.96 (0.05)site.block 3.95 (2.86) 1481.32 (1895.60) 0.99 (NA)site.year 74.95 (39.69) 173 236.37 (81415.70) 0.76 (0.14)site.treatment 33.47 (12.90) 25 672.31 (12386.48) 0.93 (0.12)site.treatment.year 31.96 (9.76) 37 150.76 (12200.00) 0.92 (0.15)



TABLE 7Settings and rationales for the mixture analyses

Setting Rationale

Maximum number of components fixed at 6 Larger numbers resulted in frequent cases ofspuriosities

Starting strategy i-ii. Number of random and k-means partitions set at 100

Ensure to have a good initial partition. However,similar mixture estimates were observed whenreducing this number

Starting strategy iii. Not used for IEN analysis May produce negative initial values of the meansof IEN . Biologically impossible

Starting strategy iv. Subsamples of 200 observa-tions

Subsample size sufficient to not produce degener-ate solutions

Starting strategy v. A short run indicated that thethreshold to stop the EM algorithm was c = 10−2.The number of short runs employed was three

When initiating the short run with a large numberof random starts, similar mixture estimates wereobtained with thresholds between [10−4, 10−1] orlarger number of short runs

Stopping criteria. The threshold to stop the EMalgorithm was c = 10−6. Standard setting

NA

Biological spuriosities. Components with negativemean for IEN , and components with negativejoint mean or correlation for (NU , GY )

Biologically impossible

TABLE 8Starting strategy and AIC and BIC values for the solution with the highest value of the log likelihoodfunction when fitting g = 1 . . . 6 groups for the analysis of IEN . The minimum values of the

information criteria are highlighted in bold.

Number of groups Starting strategy AIC BIC

6 K-means 3287.43 3356.605 K-means 3281.43 3338.394 subsample solution 3275.66 3320.423 K-means 3272.12 3304.672 short runs 3270.13 3290.471 random starts 3296.07 3304.20

TABLE 9Parameter estimates (bootstrapped standard error) of the mixture model selected in Table 8 for theanalysis of internal nitrogen use efficiency (πi is the estimate of the mixing proportion, µi the estimate

of the mean of the ratio and σi is the estimate of the standard deviation for the i-th group)

πi µi σi

First group 0.45 (0.14) 38.36 (1.12) 6.14 (1.02)

Second group 0.55 (0.14) 49.01 (3.26) 11.60 (1.25)



TABLE 10Starting strategy and AIC and BIC values of the solution with the highest value of the log likelihoodfunction when fitting g = 1 . . . 6 for the joint analysis of grain yield and nitrogen uptake. The minimum

values of the information criteria are highlighted in bold.

Number of groups Starting strategy AIC BIC

6 random starts 10335.46 10477.855 subsample solution 10331.07 10449.064 short runs 10336.34 10429.913 simulated means 10333.03 10402.192 K-means 10363.44 10408.191 K-means 10444.21 10464.55

TABLE 11Parameter estimates of the mixture model (bootstrapped standard error, below) selected by the BICcriterion (Table 10) for the joint analysis of grain yield and nitrogen uptake (πi is the estimate of themixing proportion, µi the estimate of the joint mean, ˆIENi is the ratio of the estimated joint mean, Σi

is the estimate of the covariance variance matrix and ρi is the estimate of the correlation for the i-thgroup)

πi µiˆIENi Σi ρi

First group 0.15(

20.421070.22

)52.41

(37.65 2607.11

2607.11 225208.08

)0.90

Second group 0.41(

42.431866.97

)44.00

(118.25 3917.713917.71 253491.45

)0.72

Third group 0.44(

70.662814.10

)39.82

(322.31 5272.205272.20 308423.50

)0.52

First group 0.03

(2.09

157.00

)3.53

(15.30 1030.00

1030.00 75700.00

)0.05

Second group 0.06

(2.27

114.00

)1.67

(24.00 1021.00

1021.00 69200.00

)0.06

Third group 0.05(

2.68100.00

)1.27

(42.80 1540.00

1540.00 61300.00

)0.09



Figure 1. Typical trend of grain yield on nitrogen uptake from the experiments considered in Section 3.The close and open circles represent plots with no fertiliser added and plots with a source of fertiliser,respectively.

Figure 2. The Q-Q plot for the residuals of internal nitrogen use efficiency versus the expected quantilesof a normal distribution (left) and the plot of residuals of internal nitrogen use efficiency versusestimated means (right).



Figure 3. Classification of the internal nitrogen use efficiency observations presented against theexperimental sites for plots receiving a) FYM NPK , b) N and Control and c) all plots together with thewater status post-flowering. The close and open symbols refer to observations classified in the first andsecond group, respectively.



Figure 4. Mixture groups for grain yield and nitrogen uptake data considered in Section 3. The blackdots represent the estimated joint means and the ellipses the 90% prediction regions of the groups (Eq.4.4 in Chew (1966)). The intervals are the confidence sets of the ratios of expectations of the groupscalculated according to Fieller (1954).



Figure 5. Mixture groups for the grain yield and nitrogen uptake data considered in Section 3 togetherwith a) the water status of the plots post-flowering and b) N fertiliser status. The ellipses are the 90%prediction regions of the groups (Eq. 4.4 in Chew (1966)).


Chapter 5

Conclusions and future lines of research

In this thesis I have proposed to perform bivariate mixed and mixture analyses on

(NU,GY ) instead of their univariate counterparts on IEN . Bivariate analyses main-

tain the complete information on NU and GY , including their correlation, and avoid

dealing with the distributional properties of the ratio.

The IEN data may violate the assumption of normality and homogeneity of error

variances (Marsaglia, 1965, 2006) required in linear mixed models. As for the appli-

cation of mixtures of univariate Gaussian components, the potential abnormality of

the ratio distribution (Marsaglia, 1965, 2006) also complicates its analysis due to the

following reasons. Firstly, the ratio may present heavy-tailedness; thus, its distribution

may not be satisfactorily approximated by a mixture of univariate Gaussian compo-

nents. Atypical observations of the ratio, produced when the denominator takes values

close to zero, can affect the estimates of the univariate Gaussian components (Peel

& McLachlan, 2000). Secondly, if in a physical groups the ratio distribution presents

skewness or bimodality, more than one Gaussian component will be required to approx-

imate a non-symmetric shape. This fact breaks the one-to-one correspondence between

mixture components and physical groups and requires dealing with mixtures of mix-

tures. Mixtures of mixtures arise when each component of the mixture is a mixture

itself. That is:

f(yj|ψ) =

g∑

i=1

πifi(yj|θi) =

g∑

i=1

πi

(ri∑

k=1

αikgik(yj|ξik))

(5.1)

where the gik can be assumed to be univariate normal. Mixtures of mixtures have

111

problems of non-identifiability (Di Zio et al., 2007). One of them is to choose which

components of the second level of the mixture (gik) are used to model the non-Gaussian

pdfs of the first level of the mixture (fi). Constraints on the parameters are needed to

impose identifiability (see Willse & Boik, 1999)

Bivariate mixed models on (NU,GY ) are appropriate for the analysis of designed

field trials, in which the researcher aims to test the performance of different experi-

mental treatments. In our literature study of recent publications in nitrogen efficiency

in cereals (Section 1.4), the experimental treatments mostly corresponded to different

levels of nutrients added to the soil. However, the level of available nutrients (from

indigenous as well as added sources) may substantially differ from the level of applied

nutrients. For instance, strong precipitations after an application of nitrates can result

in considerable losses of these nutrients through leaching or surface runoff (Raun &

Johnson, 1999). Thus, in order to apply mixed models, special care of the design of

such trials and a high control on the agronomic practices must be taken to ensure that

treatment applications are as uniform as possible.

There are other types of studies where fertilisers are applied to create different fer-

tility conditions rather than to test for their performance. This is the case of Naklang

et al. (2006) whose research objectives were not formulated in terms of treatment-based

analyses. Naklang et al. (2006) aimed to improve the understanding of how different

environmental conditions affect IEN in rice. With this objective in mind several ex-

periments across different sites and years were carried out to widen the range of soil

characteristics and climatic and fertility conditions. In this thesis I have argued that

different experimental and field conditions may lead to different patterns of the conver-

sion of NU into GY , and thus to the presence of clusters in the field data. Such clusters

can be identified by bivariate mixture models. The inspection of the mixture groups

may reveal the environmental and fertility conditions defining them, assuming that the

appropriate information has been recorded. Thus, mixture models are recommended

for studies with similar objectives to the ones in Naklang et al. (2006).

112

The current methodology for mixture models assumes the data to be independent.

However, NU and GY field data are commonly collected from designed field trials,

with correlation between observations induced by the design. In order to reduce the

correlation, one can apply mixture models on the observations adjusted for the spatial

trend of the field. However, in our case study the information of the spatial layout of

the designed field trial, required for modelling the spatial trends, was missing. Despite

the potential violation of the independence assumption, mixture models were useful in

the analysis of the case-study for the identification of different groups and allowed us

to highlight the effect of N availability and soil water conditions at post-flowering on

NU and GY .

The results obtained by applying mixture models can add extra information to

the ones obtained by mixed models in designed field trials. For instance, if there are

non-controlled factors which overcome, or interact with, designed treatments, the in-

spection of the mixture groups may help to reveal them. On the other hand, if the

fertiliser treatments are the main cause of having different patterns of (NU , GY ), one

will expect these groups to be revealed by the mixture approach as well as to find

significant differences when treatment contrasts are performed in the mixed models.

In conclusion, finite mixture models of bivariate Gaussian distributions are useful

models for identifying potential experimental and environmental conditions segregat-

ing the NU and GY data into clusters. In designed field trials and due to the potential

violation of the independence assumption, we recommend using mixture models for

exploratory purposes only and as a complement to mixed models. In order to fully

exploit the potential of the technique, it is important to implement sampling proce-

dures which avoid correlated observations, e.g. data collection from simple random

field surveys.

A number of interesting questions have arisen during the development of this re-

113

search. One immediate future line of research is to carry out simulation studies to assess

the coverage of the Fieller’s confidence intervals of E(GY )/E(NU) (Fieller, 1954) es-

timated jointly for each mixture group (Figure 4, Chapter 4). Samples from mixtures

with known parameters will be generated with the R package EMMIX (McLachlan

et al., 1999). Then, the Fieller’s confidence intervals will be calculated and the per-

centage of times that these intervals contain the true ratio of means for each group will

provide a measure of the coverage of the intervals. In particular, it will be interesting

to investigate how the coverage is affected by changing the area of intersection between

the clusters.

A research gap identified during the development of this research is how to design

field surveys with the aim of applying mixture models. Although mixture models have

been widely applied in surveys (e.g. Di Zio et al., 2005; Genge, 2013), to the best of our

knowledge and in agreement with Cressie (2014, pers. comm., 27 February), there are

insufficient details in the literature on how to carry out surveys with post-stratification

objectives. The simplest method would be to collect a large sample from the field in

a simple randomised manner. However, it is not clear how large the sample size needs

to be or how to collect the data to not introduce bias in the mixing proportions.

Not only the data collection protocol of the sample but also the way NU is mea-

sured by biologists has been an issue identified in this research. Nitrogen uptake is a

derived variable calculated from GY (Eq. 1.1). This induces a spurious correlation

between both variables. This is clearly shown in Table 6 (Chapter 4), where the esti-

mated correlation goes near to the boundary of the parameter space.

Finally, it would be interesting to further explore the topic of mixture of mixtures

for modelling IEN . In particular, the identification of valid constraints on the mixture

parameters to impose identifiability. Willse & Boik (1999) proposed to use constraints

of the type µis = µi + θs. The µis is the mean of the s-th component of the second

level of the mixture in the i-th component of the first level mixture. The µi is the

114

mean of the first component of the second level of the mixture in the i-th component

of the first level of the mixture. The θs is the deviation between the means of the

components of the second level of the mixture. Even with these constraints, we may

expect that different starting values of θ(0)s for initiating the EM algorithm will result

in different components of the second level of the mixture modelling the non-Gaussian

pdfs of the first level of the mixture. Alternatively, I suggest the use of mixtures of

normal-skewed or t-skewed distributions (Fruhwirth-Schnatter & Pyne, 2010) to model

the distribution of the ratio. This is another topic that would be interesting to study

in the future.

115

Appendix A

List of the studies in the review

C. Adhikari, K.F. Bronson, G.M. Panuallah, A.P. Regmi, P.K. Saha, A. Dobermann,

D. C. Olk, P.R. Hobbs, and E. Pasuquin. On-farm soil N supply and N nutrition in

the rice-wheat system of Nepal and Bangladesh. Field Crops Research, 64:273–286,

1999.

R. Albrizio, M. Todorovic, T. Matic, and A. M. Stellacci. Comparing the interactive

effects of water and nitrogen on durum wheat and barley grown in a Mediterranean

environment. Field Crops Research, 115:179–190, 2010.

W.K. Anderson and F.C. Hoyle. Nitrogen efficiency of wheat cultivars in a Mediter-

ranean environment. Australian Journal of Experimental Agricultures, 39:957–965,

1999.

M.S. Aulakh, T.S. Khera, J.W. Doran, Kuldip-Singh, and Bijay-Singh. Yields and

nitrogen dynamics in a rice-wheat system using green manure and inorganic fertilizer.

Soil Science Society of America Journal, 64:1867–1876, 2000.

P. Belder, B.A.M. Bouman, R. Cabangon, L. Guoan, E.J.P. Quilang, L. Yuanhua,

J.H.J. Spiertz, and T.P. Tuong. Effect of water-saving irrigation on rice yield and

water use in typical lowland conditions in Asia. Agricultural Water Management,

65:193–210, 2004.

P. Belder, B.A.M. Bouman, J.H.J. Spiertz, S. Peng, A.R. Castaneda, and R.M. Vis-

peras. Crop performance, nitrogen and water use in flooded and aerobic rice. Plant

and Soil, 273:167–182, 2005.

117

Bijay-Singh., K.F. Bronson, Yavinder-Singh, T.S. Khera, and E. Pasuquin. Nitrogen-15

balance as affected by rice straw management in a rice-wheat rotation in northwest

India. Nutrient Cycling in Agroecosystems, 59:227–237, 2001.

Bijay-Singh, Varinderpal-Singh, Yadvinder-Singh, H.S. Thind, A. Kumar, R.K. Gupta,

A. Kaul, and M. Vashistha. Fixed-time adjustable dose site-specific fertilizer nitrogen

management in transplanted irrigated rice (Oryza sativa L.) in South Asia. Field

Crops Research, 126:63–69, 2012.

A.K. Borrell, A.L. Garside, S. Fukai, and D.J. Reid. Season, nitrogen rate, and plant

type affect nitrogen uptake and nitrogen use efficiency in rice. Australian Journal of

Agricultural Research, 49:829–843, 1998.

R.J. Cabangon, T. P. Tuong, E. G. Castillo, L. X. Bao, G. Lu, G. Wang, Y. Cui, B.A.M.

Bouman, Y. Li, C. Chen, and J. Wang. Effect of irrigation method and N-fertilizer

management on rice yield, water productivity and nutrient-use efficiencies in typical

lowland rice conditions in China. Paddy and Water Environment, 2:195–206, 2004.

K.G. Cassman, M.J. Kropff, J. Gaunt, and S. Peng. Nitrogen use efficiency of rice

reconsidered: What are the key constraints? Plant and Soil, 155-156:359–362, 1993.

K.G. Cassman, A. Dobermann, P.C. Sta Cruz, G.C. Gines, M.I. Samson, J.P. Descal-

sota, J.M. Alcantara, M.A. Dizon, and D.C. Olk. Soil organic matter and the in-

digenous nitrogen supply of intensive irrigated rice systems in the tropics. Plant and

Soil, 182:267–278, 1996a.

K.G. Cassman, G.C. Gines, M.A. Dizon, M.I. Samson, and J.M. Alcantara. Nitrogen-

use efficiency in tropical lowland rice systems: contributions from indigenous and

applied nitrogen. Field Crops Research, 47:1–12, 1996b.

D. Chakraborty, R.N. Garg, R.K. Tomar, R. Singh, S.K. Sharma, R.K. Singh, S.M.

Trivedi, R.B. Mittal, P.K. Sharma, and K.H. Kamble. Synthetic and organic

mulching and nitrogen effect on winter wheat (Triticum aestivum l.) in a semi-arid

environment. Agricultural Water Management, 97:738–748, 2010.

118

X. Chen, J. Zhou, X. Wang, A.M. Blackmer, and F. Zhang. Optimal rates of nitrogen

fertilization for a winter wheat-corn cropping system in northern China. Communi-

cations in Soil Science and Plant Analysis, 35:583–597, 2004.

L. Chuan, P. He, J. Jin, S. Li, C. Grant, X. Xu, S. Qiu, S. Zhao, and W. Zhou.

Estimating nutrient uptake requirements for wheat in China. Field Crops Research,

146:96–104, 2013.

J.M. Clarke, C.A. Campbell, H.W. Cutforth, R.M. DePauw, and G.E. Winkleman.

Nitrogen and phosphorus uptake, translocation, and utilization efficiency of wheat

in relation to environment and cultivar yield and protein levels. Canadian Journal

of Plant Science, 70:965–977, 1990.

M.K. Conyers, C. Tang, G.J. Poile, D.L. Liu, D. Chen, and Z. Nuruzzaman. A combina-

tion of biological activity and the nitrate form of nitrogen can be used to ameliorate

subsurface soil acidity under dryland wheat farming. Plant and Soil, 348:155–166,

2011.

C. M. Cossani, C. Thabet, H. J. Mellouli, and G.A. Slafer. Improving wheat yields

through N fertilization in Mediterranean Tunisia. Experimental Agriculture, 47:459–

475, 2011.

C.M. Cossani, G.A. Slafer, and R. Savin. Nitrogen and water use efficiencies of wheat

and barley under a Mediterranean environment in Catalonia. Field Crops Research,

128:109–118, 2012.

Z. Cui, F. Zhang, X. Chen, Y. Miao, J. Li, L. Shi, J. Xu, Y. Ye, C. Liu, Z. Yang,

Q. Zhang, S. Huang, and D. Bao. On-farm evaluation of an in-season nitrogen

management strategy based on soil N min test. Field Crops Research, 105:48–55,

2008.

Z. Cui, F. Zhang, X. Chen, F. Li, and Y. Tong. Using in-season nitrogen manage-

ment and wheat cultivars to improve nitrogen use efficiency. Soil Science Society of

America Journal, 75:976–983, 2011.

119

X.Q. Dai, H.Y. Zhang, J.H.J. Spiertz, J. Yu, G.H. Xie, and B.A.M. Bouman. Crop

response of aerobic rice and winter wheat to nitrogen, phosphorus and potassium in

a double cropping system. Nutrient Cycling in Agroecosystems, 86:301–315, 2010.

D.K. Das, D. Maiti, and H. Pathak. Site-specific nutrient management in rice in

Eastern India using a modeling approach. Nutrient Cycling in Agroecosystems, 83:

85–94, 2009.

S.K. De Datta, R.J. Buresh, M.I. Samson, and Kai-Rong Wang. Nitrogen use efficiency

and nitrogen-15 balances in broadcast-seeded flooded and transplanted rice. Soil

Science Society of America Journal, 52:849–855, 1988.

G. Delogu, L. Cattivelli, N. Pecchioni, D. De Falcis, T. Maggiore, and A.M. Stanca.

Uptake and agronomic efficiency of nitrogen in winter barley and winter wheat.

European Journal of Agronomy, 9:11–20, 1998.

A. Dobermann, C. Witt, D. Dawe, S. Abdulrachman, H.C. Gines, R. Nagarajan, S. Sa-

tawathananont, T.T. Son, P.S. Tan, G.H. Wang, N.V. Chien, V.T.K. Thoa, C.V.

Phung, P. Stalin, P. Muthukrishnan, V. Ravi, M. Babu, S. Chatuporn, J. Sook-

thongsa, Q. Sun, R. Fu, G.C. Simbahan, and M.A.A. Adviento. Site-specific nutrient

management for intensive rice cropping systems in Asia. Field Crops Research, 74:

37–66, 2002.

A. Dobermann, C. Witt, S. Abdulrachman, H.C. Gines, R. Nagarajan, T.T. Son, P.S.

Tan, G.H. Wang, N.V. Chien, V.T.K. Thoa, C.V. Phung, P. Stalin, P. Muthukrish-

nan, V. Ravi, M. Babu, G.C. Simbahan, and M.A.A. Adviento. Soil fertility and

indigenous nutrient supply in irrigated rice domains of Asia. Agronomy Journal, 95:

913–923, 2003.

A.D. Doyle and I.C.R. Holford. The uptake of nitrogen by wheat, its agronomic ef-

ficiency and their relationship to soil and fertilizer nitrogen. Australian Journal

Agricultural of Agricultural Research, 44:1245–1258, 1993.

A.D. Doyle and C.C. Leckie. Recovery of fertiliser nitrogen in wheat grain and its im-

120

plications for economic fertiliser use. Australian Journal of Experimental Agriculture,

32:383–387, 1992.

Y. Duan, M. Xu, X. He, S. Li, and X. Sun. Long-term pig manure application reduces

the requirement of chemical phosphorus and potassium in two rice-wheat sites in

subtropical China. Soil Use and Management, 27:427–436, 2011.

Q. Fang, Q. Yu, E. Wang, Y. Chen, G. Zhang, J. Wang, and L. Li. Soil nitrate accumu-

lation, leaching and crop nitrogen use as influenced by fertilization and irrigation in

an intensive wheat–maize double cropping system in the North China Plain. Plant

and Soil, pages 335–350, 2006.

R.A. Fischer. Irrigated spring wheat and timing and amount of nitrogen fertilizer. II.

physiology of grain yield response. Field Crops Research, 33:57–80, 1993.

R.A. Fischer, G.N. Howe, and Z. Ibrahim. Irrigated spring wheat and timing and

amount of nitrogen fertilizer. I. grain yield and protein content. Field Crops Research,

33:37–56, 1993.

L.E. Gauer, C.A. Grant, D.T. Gehl, and L.D. Bailey. Effects of nitrogen fertilization

on grain protein content, nitrogen uptake, and nitrogen use efficiency of six spring

wheat (Triticum aestivum l.) cultivars, in relation to estimated moisture supply.

Canadian Journal of Plant Science, 72:235–241, 1992.

B.B. Ghaley, H. Hgh-Jensen, and J.L. Christiansen. Recovery of nitrogen fertilizer by

traditional and improved rice cultivars in the Bhutan Highlands. Plant and Soil,

332:233–246, 2010.

D. Giambalvo, P. Ruisi, G. Di Miceli, A. S. Frenda, and G. Amato. Nitrogen use

efficiency and nitrogen fertilizer recovery of durum wheat genotypes as affected by

interspecific competition. Agronomy Journal, 102:707–715, 2010.

G. Guarda, S. Padovan, and G. Delogu. Grain yield, nitrogen-use efficiency and baking

quality of old and modern Italian bread-wheat cultivars grown at different nitrogen

levels. European Journal of Agronomy, 21:181–192, 2004.

121

S.M. Haefele, M.C.S. Wopereis, M.K. Ndiaye, S.E. Barro, and M. Ould Isselmou. In-

ternal nutrient efficiencies, fertilizer recovery rates and indigenous nutrient supply

of irrigated lowland rice in Sahelian West Africa. Field Crops Research, 80:19–32,

2003.

S.M. Haefele, S.M.A. Jabbar, J.D.L.C. Siopongco, A. Tirol-Padre, S.T. Amarante, P.C.

Sta Cruz, and W.C. Cosico. Nitrogen use efficiency in selected rice ( Oryza sativa L.)

genotypes under different water regimes and nitrogen levels. Field Crops Research,

107:137–146, 2008.

T. Horie, M. Ohnishi, J.F. Angus, L.G. Lewin, T. Tsukaguchi, and T. Matano. Physi-

ological characteristics of high-yielding rice inferred from cross-location experiments.

Field Crops Research, 52:55–67, 1997.

M.F. Hossain, S.K. White, S.F. Elahi, N. Sultana, M.H.K. Choudhury, Q.K. Alam, J.A.

Rother, and J.L. Gaunt. The efficiency of nitrogen fertiliser for rice in Bangladeshi

farmers fields. Field Crops Research, 93:94–107, 2005.

J. Huang, F. He, K. Cui, R. J. Buresh, B. Xu, W. Gong, and S. Peng. Determination

of optimal nitrogen rate for rice varieties using a chlorophyll meter. Field Crops

Research, 105:70–80, 2008.

L. Jiang, D. Dong, X. Gan, and S. Wei. Photosynthetic efficiency and nitrogen dis-

tribution under different nitrogen management and relationship with physiological

N-use efficiency in three rice genotypes. Plant and Soil, 271:321–328, 2005.

Q. Jing, B.A.M. Bouman, H. Hengsdijk, H. Van Keulen, and W. Cao. Exploring

options to combine high yields with high nitrogen use efficiencies in irrigated rice in

China. European Journal of Agronomy, 26:166–177, 2007.

Q. Jing, B. Bouman, H. van Keulen, H. Hengsdijk, W. Cao, and T. Dai. Disentangling

the effect of environmental factors on yield and nitrogen uptake of irrigated rice in

Asia. Agricultural Systems, 98:177–188, 2008.

122

Q. Jing, H. Van Keulen, H. Hengsdijk, W. Cao, P.S. Bindraban, T. Dai, and D. Jiang.

Quantifying N response and N use efficiency in rice–wheat (RW) cropping systems

under different water management. The Journal of Agricultural Science, 147:303–

312, 2009.

N. Kalra, D. Chakraborty, P. Ramesh Kumar, M. Jolly, and P.K. Sharma. An approach

to bridging yield gaps, combining response to water and other resource inputs for

wheat in northern India, using research trials and farmers’ fields data. Agricultural

Water Management, 93:54–64, 2007.

C.S. Khind and F.N. Ponnamperuma. Effects of water regime on growth, yield, and

nitrogen uptake of rice. Plant and Soil, 59:287–298, 1981.

H.S. Khurana, S.B. Phillips, Bijay-Singh, M.M. Alley, A. Dobermann, A.S. Sidhu,

Yadvinder-Singh, and S. Peng. Agronomic and economic evaluation of site-specific

nutrient management for irrigated wheat in northwest India. Nutrient Cycling in

Agroecosystems, 82:15–31, 2008.

D. Kumar, C. Devakumar, R. Kumar, A. Das, P. Panneerselvam, and Y.S. Shivay.

Effect of neem-oil coated prilled urea with varying thickness of neem-oil coating and

nitrogen rates on productivity and nitrogen-use efficiency of lowland irrigated rice

under Indo-Gangetic Plains. Journal of Plant Nutrition, 33:1939–1959, 2010.

K. Kumar and K.M. Goh. Management practices of antecedent leguminous and non-

leguminous crop residues in relation to winter wheat yields, nitrogen uptake, soil

nitrogen mineralization and simple nitrogen balance. European Journal of Agronomy,

16:295–308, 2002.

J.K. Ladha, D. Dawe, T.S. Ventura, U. Singh, W. Ventura, and I. Watanabe. Long-

term effects of urea and green manure on rice yields and nitrogen balance. Soil

Science Society of America Journal, 64:1993–2001, 2000.

J. Le Gouis, D. Beghin, E. Heumez, and P. Pluchard. Genetic differences for nitrogen

uptake and nitrogen utilisation efficiencies in winter wheat. European Journal of

Agronomy, 12:163–173, 2000.

123

J. Liu, H. Liu, S. Huang, X. Yang, B. Wang, X. Li, and Y. Ma. Nitrogen efficiency

in long-term wheat-maize cropping systems under diverse field sites in China. Field

Crops Research, 118:145–151, 2010.

M. Liu, Z. Yu, Y. Liu, and N.T. Konijn. Fertilizer requirements for wheat and maize

in China: The QUEFTS approach. Nutrient Cycling in Agroecosystems, 74:245–258,

2006.

X. Liu, P. He, J. Jin, W. Zhou, G. Sulewski, and S. Phillips. Yield gaps, indigenous

nutrient supply, and nutrient use efficiency of wheat in China. Agronomy Journal,

103:1452–1463, 2011.

L. Lopez-Bellido, R. J. Lopez-Bellido, and F.J. Lopez-Bellido. Fertilizer nitrogen ef-

ficiency in durum wheat under rainfed mediterranean conditions: Effect of split

application. Agronomy journal, 98:55–62, 2006.

A. J. Macdonald and R.J. Gutteridge. Effects of take-all (Gaeumannomyces graminis

var. tritici) on crop N uptake and residual mineral N in soil at harvest of winter

wheat. Plant and Soil, 350:253–260, 2012.

D. Maiti, D.K. Das, and H. Pathak. Fertilizer requirement for irrigated wheat in

easter India using the QUEFTS simulation model. TheScientificWorldJOURNAL,

6:231–245, 2006.

A.M. McGuire, D.C. Bryant, and R.F. Denison. Wheat yields, nitrogen uptake, and

soil moisture following winter legume cover crop vs. fallow. Agronomy Journal, 90:

404–410, 1998.

K. Naklang, D. Harnpichitvitaya, S.T. Amarante, L.J. Wade, and S.M. Haefele. In-

ternal efficiency, nutrient uptake, and the relation to field water resources in rainfed

lowland rice of northeast Thailand. Plant and Soil, 286:193–208, 2006.

C. Noulas, I. Alexiou, J. M. Herrera, and P. Stamp. Course of dry matter and nitrogen

accumulation of spring wheat genotypes known to vary in parameters of nitrogen

use efficiency. Journal of Plant Nutrition, 36:1201–1218, 2013.

124

S.E. Ockerby, S.W. Adkins, and A.L. Garside. The uptake and use of nitrogen by

paddy rice in fallow, cereal, and legume cropping systems. Australian Journal of

Agricultural Research, 50:945–952, 1999.

D.C. Olk, K.G. Cassman, G. Simbahan, P.C. Sta. Cruz, S. Abdulrachman, R. Na-

garajan, P.S. Tan, and S. Satawathananont. Interpreting fertilizer-use efficiency in

relation to soil nutrient-supplying capacity, factor productivity, and agronomic effi-

ciency. Nutrient Cycling in Agroecosystems, 53:35–41, 1999.

R. Ortiz-Monasterio, K.D. Sayre, S. Rajaram, and M. McMahon. Genetic progress in

wheat yield and nitrogen use efficiency under four nitrogen rates. Crop Science, 37:

898–904, 1997.

H. Pathak, P.K. Aggarwal, R. Roetter, N. Kalra, S.K. Bandyopadhaya, S. Prasad,

and H. Van Keulen. Modelling the quantitative evaluation of soil nutrient supply,

nutrient use efficiency, and fertilizer requirements of wheat in India. Nutrient Cycling

in Agroecosystems, 65:105–113, 2003.

S.K. Patil, U. Singh, V.P. Singh, V.N. Mishra, R.O. Das, and J. Henao. Nitrogen

dynamics and crop growth on an Alfisol and a Vertisol under a direct-seeded rainfed

lowland rice-based system. Field Crops Research, 70:185–199, 2001.

S. Peng, F.V. Garcia, R.C. Laza, A.L. Sanico, R.M. Visperas, and K.G. Cassman.

Increased N-use efficiency using a chlorophyll meter on high-yielding irrigated rice.


S. Peng, R. J. Buresh, J. Huang, J. Yang, Y. Zou, X. Zhong, G. Wang, and F. Zhang.

Strategies for overcoming low agronomic nitrogen use efficiency in irrigated rice sys-

tems in China. Field Crops Research, 96:37–47, 2006.

V. Pooniya and Y. S. Shivay. Enrichment of basmati rice grain and straw with zinc and

nitrogen through ferti-fortification and summer green manuring under indo-gangetic

plains of India. Journal of Plant Nutrition, 36:91–117, 2013.

125

V. Pooniya, Y. S. Shivay, A. Rana, L. Nain, and R. Prasanna. Enhancing soil nutrient

dynamics and productivity of Basmati rice through residue incorporation and zinc

fertilization. European Journal of Agronomy, 41:28–37, 2012.

J. Qiao, L. Yang, T. Yan, F. Xue, and D. Zhao. Nitrogen fertilizer reduction in rice

production for two consecutive years in the Taihu Lake area. Agriculture, Ecosystems

& Environment, 146:103–112, 2012.

J. Qin, S.M. Impa, Q. Tang, S. Yang, J. Yang, Y. Tao, and K.S.V. Jagadish. Integrated

nutrient, water and other agronomic options to enhance rice grain yield and N use

efficiency in double-season rice crop. Field Crops Research, 148:15–23, 2013.

S. Qiu, X. Ju, X. Lu, L. Li, J. Ingwersen, T. Streck, P. Christie, and F. Zhang. Improved

nitrogen management for an intensive winter wheat/summer maize double-cropping

system. Soil Science Society of America Journal, 76:286–297, 2012.

V.O. Sadras and C. Lawson. Nitrogen and water-use efficiency of australian wheat

varieties released between 1958 and 2007. European Journal of Agronomy, 46:34–41,

2013.

R. Setia, K.N. Sharma, P. Marschner, and H. Singh. Changes in nitrogen, phosphorus,

and potassium in a long-term continuous maize-wheat cropping system in India.

Communications in Soil Science and Plant Analysis, 40:3348–3366, 2009.

A.R. Sharma and U.K. Behera. Nitrogen contribution through Sesbania green manure

and dual-purpose legumes in maize–wheat cropping system: agronomic and economic

considerations. Plant and Soil, 325:289–304, 2009.

Z. Shi, Q. Jing, J. Cai, D. Jiang, W. Cao, and T. Dai. The fates of 15 n fertilizer

in relation to root distributions of winter wheat under different N splits. European

Journal of Agronomy, 40:86–93, 2012a.

Z. Shi, D. Li, Q. Jing, J. Cai, D. Jiang, W. Cao, and T. Dai. Effects of nitrogen

applications on soil nitrogen balance and nitrogen utilization of winter wheat in a

rice-wheat rotation. Field Crops Research, 127:241–247, 2012b.

126

U. Singh, J.K. Ladha, E.G. Castillo, G. Punzalan, A. Tirol-Padre, and M. Duqueza.

Genotypic variation in nitrogen use efficiency in medium-and long-duration rice.


V.K. Singh and B.S. Dwivedi. Yield and nitrogen use efficiency in wheat, and soil

fertility status as influenced by substitution of rice with pigeon pea in a rice–wheat

cropping system. Australian Journal of Experimental Agriculture, 46:1185–1194,

2006.

E.M.A. Smaling and B.H. Janssen. Calibration of QUEFTS, a model predicting nu-

trient uptake and yields from chemical soil fertility indices. Geoderma, 59:21–44,

1993.

P. Suriyakup, A. Polthanee, K. Pannangpetch, R. Katawatin, J.C. Mouret, and

C. Clermont-Dauphin. Introducing mungbean as a preceding crop to enhance ni-

trogen uptake and yield of rainfed rice in the north-east of Thailand. Australian

Journal of Agricultural Research, 58:1059–1067, 2007.

S. Takahashi, M. R. Anwar, and S. G de Vera. Effects of compost and nitrogen fertilizer

on wheat nitrogen use in Japanese soils. Agronomy Journal, 99:1151–1157, 2007.

C. Tetard-Jones, P. N. Shotton, L. Rempelos, J. Cooper, M. Eyre, C. H. Orr, C. Leifert,

and A.M.R. Gatehouse. Quantitative proteomics to study the response of wheat to

contrasting fertilisation regimes. Molecular Breeding, 31:379–393, 2013.

J. Timsina, U. Singh, M. Badaruddin, C. Meisner, and M.R. Amin. Cultivar, nitrogen,

and water effects on productivity, and nitrogen-use efficiency and balance for rice–

wheat sequences of Bangladesh. Field Crops Research, 72:143–161, 2001.

G. Wang, A. Dobermann, C. Witt, Q. Sun, and R. Fu. Performance of site-specific

nutrient management for irrigated rice in southeast China. Agronomy Journal, 93:

869–878, 2001.

G. Wang, Q.C. Zhang, C. Witt, and R.J. Buresh. Opportunities for yield increases and

127

environmental benefits through site-specific nutrient management in rice systems of

Zhejiang province, China. Agricultural Systems, 94:801–806, 2007.

Y. Wang, E. Wang, D. Wang, S. Huang, Y. Ma, C. J. Smith, and L. Wang. Crop

productivity and nutrient use efficiency as affected by long-term fertilisation in North

China Plain. Nutrient Cycling in Agroecosystems, 86:105–119, 2010.

D. Wei, K. Cui, J. Pan, G. Ye, J. Xiang, L. Nie, and J. Huang. Genetic dissection

of grain nitrogen use efficiency and grain yield and their relationship in rice. Field

Crops Research, 124:340–346, 2011.

C. Witt, A Dobermann, S. Abdulrachman, H.C. Gines, W. Guanghuo, R. Nagarajan,

S. Satawatananont, T. Thuc Son, P. Sy Tan, L. Van Tiem, and D. C. Olk. Internal

nutrient efficiencies of irrigated lowland rice in tropical and subtropical Asia. Field

Crops Research, 63:113–138, 1999.

C. Witt, K.G. Cassman, D.C. Olk, U. Biker, S.P. Liboon, M.I. Samson, and J.C.G.

Ottow. Crop rotation and residue management effects on carbon sequestration,

nitrogen cycling and productivity of irrigated rice systems. Plant and Soil, 225:

263–278, 2000.

Y. Xu, L. Nie, R.J. Buresh, J. Huang, K. Cui, B. Xu, W. Gong, and S. Peng. Agro-

nomic performance of late-season rice under different tillage, straw, and nitrogen

management. Field Crops Research, 115:79–84, 2010.

C.Y. Xue, X.G. Yang, B.A.M. Bouman, W. Deng, Q.P. Zhang, J. Yang, W.X. Yan,

T.Y. Zhang, A.J. Rouzi, H.Q. Wang, and P. Wang. Effects of irrigation and nitrogen

on the performance of aerobic rice in northern China. Journal of Integrative Plant

Biology, 50:1589–1600, 2008.

Y. Yang, M. Zhang, L. Zheng, D. Cheng, M. Liu, Y. Geng, and J. Chen. Controlled-

release urea for rice production and its environmental implications. Journal of Plant

Nutrition, 36:781–794, 2013.

128

Yavinder-Shing, J.K. Ladha, C.S. Khind, R.K. Gupta, O.P. Meelu, and E. Pasuquin.

Long-term effects of organic inputs on yield and soil fertility in the rice–wheat rota-

tion. Soil Science Society of America Journal, 68:845–853, 2004.

Y. Ye, X. Liang, Y. Chen, J. Liu, J. Gu, R. Guo, and L. Li. Alternate wetting and

drying irrigation and controlled-release nitrogen fertilizer in late-season rice. Effects

on dry matter accumulation, yield, water and nitrogen use. Field Crops Research,

144:212–224, 2013.

J. Ying, S. Peng, G. Yang, N. Zhou, R.M. Visperas, and K.G. Cassman. Comparison

of high-yield rice in tropical and subtropical environments II. Nitrogen accumulation

and utilization efficiency. Field Crops Research, 57:85–93, 1998.

L. Zhang, J.H.J. Spiertz, S. Zhang, B. Li, and W. Van der Werf. Nitrogen economy in

relay intercropping systems of wheat and cotton. Plant and Soil, 303:55–68, 2008.

129

Appendix B

Journal information of the studies in the review

Table B.1: Journals and their impact factor for studies in the review

Name Category 5 years im-

pact factor

Number of

manuscripts

selected

Agricultural Systems Agriculture Multidisciplinary 2.837 2

Agricultural Water

ManagementAgronomy 2.552 3

Agriculture Ecosystems

and EnvironmentAgriculture Multidisciplinary 3.673 1

Agronomy Journal Agronomy 1.989 7

Animal Production

Science (before

Australian Journal of

Experimental

Agriculture)

Agriculture Multidisciplinary 1.228 3

Canadian Journal of

Plant ScienceAgronomy 0.764 2

Communications in Soil

Science and Plant

Analysis

Agronomy 0.612 2

131

Table B.1 – continued from previous page

Crop and Pasture

Science (before

Australian Journal of

Agricultural Research)

Agriculture Multidisciplinary 1.439 4

Crop Science Agronomy 2.096 1

European Journal of

AgronomyAgronomy 3.311 8

Field Crops Research Agronomy 2.984 28

Geoderma Soil Science 2.904 1

Journal of Agricultural

ScienceAgriculture Multidisciplinary 2.604 1

Journal of Integrative

Plant BiologyBiochemistry and Molecular 2.429 1

Journal of Plant

NutritionPlant Science 0.851 4

Molecular Breeding Agronomy 3.304 1

Nutrient Cycling in

AgroecosystemsSoil Science 1.966 8

Paddy Water and

EnvironmentAgronomy 0.889 1

Plant and Soil Agronomy 3.108 13

Soil Science Society of

America JournalSoil Science 2.232 6

Soil Use and

ManagementSoil Science 2.219 1

Scientific World

JournalMultidisciplinary Science 1.603 1

132

Appendix C

Equivalence between Pham-Gia et al. (2006) and

(Marsaglia, 1965, 2006) expressions of the pdf of the

ratio

In this appendix we demonstrate that the expression of the pdf of T =a+ U

b+ Vgiven in

Pham-Gia et al. (2006), where U and V are two independent standard normal variables

and a and b are defined as in Eq 2.5, is equivalent to the one in (Marsaglia, 1965, 2006).

Firstly, let us calculate the integral:

H−2(z) =

∫ ∞

0

te−t2−2tzdt

Notice that:

−1 =

∫ ∞

0

(−2t− 2z)e−t2−2tzdt = −2H−2(z)− 2z

∫ ∞

0

e−t2−2tzdt

If we define:

A =

∫ ∞

0

e−t2−2tzdt =

∫ ∞

0

e−t2−2tz+z2−z2dt = ez

2

∫ ∞

0

e−(t+z)2dt

By performing the following change of variable, we obtain:

t+ z =r√2⇒ dt =

dr√2

A =ez

2

√2

∫ ∞√

2z

e−r2/2dr =

ez2

√2

√2π

∫ ∞√

2z

1√2πe−r

2/2dr = ez2√π(

1− ϕ(√

2z))

(C.1)

133

Therefore, H−2(z) =1

2− zez2√π

(1− ϕ(

√2z)

. If we substitute the latter expression

in Eq. 2.8 with σx = σy = 1 and µx = b µy = a, we arrive at:

fr(r) =e−(a2+b2)/2

π(1 + r2)[H−2(s(r)) +H−2(−s(r))]

where:

s(r) =ar + b√2(r2 + 1)

=q√2

, with q defined as in Eq. 2.6

Then,

ft(t) =e−(a2+b2)/2

π(1 + r2)[1 +

q√2eq

2/2√π(−1 + ϕ(q) + 1− ϕ(−q))]

=e−(a2+b2)/2

π(1 + r2)

[1 +

q√2eq

2/2√π

∫ q

−q

e−r2/2

√2π

dr

]=e−(a2+b2)/2

π(1 + r2)

[1 +

q

2eq

2/22

∫ q

0

e−r2/2dr

]

=e−(a2+b2)/2

π(1 + r2)

[1 + qeq

2/2

∫ q

0

e−r2/2dr

]=e−(a2+b2)/2

π(1 + r2)+ q

e−(a2+b2)/2

π(1 + r2)

∫ q

0

e−(r2−q2)/2dr

= (e−(a2+b2)/2)1

π(1 + r2)+ (1− e−(a2+b2)/2)

q∫ q

0e−(r2−q2)/2dr

π(1 + r2)(e(a2+b2)/2 − 1)

The last equality is obtained noticing that e−(a2+b2)/2 =1− e−(a2+b2)/2

(e(a2+b2)/2 − 1)

134

Appendix D

Application of the EM algorithm for estimating the

parameters of a mixture of multivariate Gaussian

distributions

In this appendix we detail the application of the EM algorithm to estimate the parameter

vector of a mixture of multivariate Gaussian distributions. Following the approach of

Dempster et al. (1977), the first step is to compute Lc(z,y,ψ).

The log likelihood function of the complete random sample is given by (McLachlan &

Peel, 2007 cited in Figueiredo & Jain , 2002):

Lc(z,y,ψ) =n∑

j=1

g∑

i=1

zij ln(πiφ(yj|µi,Σi)

)

=n∑

j=1

g∑

i=1

zij(ln(πi) + ln(φ(yj|µi,Σi)

)

where φ is the pdf of a MVN .

The two steps of the E-M algorithm are:

1. Compute E(Lc(z,y,ψ)|y,ψ(r)), whereψ(r) = (π(r)1 , . . . π

(r)g−1,µ

(r)1 , . . . ,µ(r)

g ,Σ(r)1 . . . ,Σ(r)

g ),

for a fixed g.

E(Lc(z,y,ψ)|y,ψ(r)) =n∑

j=1

g∑

i=1

E(zij|yj,ψ(r))[lnπi + lnφ(yj|µi,Σi)]

E(zij|yj,ψ(r)) = 1P (zij = 1|yj,ψ(r)) + 0P (zij = 0|y,ψ(r))

= P (zij = 1|yj,ψ(r)) (D.1)

135

by Eq. 3.2 and D.1, it follows that:

E(zij|yj,ψ(r)) = P (zij = 1|yj,ψ(r)) = τ(r+1)ij =

π(r)i φ(yj|µ(r)

i ,Σ(r)i )

∑gi=1 π

(r)i φ(yj|µ(r)

i ,Σ(r)i )

2. Find ψ(r+1) that maximises E(Lc(z,y,ψ)|y,ψ(r))

E(Lc(z,y,ψ)|y,ψ(r)) =n∑

j=1

g∑

i=1

τ(r+1)ij ln(πi) + τ

(r+1)ij lnφ(yj|µi,Σi) (D.2)

Firstly, let us find π(r+1)i , i = 1 . . . g. The second addend of the Eq. D.2 does not

depend on π so it is enough to derive:

h(π) =n∑

j=1

g∑

i=1

τ(r+1)ij ln(πi) and

∂h(π)

∂πi=

∑nj=1 τ

(r+1)ij

πi−

∑nj=1 τ

(r)gj

(1− π1 . . .− πg−1)

Denoting ni =n∑

j=1

τ(r+1)ij , we get

∂h(π)

∂πi=niπi− ng

(1− π1 . . .− πg−1)

=ni(1− π1 . . .− πg−1)− ngπi

πi(1− π1 − . . . πg−1)= 0 (D.3)

Note that ∀ s, s 6= i, by subtracting∂h(π)

∂πi− ∂h(π)

∂πs, we get that

πs =nsπini

∀s = 1 . . . g s 6= i

If we substitute πs, s = 1 . . . g, into the expression D.3 we obtain:

ni(1−n1πini− n2πi

ni. . .

ng−1πini

)− ngπi = 0

ni − πi(n1 + n2 + . . .+ ng) = 0 (D.4)

Notice that:

n1 + n2 + . . .+ ng =n∑

j=1

τ(r+1)1j + τ

(r+1)2j + . . .+ τ

(r+1)gj = n (D.5)

Therefore, by Eq.D.4 and D.5 , it is concluded that:

πi(r+1) =

nin

=

∑nj=1 τ

(r+1)ij

n

136

Taking derivatives with respect to µi and Σi, ∀i = 1 . . . g, in Eq. D.2:

g∑

k=1

n∑

j=1

τ(r+1)kj

∂

∂µilnφ(yj|µk,Σk) =

n∑

j=1

τ(r+1)ij

∂

∂µilnφ(yj|µi,Σi) = 0

g∑

k=1

n∑

j=1

τ(r+1)kj

∂

∂Σi

lnφ(yj|µk,Σk) =n∑

j=1

τ(r+1)ij

∂

∂Σi

lnφ(yj|µi,Σi) = 0 (D.6)

Note that∂

∂µilnφ(yj|µk,Σk) = 0 and

∂

∂Σi

lnφ(yj|µk,Σk) = 0 if i 6= k. Using some

properties detailed in (Rencher, 1998, p.416), it is straightforward to show that µ(r+1)i

and Σ(r+1)i satisfying D.6 exist in a closed form given by:

µ(r+1)i =

n∑

j=1

τ(r+1)ij yj/

n∑

j=1

τ(r+1)ij

Σ(r+1)i =

n∑

j=1

τ(r+1)ij (yj − µ(r+1)

i )>(yj − µ(r+1)i )/

n∑

j=1

τ(r+1)ij

We firstly calculate µ(r+1)i :

n∑

j=1

τ(r+1)ij

∂

∂µ>ilnφ(yj|µi,Σi) =

n∑

j=1

τ(r+1)ij

∂

∂µ>i[−p ln(

√2π)− 1

2ln |Σi| −

1

2(yj − µi)Σ−1

i (yj − µi)>] =

n∑

j=1

τ(r+1)ij

∂

∂µ>i[−1

2(yj − µi)Σ−1

i (yj − µi)>] =

n∑

j=1

τ(r+1)ij

∂

∂µ>i[−1

2(yjΣ

−1i y>j − yjΣ

−1i µ

>i − µiΣ−1

i y>j + µiΣ−1i µ

>i )] =

n∑

j=1

τ(r+1)ij

∂

∂µ>i[−1

2(−yjΣ

−1i µ

>i − (µ>i )>Σ−1

i y>j + (µ>i )>Σ−1i µ

>i )] =

Using A.13.2 and A. 13.3 in Rencher (1998) which state that∂(a>x)

∂x=∂(x>a)

∂x= a

and A.13.3∂(x>Ax)

∂x= 2Ax,

n∑

j=1

τ(r+1)ij (−1

2)[−(yjΣ

−1i )> −Σ−1

i y>j + 2Σ−1i µ

>i ] =

137

n∑

j=1

τ(r+1)ij [Σ−1

i (−y>j + µ>i )]⇒

µi =n∑

j=1

τ(r+1)ij yj/

n∑

j=1

τ(r+1)ij

Now we calculate Σ(r+1)i . Instead of taking derivatives with respect to Σi, we do so

with respect to Σ−1i .

n∑

j=1

τ(r+1)ij

∂

∂Σ−1i

lnφ(yj|µ(r)i ,Σi) =

n∑

j=1

τ(r+1)ij

∂

∂Σ−1i

−p ln√

2π − 1

2ln |Σi| −

1

2(yj − µ(r)

i )Σ−1i (yj − µ(r)

i )>

=n∑

j=1

τ(r+1)ij

∂

∂Σ−1i

1

2ln |Σ−1

i | −1

2(yj − µ(r)

i )Σ−1i (yj − µ(r)

i )>

But

(yj − µ(r)i )Σ−1

i (yj − µ(r)i )> = tr(yj − µ(r)

i )Σ−1i (yj − µ(r)

i )>

and tr(CD) = tr(DC)

=n∑

j=1

τ(r+1)ij

∂

∂Σ−1i

1

2ln |Σ−1

i | −1

2tr(Σ−1

i (yj − µ(r)i )>(yj − µ(r)

i ))

Let us take D = (yj − µ(r)i )>(yj − µ(r)

i )

Applying∂tr(CD)

∂C= D +D> − diag(D) (A.13.5 Rencher, 1998)

and∂ ln |C|∂C

= 2C−1 − diag(C−1) (A.13.6 Rencher, 1998) we get:

n∑

j=1

τ(r+1)ij 1

2(2Σi − diag(Σi)−D −D> + diag(D))

D is symmetric so

n∑

j=1

τ(r+1)ij Σi −

1

2diag(Σi)−D +

1

2diag(D) ⇒

If we equal the previous expression to 0, we get

n∑

j=1

τ(r+1)ij Σi −

1

2diag(Σi) =

n∑

j=1

τ(r+1)ij D − 1

2diag(D) ⇒

138

Σ(r)i =

∑nj=1 τ

(r+1)ij D

∑nj=1 τ

(r+1)ij

=

∑nj=1 τ

(r+1)ij (yj − µ(r)

i )>(yj − µ(r)i )

∑nj=1 τ

(r+1)ij

To show the second order condition for (π(r+1)1 , . . . , π(r+1)

g ,µ(r+1)1 , . . . ,µ(r+1)

g ,Σ(r+1)1 , . . . ,Σ(r+1)

g )

to be a maximum, we firstly demonstrate the following property:

Property 1:

If g(y|θ1 . . .θn) =n∑

i=1

gi(y|θi) and θi is a local maximum of gi(y|θi) ∀i⇒ (θ1, . . . , θn)

is a local maximum of g(y|θ1, . . .θn)

Proof. The sufficient conditions to show that (θ1, . . . , θn) is a maximum of g(y|θ1 . . .θn)

are (see Snyman, 2005):

1. Dg(θ1, . . . , θn) = 0, where Dg is the gradient of g

2. Hg(θ1, . . . , θn) is definite negative, where Hg is the Hessian matrix of g.

The first condition is straightfoward to show by taking into account that∂g

∂θi=∂gi∂θi

= 0

∀i. The second condition follows by considering that Hg is a diagonal block matrix

where all the blocks are negative definite matrices.

Hg =

Hg1 0 . . . 0

0 Hg2 . . . 0

0 0 . . . 0...

......

...

0 0 . . . Hgn

Thus, |Hg − λI| = |Hg1 − λI| . . . |Hg−n − λI| and this implies that all the eigenvalues

of Hg are negative.

Now we want to prove that (π1, . . . πg, µ1, . . . , µg, Σ1, . . . , Σg) is the global maxi-

mum of E(Lc(z,y,ψ)|y,ψ(r)).

We can express

E(Lc(z,y,ψ)|y,ψ(r)) = h(π) +

g∑

i=1

gi(y|µi,Σi)

139

with:

h(π) =n∑

j=1

g∑

i=1

τ(r+1)ij ln(πi) (D.7)

gi(y|µi,Σi) =n∑

j=1

τ(r+1)ij lnφ(yj|µiΣi) (D.8)

It is straightforward to show that π(r+1) is the global maximum of h(π) by calculating

the second derivative and applying Property 1. To show that (µ(r+1)i ,Σ

(r+1)i ) is the

global maximum of gi(y|µi,Σi) we follow the argument in Anderson & Olkin (1985)

for the MLE of the multivariate normal distribution. The function gi(y|µi,Σi) is

continuously differentiable and gi(y|µi,Σi)→ −∞ when the parameters approach the

boundary of the parameter space. Then, its only critical point is the global maximum.

By Property 1, (π(r+1)1 , . . . , π(r+1)

g ,µ(r+1)1 , . . . ,µ(r+1)

g ,Σ(r+1)1 , . . . ,Σ(r+1)

g ) is the global

maximum of E(Lc(z,y,ψ)|y,ψ(r)).

140

Appendix E

R code for fitting mixtures models of univariate Gaus-

sian distributions

Univariate mixture models can be fitted using the R packages mclust (Fraley & Raftery,

1999) or mixtools (Benaglia et al., 2009). For our analysis, we used mixtools because

it is more convenient for implementing the starting strategies described in Section 2.3

of the manuscript. The starting strategy number iii was not used because it tends to

produce negative values of the initial estimates of the means of the ratio, which is not

biologically possible. The procedure for fitting mixture models has been detailed in the

Section 2.3 of the manuscript. In this complementary material, we display the code

employed to perform the univariate mixture analyses. For the purpose of illustrating

the analysis, we consider the number of groups (g) fixed and equal to 3.

Firstly, call the library mixtools and mclust.

library(mclust)

library(mixtools)

Then, read the csv file containing the data set.

dataset <- read.table(file.choose(), header = T, sep = ",")

IEN <- dataset$IEN

141

g = 3

# g is the number of groups

size = length(dataset$IEN)

# size is the sample size

A function called initial was created to calculate the initial estimates of the mixture

when a cluster partition is provided. The inputs needed for this function are: the

data (dat); the sample size (size); a vector containing the classification of each of the

observations into groups (clust); and the number of groups (g).

initial <- function(dat, size, clust, g) dat <- as.matrix(dat)

class <- unmap(clust)

# unmap converts the vector clust into a matrix

# with number of rows and columns given by the

# sample size and the number of groups,

# respectively. The entry class[ik] has two

# possible values: 1, if the i-th observation is

# classified in the k-th group, or 0 otherwise.

sum <- t(dat) %*% class

# Multiplying the data by class we get the sums

# of the observations values classified in each

# of the groups

k <- rep(0, g)

mu <- rep(0, g)

lambda <- rep(0, g)

aux <- matrix(nrow = g, ncol = size)

for (i in 1:g) k[i] <- sum(class[, i])

# k denotes the number of observations classified

# in each of the groups

142

mu[i] <- sum[, i]/k[i]

lambda[i] <- k[i]/size

aux[i, ] <- rep(mu[i], size)

diff <- matrix(nrow = g, ncol = size)

sumvariance <- rep(0, g)

variance <- rep(0, g)

for (i in 1:g) diff[i, ] <- (aux[i, ] - dat)^2

sumvariance[i] <- t(diff[i, ]) %*% class[,

i]

variance[i] <- sumvariance[i]/k[i]

# lambda, mu and variance are vectors containing

# the initial estimates of the mixing

# proportions, means and variances, respectively.

# These values are stored in a list called init,

# which is returned by the function initial.

init <- list()

init$lambda <- lambda

init$mean <- mu

init$variance <- variance

return(init)

Now, proceed to fit univariate mixture models initiating the EM algorithm with

the strategies detailed in Section 2.3 of the manuscript.

i) Random starts

143

clust1 <- sample(1:g, size, replace = "TRUE")

init1 <- initial(dat = IEN, size = size, clust = clust1,

g = g)

model1 <- normalmixEM(IEN, lambda = init1$lambda, mu = init1$mean,

sigma = init1$variance, epsilon = 1e-06, k = g,

maxit = 10000)

## number of iterations= 789

plot(model1, whichplot = 2)

Density Curves

Data

Den

sity

20 40 60 80

0.00

00.

005

0.01

00.

015

0.02

00.

025

0.03

00.

035

Figure E.1: Histogram of the internal nitrogen use efficiency data and the mixture

components found by the EM algorithm initiated from random starts.

ii) K-means

clust2 <- kmeans(IEN, g, nstart = 5)$cluster

init2 <- initial(dat = IEN, size = size, clust = clust2,

g = g)

model2 <- normalmixEM(IEN, lambda = init2$lambda, mu = init2$mean,

sigma = init2$variance, epsilon = 1e-06, k = g,

maxit = 10000)

144



Density Curves

Data

Den

sity

20 40 60 80

0.00

00.

005

0.01

00.

015

0.02

00.

025

0.03

00.

035


components found by the EM algorithm initiated from the partition provided by the

K-means algorithm.

145

iv) Subsample solution

sequence <- seq(1, size, 1)

index <- sample(sequence, size = 200, replace = FALSE)

subsample <- IEN[index]

clustsub <- sample(1:g, 200, replace = "TRUE")

initsub <- initial(dat = subsample, size = 200, clust = clustsub,

g = g)

modelsub <- normalmixEM(IEN, lambda = initsub$lambda,

mu = initsub$mean, sigma = initsub$variance, k = g,

maxit = 10, epsilon = 0.01)


model3 <- normalmixEM(IEN, lambda = modelsub$lambda,

mu = modelsub$mean, sigma = modelsub$variance,

epsilon = 1e-06, k = g, maxit = 10000)



146

Density Curves

Data

Den

sity

20 40 60 80

0.00

0.01

0.02

0.03


components found by the EM algorithm initiated on a random subsample.

v) Short runs

clustshort1 <- sample(1:g, size, replace = "TRUE")

initshort1 <- initial(dat = IEN, size = size, clust = clustshort1,

g = g)

modelshort1 <- normalmixEM(IEN, lambda = initshort1$lambda,

mu = initshort1$mean, sigma = initshort1$variance,

epsilon = 0.01, k = g, maxit = 10000)




g = g)



epsilon = 0.01, k = g, maxit = 10000)

147




g = g)



epsilon = 0.01, k = g, maxit = 10000)


modelshort1$loglik

## [1] -1646

modelshort2$loglik

## [1] -1646

modelshort3$loglik

## [1] -1646

max(modelshort1$loglik, modelshort3$loglik, modelshort3$loglik)

## [1] -1646

If modelshort3 provides the solution with the highest value of the log likelihood

function among the short runs of the algorithm, use this solution to initiate a complete

run of the algorithm.

148

Density Curves

Data

Den

sity

20 40 60 80

0.00

0.01

0.02

0.03


components found after running several short runs of the EM algorithm.

modellong <- normalmixEM(IEN, lambda = modelshort3$lambda,

mu = modelshort3$mean, sigma = modelshort3$variance,

epsilon = 1e-06, k = g, maxit = 10000)


plot(modellong, whichplot = 2)

model1$loglik

## [1] -1628

model2$loglik

## [1] -1628

model3$loglik

## [1] -1629

149

modellong$loglik

## [1] -1629

Delete spurious solutions. From the remaining solutions, select the one with the

highest value of the log likelihood function. If there are not spuriosities, proceed as

follows:

max(model1$loglik, model2$loglik, model3$loglik, modellong$loglik)

## [1] -1628

If obj1 is the solution with the highest value of the log likelihood function, record

its AIC and BIC values.

aic <- function(loglik, g, size) aic = -2 * model1$loglik + 2 * (3 * g - 1)

return(aic)

bic <- function(loglik, g, size)

bic = -2 * model1$loglik + (3 * g - 1) * log(size)

return(bic)

aic(loglik = model1$loglik, g, size)

## [1] 3272

bic(loglik = model1$loglik, g, size)

## [1] 3305

This procedure is repeated starting with the maximum number of components until

fitting just one component. However, mixtools does not allow to fit one component,

so mclust is needed when g=1. The commands used for fitting one component with

mclust are:

150

model1 = Mclust(dataset$IEN, G = 1, modelNames = "V")

model1$parameters

## $pro

## [1] 1

##

## $mean

## [1] 44.27

##

## $variance

## $variance$modelName

## [1] "X"

##

## $variance$d

## [1] 1

##

## $variance$G

## [1] 1

##

## $variance$sigmasq

## [1] 119.4

Finally, the model selected is the one which minimises the information criteria.

151

Appendix F

R code for fitting mixture models of bivariate Gaus-

sian distributions

Bivariate mixture models were fitted using the R package EMMIX (McLachlan et al.,

1999). This package can be downloaded from the Web site:

http://www.maths.uq.edu.au/∼gjm/mix soft/EMMIX R/index.html. The procedure

for fitting mixture models has been detailed in Section 2.3 of the manuscript. Here, we

display the code employed to fit mixture models of bivariate Gaussian distributions.

For the purpose of illustrating the analysis, we consider the number of components (g)

fixed and equal to 3.

Firstly, set the R studio working directory to the root folder in which the package

EMMIX has been downloaded. Then, call the package EMMIX and mnormt. The

latter is needed for the starting strategy number 3 (see Section 2.3 of the manuscript).

source("EMMIX.R")

library(mnormt)

Read the csv file containing the data set.

dataset <- read.table(file.choose(), header = T, sep = ",")

For the bivariate analysis, the EM algorithm operates with the measurements of

grain yield

153

(dataset$YGR14) and nitrogen uptake (dataset$NUPT). These measurements are stored

in the matrix named dat.

size = 432

# size is the sample size

dat <- matrix(nrow = size, ncol = 2)

dat[, 1] <- dataset$NUPT

dat[, 2] <- dataset$YGR14

distr = "mvn"

# distr refers to the type of component densities

# of the mixture. In our case, multivariate

# Gaussian distributions (mvn)

ncov = 3

# ncov=3 indicates that each group is allowed to

# have a different covariance matrix.

g = 3

# g is the number of groups

A function called classcolor was created to visualise the solutions provided by the

EM algorithm. The inputs needed for this function are: the data set (y); a logic

parameter (ellipse) which can be TRUE or FALSE depending whether the user wants to

draw the prediction ellipses; the mixture model fitted by EMMIX (obj); the confidence

level for drawing the prediction ellipses (a); the size of the sample (n); and the number

of groups (g). The output of this function is a plot which displays the observations in

different colours depending on their classification. Furthermore, if ellipse=TRUE, the

prediction ellipses for a new point of the population to belong to each of the groups

are drawn. The ellipses are drawn modifying the function ellipse implemented in the

R package ellipse (Murdoch et al., 2007) according to Eq. 4.4 in Chew (1966).

154

classcolor<-function(y, ellipse, obj, a, n, g)p=2

# p is the dimension of the random vector

aux=matrix(nrow=n, ncol=3)

# aux is a matrix whose first two columns contain

# the data of nitrogen uptake and grain yield, and the third

# column contains the number of the group in which the

# observations have been classified.

tau<-obj$tau

for (i in 1: n)aux[i,1]<-y[i,1]

aux[i,2]<-y[i,2]

aux[i,3]<-which(tau[i,]==max(tau[i,]))

# Next, the measurements of grain yield and nitrogen uptake

# are coloured according to the classification given by the

# EM algorithm

xmin=0

xmax=max(y[,1])+20

ymin=0

ymax=max(y[,2])+20

plot(aux[,1][aux[,3]==1], aux[,2][aux[,3]==1],

col="red", xlim=c(xmin,xmax), ylim=c(ymin, ymax),

ylab="Grain Yield (kg/ha)",

xlab="Nitrogen Uptake (kg/ha)",cex.lab=1.2,

cex.axis=1.2 )

for(i in 2:g)points(aux[,1][aux[,3]==i], aux[,2][aux[,3]==i], col=i+1)

155

# If ellipse=TRUE, the function ellipse2 draws the prediction

# ellipses

ellipse2<-function (mu, sigma, alpha , npoints , newplot,

r1, draw, ...) es <- eigen(sigma)

e1 <- es$vec %*% diag(sqrt(es$val))

theta <- seq(0, 2 * pi, len = npoints)

v1 <- cbind(r1 * cos(theta), r1 * sin(theta))

pts = t(mu - (e1 %*% t(v1)))

if (newplot && draw) plot(pts, ...)

else if (!newplot && draw)

lines(pts, ...)

invisible(pts)

# end function

if (ellipse=="TRUE")

for (i in 1: g)n1=length(aux[aux[,3]==i])/3

q1=qf(1-a, 2, n1-2)

def<-((n1-1)*(n1+1)*p*q1)/((n1-p)*n1)

ellipse2(mu=obj$mu[,i], sigma=obj$sigma[,,i], alpha=a,

r1=sqrt(def), npoints=1000, newplot=FALSE,

draw=TRUE,type="l", lwd=2, col=i+1)

points( obj$mu[,i][1],obj$mu[,i][2] , col="black", pch=16)

156

# end for

# end if

# end function

Now, fit bivariate mixture models initiating the EM algorithm with the starting

strategies described in the Section 2.3 of the manuscript.

i) Random starts

initobj1 <- init.mix(dat, g, distr, ncov, nkmeans = 0,

nrandom = 100, nhclust = FALSE)

obj1 <- EMMIX(dat, g, distr, ncov, init = initobj1,

itmax = 1000, epsilon = 1e-06)

##

## -----------------------

##

## 3 - Component Multivariate Normal Mixture Model

##

## -----------------------

##

## $distr

## [1] "mvn"

##

## $error

## [1] 0

##

## $loglik

## [1] -5160

##

157

## $bic

## [1] 10424

##

## $aic

## [1] 10355

##

## $pro

## [1] 0.3037 0.3775 0.3188

##

## $mu

## [,1] [,2] [,3]

## [1,] 51.42 33.88 72.43

## [2,] 1993.46 1752.30 2808.45

##

## $sigma

## , , 1

##

## [,1] [,2]

## [1,] 301.8 11121

## [2,] 11121.0 473365

##

## , , 2

##

## [,1] [,2]

## [1,] 187.4 9557

## [2,] 9557.0 562741

##

## , , 3

##

## [,1] [,2]

158

0 20 40 60 80 100 120 140

010

0020

0030

0040

00

Nitrogen Uptake (kg/ha)

Gra

in Y

ield

(kg

/ha)

Figure F.1: Cluster partition found by the EM algorithm initiated from random starts.

[Observations in different colours have been classified as belonging to different groups. The dots

represent the joint means and the ellipses are the 90% prediction regions for each group].

## [1,] 368.2 6279

## [2,] 6279.4 359595

##

##

## $ICL

## [1] -5349

##

##

## -----------------------

obj1$loglik

## [1] -5160

classcolor(dat, ellipse = "TRUE", obj1, a = 0.1, n = size,

g = g)

ii) K-means

159

initobj2 <- init.mix(dat, g, distr, ncov, nkmeans = 100,




##

## -----------------------

##


##

## -----------------------

##

## $distr

## [1] "mvn"

##

## $error

## [1] 0

##

## $loglik

## [1] -5150

##

## $bic

## [1] 10402

##

## $aic

## [1] 10333

##

## $pro

## [1] 0.4096 0.4390 0.1513

##

## $mu

160

## [,1] [,2] [,3]

## [1,] 42.43 70.66 20.42

## [2,] 1866.97 2814.10 1070.22

##

## $sigma

## , , 1

##

## [,1] [,2]

## [1,] 118.2 3918

## [2,] 3917.7 253491

##

## , , 2

##

## [,1] [,2]

## [1,] 322.3 5272

## [2,] 5272.2 308424

##

## , , 3

##

## [,1] [,2]

## [1,] 37.65 2607

## [2,] 2607.11 225208

##

##

## $ICL

## [1] -5284

##

##

## -----------------------

obj2$loglik

161

0 20 40 60 80 100 120 140

010

0020

0030

0040

00


Gra

in Y

ield

(kg

/ha)

Figure F.2: Cluster partition found by the EM algorithm initiated from the partition

obtained by the K-means algorithm. [Observations in different colours have been classified

as belonging to different groups. The dots represent the joint means and the ellipses are the 90%

prediction regions for each group].

## [1] -5150


g = g)

iii) Simulated means

mu <- c()

mu[1] <- mean(dat[, 1])

mu[2] <- mean(dat[, 2])

S <- cov(dat)

meaninit <- rmnorm(n = g, mean = mu, S)

# the initial values of the means

sigmainit <- array(S, c(2, 2, g))

# the initial values of the covariance matrices

initobj1$pro <- rep(1/g, g)

# the initial values of the mixing proportions

162

initobj1$mu <- t(meaninit)

initobj1$sigma <- sigmainit



##

## -----------------------

##


##

## -----------------------

##

## $distr

## [1] "mvn"

##

## $error

## [1] 0

##

## $loglik

## [1] -5150

##

## $bic

## [1] 10402

##

## $aic

## [1] 10333

##

## $pro

## [1] 0.1524 0.4032 0.4444

##

163

## $mu

## [,1] [,2] [,3]

## [1,] 20.46 42.36 70.43

## [2,] 1073.41 1861.97 2808.18

##

## $sigma

## , , 1

##

## [,1] [,2]

## [1,] 37.85 2621

## [2,] 2621.27 226479

##

## , , 2

##

## [,1] [,2]

## [1,] 116.4 3866

## [2,] 3866.0 250825

##

## , , 3

##

## [,1] [,2]

## [1,] 324.3 5353

## [2,] 5352.7 310420

##

##

## $ICL

## [1] -5284

##

##

## -----------------------

164

0 20 40 60 80 100 120 140

010

0020

0030

0040

00


Gra

in Y

ield

(kg

/ha)

Figure F.3: Cluster partition found by the EM algorithm initiated from simulated

means. [Observations in different colours have been classified as belonging to different groups. The

dots represent the joint means and the ellipses are the 90% prediction regions for each group].

obj3$loglik

## [1] -5150


g = g)

iv) Subsample solution

sequence <- seq(1, size, 1)

index <- sample(sequence, size = 200, replace = FALSE)

subsample <- dat[index, ]

initobjsub <- init.mix(subsample, g, distr, ncov, nkmeans = 0,

nrandom = 100, nhclust = TRUE)

objsub <- EMMIX(subsample, g, distr, ncov, init = initobjsub,

itmax = 10)

##

## -----------------------

165

##


##

## -----------------------

##

## $distr

## [1] "mvn"

##

## $error

## [1] 1

##

## $loglik

## [1] -2397

##

## $bic

## [1] 4885

##

## $aic

## [1] 4829

##

## $pro

## [1] 0.3471 0.3572 0.2957

##

## $mu

## [,1] [,2] [,3]

## [1,] 47.43 72.27 28.24

## [2,] 2020.94 2880.57 1359.73

##

## $sigma

## , , 1

166

##

## [,1] [,2]

## [1,] 171.2 4978

## [2,] 4977.6 284157

##

## , , 2

##

## [,1] [,2]

## [1,] 353.6 4516

## [2,] 4516.3 309734

##

## , , 3

##

## [,1] [,2]

## [1,] 105 4747

## [2,] 4747 312637

##

##

## $ICL

## [1] -2500

##

##

## -----------------------

initobj4 <- objsub



##

## -----------------------

##

167


##

## -----------------------

##

## $distr

## [1] "mvn"

##

## $error

## [1] 0

##

## $loglik

## [1] -5150

##

## $bic

## [1] 10402

##

## $aic

## [1] 10333

##

## $pro

## [1] 0.4059 0.4417 0.1524

##

## $mu

## [,1] [,2] [,3]

## [1,] 42.41 70.54 20.46

## [2,] 1864.94 2811.11 1073.26

##

## $sigma

## , , 1

##

168

## [,1] [,2]

## [1,] 117.1 3887

## [2,] 3887.5 252103

##

## , , 2

##

## [,1] [,2]

## [1,] 323.3 5313

## [2,] 5313.0 309438

##

## , , 3

##

## [,1] [,2]

## [1,] 37.87 2621

## [2,] 2621.16 226382

##

##

## $ICL

## [1] -5284

##

##

## -----------------------

obj4$loglik

## [1] -5150


g = g)

v) Short runs

169

0 20 40 60 80 100 120 140

010

0020

0030

0040

00


Gra

in Y

ield

(kg

/ha)

Figure F.4: Cluster partition found by the EM algorithm initiated from the mixture

estimates obtained by running the EM algorithm on a random subsample of 200 ob-

servations. [Observations in different colours have been classified as belonging to different groups.

The dots represent the joint means and the ellipses are the 90% prediction regions for each group].

initobjshort1 <- init.mix(dat, g, distr, ncov, nkmeans = 0,


objshort1 <- EMMIX(dat, g, distr, ncov, init = initobjshort1,

epsilon = 0.01)

##

## -----------------------

##


##

## -----------------------

##

## $distr

## [1] "mvn"

##

## $error

## [1] 0

170

##

## $loglik

## [1] -5160

##

## $bic

## [1] 10424

##

## $aic

## [1] 10355

##

## $pro

## [1] 0.3109 0.3822 0.3069

##

## $mu

## [,1] [,2] [,3]

## [1,] 72.46 34.16 51.85

## [2,] 2803.79 1765.15 2006.83

##

## $sigma

## , , 1

##

## [,1] [,2]

## [1,] 371.9 6341

## [2,] 6341.4 363120

##

## , , 2

##

## [,1] [,2]

## [1,] 190.3 9673

## [2,] 9673.0 567372

171

##

## , , 3

##

## [,1] [,2]

## [1,] 314.9 11613

## [2,] 11613.4 490879

##

##

## $ICL

## [1] -5351

##

##

## -----------------------




epsilon = 0.01)

##

## -----------------------

##


##

## -----------------------

##

## $distr

## [1] "mvn"

##

## $error

## [1] 0

172

##

## $loglik

## [1] -5160

##

## $bic

## [1] 10424

##

## $aic

## [1] 10355

##

## $pro

## [1] 0.3071 0.3878 0.3051

##

## $mu

## [,1] [,2] [,3]

## [1,] 52.29 34.33 72.52

## [2,] 2020.33 1771.45 2801.91

##

## $sigma

## , , 1

##

## [,1] [,2]

## [1,] 321.3 11862

## [2,] 11862.2 499889

##

## , , 2

##

## [,1] [,2]

## [1,] 192.2 9737

## [2,] 9737.0 569729

173

##

## , , 3

##

## [,1] [,2]

## [1,] 374 6359

## [2,] 6359 364909

##

##

## $ICL

## [1] -5351

##

##

## -----------------------




epsilon = 0.01)

##

## -----------------------

##


##

## -----------------------

##

## $distr

## [1] "mvn"

##

## $error

## [1] 0

174

##

## $loglik

## [1] -5160

##

## $bic

## [1] 10424

##

## $aic

## [1] 10355

##

## $pro

## [1] 0.3025 0.3099 0.3877

##

## $mu

## [,1] [,2] [,3]

## [1,] 72.6 52.32 34.37

## [2,] 2803.0 2022.39 1774.12

##

## $sigma

## , , 1

##

## [,1] [,2]

## [1,] 373.9 6338

## [2,] 6338.4 364670

##

## , , 2

##

## [,1] [,2]

## [1,] 323.3 11922

## [2,] 11921.7 501949

175

##

## , , 3

##

## [,1] [,2]

## [1,] 192.7 9768

## [2,] 9767.7 571508

##

##

## $ICL

## [1] -5351

##

##

## -----------------------

objshort1$loglik

## [1] -5160

objshort2$loglik

## [1] -5160

objshort3$loglik

## [1] -5160

max(objshort1$loglik, objshort3$loglik, objshort3$loglik)

## [1] -5160

If objshort3 is the model which provides the solution with the highest value of the

log likelihood function among the short runs of the algorithm; then, use this solution

to initiate a complete run of the algorithm.

176

objlong <- EMMIX(dat, g, distr, ncov, init = objshort3,


##

## -----------------------

##


##

## -----------------------

##

## $distr

## [1] "mvn"

##

## $error

## [1] 0

##

## $loglik

## [1] -5160

##

## $bic

## [1] 10424

##

## $aic

## [1] 10355

##

## $pro

## [1] 0.3189 0.3037 0.3774

##

## $mu

## [,1] [,2] [,3]

## [1,] 72.42 51.41 33.87

177

## [2,] 2808.44 1993.19 1752.20

##

## $sigma

## , , 1

##

## [,1] [,2]

## [1,] 368.2 6280

## [2,] 6279.6 359576

##

## , , 2

##

## [,1] [,2]

## [1,] 301.7 11117

## [2,] 11117.0 473221

##

## , , 3

##

## [,1] [,2]

## [1,] 187.4 9556

## [2,] 9556.0 562705

##

##

## $ICL

## [1] -5349

##

##

## -----------------------

objlong$loglik

## [1] -5160

178

0 20 40 60 80 100 120 140

010

0020

0030

0040

00


Gra

in Y

ield

(kg

/ha)

Figure F.5: Cluster partition found by the EM algorithm initiated from the mixture

estimates after running several short runs of the EM algorithm. [Observations in different

colours have been classified as belonging to different groups. The dots represent the joint means and

the ellipses are the 90% prediction regions for each group].

classcolor(dat, ellipse = "TRUE", objlong, a = 0.1,

n = size, g = g)

obj1$loglik

## [1] -5160

obj2$loglik

## [1] -5150

obj3$loglik

## [1] -5150

obj4$loglik

## [1] -5150

objlong$loglik

## [1] -5160

179

Delete spurious solutions. From the remaining solutions, select the one with the

highest value of the log likelihood function. If there are not spuriosities, proceed as

follows:

max(obj1$loglik, obj2$loglik, obj3$loglik, obj4$loglik,

objlong$loglik)

## [1] -5150

If obj2 is the solution with the highest value of the log likelihood function, record

its AIC and BIC value.

obj2$aic

## [1] 10333

obj2$ICL

## [1] -5284

obj2$bic

## [1] 10402

This procedure is repeated starting with the maximum number of groups until

fitting just one group. Finally, the model selected is the one which minimises the

information criteria.

180

Bibliography

H. Akaike. Information theory and an extension of the maximum likelihood principle. In

B. N. Petrov and F. Csaki, editors, Second International Symposium on Information

Theory., pages 267–281. Budapest: Akademia Kiado, 1973.

H. Akaike. A new look at the statistical model identification. IEEE Transactions on

Automatic Control, 19:716–723, 1974.

R. Albrizio, M. Todorovic, T. Matic, and A. M. Stellacci. Comparing the interactive

effects of water and nitrogen on durum wheat and barley grown in a Mediterranean

environment. Field Crops Research, 115:179–190, 2010.

T. W. Anderson and I. Olkin. Maximum-likelihood estimation of the parameters of

a multivariate normal distribution. Linear algebra and its applications, 70:147–171,

1985.

M. Andrews, P. J. Lea, J.A. Raven, and R.A. Azevedo. Nitrogen use efficiency. 3.

Nitrogen fixation: genes and costs. Annals of Applied Biology, 155:1–13, 2009.

Australian Centre for Plant Functional Genomics. ACPFG Web site. http://

www.acpfg.com.au/search.php?q=Nitrogen%20Use%20Efficiency. Last accessed:

2014-06-12.

J. D. Banfield and A. E. Raftery. Model-based Gaussian and non-Gaussian clustering.

Biometrics, 49:803–821, 1993.

T. Benaglia, D. Chauveau, D. R. Hunter, and D. S. Young. mixtools: An R package

for analyzing finite mixture models. Journal of Statistical Software, 32:1–29, 2009.

181

H. Bensmail, G. Celeux, A. E. Raftery, and C. P. Robert. Inference in model-based

cluster analysis. Statistics and Computing, 7:1–10, 1997.

C. Biernacki, G. Celeux, and G. Govaert. Assessing a mixture model for clustering

with the Integrated Completed Likelihood. IEEE Transactions on Pattern Analysis

and Machine Intelligence, 22:719–725, 2000.

C. Biernacki, G. Celeux, and G. Govaert. Choosing starting values for the EM algo-

rithm for getting the highest likelihood in multivariate Gaussian mixture models.

Computational Statistics & Data Analysis, 41:561–575, 2003.

C. Biernacki, G. Celeux, G. Govaert, and F. Langrognet. Model-based cluster and

discriminant analysis with the MIXMOD software. Computational Statistics & Data

Analysis, 51:587–600, 2006.

S. Borman. Topics in multiframe superresolution restoration. PhD thesis, the Univer-

sity of Notre Dame, 2004.

A.F. Bouwman, G. Van Drecht, and K.W. Van der Hoek. Global and regional surface

nitrogen balances in intensive agricultural production systems for the period 1970-

2030. Pedosphere, 15:137–155, 2005.

H. Bozdogan. On the information-based measure of covariance complexity and its

application to the evaluation of multivariate linear models. Communications in

Statistics-Theory and Methods, 19:221–278, 1990.

H. Bozdogan. Choosing the number of component clusters in the mixture-model using

a new informational complexity criterion of the inverse-Fisher information matrix.

In O. Opitz, B. Lausen, and R. Klar, editors, Information and classification, pages

40–54. Heidelberg: Springer, 1993.

G. Casella and R. L. Berger. Statistical inference. Pacific Grove, CA: Duxbury, 2nd

edition, 2002.

K. G. Cassman, A. Dobermann, and D. T. Walters. Agroecosystems, nitrogen-use

efficiency, and nitrogen management. Ambio, 31:132–140, 2002.

182

K.G. Cassman, S. Peng, D. C. Olk, J.K. Ladha, W. Reichardt, A. Dobermann, and

U. Singh. Opportunities for increased nitrogen-use efficiency from improved resource

management in irrigated rice systems. Field Crops Research, 56:7–39, 1998.

G. Celeux and G. Soromenho. An entropy criterion for assessing the number of clusters

in a mixture model. Journal of classification, 13:195–212, 1996.

V. Chew. Confidence, prediction, and tolerance regions for the multivariate normal

distribution. Journal of the American Statistical Association, 61:605–617, 1966.

G. W. Cochran. Sampling Techniques. New York: Wiley, 1977.

H. Cramer. Mathematical Methods of Statistics. Princeton: Princeton University Press,

1946.

J. Crossa and J. Franco. Statistical methods for classifying genotypes. Euphytica, 137:

19–37, 2004.

A.P. Dempster, N.M. Laird, and D.B. Rubin. Maximum likelihood from incomplete

data via the EM algorithm. Journal of the Royal Statistical Society. Series B

(Methodological), 39:1–38, 1977.

M. Di Zio, U. Guarnera, and O. Luzi. Editing systematic unity measure errors through

mixture modelling. Survey Methodology, 31:53–63, 2005.

M. Di Zio, U. Guarnera, and R. Rocci. A mixture of mixture models for a classification

problem: The unity measure error. Computational statistics & data analysis, 51:

2573–2585, 2007.

J. Diebolt and C. P. Robert. Estimation of finite mixture distributions through bayesian

sampling. Journal of the Royal Statistical Society. Series B (Methodological), 56:363–

375, 1994.

A. Dobermann and K.G. Cassman. Plant nutrient management for enhanced produc-

tivity in intensive grain production systems of the United States and Asia. Plant

and Soil, 247:153–175, 2002.

183

A. R. Dobermann. Nitrogen use efficiency–state of the art. In IFA International

Workshop on Enhanced Efficiency Fertilizers. Frankfurt, 2005.

R.E. Evenson and D. Gollin. Assessing the impact of the green revolution, 1960 to

2000. Science, 300:758–762, 2003.

B. S. Everitt. An introduction to finite mixture distributions. Statistical Methods in

Medical Research, 5:107–127, 1996.

B. S. Everitt, S. Landau, M. Leese, and D. Stahl. Cluster analysis. Chicester: Wiley,

5th edition, 2011.

Microsoft Excel. Microsoft Excel. Computer Software. Microsoft Corporation, Red-

mond, Washington, 2010.

N.K. Fageria and V.C. Baligar. Enhancing nitrogen use efficiency in crop plants. Ad-

vances in agronomy, 88:97–185, 2005.

FAO. Global agriculture towards 2050. In High Level Expert Forum. How to feed the

world 2005. Rome: Food and Agriculture Organization of the United Nations, 2009.

E. C. Fieller. The distribution of the index in a normal bivariate population.

Biometrika, 24:428–440, 1932.

E. C. Fieller. Some problems in interval estimation. Journal of the Royal Statistical

Society. Series B (Methodological), 16:175–185, 1954.

M.A.T. Figueiredo and A.K. Jain. Unsupervised learning of finite mixture models.

IEEE Transactions on Pattern Analysis and Machine Intelligence, 24:381–396, 2002.

R. F. Follett. Innovative 15n microplot research techniques to study nitrogen use

efficiency under different ecosystems. Communications in soil science and plant

analysis, 32:951–979, 2001.

J.R.S. Fonseca and M.G.M.S. Cardoso. Mixture-model cluster analysis using informa-

tion theoretical criteria. Intelligent Data Analysis, 11:155–173, 2007.

184

M.J. Foulkes, M.J. Hawkesford, P.B. Barraclough, M.J. Holdsworth, S. Kerr, S. Kight-

ley, and P.R. Shewry. Identifying traits to improve the nitrogen economy of wheat:

Recent advances and future prospects. Field Crops Research, 114:329–342, 2009.

D.B. Fowler. Crop nitrogen demand and grain protein concentration of spring and

winter wheat. Agronomy Journal, 95:260–265, 2003.

C. Fraley and A. E. Raftery. How many clusters? Which clustering method? Answers

via model-based cluster analysis. The computer journal, 41:578–588, 1998.

C. Fraley and A. E. Raftery. MCLUST: Software for model-based cluster analysis.

Journal of Classification, 16:297–306, 1999.

V. H. Franz. Ratios: A short guide to confidence limits and proper use. arXiv preprint

arXiv:0710.2024, 2007.

M. Friendly. Data ellipses, HE plots and reduced-rank displays for multivariate linear

models: SAS software and examples. Journal of Statistical Software, 17:1–43, 2006.

S. Fruhwirth-Schnatter. Finite mixture and Markov switching models. Springer: New

York, 2006.

S. Fruhwirth-Schnatter and S. Pyne. Bayesian inference for finite mixtures of univariate

and multivariate skew-normal and skew-t distributions. Biostatistics, 11:317–336,

2010.

J. N. Galloway, J. D. Aber, J. W. Erisman, S. P. Seitzinger, R. W. Howarth, E. B.

Cowling, and B. J. Cosby. The nitrogen cascade. Bioscience, 53:341–356, 2003.

A. Ganesalingam, A. B. Smith, C. P. Beeck, W. A. Cowling, R. Thompson, and B. R.

Cullis. A bivariate mixed model approach for the analysis of plant survival data.

Euphytica, 190:371–383, 2013.

R. C. Geary. The frecuency distribution of the quotient of two normal variates. Journal

of the Royal Statistical Society, 93:442–446, 1930.

185

S. Geman and D. Geman. Stochastic relaxation, gibbs distributions, and the bayesian

restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelli-

gence, 6:721–741, 1984.

E. Genge. A latent class analysis of the public attitude towards the euro adoption in

Poland. Adv. Data Anal. Classif. (in press), 2013.

J. K. Ghosh and P. K. Sen. On the asymptotic performance of the log likelihood ratio

statistic for the mixture model and related results. In Proceedings of the Berkeley

Conference in Honor of Jerzy Neyman and Jack Kiefer, pages 789–806. Monterey:

Wadsworth, 1985.

D. Giambalvo, P. Ruisi, G. Di Miceli, A. S. Frenda, and G. Amato. Nitrogen use

efficiency and nitrogen fertilizer recovery of durum wheat genotypes as affected by

interspecific competition. Agronomy Journal, 102:707–715, 2010.

S. S. Goyal and R. C. Huffaker. Nitrogen in crop production. In R.D. Hauck, editor,

Nitrogen toxicity in plants. Madison: American Society of Agronomy- Crop Science

Society of America- Soil Science Society of America, 1984.

R. J. Hathaway. A constrained formulation of maximum-likelihood estimation for

normal mixture distributions. The Annals of Statistics, 13:795–800, 1985.

M. J. Hawkesford. Reducing the reliance on nitrogen fertilizer for wheat production.

Journal of Cereal Science, 59:276–283, 2014.

D. V. Hinkley. On the ratio of two correlated normal random variables. Biometrika,

56:635–639, 1969.

B. Hirel, J. Le Gouis, B. Ney, and A. Gallais. The challenge of improving nitrogen

use efficiency in crop plants: towards a more central role for genetic variability and

quantitative genetics within integrated approaches. Journal of Experimental Botany,

58:2369–2387, 2007.

186

S. Ingrassia and R. Rocci. Constrained monotone EM algorithms for finite mixture

of multivariate Gaussians. Computational Statistics & Data Analysis, 51:5339–5351,

2007.

IRRI. IRRISTAT User’s Manual Version 3. 1994.

M. Ishiguro, Y. Sakamoto, and G. Kitagawa. Bootstrapping log likelihood and EIC,

an extension of AIC. Annals of the Institute of Statistical Mathematics, 49:411–434,

1997.

B.H. Janssen, F.C.T. Guiking, D. Van der Eijk, E.M.A. Smaling, J. Wolf, and

H. Van Reuler. A system for quantitative evaluation of the fertility of tropical soils

(QUEFTS). Geoderma, 46:299–318, 1990.

A Jasra, C. C. Holmes, and D. A. Stephens. Markov chain Monte Carlo methods and

the label switching problem in bayesian mixture modeling. Statistical Science, 20:

50–67, 2005.

D. Karlis and E. Xekalaki. Choosing initial values for the EM algorithm for finite

mixtures. Computational Statistics & Data Analysis, 41:577–590, 2003.

J. Kiefer and J. Wolfowitz. Consistency of the maximum likelihood estimator in the

presence of infinitely many incidental parameters. The Annals of Mathematical

Statistics, 27:887–906, 1956.

J. K. Ladha, H. Pathak, T. J. Krupnik, J. Six, and C. van Kessel. Efficiency of fertilizer

nitrogen in cereal production: retrospects and prospects. Advances in Agronomy,

87:85–156, 2005.

C. D. Lai, G. R. Wood, and C. G. Qiao. The mean of the inverse of a punctured normal

distribution and its application. Biometrical Journal, 46:420–429, 2004.

K. Lee, J. M. Marin, K. Mengersen, and C. Robert. In Proceedings of the Platinum

Jubilee of the Indian Statistical Institute, chapter Bayesian inference on mixtures of

distributions. Bangalore: Indian Statistical Institute, 2008.

187

P. M. Lee. Bayesian statistics: an introduction. Hoboken: Wiley, 4th edition, 2012.

P. J. Lenk and W. S. DeSarbo. Bayesian inference for finite mixtures of generalized

linear models with random effects. Psychometrika, 65:93–119, 2000.

Bruce G Lindsay. Mixture models: theory, geometry and applications. In NSF-CBMS

regional conference series in probability and statistics, pages 1–163. Hayward: Insti-

tute of Mathematical Statistics- Alexandria: American Statistical Association, 1995.

M. Liu, Z. Yu, Y. Liu, and N.T. Konijn. Fertilizer requirements for wheat and maize

in China: The QUEFTS approach. Nutrient Cycling in Agroecosystems, 74:245–258,

2006.

X. Liu, P. He, J. Jin, W. Zhou, G. Sulewski, and S. Phillips. Yield gaps, indigenous

nutrient supply, and nutrient use efficiency of wheat in China. Agronomy Journal,

103:1452–1463, 2011.

R. Maitra. Initializing partition-optimization algorithms. IEEE/ACM Transactions on

Computational Biology and Bioinformatics, 6:144–157, 2009.

J. M. Marin, K. Mengersen, and C. P. Robert. Bayesian modelling and inference on

mixtures of distributions. In C. Rao and D. Dey, editors, Handbook of Statistics,

volume 25, pages 459–507. Elsevier: Amsterdan, 2005.

J. S. Marron and M. P. Wand. Exact mean integrated squared error. The Annals of

Statistics, 20:712–736, 1992.

G. Marsaglia. Ratios of normal variables and ratios of sums of uniform variables.

Journal of the American Statistical Association, 60:193–204, 1965.

G. Marsaglia. Ratios of normal variables. Journal of Statistical Software, 16:1–10,

2006.

H. Marschner and P. Marschner. Marschner’s mineral nutrition of higher plants. Lon-

don: Elsevier, 2012.

188

G. J. McLachlan. On bootstrapping the likelihood ratio test stastistic for the number

of components in a normal mixture. Journal of the Royal Statistical Society. Series

C. (Applied Statistics), 36:318–324, 1987.

G. J. McLachlan and D. Peel. Finite mixture models. New York: Wiley, 2000.

G. J. McLachlan, D. Peel, K. E. Basford, and P. Adams. The EMMIX software for the

fitting of mixtures of normal and t-components. Journal of Statistical Software, 4,

1999.

G.J. McLachlan, D. Peel, and W.J. Whiten. Maximum likelihood clustering via normal

mixture models. Signal Processing: Image Communication, 8:105–111, 1996.

M. Meila and D. Heckerman. An experimental comparison of model-based clustering

methods. Machine Learning, 42:9–29, 2001.

V. Melnykov. Challenges in model-based clustering. Wiley Interdisciplinary Reviews:

Computational Statistics, 5:135–148, 2013.

V. Melnykov and R. Maitra. Finite mixture models and model-based clustering. Statis-

tics Surveys, 4:80–116, 2010.

X.L. Meng and D. Van Dyk. The EM algorithm - an old folk-song sung to a fast new

tune. Journal of the Royal Statistical Society. Series B: Statistical Methodology, 59:

511–567, 1997.

D. Murdoch, E.D. Chow, and J.M. Frias Celayeta. ellipse: Functions for drawing

ellipses and ellipse-like confidence regions. R package version 0.3-5, 2007.

K. Naklang, D. Harnpichitvitaya, S.T. Amarante, L.J. Wade, and S.M. Haefele. In-

ternal efficiency, nutrient uptake, and the relation to field water resources in rainfed

lowland rice of northeast Thailand. Plant and Soil, 286:193–208, 2006.

S. Newcomb. A generalized theory of the combination of observations so as to obtain

the best result. American Journal of Mathematics, 8:343–366, 1886.

189

S. Ng. Recent developments in expectation-maximization methods for analyzing com-

plex data. Wiley Interdisciplinary Reviews: Computational Statistics, 5:415–431,

2013.

C. Nicholson. The probability integral for two variables. Biometrika, 33:59–72, 1943.

O.E. Olarewaju, M.T. Adetunji, C.O. Adeofun, and I.M. Adekunle. Nitrate and phos-

phorus loss from agricultural land: implications for nonpoint pollution. Nutrient

Cycling in Agroecosystems, 85:79–85, 2009.

M.G.H. Omran, A. P. Engelbrecht, and A. Salman. An overview of clustering methods.

Intelligent Data Analysis, 11:583–605, 2007.

B.N. Otteson, M. Mergoum, and J.K. Ransom. Seeding rate and nitrogen management

effects on spring wheat yield and yield components. Agronomy Journal, 99:1615–

1621, 2007.

K. Pearson. Contributions to the mathematical theory of evolution. Philosophical

Transactions of the Royal Society of London. A, 185:71–110, 1894.

D. Peel and G. J. McLachlan. Robust mixture modelling using the t distribution.

Statistics and computing, 10:339–348, 2000.

D. Pena. Analisis de datos multivariantes. McGraw-Hill: Madrid, 2002.

T. Pham-Gia, N. Turkkan, and E. Marchand. Density of the ratio of two normal random

variables and applications. Communications in StatisticsTheory and Methods, 35:

1569–1591, 2006.

C. G. Qiao, G. R. Wood, C. D. Lai, and D. W. Luo. Comparison of two common

estimators of the ratio of the means of independent normal variables in agricultural

research. Journal of Applied Mathematics and Decision Sciences, 2006:1–14, 2006.

R Core Team. R: A Language and Environment for Statistical Computing. R

Foundation for Statistical Computing, Vienna, Austria, 2012. URL http://www.

R-project.org/. ISBN 3-900051-07-0.

190

W. R. Raun and G. V. Johnson. Improving nitrogen use efficiency for cereal production.

Agronomy Journal, 91:357–363, 1999.

R. A. Redner and H. F. Walker. Mixture densities, maximum likelihood and the EM

algorithm. SIAM Review, 26:195–239, 1984.

A. C. Rencher. Multivariate statistical inference and applications. New York: Wiley,

1998.

C. P. Robert and G. Casella. Monte Carlo statistical methods. New York: Springer,

2nd edition, 2004.

M. L. Samuels, J. A. Witmer, and A Schaffner. Statistics for the life sciences. Boston:

Pearson Education, 2012.

SAS Institute. SAS Institute version 9.4. 2013.

G. Schwarz. Estimating the dimension of a model. The Annals of Statistics, 6:461–464,

1978.

W. Seidel, K. Mosler, and M. Alker. A cautionary note on likelihood ratio tests in

mixture models. Annals of the Institute of Statistical Mathematics, 52:481–487, 2000.

B. Seo and D. Kim. Root selection in normal mixture models. Computational Statistics

& Data Analysis, 56:2454–2470, 2012.

S. Shanmugalingam. On the analysis of the ratio of two correlated normal variables.

Journal of the Royal Statistical Society. Series D (The Statistician), 31:251–258,

1982.

T. R. Sinclair. Historical changes in harvest index and crop nitrogen accumulation.

Crop Science, 38:638–643, 1998.

V. Smil. Nitrogen in crop production: An account of global flows. Global Biogeochemical

Cycles, 13:647–662, 1999.

P. Smyth. Model selection for probabilistic clustering using cross-validated likelihood.

Statistics and Computing, 9:63–72, 2000.

191

J. A. Snyman. Practical mathematical optimization: an introduction to basic opti-

mization theory and classical and new gradient-based algorithms. Boston: Springer,

2005.

J.H.J. Spiertz. Nitrogen, sustainable agriculture and food security. A review. Agronomy

for Sustainable Development, 30:43–55, 2010.

SPSS. Systat user’s guide: Statistics, version 7.0. spss. Inc., Chicago, IL, 1997.

M. Stephens. Dealing with label switching in mixture models. Journal of the Royal

Statistical Society: Series B (Statistical Methodology), 62:795–809, 2000.

S. Takahashi, M. R. Anwar, and S. G de Vera. Effects of compost and nitrogen fertilizer

on wheat nitrogen use in Japanese soils. Agronomy Journal, 99:1151–1157, 2007.

C. Tetard-Jones, P. N. Shotton, L. Rempelos, J. Cooper, M. Eyre, C. H. Orr, C. Leifert,

and A.M.R. Gatehouse. Quantitative proteomics to study the response of wheat to

contrasting fertilisation regimes. Molecular Breeding, 31:379–393, 2013.

D. Tilman, K. G. Cassman, P. A. Matson, R. Naylor, and S. Polasky. Agricultural

sustainability and intensive production practices. Nature, 418:671–677, 2002.

D. M. Titterington, A.F.M. Smith, and U.E. Makov. Statistical analysis of finite

mixture distributions. New York: Wiley, 1985.

UN. Word population prospects: The 2012 revision, highlights and advance tables.

United Nations, 2013.

K.K. Vinod and S. Heuer. Approaches towards nitrogen-and phosphorus-efficient rice.

AoB Plants, pls 028, 2012.

U. Von Luxburg and V. H. Franz. Confidence sets for ratios: a purely geometric

approach to Fieller’s theorem. Technical Report TR-133, Max Planck Institute for

Biological Cybernetics, Giessen, Germany, 2004.

192

U. Von Luxburg and V. H. Franz. A geometric approach to confidence sets for ratios:

Fieller’s theorem, generalizations, and bootstrap. Statistica Sinica, 19:1095–1117,

2009.

VSN International. Genstat for Windows 15th Edition. VSN International, Hemel

Hempstead UK. 2012. URL http://www.vsni.co.uk/.

D. D. Wackerly, W. Mendenhall, and R. L. Scheaffer. Mathematical statistics with

applications. Belmont:Duxbury, 5th edition, 1996.

Waite Institute. Waite Research Institute Web site. http://

waiteresearchinstitute.wordpress.com/tag/use-efficiency/, a. Last

accessed: 2014-06-12.

Waite Institute. Waite Research Institute Web site.

https://waiteresearchinstitute.wordpress.com/tag/

australian-centre-for-plant-functional-genomics/, b. Last accessed:

2014-06-12.

S. D. Walter, A. Gafni, and S. Birch. A geometric confidence ellipse approach to the

estimation of the ratio of two variables. Statistics in Medicine, 27:5956–5974, 2008.

S.S. Wilks. The large-sample distribution of the likelihood ratio for testing composite

hypotheses. The Annals of Mathematical Statistics, 9:60–62, 1938.

A. Willse and R. J. Boik. Identifiable finite mixtures of location models for clustering

mixed-mode data. Statistics and Computing, 9:111–121, 1999.

C. Witt, A Dobermann, S. Abdulrachman, H.C. Gines, W. Guanghuo, R. Nagarajan,

S. Satawatananont, T. Thuc Son, P. Sy Tan, L. Van Tiem, and D. C. Olk. Internal

nutrient efficiencies of irrigated lowland rice in tropical and subtropical Asia. Field

Crops Research, 63:113–138, 1999.

C. F. Wu. On the convergence properties of the EM algorithm. The Annals of Statistics,

11:95–103, 1983.

193

G. Xu, X. Fan, and A. J. Miller. Plant nitrogen assimilation and use efficiency. Annual

review of plant biology, 63:153–182, 2012.

L. Xu, T. Hanson, E. J. Bedrick, and C. Restrepo. Hypothesis tests on mixture model

components with applications in ecology and agriculture. Journal of Agricultural,

Biological, and Environmental statistics, 15:308–326, 2010.

194

Bivariate models for the analysis of internal nitrogen use ...

Documents

Transcript of Bivariate models for the analysis of internal nitrogen use ...