Bivariate models for the analysis of internal nitrogen use ...
Transcript of Bivariate models for the analysis of internal nitrogen use ...
Bivariate Models for the Analysis of Internal Nitrogen Use
Efficiency: Mixture Models as an Exploratory Tool
Isabel Munoz Santa (Masters of Applied Science in Biometrics)
A thesis submitted for
the degree of Masters of Applied Science in Biometrics
in the University of Adelaide
School of Agriculture Food and Wine
July 2014
ii
Acknowledgments
I would like to express my deepest gratitude to my advisor Dr. Olena Kravchuck for
her expertise, guidance, support and encouragement. This thesis would not have been
possible without her. Thank you for your efforts to make me a better professional and
give me the opportunity to study here!
I would also like to thank my second supervisor Dr. Petra Marschner for her guidance
in the biological aspects of my thesis and Dr. Stephan Haefele who kindly provided
the data for the case study of this thesis and provided support in the interpretation of
the results.
I would like to acknowledge the Faculty of Science for providing the Turner Family
Scholarship which supported this research and provided travel assistance to attend
the Australian Statistical Conference, July 2014 Adelaide, Young Statistician Con-
ference, February 2013 Melbourne, International Biometrics Society Conference, De-
cember 2014 Mandurah and Australian Statistical Conference in conjunction with the
Institute of Mathematical Statistics Annual Meeting, July 2014 Sydney.
I would like to thank all the people at the Biometry Hub: Bev, David, Jules, Paul,
Stephen and Wayne for their big smiles and for creating a positive working environ-
ment as well as the wonderful group of statisticians at Waite, meeting regularly for the
professional development discussions.
Thank you to all the friends I have met in Adelaide, where I have had one of the best
professional, cultural and personal experiences of my life. Thanks to Negar, Mohsen,
Casey, Amanda, Rodrigo, Mariana, Fien, Daniela, Diego, Kanch, Antonio, Maria,
Ruben, Alfonso, Lidia, Pablo, Ana, Antonija, Roey, Chris, Konrad, Diana, Luis and
and Lorinda.
Finally, with deep love and admiration, I thank my family for their love and support
iii
from thousands of kilometres away and Martin for his support, love and contagious
positive vision of life.
iv
Contents
1 Motivations and thesis outline 1
1.1 Feeding the world requires an efficient use of nitrogen fertilisers . . . . 1
1.2 Nitrogen efficiency measures . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Strategies for improving the uptake and utilisation of nitrogen by cereals 8
1.4 Review of grain yield and nitrogen uptake analyses in agricultural research 9
1.4.1 Studies selected for the review . . . . . . . . . . . . . . . . . . . 9
1.4.2 Amount and format of grain yield and nitrogen uptake data . . 12
1.4.3 Relationship between grain yield and nitrogen uptake . . . . . . 12
1.4.4 Common methods of analysis of grain yield and nitrogen uptake
field data and their limitations . . . . . . . . . . . . . . . . . . . 13
1.5 Objectives of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.6 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2 Ratios of jointly normal variables 21
2.1 Introduction to the ratio of jointly normal variables . . . . . . . . . . . 21
2.2 Distribution of the ratio: history and properties . . . . . . . . . . . . . 23
2.2.1 Geary (1930) and Fieller (1932) expressions of the pdf . . . . . 23
2.2.2 Marsaglia (1965, 2006) expression of the pdf . . . . . . . . . . . 24
2.2.3 Pham-Gia et al. (2006) expression of the pdf . . . . . . . . . . . 27
2.3 Normal approximation of the pdf of the ratio . . . . . . . . . . . . . . . 30
2.4 Estimators of the ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.4.1 Point estimators: average of ratios, ratio of averages . . . . . . . 33
2.4.2 Confidence sets of the ratio of expected values . . . . . . . . . . 35
2.5 On the distributional properties of internal nitrogen use efficiency in rice. 42
v
2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3 Fundamentals of finite mixture models 47
3.1 Non-technical introduction . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.2 Common use of mixture models . . . . . . . . . . . . . . . . . . . . . . 51
3.3 Mathematical definition . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.4 Classifying data into groups: label random vectors and posterior prob-
abilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.5 Maximum likelihood estimation of the mixture parameters . . . . . . . 54
3.6 The EM algorithm for the estimation of mixture parameters . . . . . . 55
3.7 The EM algorithm for the estimation of parameters of mixtures of mul-
tivariate Gaussian distributions . . . . . . . . . . . . . . . . . . . . . . 57
3.7.1 Illustration of the EM algorithm on simulated data . . . . . . . 57
3.8 Difficulties in selecting the MLE of mixture models of Gaussian distri-
butions with heteroscedastic components . . . . . . . . . . . . . . . . . 59
3.8.1 Unboundedness of the likelihood function . . . . . . . . . . . . . 60
3.8.2 Multiple local maxima . . . . . . . . . . . . . . . . . . . . . . . 61
3.8.3 Spuriosities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.9 Strategy to select the MLE of mixtures with heteroscedastic Gaussian
components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.9.1 Starting strategies for the EM algorithm . . . . . . . . . . . . . 66
3.10 Bayesian approach to estimating parameters of mixture models of mul-
tivariate Gaussian distributions . . . . . . . . . . . . . . . . . . . . . . 67
3.10.1 The Gibbs sampler . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.10.2 The Gibbs sampler for a mixture of multivariate Gaussian dis-
tributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.10.3 Label switching problem . . . . . . . . . . . . . . . . . . . . . . 72
3.11 Selecting the number of mixture components . . . . . . . . . . . . . . . 74
3.11.1 Information Criteria . . . . . . . . . . . . . . . . . . . . . . . . 74
3.11.2 Likelihood ratio test for selecting the number of clusters . . . . 76
3.12 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
vi
4 Bivariate models for internal nitrogen use efficiency: mixture models
as an exploratory tool 81
5 Conclusions and future lines of research 111
Appendix A List of studies in the review 117
Appendix B Journal information of the studies in the review 131
Appendix C Equivalence between Pham-Gia et al. (2006) and (Marsaglia,
1965, 2006) expressions of the pdf of the ratio 133
Appendix D Application of the EM algorithm for estimating the param-
eters of a mixture of multivariate Gaussian distributions 135
Appendix E R code for fitting mixtures models of univariate Gaussian
distributions 141
Appendix F R code for fitting mixture models of bivariate Gaussian
distributions 153
vii
viii
Abstract
Ratios are commonly used among plant and soil scientists, in particular to express the
plant nutrient utilisation efficiency of macro- and micro-nutrients. The internal nutri-
ent efficiency can be understood in terms of maximising yield per a unit of nutrient in
the plant. At present, IEN data are usually collected from designed field trials where
different treatments are applied (e.g. fertiliser treatments) and analysed by univariate
linear mixed models. However, univariate linear models on the ratio do not maintain
information on the original traits, including their correlation, which presents a chal-
lenge when interpreting the effect of agronomic practices or environmental conditions
on the process of nutrient conversion into grain. Moreover, the distributional proper-
ties of ratios do not comply with the assumptions of these linear models favoured in
the area of soil and plant science research. A more suitable approach is to collect the
traits of interest and to use bivariate analyses. These analyses preserve the information
on the original traits and avoid issues associated with the ratio distributional properties.
If the data comes from field studies, different experimental and environmental con-
ditions may lead to the presence of patterns (groups) in the data in addition or con-
currently with designed treatments. Researchers in plant and soil sciences may be
interested in identifying those conditions, for example to understand the nature of
genotype-by-environment interactions. The inspection of the groups may reveal the
factors defining them, thus gaining insight into the experimental or environmental
drivers of the biological traits. Among bivariate analyses, bivariate mixture models
of Gaussian distributions are an appropriate methodology for identifying clusters in
the nutrient efficiency data, assuming that the traits are jointly normal. Studying this
methodology for the analysis of the internal nitrogen use efficiency traits is the focus
ix
of the present thesis.
The application of bivariate mixture models is suggested here as a complementary
analysis to bivariate mixed models in designed field trials and for exploratory purposes
only. The exploratory and supplementary character of the mixture analysis is due to
the potential violation of the independence assumption when the data are collected
from designed field trials.
In this project, bivariate mixed and mixture models are applied to a real-life de-
signed field trial on non-irrigated rice in Thailand for the analysis of grain yield (GY )
and plant nitrogen uptake (NU) data. The univariate counterparts of these analyses
are also applied on the ratio of these two traits (the internal nitrogen use efficiency).
The advantages of the bivariate analyses are discussed in comparison to the univari-
ate analyses on the ratio. In this case study, the bivariate mixture approach revealed
that soil water availability post-flowering and N supply in soil are the potential factors
defining the mixture groups.
The present work can be readily extended to the analysis of other similar traits in
agriculture when the objective is to explore potential environmental conditions affecting
the traits under study. In order to fully exploit the proposed methodology, field survey
is suggested as a more appropriate sampling procedure for the application of mixture
models than collecting data from designed field trials.
x
Declaration of originality
I certify that this work contains no material which has been accepted for the award of
any other degree or diploma in my name in any university or other tertiary institution
and, to the best of my knowledge and belief, contains no material previously published
or written by another person, except where reference has been made in the text. In
addition, I certify that no part of this work will, in the future, be used in a submission
in my name for any other degree or diploma in any university or other tertiary insti-
tution without the prior approval of the University of Adelaide and where applicable,
any partner institution responsible for the joint award of this degree.
I give consent to this copy of my thesis, when deposited in the University Library,
being made available for loan and photocopying, subject to the provisions of the Copy-
right Act 1968.
I also give permission for the digital version of my thesis to be made available on
the web, via the University digital research repository, the Library Search and also
through web search engines, unless permission has been granted by the University to
restrict access for a period of time.
xi
xii
Common abbreviations in this thesis
BV N bivariate normal distribution
χ2 chi-square distribution
CV coefficient of variation
ρ correlation coefficient
σxy covariance between random variables X and Y
Cov() covariance operator
cdf cumulative distribution function
CVx CV of a random variable X
p dimension of a random vector
D Dirichlet distribution
∼ distributed as
EM Expectation and Maximisation
µx expected value of a random variable X
µ expected value of a random vector
µiexpected value of the i-th component of a mixture of
multivariate normal distributions
E() expected value operator
exp exponential function
F F-distribution
Γ gamma function
GY grain yield
↔ ⇔ if only if
Gi i-th group of a mixture of distributions
xiii
i.i.d. independent and identically distributed
IEN internal nitrogen use efficiency
L() log likelihood function
MLE maximum likelihood estimate
π mixing proportions
ψ mixture parameters
Mult multinomial distribution
MVN multivariate normal distribution
NU nitrogen uptake
g number of groups in a mixture
τij posterior probabilities of the j-th observation in Gi
pdf probability density function
∝ proportional
tr trace of a matrix
n sample size
Φ standard univariate normal cdf
ϕ standard univariate normal pdf
S straw yield
T student’s t-distribution
N univariate normal
σ variance
σx variance of a random variable X
Σ variance-covariance matrix
Σi
variance-covariance matrix of the i-th component of a
multivariate normal distributions
V ar() variance operator
θi vector of parameters of the i-th component
W Wishart distribution
xiv
List of Tables
1.1 Causes of N losses and associated environmental impacts. . . . . . . . . 3
1.2 Main Nitrogen Use Efficiency (NUE) indices. . . . . . . . . . . . . . . . 7
2.1 Conditions for the four scenarios of Fieller’s confidence sets . . . . . . . 38
B.1 Journals and their impact factor for studies in the review . . . . . . . . 131
xv
xvi
List of Figures
1.1 Nitrogen pathway from soil to grain . . . . . . . . . . . . . . . . . . . . 4
1.2 Distribution of the studies in the review by continents . . . . . . . . . . 11
1.3 Distribution of the studies in the review by the year of publication (left)
and the paper citation index (right). . . . . . . . . . . . . . . . . . . . 12
1.4 Typical scatter plots of grain yield and nitrogen uptake in wheat . . . . 13
2.1 Different shapes of the probability density function of the ratio of two
jointly normal variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.2 Probability density function of the ratio of two jointly normal variables
for different values of the coefficient of variation of the denominator (CVx) 32
2.3 Probability density function of the ratio of two jointly normal variables
for different values of the coefficient of variation of the numerator (CVy)
but same CVx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.4 Bootstrap distribution of the estimators defined in Eq. 2.11 (left) and
Eq. 2.12 (right) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.5 Feasible cases of Fieller’s confidence set of the ratio of expected values . 37
2.6 Construction of a wedge given the confidence interval of the ratio of
means (left) and vice versa (right) . . . . . . . . . . . . . . . . . . . . . 40
2.7 Confidence sets of the ratio of expected values in Von Luxburg & Franz
(2009). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.8 Grain yield versus nitrogen uptake from a sample of non-irrigated rice
in northeast Thailand (Naklang et al., 2006) . . . . . . . . . . . . . . . 43
2.9 Probability density function of the ratio of two jointly normal variables
with parameters given in Eq. 2.21 . . . . . . . . . . . . . . . . . . . . 44
xvii
2.10 Bootstrap distribution of the estimators defined in Eq. 2.11 (left) and
Eq. 2.12 (right) for the data in Fig. 2.8 . . . . . . . . . . . . . . . . . . 44
3.1 Solutions of three clusters obtained by the EM algorithm for the case
study (Chapter 4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.2 Spurious solution obtained by the EM algorithm when fitting 7 compo-
nents to the case study data (Chapter 4). . . . . . . . . . . . . . . . . . 50
3.3 Bimodal distribution generated from a mixture of three univariate nor-
mal components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.4 Scatter plot of a sample from a mixture two bivariate normal . . . . . . 58
E.1 Histogram of the internal nitrogen use efficiency data and the mixture
components found by the EM algorithm initiated from random starts . 144
E.2 Histogram of the internal nitrogen use efficiency data and the mixture
components found by the EM algorithm initiated from a partition pro-
vided by the K-means algorithm . . . . . . . . . . . . . . . . . . . . . . 145
E.3 Histogram of the internal nitrogen use efficiency data and the mixture
components found by the EM algorithm initiated on a random subsample147
E.4 Histogram of the internal nitrogen use efficiency data and the mixture
components found after running several short runs of the EM algorithm 149
F.1 Cluster partition found by the EM algorithm initiated from random starts159
F.2 Cluster partition found by the EM algorithm initiated from the partition
obtained by the K-means algorithm . . . . . . . . . . . . . . . . . . . . 162
F.3 Cluster partition found by the EM algorithm initiated from simulated
means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
F.4 Cluster partition found by the EM algorithm initiated from the mixture
estimates obtained by running the EM algorithm on a random subsample
of 200 observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
F.5 Cluster partition found by the EM algorithm initiated from the mixture
estimates after running several short runs of the EM algorithm . . . . . 179
xviii
Chapter 1
Motivations and thesis outline
1.1 Feeding the world requires an efficient use of nitrogen
fertilisers
The world is experiencing a rapid increase of its population, from the current 7.2 bil-
lion to the expected 9.6 billion by 2050 (UN, 2013). This growth trajectory implies
an increase in the demand for cereals– for their direct consumption as the main staple
food, and indirectly, since the demand for meat is also increasing and cereals are used
for feeding animals (Cassman et al., 2002). According to the FAO (2009), to feed the
expected 9.6 billion of people will require producing one additional billion tonnes of
cereals per year. However, the grain production is constrained by environmental con-
ditions, especially by the availability of water and nitrogen (N) (Andrews et al., 2009).
Nitrogen is, after carbon (C), the major nutrient required by plants in largest quan-
tities (Marschner & Marschner, 2012, p.135). Plants take up N from soil, mainly as
nitrate and ammonium, to produce their proteins and nucleotides (Xu et al., 2012).
A fraction of this N is stored in the grain and removed from the field at harvest. If
additionally the straw is removed, the N depletion of the soil is greater.
To replace this nutrient and maintain the soil fertility status, N fertilisers are ap-
plied to provide plant nutrients in addition to the indigenous N in the soil. Nitrogen
fertilisers have been essential for the steady increase of the production of grain over
the last decades (Follett, 2001; Hirel et al., 2007), which explains the high demand for
1
this commodity (≈ . 169 Tg/year, (Smil, 1999)).
Not all the N applied to soil is taken up by plants. Although variable across coun-
tries (Table 5 in Bouwman et al. (2005)), it is estimated that more than half of the N
fertiliser applied is lost into the environment (Tilman et al., 2002). The losses of N are
due to the fact that nitrates are not easily retained by the soil matrix (Olarewaju et al.,
2009) and can be transported into water bodies through leaching and surface runoff
(Raun & Johnson, 1999). Additionally, N can be lost to the atmosphere through deni-
trification, ammonia volatilization and gaseous emission from leaves (Fageria & Baligar,
2005).
Nitrogen losses result in large economic, energy and environmental costs. The eco-
nomic loss is estimated 15.9 billion US dollars per year (Raun & Johnson, 1999). The
energy cost is related to the fertiliser manufacturing process (Haber-Bosch) in which
high pressure and temperature are needed to convert N2 in the atmosphere into am-
monia (NH3) (Galloway et al., 2003). Finally, the environmental impacts (Table 1.1)
are related to the pollution of soil, water and atmosphere (Ladha et al., 2005).
Minimising the losses of N is thus essential to satisfy the economic and popula-
tion growth needs without compromising the future of our planet. This is the base
of sustainable agriculture (Tilman et al., 2002), stated by Spiertz (2010) as the 3-P
principle, which consists of meeting ‘population’, ‘profit’ and ‘planet’ requirements.
To meet the 3-P principle, it is necessary to be efficient by achieving the maximum
possible yields with an optimum and responsible use of fertilisers and natural resources.
To quantify the N use efficiency, scientists have defined a variety of indices (Table 1.2)
and identified strategies to increase them (Section 1.3). However, the expected effects
of these strategies are not always achieved since the N utilisation depends on a myriad
of complex interactions between plant, soil, climate and microorganisms affecting the
physiological processes in plants (Figure 1.1).
2
Table 1.1: Causes of N losses and associated environmental impacts.
Cause of N loss Description Environmental impact
Denitrification Conversion of nitrate (NO−3 ) into N
gases (N2, NO, N2O)
Global warming and atmosphere
ozone depletion
Ammonia volatilization Conversion of urea (CO(NH2)2)
into N gases (NH3)
Acid rain, which leads to soil acidi-
fication.
Gaseous plant emission Release of NH3 from leaves Acid rain, which leads to soil acidi-
fication.
Leaching N is transported downward after ir-
rigation or strong precipitations
Groundwater contamination and
eutrophication∗
Surface runoff Water flows over the soil surface
and moves N into streams and lakes
Eutrophication
∗An excessive amount of nitrates in water results in a great increase of algae causing oxygen depletion
and putrefaction of the water.
Information has been compiled from Fageria & Baligar (2005); Ladha et al. (2005); Raun & Johnson
(1999).
3
Total amount of N in soil: indigenous N and external sources from organic and
inorganic fertilisers.
N in soil
N potentially taken up by plants. The N availability is subjected to a range
of factors such as the amount and source of N in soil, the availability of other
nutrients (e.g. C:N ratio), the activity of microorganisms (mineralization and
inmobilization processes), climate and soil characteristics (e.g. pH and moisture
content).
available N
N taken up by roots. N absorption depends on the availability of nutrients,
climate factors (e.g. availability of water), the crop demand for nutrients which
varies with the type of crop, the growth stage of the plant, the plant genotype
and the root distribution.
N absorbed
Concentration of N multiplied by the dry weight of plants. It differs from N
absorbed due to leaf senesce and fall, insect attacks, the root turnover and gaseous
plant emissions through the plant canopy.
N taken up
Concentration of N in grain multiplied by the grain yield. It can be affected by
temperature and drought stresses, variety, diseases etc.
N in grain
Figure 1.1: Nitrogen pathway from soil to grain. [Compiled from Fageria & Baligar (2005);
Marschner & Marschner (2012); Marschner (2013, pers. comm., 3 January); Xu et al. (2012)]
4
1.2 Nitrogen efficiency measures
Nitrogen in soil has to undergo different steps before being used in grain production
(Figure 1.1). To quantify the efficiency of N utilisation in these steps, plant and soil
scientists have suggested several Nitrogen Use Efficiency (NUE) indices. Table 1.2 dis-
plays the most common NUE indices found in plant and soil science literature.
Among various NUE indices, this thesis focuses on the statistical analysis of internal
N use efficiency, IEN (Eq. 1.1). This index measures the ability of plants to convert
N content in aboveground biomass into grain yield.
IEN =GY
NU(1.1)
where GY (kg/ha) is grain yield and NU (kg/ha) is N uptake – content of N in
aboveground plant parts. In field trials, NU is commonly measured (e.g. Naklang
et al., 2006) as follows1:
NU =m[N ]GGY + [N ]SS
1000(kg/ha) (1.2)
where [N ]G is N concentration in dry grain (g/kg), [N ]S is N concentration in straw
(g/kg), GY is grain yield (kg/ha), S is straw yield (kg/ha) and m is the standard
moisture correction factor, equal to 0.86.
Internal nitrogen use efficiency can be directly linked to another important agricul-
tural efficiency measure; the so-called harvest index, HI, (Eq. 1.3)
HI =GY
GY + S(1.3)
The pressure of the last decades to increase GY has made HI an important trait for
variety selection in plant breeding. Although HI emphasises the partition of carbon
(C) in plant (Sinclair, 1998), the importance of N for grain production induces a close
1Notice that no direct measures of NU are available. However, the fact that NU is a derived
variable from others is not investigated in this thesis.
5
association between HI and IEN (Sinclair, 1998). In particular, by Eq. 1.1, 1.2 and
1.3, the following relationship is derived:
IE−1N =
1
1000[m[N ]G + [N ]S
(HI−1 − 1
)]
6
Table 1.2: Main Nitrogen Use Efficiency (NUE) indices.
NUE index Definition Formula
Partial factor
productivity
(PFPN )
Ratio of the total grain yield to the level of N
fertiliser applied.PFPN =
GY
FN(kg kg−1)
Agronomic effi-
ciency (AEN )
Ratio of the increment in grain yield to the
level of N fertiliser applied. It differs from
PFPN in that it provides information about
the increase of grain due to fertilising.
AEN =(GY −G0)
FN(kg kg−1)
Recovery effi-
ciency (REN )
Ratio of the increment of N uptake to the level
of N fertiliser applied. It measures the ability
of plants to take up N from applied fertilisers.REN =
(NU −N0)
FN(kg kg−1)
Internal N use
efficiency (IEN )
Ratio of the total grain yield to the total N up-
take. It measures the ability of plants to utilise
N content in biomass for producing grain.IEN =
GY
NU(kg kg−1)
Physiological ef-
ficiency (PEN )
Ratio of the increment in grain yield to the
increment in N uptake due to fertilising. It
measures the ability of plants to utilise the in-
crement in N uptake due to fertilising to pro-
duce grain.
PEN =(GY −G0)
(NU −N0)(kg kg−1)
NUE: Nitrogen Use Efficiency. GY : grain yield. FN : level of fertiliser applied. G0: grain yield in
plots with no fertiliser (indigenous N only). NU : N uptake measured in aboveground biomass. N0:
N uptake measured in aboveground biomass in a plot with no fertiliser. Compiled from Dobermann
(2005).
7
1.3 Strategies for improving the uptake and utilisation of ni-
trogen by cereals
Maximising the production of grain and optimising the use of N fertiliser are major
priorities for soil and plant scientists. In the last 70 years, substantial effort has been
undertaken to develop strategies which enable an increase in the uptake and utilisation
of N for grain production (Evenson & Gollin, 2003). These strategies have been focused
on the improvement of agronomic practices and plant genotypes (Cassman et al., 2002;
Raun & Johnson, 1999; Spiertz, 2010; Tilman et al., 2002):
• Improvement of agronomic practices: better matching of the fertiliser ap-
plication to the crop demand, crop choice, crop rotation with legumes, utilisation
of slow release fertilisers, conservation tillage systems, cover crops with legumes,
utilisation of organic compost and green manures, water management and sowing
time management (Fageria & Baligar, 2005). These practices may differ depend-
ing on the climatic region. For instance, in arid regions the amount of N applied
is lower than in wetter regions because plants tend to grow less. Furthermore, in
wetter regions N loss via leaching or denitrification is high so more N is applied
although in small but frequent doses (Marschner 2014, pers. comm., 31 October)
.
• Improvement on plant genotypes: plant breeding or genetic engineering
(see Marschner & Marschner (2012, Section. 6.1.6)). At present, several fam-
ily of genes which can potentially increase NUE have been identified in rice
(Vinod & Heuer, 2012) and wheat (Foulkes et al., 2009; Hawkesford, 2014). How-
ever, these genotypes have been mostly tested in laboratory conditions and there
are not enough field studies to assess the genotype by environment interactions
(Hawkesford, 2014). The need of more research and the opposition of part of the
population to adopt genetically modified food makes this strategy a long-term
prospect.
8
Improving N efficiency has been also a priority at the Waite Campus where exten-
sive research across disciplines has been undertaken to develop strategies for increasing
the efficiency in nitrogen utilisation by cereals (e.g. Australian Centre for Plant Func-
tional Genomics, 2014; Waite Institute, 2014a; Waite Institute, 2014b). A comprehen-
sive discussion about the best approaches to increase the efficiency is beyond the scope
of this thesis. This thesis is focused on implementing new statistical methods which
can provide more insight into the conversion of NU into GY and how this conversion
is affected by environmental factors. The identification of such factors can help plant
and soil scientists to better understand the conversion process of NU into GY and to
develop specific management practices for increasing IEN .
With these objectives in mind, a literature search for current statistical methods
in the analysis of GY and NU in published agricultural research was carried out.
The objectives of this literature search were to determine: 1) the amount and format
of published GY and NU data, 2) typical trends between the two traits and 3) the
current most common statistical techniques employed to analyse these data.
1.4 Review of grain yield and nitrogen uptake analyses in agri-
cultural research
1.4.1 Studies selected for the review
1.4.1.1 Searching criteria
A selection of 100 studies (see Appendix A) from plant and soil science literature was
collected using the search engines of Web of Science and Google Scholar and employing
the following three searching criteria:
1. A combination of keywords such as yield, nitrogen, uptake, internal efficiency,
physiological efficiency, utilisation efficiency, rice or wheat were used for topic
searching e.g. yield and nitrogen and (rice or wheat). The searching was restricted
by research affiliations; in particular, to the International Rice Research Institute,
9
Indian Agricultural Research Institute and all the Australian affiliations (CSIRO,
University of Adelaide, etc.). These are world-wide recognised institutions in
cereal research; hence, a source of reliable data. Furthermore, studies performed
before 1980 were not included in the selection.
2. Articles which were referenced by the previously selected articles and articles
which cited them. These manuscripts were included to widen the searching frame-
work.
3. Studies containing both GY and NU data and performed in the field (farmers or
research stations fields).
1.4.1.2 Main features of the studies selected
In this section, I present the most important features of the studies evaluated in rela-
tion to: 1) journals in which the studies were published, 2) places where the studies
were conducted and 3) year of publication and manuscript citation index.
Most of the journals in which the studies were published belong to the category of
Agronomy followed by Agriculture Multidisciplinary (Table B.1). Within these cate-
gories, an impact factor close or higher than three is considered to be high. Therefore,
51% of the studies selected are considered to be published in high impact factor jour-
nals (Table B.1): Agriculture, Ecosystems and Environment (1%), Molecular Breeding
(1%), European Journal of Agronomy (8%), Plant and Soil (13%) and Field and Crops
Research (28%). These journals are placed in the top ten of their categories for a
total of 78 journals in Agronomy, 57 in Agriculture Multidisciplinary and 195 in Plant
Science.
Most of the studies selected were carried out in Asia (72%); mainly China (30%),
India (17%) and Philippines (10%). About 10%, 9%, 4% and 3% of the studies se-
lected were performed in Oceania, Europe, America and Africa, respectively. The 2%
of the studies were carried out in several countries belonging to, at least, two continents
10
(Others), see Figure 1.2. This distribution is explained by our searching criteria, which
gave priority to well-recognised Asian and Australasian institutions.
The majority of the studies were published in the last 8 years (Figure 1.3, left).
This partially explains why 31% of the manuscripts have been cited less than 3 times
and 59% have less than 20 cites (Figure 1.3, right)
Figure 1.2: Distribution of the studies in the review by continents. [Others refers to
studies which present experiments carried out in at least two countries from different continents.]
11
Figure 1.3: Distribution of the studies in the review by the year of publication (left)
and the paper citation index (right).
1.4.2 Amount and format of grain yield and nitrogen uptake
data
There is a vast amount of experimental data on GY and NU collected from designed
field trials. The data are mostly presented in: 1) tables, which display the means of
GY , NU or IEN across treatments, or 2) scatter plots in which GY is plotted against
NU . In the latter, least-square exponential or polynomial regressions are commonly
fitted to model the trend of GY on NU (Figure 1.4, left). Alternatively, two straight
envelope lines delimiting the points of the scatter plot are displayed (Figure 1.4, right)
1.4.3 Relationship between grain yield and nitrogen uptake
The trend of GY on NU usually follows a saturation curve. At low levels of NU , a small
increment of the latter sharply increases GY ; however, this effect flattens out at high
levels of NU (Figure 1.4, left). This pattern has been shown in Dobermann & Cassman
(2002); Naklang et al. (2006); Witt et al. (1999). This trend shape is due to the fact
that low levels of the N content in the plant limit GY (Xu et al., 2012). However, once
12
Figure 1.4: Typical scatter plots of grain yield and nitrogen uptake in wheat. [A
curvilinear regression has been fitted in the graph to the left (Takahashi et al., 2007). The open and
close points refer to plots with or without compost application, respectively. In the graph to the right,
two lines delimiting the points of the scatter plot are presented (Liu et al., 2006). The circles and
triangles refer to plots with and without fertiliser application, respectively]
the plant has accumulated enough N, other factors (e.g. other major or minor nutrients,
water or low rate of photosynthesis (Foulkes et al., 2009)) may be more limiting than
N . The flattening effect is also produced because there is a maximum quantity of grain
that plants can produce (Fowler, 2003; Otteson et al., 2007). If the amount of N in
plant is excessive, N may have a negative effect decreasing the grain production (Goyal
& Huffaker, 1984, Chapter. 6). Therefore, there is a change in the correlation between
both variables GY and NU (Figure 1.4, left). The correlation reveals how N taken
up is utilised for grain, and it changes depending on the environmental conditions,
agronomy practices and plant genotypes. We will return to this issue later in Chapter
3 and 4.
1.4.4 Common methods of analysis of grain yield and nitrogen
uptake field data and their limitations
The current most common techniques to analyse GY and NU data in agricultural
science journals can be divided into 1) statistical techniques and 2) biological quantita-
13
tive models. The limitations of these techniques to provide additional understanding of
the conversion of NU into GY are discussed below. The standard statistical software
packages are also detailed.
Statistical techniques:
• Least square regressions
This technique is used to model the trend of GY on NU . Around 16% of the stud-
ies in our literature selection used simple least-square polynomial or exponential
regressions. The main objective is to check if both variables are related and the
type of relationship they have e.g. linear, curvilinear, linear-plateau. However,
it is not clear if the observed scatter (e.g. Figure 1.4) is a direct functional of
the response of GY to NU or/and an overlay of growth processes across different
conditions. Furthermore, the least-square regressions of GY on NU ignore the
external factors affecting the relationship between both variables which are of
great interest to farmers and have to be taken into account to understand the
utilisation of NU for GY . In addition, one of the necessary conditions for simple
least-square regressions is that the variance along the fitted line is constant (ho-
mocedasticity of the error). However, as observed in Figure 1.4 (left hand side),
this requirement is not necessarily fulfilled.
• Marginal analyses on GY and NU data
These analyses are mainly used to test for differences between the means of GY
and NU across experimental treatments. Around 80% of the studies in our
selection performed marginal analyses of GY and NU . In the majority of these
studies the technique used is the univariate analysis of variance followed by a
post-hoc analysis, such as the Least Significant Difference test or the Duncan’s
Mean Range test. However, such analyses ignore the joint behaviour of the
variables, which poses a serious limitation on getting an in-depth understanding
of the conversion of NU into GY .
14
• Analyses on IEN
These analyses are mainly applied to test if there is any statistical difference
between the means of the ratio across treatments. Internal N use efficiency is
computed at each sampling unit (at which NU and GY are measured) and com-
monly analysed by the univariate analysis of variance. Since the distribution
of the ratio of two jointly normal variables is a mixture2 of non-normal heavy-
tailed distributions (Marsaglia, 1965, 2006), the analysis of the ratio may violate
the requirements for non-abnormality and homogeneity of error variances; thus,
inferential conclusions may not be reliable. Around 40% of the studies in our
literature selection carried out analyses on the ratio. Just four of them (Albrizio
et al., 2010; Giambalvo et al., 2010; Liu et al., 2011; Tetard-Jones et al., 2013)
stated to have checked the deviation from the normality assumption and three
(Albrizio et al., 2010; Giambalvo et al., 2010; Liu et al., 2011) the homogeneity
of error variances.
Analyses of variance were mostly employed to assess the effect of different fertiliser
treatments on GY , NU or IEN . However, N utilisation for grain production depends
on the available N in soil rather than on the amount of N applied (Figure 1.1). The
fraction of available N can substantially differ from the applied N due to complex envi-
ronmental interactions between climate, soil and plant (Marschner & Marschner, 2012,
p.315). Thus, plots receiving the same fertiliser treatment may present different levels
of available N, resulting in non-uniform realisations of treatments (ill-defined). Thus,
special care is required when designing such field trials as well as a high control on
the agronomy practices to ensure treatments applications are as consistent as possi-
ble. In addition, and in order to better understand the interactions of these fertiliser
treatments with non-controlled environmental conditions in the field, complementary
analyses to the ones commonly used (e.g. linear mixed models) may be required.
Biological quantitative models:
Biological models like Quantitative Evaluation of Fertility Tropical Soil (QUEFTS)
2see Chapter 3
15
(Janssen et al., 1990) or crop growth simulations models (see Cassman et al. (1998,
Section. 6.5) for a review) describe internal processes in plants and are based on the-
oretical equations derived from physical principles. Statistics is used to validate such
models. Consequently, these models are deterministic and beyond the scope of this
thesis. In our literature selection, 8% of the studies used the QUEFTS model and 3%3
employed crop growth simulation models.
Statistical software
The statistical software packages used to analyse GY and NU together with the per-
centage of studies in our literature search employing them are listed as follows. Sta-
tistical Analysis System, SAS, (34%) (SAS Institute, 2013), SPSS (8%) (SPSS, 1997),
Genstat (6%) (VSN International, 2012), IRRISTAT (5%) (IRRI, 1994), Excel (2%)
(Excel, 2010), R (2%)(R Core Team, 2012) and Others (2%). The rest (41 %) of the
studies did not provide any details on the statistical sofware used. This percentage
corresponds to studies which developed biological quantitative models (11%) or did
not mention the software employed (30%).
1.5 Objectives of the thesis
Statistical techniques currently used in plant and soil science publications to model GY
and NU data have serious limitations in contributing to a fundamental understanding
of the conversion process of NU into GY . In this thesis, I argue that a better approach
is to analyse GY and NU data jointly with bivariate analyses. For instance, bivari-
ate linear mixed models on (GY,NU) are more appropriate for comparing treatments
effects than univariate linear models on GY , NU or IEN . Recently, Ganesalingam
et al. (2013) proposed bivariate mixed models as an alternative to univariate mixed
model on a ratio for the analysis of plant survival data. As they showed, the bivariate
approach preserved the information on the original variables, allowed modelling the
spatial correlation for each trait, better utilised the experimental data and increased
3The percentages presented in this section does not sum up to 100% because some studies performed
more than one type of analysis
16
the accuracy of predictions for variety survival.
In designed field experiments, both GY and NU can be greatly affected by non-
controlled environmental conditions. These environmental conditions may often be
confounded with designed factors or interact with them resulting in non-uniform treat-
ments. This may complicate the interpretation of bivariate mixed model analyses. Our
hypothesis is that GY and NU field data collected across a range of environments can
reflect a heterogeneous population composed by subpopulations. Each of these sub-
populations (clusters) can be considered a separate environment. The identification
of such clusters and the close inspection of potential factors defining them can shed
additional light on the nitrogen utilisation process.
Finite mixture models of bivariate Gaussian distributions can be used to identify
such clusters and are proposed here as a complementary analysis to bivariate mixed
models for data collected in field experiments. The main benefits envisaged for the
analysis of the internal N efficiency traits are stated as follows. This approach 1)
preserves information on GY and NU , 2) acknowledges the change in the correlation
between GY and NU across environments (clusters), 3) avoids dealing with the ratio
distribution, 4) allows estimation of IEN within each cluster 5) provides additional
insight into potential environmental conditions affecting the mechanism of the N utili-
sation for grain production. In terms of agronomic studies, the identification of clusters
is useful to determine samples which belong to the same environment. The identifi-
cation of the factors defining each of the environments could be useful to implement
agronomic practices which counteract potential adverse environmental conditions. For
instance, the lack of rain at a certain growth stage of the plant could be identified as
one of the potential factors affecting N utilisation. To counteract drought stress in
non-irrigated systems, farmers could cover the crops with straw to retain the moisture
of the soil. The identification of these environmental conditions is also important to
improve further experimental designs or to include them in further statistical models.
For instance, to predict the best sowing time according to the local conditions of the
17
field.
The objective of this thesis is to investigate the benefits of the bivariate mixture
methodology as a complementary analysis of bivariate mixed models for field trials
in the presence of strong environmental conditions. At this current stage, mixture
models should not be applied alone due to the fact that GY and NU data are mostly
collected from designed field trials. Data from such trials may violate the assumption of
independence of mixture models4. Despite this limitation, I believe that the technique
can be useful for identifying environmental factors which may overshadow, or interact
with, treatment effects. Thus, the combination of bivariate mixed and mixture models
can provide a more insightful interpretation of the conversion process of NU into GY
in field trials.
1.6 Thesis outline
In this chapter I have presented the biological background and the motivation for this
project. The remainder of the thesis is presented in four chapters.
Chapter 2 presents the distributional properties of ratios of jointly normal variables.
The objective of this chapter is to draw the attention of the reader to the pitfalls of
analysing ratios with simple statistical techniques. The expressions of the probability
density function (pdf) of these types of ratios are reviewed highlighting the mixture
nature and possible heavy-tailedness of their distribution. I also revise the main esti-
mators of the ratio of expected values and properties. Finally, analytical and graphical
procedures for calculating confidence sets of the ratio of expected values are reviewed.
Chapter 3 exposes the theoretical fundamentals of finite mixture models of bivari-
ate Gaussian distributions and provides the theoretical framework of the methodology
of this thesis. I revise the frequentist and Bayesian approaches for mixtures as well as
the difficulties encountered when fitting mixture models of Gaussian distributions with
heteroscedastic components and common recommendations for overcoming them.
4See Chapter 3
18
In Chapter 4, the methodology of bivariate mixture models together with bivariate
mixed models for the analysis of (NU , GY ) is demonstrated for a field study reported
in Naklang et al. (2006). The benefits of the bivariate analyses (mixture and mixed
models) are discussed in comparison with their respective univariate counterparts on
IEN . This chapter is presented as a manuscript, following the required format for
submission to the Australian and New Zealand Journal of Statistics.
Finally, Chapter 5 presents the general conclusions of the thesis. I argue that
ideally the methodology of bivariate mixture models should be applied in field surveys
to fully exploit the potential of the technique as a means for identifying environmental
factors affecting NU and GY . Simulation studies to assess the coverage of Fieller’s
confidence intervals (Fieller, 1954) of the ratio of expected values for each component
of the mixture is proposed as a future line of research. Other research gaps identified
during the development of the present project are discussed.
19
20
Chapter 2
Ratios of jointly normal variables
2.1 Introduction to the ratio of jointly normal variables
Ratios are commonly used in agricultural research to quantify one variable with respect
to another. For instance, the efficiency of N utilisation by plants is quantified by sev-
eral indices (Table 1.2). However, researchers in agriculture and biometricians are not
always aware of the distributional properties of ratios and choose to apply simple sta-
tistical methods, which assume normality and homocedasticity of error variance, on the
ratio observations. Violations of these assumptions may lead to non-reliable inferences.
A ratio can take atypical large values if its denominator goes close to zero. This
results in higher probabilities of having outliers and heavy-tailedness. In particular, if
(X, Y ) are jointly normal, the probability density function (pdf) of the ratio R = Y/X
is a mixture1 of two heavy-tailed distributions, one of the components being Cauchy
distributed (Marsaglia, 1965, 2006)2.
The presence of a Cauchy component in the pdf of R results in the non-existence
of the expected value, E(Y/X), and the variance, V ar(Y/X). Since E(Y/X) does not
exist, E(Y )/E(X) is used for inference purposes (e.g. Lai et al., 2004). The confidence
sets of E(Y )/E(X) comprise of asymmetric bounded intervals, unbounded intervals or
the entire real line (Fieller, 1954; Von Luxburg & Franz, 2009). These solutions are
1See Chapter 32Further details on the study of Marsaglia (1965) were given in Marsaglia (2006)
21
derived in the following situations (Von Luxburg & Franz, 2009):
• The denominator is far from zero. Then, there are no problems in computing the
ratio and the confidence sets of E(Y )/E(X) are bounded intervals.
• The numerator is far from zero but the denominator is not. Then, the denomina-
tor can take positive or negative values, and E(Y )/E(X) can result in arbitrarily
large values of unknown sign. In this case, the confidence sets are in the form of
]−∞, q1] ∪ [q2,∞[ with q1 < 0 and q2 > 0. The only possibility for E(Y )/E(X)
to be in ]q1, q2[ is by having a small numerator or a large denominator, which is
not possible in this assumed situation.
• The denominator and the numerator can both take values close to zero. Then,
we have an indeterminacy of the type 0/0 and E(Y )/E(X) can take any value.
Thus, the confidence set is the entire real line.
This chapter revises the distributional properties of the ratio R, where (X, Y ) is
a jointly normally distributed (BV N) variable, with the following expected value (µ)
and variance-covariance(Σ):
µ = (µx, µy); Σ =
σ2
x σxy
σxy σ2y
(2.1)
The properties that will be reviewed in this chapter are readily applied to the IEN
distribution. Recall from Section 1.2 that IEN is defined as the ratio of GY to NU .
In field trials, GY and NU are measured at harvest and cumulatively on each experi-
mental plot. Conditional on major factors and assuming that the plant measurements
are weakly correlated, it can be assumed that at the plot level both GY and NU are
normally distributed (see Central Limit Theorem in Cramer, 1946, p. 219). Under
the same assumptions, the Central Limit Theorem can be extended to the bivariate
case and one can consider (NU , GY ) to be jointly normal (see Cramer, 1946, p. 286).
The chapter is structured as follows. Firstly, I present the main results regarding
the pdf of R and review the cases for which this distribution is approximately normal.
22
Then, I discuss the most typical estimators of E(Y )/E(X) and their performance.
Finally, I review analytical and graphical procedures (Fieller’s rule and a geometric
approach) for deriving the confidence sets of E(Y )/E(X).
2.2 Distribution of the ratio: history and properties
2.2.1 Geary (1930) and Fieller (1932) expressions of the pdf
The research on ratios of two jointly normal variables goes back to the 1930s when
Geary (1930) formulated the analytical expression of the pdf of R for the particular
case of µ = (0, 0).
fR(r) =σxσy
√1− ρ2
π(σ2xr
2 − 2ρσxσyr + σ2y)
(2.2)
where ρ is the correlation coefficient.
Since then, several authors have provided an expression of the pdf of R without re-
strictions on the means. The first to suggest a solution with µ being an arbitrary real
vector was Fieller (1932). The solution in Fieller (1932) was re-expressed in Hinkley
(1969):
fR(r) =b(r)d(r)√
2πσyσxa3(r)
[Φ b(r)√
(1− ρ2)a(r) − Φ− b(r)√
(1− ρ2)a(r)]
+
√1− ρ2
πσxσya2(r)exp− c
2(1− ρ2) (2.3)
where:
a(r) =
√r2
σ2y
− 2ρr
σxσy+
1
σ2x
b(r) =µyr
σ2y
− ρ(µy + µxr)
σxσy+µxσ2x
c =µ2y
σ2y
− 2ρµxµyσxσy
+µ2x
σ2x
d(r) = exp b2(r)− ca2(r)
2(1− ρ2)a2(r)
23
and Φ and exp are the cumulative distribution function (cdf) of the standard uni-
variate normal and the exponential function, respectively.
Fieller (1932), Geary (1930) and Hinkley (1969) derive the pdf of R by performing
the following change of variable:
X = X and Y = RX,
fX,R(x, r) = φX,Y (x, rx|µ, Σ)|x|.
The marginal density fR(r) is calculated by integrating fX,R with respect to x. Notice
that if µx = µy = 0, Eq. 2.3 reduces to Eq. 2.2. Furthermore, if ρ = 0 and σx = σy, Eq.
2.2 is the standard Cauchy distribution (Pham-Gia et al., 2006). The latter example
shows how much the distribution of R can differ from a normal.
The expressions of the pdf of R in Fieller (1932) and Hinkley (1969) are complex
and do not give much insight into its distributional shape. In 1965, Marsaglia (1965,
2006) expresses this pdf as a mixture of two heavy-tailed distributions. Marsaglia’s ex-
pression provides more intuition about the shape of the distribution, which can differ
greatly from the normal and can present skewness, bimodality and heavy tails.
2.2.2 Marsaglia (1965, 2006) expression of the pdf
The motivation problem for Marsaglia (1965) was a regression analysis to model the
number of red cells (y) in blood against time (r), y = α + βr. The r-intercept
(r = −α/β) was used to estimate the red cell life span. Thus, the r-intercept dis-
tribution and its expected value were of medical interest to detect anomalies related
to patients with the red cell life span shorter than expected. For modelling purposes,
Marsaglia (1965) considered the r-intercept to be the ratio of two normal variables.
Marsaglia (1965, 2006) proved that R can always be transformed, by a translation
and a change of scale, into a ratio of the form T =a+ U
b+ V, where U and V are
24
independent standard normal variables and a and b are non-negative constants. Thus,
instead of R, it is sufficient to study the ratios of the form T =a+ U
b+ V. The results in
Marsaglia (1965, 2006) are summarised in Theorem 1 and 2.
Theorem 1. Let (X, Y ) ∼ BV N(µ,Σ), Eq. 2.1. Then, ∃ s, w, a and b such that:
w
(Y
X− s)
= w
(Y − sXX
)∼ a+ U
b+ V↔ Y
X∼ 1
w
(a+ U
b+ V
)+ s
where U and V are two independent standard normal variables. The values of w, s, a
and b are given by
s = ρσy/σx w =σx
∓σy√
1− ρ2(2.4)
a = ∓µy/σy − ρµx/σx√1− ρ2
b = µx/σx (2.5)
Proof. (Sketch) Firstly, the value of s (translation) is selected such that (Y − sX) /X
is the ratio of two independent normal variables. Therefore,
E((Y − sX)X) = E(Y − sX)E(X)↔ E(Y X)− sE(X2) = E(Y − sX)E(X)↔
ρσxσy + µxµy − s(σ2x + µ2
x) = µxµy − sµ2x ↔ ρσxσy − sσ2
x = 0↔ s =ρσyσx
Secondly, the value of w (change of scale) is chosen such that Y − sX and X have the
standard deviations equal to 1. Thus, w = ±√V ar(X)/
√V ar(Y − sX). Finally, the
choice of a and b is straightforward by considering that U ∼ N(0, 1)↔ a+U ∼ N(a, 1),
where N denotes the univariate normal distribution. Consequently, a and b are chosen
to be the expected value of (Y − sX)/√V ar(Y − sX) and X/
√V ar(X), respectively.
For more details on the proof, refer to Marsaglia (2006).
Theorem 2. The pdf of the ratio T =a+ U
b+ Vis a mixture of two heavy-tailed distri-
butions given by:
f(t) = pf1(t) + (1− p)f2(t) (2.6)
where:
f1(t) =1
π(1 + t2)and f2(t) =
q∫ q
0exp−x2−q2
2dx
π(1 + t2)(expa2+b2
2 − 1)
with:
p = exp−(a2 + b2)
2 and q =
b+ at√1 + t2
25
Proof. (Sketch) The cdf of T =a+ U
b+ V, F (), is related to the function of Nicholson
(Nicholson, 1943), denoted here by H() as follows:
F (t) =1
2+
1
πtan−1(t) + 2H
(bt− a√1 + t2
,b+ at√1 + t2
)− 2H(b, a) (2.7)
where:
H(q, h) =
∫ q
0
∫ hx/q
0
ϕ(x)ϕ(y)dydx
and ϕ the standard normal pdf.
Then, the pdf of T =a+ U
b+ Vis obtained by taking the derivative of F (t) (Eq. 2.7) with
respect to t. For more details, refer to Marsaglia (1965).
The pdf f(t) (Eq. 2.6) can substantially differ from a normal. Notice that f1(t)
is the Cauchy. Thus, the moments of f(t) do not exist. However, if b > 4, Marsaglia
(2006) provides approximate values of the mean (µ) and the variance (σ2) for practical
purposes:
µ = a/(1.01b− 0.2713) σ2 = (a2 + 1)/(b2 + 0.108b− 3.795)− µ2
The pdf f(t) can be skewed or even bimodal depending on the values of a and b. Once
the pdf of T is obtained, it is straightforward to calculate the pdf of R by performing
the change of variable t = w(r − s), which leads to fR(r) = |w|fT (w(r − s)). An
illustration of possible shapes of the pdf of R is shown in Figure 2.1.
26
Figure 2.1: Different shapes of the probability density function of the ratio of two
jointly normal variables. [The parameters used are (µx, µy, σx, σy, ρ), (2, 38, 8, 24, 0.16) (left);
(2, 20, 8, 24, 0.66) (middle) and (30, 31, 4, 5, 0.8) (right)]
2.2.3 Pham-Gia et al. (2006) expression of the pdf
The pdf of R has been studied for the last 80 years and it is still an active field of
research. Recently, Pham-Gia et al. (2006) have provided a closed form of the pdf of R
in terms of Hermite functions. The pdf is firstly formulated for ratios of independent
normal variables (Theorem 3). Then, by using Theorem 1, it is possible to derive the
pdf of the ratio of two dependent jointly normal variables (Theorem 4).
Theorem 3. Let (X, Y ) be a random vector of two independent normal variables with
parameters as in Eq. 2.1 with σxy = 0. The pdf of R = Y/X is given by:
fR(r) =C1
σ2y + r2σ2
x
[H−2(s(r)) +H−2(−s(r))] (2.8)
where:
C1 =σxσyπ
exp−1
2
(µ2y
σ2y
+µ2x
σ2x
)
s(r) =1
σxσy
σ2xµyr + σ2
yµx√2(σ2
xr2 + σ2
y)
H−2(z) =
∫ ∞
0
t exp−t2 − 2tzdt; a particular type of Hermite function
27
Proof. Firstly, the change of variable X = X and Y = RX is performed. Then,
fX,R(x, r) = |x|fX(x)fY (rx), where fX and fY are pdfs of univariate normal distribu-
tions. Secondly, fX,R is reparametrized taking ε1 = (2σ2x)−1, η1 = −µx/σ2
x, ε2 = (2σ2y)−1
and η2 = −µy/σ2y , which yields the following expressions:
fX(x) =1√
2πσ2x
exp−x2 − 2xµx + µ2
x
2σ2x
=
√ε1π
exp−ε1x2 − xη1 exp−η21
4ε1
fY (rx) =1√
2πσ2y
exp−r2x2 − 2xrµy + µ2
y
2σ2y
=
√ε2π
exp−ε2r2x2 − xrη2 exp−η22
4ε2
fX,R(x, r) = fX(x)fY (rx) = |x|√ε1ε2π
exp−η21
4ε1+−η2
2
4ε2 exp−(ε1 + ε2r
2)x2 − x(η1 + η2r)
= |x|K exp−(ε1 + ε2r2)x2 − x(η1 + η2r)
where:
K =
√ε1ε2π
exp−η21
4ε1+−η2
2
4ε2
The next step is to integrate fX,R with respect to x.
fR(r) =
∫ ∞
−∞|x|fX(x)fY (rx)dx
=
∫ 0
−∞−xfX(x)fY (rx)dx+
∫ ∞
0
xfX(x)fY (rx)dx
=
∫ ∞
0
xfX(−x)fY (−rx)dx+
∫ ∞
0
xfX(x)fY (rx)dx
=
∫ ∞
0
xK exp−(ε1 + ε2r2)x2 + x(η1 + η2r)dx
+
∫ ∞
0
xK exp−(ε1 + ε2r2)x2 − x(η1 + η2r)dx
Taking t = x√
(ε1 + ε2r2), yields:
fR(r) =
∫ ∞
0
Kt
ε1 + ε2r2exp−t2 + 2t
(η1 + η2r)
2√ε1 + ε2r2
dt
+
∫ ∞
0
Kt
ε1 + ε2r2exp−t2 − 2t
(η1 + η2r)
2√ε1 + ε2r2
dt
Considering the definition of H−2(z), it becomes evident that:
fR(r) =K
ε1 + ε2r2[H−2(−s(r)) +H−2(s(r))]
28
with:
s(r) =1
2
η1 + rη2√ε1 + r2ε2
Substituting ε1, ε2, η1 and η2 by their expressions in terms of σx, σy, µx and µy, the
result follows (Pham-Gia et al., 2006).
Theorem 4. Let (X, Y ) ∼ BV N(µ,Σ), Eq. 2.1. The pdf of R=Y/X is given by:
fR(r) =C2
σ2xr
2 − 2ρσxσyr + σ2y
[H−2(l(r)) +H−2(−l(r))]
where:
C2 =σxσy
√1− ρ2
πexp−σ
2xµ
2y − 2ρσxσyµxµy + µ2
xσ2y
2(1− ρ2)σ2xσ
2y
l(r) =[−σ2
xµyr + ρσyσx(µxr + µy)− µxσ2y]√
2σ2xσ
2y(1− ρ2)(σ2
xr2 − 2ρσyσxr + σ2
y)
Proof. (Sketch) Pham-Gia et al. (2006) suggested to follow the same steps as in The-
orem 3. Alternatively, the proof can be done considering T =a+ U
b+ V(see Theorem 1)
and its pdf given in Theorem 3:
fT (t) =1
π(1 + t2)exp−a
2 + b2
2(H−2(
at+ b√2(t2 + 1)
) +H−2(− at+ b√2(t2 + 1)
)
)
By Theorem 1, R is distributed as T/w + s, with w and s given in Eq. 2.4. Then,
performing the change of variable t = (r − s)w, the pdf of R is given by:
fR(r) = |w|fT [(r − s)w]
= |w| exp−a2+b2
2
π(1 + (r − s)2w2)
(H−2(
a(r − s)w + b√2((r − s)2w2 + 1)
) +H−2(− a(r − s)w + b√2((r − s)2w2 + 1)
)
)
Substituting a, b, s and w by their expressions with respect to µx, µy, σx, σy and ρ,
the result follows.
It can be shown that the pdf in Eq. 2.8 is equivalent to the one in Eq. 2.6 (see
Appendix C). Furthermore, Lemma 1 in Pham-Gia et al. (2006) proves that H−2(z) +
H−2(−z) = F1(1, 1/2, z2), where F1(α, γ, z) is defined as:
F1(α, γ, z) =∞∑
k=0
(α, k)
(γ, k)
zk
k!γ 6= 0,−1,−2, . . . (2.9)
29
and (α, k) = α(α + 1) . . . (α + k − 1).
The advantage of expressing the pdf of T =a+ U
b+ Vas fT (t) =
C1
1 + t2F1(1, 1/2, s2(t))
with C1 =exp−(a2 + b2)/2
πand s(t) =
at+ b√2(t2 + 1)
is that it allows one to calculate
analytically the first derivative of fT (t) and study its sign for different positive values
of a and b. By doing so, Pham-Gia et al. (2006) obtained the following results:
1. fT (t) always has a mode in ]0, a/b]
2. If a = 0, or b = 0 and a ≤ 1, fT (t) is unimodal
3. If b = 0 and a > 1, fT (t) is bimodal with symmetric modes.
4. If b > 0 and a > 0, fT (t) can be bimodal or unimodal. In the case of bimodality,
the second mode belongs to ]−∞,−b/a[.
For further details, refer to Section 5 in Pham-Gia et al. (2006).
2.3 Normal approximation of the pdf of the ratio
The shape of the pdf of R can greatly differ from the normal in terms of skewness,
bimodality or/and heavy tails (see Figure 2.1). However, if the coefficient of variation of
the denominator (CVx = σx/µx) tends to zero, the cdf of R, F (r), can be approximated
by the cdf of a normal (Fieller, 1932; Hinkley, 1969) as in Eq. 2.10.
|F (r)− Φµxr − µyσyσxa(r)
| ≤ Φ−µxσx (2.10)
with a(r) defined in Eq. 2.3. This is illustrated in Figure 2.2. However, the limit-
ing value of CVx required for having a satisfactory normal approximation cannot be
determined exactly because it depends on CVy and ρ (Shanmugalingam, 1982). For
instance, Figure 2.3 displays two different shapes of the pdf of R with the same value of
CVx (0.13) and two different values of CVy (0.12 and 0.33). The ratio with CVy = 0.12
has an approximate normal shape whereas the ratio with CVy = 0.33 presents negative
skewness.
30
Marsaglia (2006) provides a list of specific conditions for the normal approximation
of the pdf of R. Marsaglia (2006) suggests that only when a < 2.25 and b > 4 (Eq.
2.5), the ratios of the form T =a+ U
b+ Vare approximately normal (see Figure 2.1, c).
The study in Pham-Gia et al. (2006) is focused on discerning the cases for which
the pdf of R is unimodal or bimodal, rather than on investigating the goodness of its
approximation by a normal distribution.
31
Figure 2.2: Probability density function of the ratio of two jointly normal variables for
different values of the coefficient of variation of the denominator (CVx)
Figure 2.3: Probability density function of the ratio of two jointly normal variables for
different values of the coefficient of variation of the numerator (CVy) but same CVx
32
2.4 Estimators of the ratio
2.4.1 Point estimators: average of ratios, ratio of averages
As previously stated, the Cauchy component in the distribution of R leads to the
non-existence of E(Y/X) (Marsaglia, 1965, 2006). Instead, E(Y )/E(X) is usually
considered for inference purposes (e.g. Lai et al., 2004). In particular, in agriculture,
the two main estimators of E(Y )/E(X) are (Qiao et al., 2006):
RA =1
n
n∑
i=1
yixi
(2.11)
RW = y/x =n∑
i=1
yi/
n∑
i=1
xi (2.12)
The estimator RA is greatly affected by atypical large values of yi/xi, produced when
xi is close to zero. The considerable variation of RA in comparison to RW is illustrated
in Figure 2.4, which shows the distribution of 60 bootstrap samples of size 100 for RA
(left) and RW (right). Each bootstrap sample was generated from a bivariate normal
with µx = µy = 2, σx = σy = 1 and ρ = 0.7.
Figure 2.4: Bootstrap distribution of the estimators defined in Eq. 2.11 (left) and Eq.
2.12 (right). [Sixty bootstrap samples of size 100 with µx = µy = 2, σx = σy = 1 and ρ = 0.7.]
33
Notice that if (xi, yi), i = 1 . . . n are i.i.d. according to BV N(µ,Σ), Eq. 2.1, RA
and RW are the sum of ratios of jointly normal variables and the ratio of jointly normal
variables, respectively. By Marsaglia (1965, 2006), the expected values and the vari-
ances of RA and RW do not exist and no direct comparison of their bias and efficiency
can be made. Although theoretically neither the expected value nor the variance of
both estimators exist, for practical purposes, RW is suggested as less sensitive to values
xi → 0 which also results in a less variable estimator in comparison to RW (see Figure
2.4 and Qiao et al. (2006)).
The performance of RA and RW to estimate E(Y )/E(X) when X and Y are in-
dependent and normally distributed is studied in Qiao et al. (2006). Their numerical
experiments showed that RA and RW can be used when CVX = σx/µx < 0.2 and
CVX =σx/√n
µx< 0.2, respectively. The advantage of using RW is that CVX =
σx/√n
µx
depends on the sample size (n). Thus, by choosing n >25σ2
x
µ2x
, CVX can always be
made less than 0.2 (Qiao et al., 2006). Similar numerical experiments to the ones in
Qiao et al. (2006) could be carried out when ρ 6= 0, but for the best of our knowledge
this has not been done yet.
The estimation of ratios is also of interest in sample surveys. If it is reasonable
to assume a linear relationship (Y = RX) between Y , the variable of interest, and
X, an auxiliary variable, we can estimate the mean or the total of Y with greater
precision by taking advantage of the correlation between both variables. The use of
ratio estimators in survey problems is discussed in Chapter 6 of Cochran (1977). In
particular, we refer to p. 175 for a discussion of the bias correction of RA when used to
estimate E(Y )/E(X) and E(X) is known. However, the estimation of ratios in sample
surveys is quite different to the objective of this thesis where we are interested in the
estimation of the ratio per se rather than in Y .
34
2.4.2 Confidence sets of the ratio of expected values
In this section I review two equivalent procedures for calculating the confidence sets
of E(Y )/E(X). The first procedure was derived by Fieller (1954) and is the most
common among statisticians (Jarret 2012, pers. comm., December). The second one
was derived in Von Luxburg & Franz (2009) and provides a geometrical insight to
Fieller’s rule.
2.4.2.1 Fieller’s Theorem
Theorem 5. Let (X, Y ) ∼ BV N(µ,Σ), Eq. 2.1. The exact confidence set (S) of
α = E(Y )/E(X) has three possible configurations: 1) completely unbounded confidence
sets, 2) exclusive unbounded confidence sets and 3) bounded intervals (Fieller, 1954).
Proof. Here we develop the proof given in Casella & Berger (2002, p.464) in detail.
Let the paired variable (Xi, Yi), i = 1 . . . n, be independent and identically distributed
(i.i.d.) according to BV N(µ,Σ), Eq. 2.1. Let us define θ = µy − αµx. It is easy to
verify that θ is normally distributed with:
E(θ) =E(µy)− αE(µx) = µy −µyµxµx = 0
V ar(θ) =V ar(µy) + α2V ar(µx)− 2αCov(µy, µx) = (σ2y + α2σ2
x − 2ασxy)/n
where Cov denotes the covariance.
By applying the two well-known results given by Eq. 2.13 and 2.14 (see Wackerly et al.,
1996, p.294-297), we get Eq.2.15.
(n− 1)V ar(θ)
V ar(θ)∼ χ2
n−1 (2.13)
N(0, 1)√χ2n−1/(n− 1)
∼ Tn−1 (2.14)
θ/√V ar(θ)√
[(n− 1)V ar(θ)/V ar(θ)]/(n− 1)=
θ√V ar(θ)
=µy − αµx√
(σ2y + α2σ2
x − 2ασxy)/n∼ Tn−1
(2.15)
35
where χ2n−1 and Tn−1 refer to a chi-square distribution and a Student’s t-distribution
with n− 1 degrees of freedom, respectively.
By the Eq. 2.15, the exact confidence set of α is calculated to satisfy Eq. 2.16
(µy − αµx)2
(σ2y + α2σ2
x − 2ασxy)/n≤ t2n−1, 1−γ/2 (2.16)
where t2n−1, 1−γ/2 is the 1−γ/2 quantile of Tn−1. For convenience, we refer to t2n−1, 1−γ/2
as t2, further in the text. Inequality 2.16 can be rewritten as:
µ2y + α2µ2
x − 2αµxµy − t2(σ2y + α2σ2
x − 2ασxy)/n ≤ 0
(µ2x − t2σ2
x/n)α2 + (−2µxµy + 2t2σxy/n)α + (µ2y − t2σ2
y/n) ≤ 0 (2.17)
Assuming µ2x−t2σ2
x/n 6= 0, the left hand side of Eq. 2.17 is a parabola (aα2+bα+c = 0)
with respect to α where:
a =µ2x − t2σ2
x/n
b =2(−µxµy + t2σxy/n)
c =µ2y − t2σ2
y/n (2.18)
The solution of inequality 2.16 has four scenarios, one of which (number 3 in the
list below) is not feasible for the values a, b and c in Eq. 2.18:
• a < 0
1. If b2 − 4ac < 0; we obtain a completely unbounded interval, S = R
2. If b2 − 4ac ≥ 0; we get an exclusive unbounded interval,
S =]−∞, q1] ∪ [q2,+∞[
• a > 0
3. If b2 − 4ac < 0; there is no real solution.
4. If b2 − 4ac ≥ 0; we obtain a bounded interval, S = [q1, q2]
36
Figure 2.5: Feasible cases of Fieller’s confidence set of the ratio of expected values
where:
q1,2 =(µxµy − t2σxy/n)∓
√(−µxµy + t2σxy/n)2 − (µ2
y − t2σ2y/n)(µ2
x − t2σ2x/n)
µ2x − t2σ2
x/n
It is straightforward to see that:
a > 0 ≡ µ2x
σ2x/n
> t2
b2 − 4ac ≥ 0 ≡ (−µxµy + t2σxy/n)2 − (µ2y − t2σ2
y/n)(µ2x − t2σ2
x/n) ≥ 0
≡ t4(σ2xy/n
2 − σ2xσ
2y/n
2) + t2(−2µxµyσxy/n+ µ2yσ
2x/n+ µ2
xσ2y/n) ≥ 0
≡ t2(σ2xy/n
2 − σ2xσ
2y/n
2) + (−2µxµyσxy/n+ µ2yσ
2x/n+ µ2
xσ2y/n) ≥ 0
≡ −2µxµyσxy/n+ µ2yσ
2x/n+ µ2
xσ2y/n
σ2xσ
2y/n
2 − σ2xy/n
2=
1
n[−2µxµyσxy + µ2
yσ2x + µ2
xσ2y
σ2xσ
2y − σ2
xy
] ≥ t2
Let us denote:
qexclusive =µ2x
σ2x/n
qcomplete =1
n[−2µxµyσxy + µ2
yσ2x + µ2
xσ2y
σ2xσ
2y − σ2
xy
] (2.19)
The conditions presented in terms of a, b and c can be rewritten in terms of qcomplete
37
and qexclusive (Table 2.1).
Table 2.1: Conditions for the four scenarios of Fieller’s confidence sets
Conditions Fieller’s solutions
a < 0 and b2 − 4ac < 0 ≡ qexclusive < t2 and qcomplete < t2 S = R
a < 0 and b2 − 4ac ≥ 0 ≡ qcomplete ≥ t2 > qexclusive S =]−∞, q1] ∪ [q2,+∞[
a > 0 and b2 − 4ac < 0 ≡ qexclusive > t2 > qcomplete Imaginary case (non-feasible)
a > 0 and b2 − 4ac ≥ 0 ≡ qcomplete ≥ t2 and qexclusive > t2 S = [q1, q2]
∗The values of a, b, c and qcomplete, qexclusive are defined in Eq. 2.18 and Eq. 2.19
Now we verify that the imaginary case is not feasible for the defined values of a,
b and c or equivalently for the values of qcomplete and qexclusive. The proof is done by
contradiction. The imaginary case implies that qexclusive > qcomplete or equivalently
qexclusive − qcomplete > 0. However,
qexclusive − qcomplete =µ2x
σ2x/n− 1
n[−2µxµyσxy + µ2
yσ2x + µ2
xσ2y
σ2xσ
2y − σ2
xy
]
=1
n[µ2x(σ
2xσ
2y − σ2
xy)− (−2µxµyσxy + µ2yσ
2x + µ2
xσ2y)σ
2x
σ2x(σ
2xσ
2y − σ2
xy)]
=1
n[µ2xσ
2xσ
2y − µ2
xσ2xy + 2µxµyσxyσ
2x − µ2
yσ4x − µ2
xσ2xσ
2y
σ2x(σ
2xσ
2y − σ2
xy)]
=1
n[−(µyσ
2x − µxσxy)2
σ2x(σ
2xσ
2y − σ2
xy)] ≤ 0
which contradicts qexclusive − qcomplete > 0
If a = 0, inequality 2.17 is of the form bα + c ≤ 0 with the solution given by:
[−c/b,∞[ if b < 0
]−∞,−c/b] if b > 0 (2.20)
38
with b and c defined as in Eq. 2.18.
Finally, let us highlight that confidence sets of the ratio of expected values may be
unbounded. This can never be inferred when treating the ratio as a normal variable and
calculating the confidence intervals by the normal theory. Thus, such normal-theory
intervals have low coverage3 even when the denominator is significantly different from
zero and/or for large sample sizes (Franz, 2007).
2.4.2.2 Von Luxburg & Franz (2009) geometric solution
Recently, Von Luxburg & Franz (2009) have presented a geometric construction for
confidence sets of E(Y )/E(X) which coincide with the ones in Fieller (1954). The
intuitive idea in Von Luxburg & Franz (2009) is outlined as follows. Let (X, Y ) be
BV N(µ,Σ), Eq. 2.1. Let us define Lr = (x, y) ∈ R2 : y = rx the line of slope r
which passes through the origin. If S = [l, u] is the confidence interval of E(Y )/E(X)
at the confidence level 1 − γ; then, we can construct a wedge (W ) given by the area
of the plane enclosed by Ll and Lu. The wedge comprises of all the lines Lr, such
that r ∈ [l, u] (see Figure 2.6, left). On the other hand, we can construct S by
intersecting W with the vertical line (x, y) ∈ R2 : x = 1 (see Figure 2.6, right).
Thus, providing an appropriate W in R2 is equivalent to calculating the confidence
set of E(Y )/E(X). Von Luxburg & Franz (2009) proves that such W is given by
the area enclosed by the tangent lines to the ellipse (E) through the origin, where
E = z ∈ R2 : (z− µ)Σ−1
(z− µ)> = t2/n and t is the 1− γ/2 quantile of Tn−1. The
results in Von Luxburg & Franz (2009) are presented in Theorems 6 and 7.
Theorem 6. Let (X, Y ) be BV N(µ,Σ) distributed, Eq. 2.1. The exact confidence set
for the ratio α = E(Y )/E(X) can be constructed in the following steps.
1. Estimate µx, µy and Σ
2. Take t the 1− γ/2 quantile of a Tn−1
3The probability that the interval contains the true parameter
39
Figure 2.6: Construction of a wedge given the confidence interval of the ratio of means
(left) and vice versa (right). [The lower and upper confidence limits are denoted by l and u,
respectively.]
3. Plot E
4. Study the position of E
• If (0, 0) /∈ E, construct W by taking the area enclosed by the tangents to E
through the origin. Then, take S = W ∩ (x, y) ∈ R2 : x = 1
• If (0, 0) ∈ E, take S = R
The procedure presented above yields the three different cases displayed in Figure 2.7.
Proof. (Sketch) Let us define πa : R2 → R, (x, y)→ a1x+a2y an orthogonal projection
on the line La2/a1 . The proof is in two steps. Firstly, it is demonstrated that πa(E)
is a confidence interval of level 1 − γ of πa(µ), ∀a. In particular, the projection of
E on the line Lα⊥ = (x, y) ∈ R2 : y = −xα, is a confidence interval of πα⊥(µ).
Secondly, it is proved that πα⊥(µ) ∈ πα⊥(E) ⇔ α ∈ S. Then, it immediately follows
that 1− γ = P (πα⊥(µ) ∈ πα⊥(E)) = P (α ∈ S). For more details, refer to Theorem 1
of Von Luxburg & Franz (2009).
40
Figure 2.7: Confidence sets of the ratio of expected values in Von Luxburg & Franz
(2009).
Notice that if n increases, the area of E decreases and then S shrinks. Furthermore,
if E intersects the y-axis in only one point, then the y-axis is one of the tangents of E
through the origin and S is given by the solution in Eq. 2.20.
Theorem 7. The confidence sets obtained by the geometric construction in Von Luxburg
& Franz (2009) coincide with the ones in Fieller (1954).
Proof. (Sketch) The proof is in two parts. In the first one, it is demonstrated that the
feasible cases of Fieller’s solutions (Table 2.1) are equivalent to the three cases derived
in Von Luxburg & Franz (2009). The equivalence for the completely unbounded interval
is demonstrated showing that:
(0, 0) ∈ E ⇔ (0− µ)Σ−1
(0− µ)> ≤ t2/n⇔ qcomplete ≤ t2
The equivalences for the exclusive unbounded set and the bounded interval are demon-
strated projecting E onto the x-axis. In the exclusive unbounded case, the y-axis
is intersected by E. Thus, 0 belongs to the projection of E onto the x-axis given by
[µx−t√σx/n, µx+t
√σx/n]. Then, the extremes of the latter interval have the opposite
sign and (µx− t√σx/n)(µx + t
√σx/n) < 0⇔ qexclusive < t2. The same idea is used for
41
the bounded interval, in which case 0 is not contained in [µx − t√σx/n, µx + t
√σx/n]
and then (µx − t√σx/n)(µx + t
√σx/n) ≥ 0 ⇔ qexclusive > t2. The second part of the
proof shows that q1 and q2 are equal to the slopes of the tangent lines to E through
the origin. This can be demonstrated by using Eq. 3 in Walter et al. (2008). For more
details on the proof, refer to Von Luxburg & Franz (2004).
The study of the position of E is a quick procedure to determine whether X or Y
are significantly different from zero and then which of the three situations in Section
2.1 applies. The key of this idea is that πa(E) is a confidence interval of πa(µ) ∀a.
In particular, for a1 = (1, 0) and a2 = (0, 1), which represent the projections onto
the x−axis and y−axis, respectively. For instance, let us consider that E intersects
the y−axis. By projecting E onto the x-axis, it is clear that 0 ∈ πa1(µ) = [µx −t√σx/n, µx+ t
√σx/n]. Thus, µx is not significantly different from zero. The same can
be applied to the numerator by projecting E onto the y-axis.
2.5 On the distributional properties of internal nitrogen use
efficiency in rice.
In this section I applied the previously described theory to a real data set. The data
set comprises 24 plot observations of NU and GY from non-irrigated rice in northeast
Thailand (Figure 2.8) from the experiments carried out by Naklang et al. (2006). All
the plots received a dose of fertiliser of 0 kg Nitrogen (N) ha−1, 21.8 kg Phosphorus (P)
ha−1 and 41.5 kg Potassium (K) ha−1 and were wet after the flowering period of the rice.
Using this data set, I provide examples of the pdf of IEN , the bootstrap distribution
of estimators of E(GY )/E(NU) and confidence intervals of E(GY )/E(NU) for this
sample.
1. The pdf of IEN
Let us assume that the true parameters of the population of non-irrigated rice
under the fertiliser and water conditions described above coincide with the sample
42
Figure 2.8: Grain yield versus nitrogen uptake from a sample of non-irrigated rice in
northeast Thailand (Naklang et al., 2006)
mean and variance-covariance (Eq. 2.1) given by:
µ = (49.47, 2135.42); Σ =
416.86 10986
10986 609382.4
(2.21)
For these values of the parameters, the pdf of IEN (Figure 2.9) is clearly skewed
to the right with tails heavier than the normal ones. The values of a and b (Eq.
2.5) are equal to 1.48 and 2.42, respectively. Thus, the distribution of IEN for
this population does not have a satisfactory normal approximation (Section 2.3).
2. Bootstrap distribution of estimators of E(GY )/E(NU).
The arithmetic mean of the observations of IEN (RA, Eq. 2.11) is usually em-
ployed to estimate E(GY )/E(NU). However, the bootstrap distribution of this
estimator presents much more variation than the distribution of the ratio of the
arithmetic means of GY and NU (RW , Eq. 2.12). This fact is illustrated in
Figure 2.10 which displays the bootstrap distribution of these estimators when
each bootstrap sample is generated from a BV N with parameters given in Eq.
2.21.
3. Confidence interval of E(GY )/E(NU). The Fieller’s confidence set of
E(GY )/E(NU) calculated from the sample in Figure 2.8 is a bounded interval
43
Figure 2.9: Probability density function of the ratio of two jointly normal variables
with parameters given in Eq. 2.21
Figure 2.10: Bootstrap distribution of the estimators defined in Eq. 2.11 (left) and Eq.
2.12 (right) for the data in Fig. 2.8. [Sixty bootstrap sample of size 100 with parameters given
in Eq. 2.21]
44
equal to [37.93, 49.45]. However, even if the Fieller’s confidence sets are bounded,
they can substantially differ to the ones obtained when assuming IEN having a
normal distribution (in this case [39.93, 52.56]). Additionally, confidence intervals
calculated according to the normal theory may have low coverage even when NU
is far from zero (Franz, 2007).
2.6 Summary
Many of the issues presented in this chapter: mixture nature of the ratio distribution,
heavy-tailedness, non-existence of moments and uninformative unbounded confidence
sets – are not considered in agricultural published research when analysing ratios. In
particular, most of the studies in our literature search (Section 1.4) which analysed
IEN , computed it for each experimental plot (IENi = GYi/NUi, where the subindex i
refers to observations on the i-th plot) and applied univariate linear models on the IENi,
i = 1 . . . n observations. Standard univariate linear models assume non-abnormality
and homogeneity of variance for the error distribution. From the studies selected, only
four (Albrizio et al., 2010; Giambalvo et al., 2010; Liu et al., 2011; Tetard-Jones et al.,
2013) reported to have checked for the departure from normality, and three (Albrizio
et al., 2010; Giambalvo et al., 2010; Liu et al., 2011) the homogeneity of error variances.
Bivariate analyses avoid dealing with the distributional issues of the ratio and main-
tain the complete information on GY and NU and their joint behaviour, which shed
more light on the utilisation of NU for GY . As discussed in Chapter 1, I proposed
to use finite mixture models of bivariate Gaussian distributions as a complementary
analysis to bivariate mixed models for the GY and NU field data collected across a
range of environments. In the next chapter, I review the fundamentals of finite mix-
ture models of multivariate Gaussian distributions with the objective of presenting the
theoretical framework of the methodology of this thesis.
45
46
Chapter 3
Fundamentals of finite mixture models
3.1 Non-technical introduction
Finite mixture models are commonly used to classify observations from a heterogeneous
population into subpopulations1 (e.g. Crossa & Franco, 2004; Di Zio et al., 2005; Pear-
son, 1894). For a detailed explanation of the technique, refer to Fruhwirth-Schnatter
(2006); Lindsay (1995); McLachlan & Peel (2000); Titterington et al. (1985).
To classify observations into subpopulations, it is required to estimate a set of
mixture parameters, ψ (Eq. 3.1), such as the mean, variance-covariance and the pro-
portion of individuals for each subpopulation. The maximum likelihood method (see
Wackerly et al., 1996, p.398-400) has been the most popular approach to estimating
ψ (Fruhwirth-Schnatter, 2006, p.49). This method is based on maximising the likeli-
hood, or the log likelihood, function. The likelihood function is defined from the joint
probability density function when a realisation of the random sample is given. The
maximisation of the likelihood function will give us the values of the parameters which
maximise the chance of having observed this sample. However, in the case of mixtures,
the log likelihood function, L(ψ), can be complex and its global maximum cannot be
calculated explicitly by analytical mathematical methods (Redner & Walker, 1984).
Thus, the search for the global maximum is performed numerically via an algorithm
called the Expectation and Maximisation (EM) algorithm (Dempster et al., 1977). The
EM algorithm is an iterative procedure which is initiated with a starting estimate (ψ0)
1In this thesis, clusters, groups and subpopulations are used indistinctly.
47
and then, at each iteration, searches a value of ψ which increases the value of L(ψ) in
comparison to the previous iteration. The algorithm stops when a stopping criterion
is fulfilled, e.g. when the difference in the value of L(ψ) between two consecutive iter-
ations is smaller than a chosen threshold (Ng, 2013).
Fitting mixture models with the EM algorithm involves dealing with some chal-
lengues. The difficulties are due to the nature of L(ψ) rather than to a failure of the
algorithm (McLachlan & Peel, 2000, p.99). The main difficulties are listed as follows.
1. The likelihood function may be unbounded: thus, its global maximum does not
exist (Kiefer & Wolfowitz, 1956). The values of ψ for which L(ψ) is unbounded
are called singularities. The non-existence of the global maximum implies that a
local maximum needs to be chosen as the maximum likelihood estimate (MLE)
of ψ (Redner & Walker, 1984).
2. The likelihood function may have multiple local maxima (Seidel et al., 2000).
This causes sensitivity of the algorithm to starting and stopping strategies (Seidel
et al., 2000) and creates the dilemma of what local maximum to choose as the
MLE (Figure 3.1).
3. Some local maxima do not produce meaningful cluster partitions. For instance,
when the clusters are non-realistic for biological reasons (see Section 3.8.3) or a
cluster is fitted to a ‘small localised random pattern’ (McLachlan & Peel, 2000,
p.99), for example to few points which randomly lie on a line. Such local maxima
are known as spuriosities (Figure 3.2).
The most common procedure to deal with the issues mentioned above is to decide
on the maximum number of components to fit, and then to perform the following steps
decreasing until just one component is fitted (Fraley & Raftery, 1998; McLachlan &
Peel, 2000; Melnykov & Maitra, 2010; Ng, 2013):
1. Initiate the EM algorithm from different starting points to widen the search for
local maximisers.
48
2. Stop the EM algorithm when the change in L(ψ) between two consecutive iter-
ations is less than a chosen threshold.
3. Check for and remove spuriosities and singularities.
4. Select the local maximum with the highest value of L(ψ) from the remaining
solutions.
Having an optimal solution for each number of clusters, another important decision is
how to select the number of clusters. Several information criteria are commonly used
for this purpose (Melnykov & Maitra, 2010). These criteria favour the models which
fit the data well, while at the same time penalise for the model complexity.
A detailed explanation of finite mixture models is presented in Section 3.2 to Section
3.11 and references therein. In order to fully understand the rest of the chapter, the
reader needs to have a basic knowledge of probability theory and statistics (see Samuels
et al. (2012) for an introduction on the subject for biologists and Wackerly et al. (1996)
as a second and deeper reading on the topic) and be familiar with the mathematical
notations.
49
Figure 3.1: Solutions of three clusters obtained by the EM algorithm for the case study
(Chapter 4)
Figure 3.2: Spurious solution obtained by the EM algorithm when fitting 7 components
to the case study data (Chapter 4). [The green component corresponds to few points which
randomly lie on a line and the dark blue and red ones correspond to groups with negative correlations
and therefore are biologically meaningless]
50
3.2 Common use of mixture models
Mixture models are used in a large number of fields, such as agriculture, medicine,
engineering, genetics, weather forecast and image segmentation and are applied for
modelling very different types of data (Titterington et al., 1985, Table 2.1.3). Mixture
models are used for two main purposes (Figueiredo & Jain, 2002; Fruhwirth-Schnatter,
2006, p.5-6):
1. For cluster analysis where the researcher has some scientific justification in as-
suming that the data come from different groups but there is little or no infor-
mation on their classification (e.g. Crossa & Franco, 2004; Di Zio et al., 2005).
Although, there are other clustering techniques, finite mixture modelling is the
only one which: 1) assumes a well characterised mathematical model and 2) al-
lows for parameter estimation by the maximum likelihood method and hypothesis
testing on the number of clusters (McLachlan et al., 1996).
2. For approximating distributional shapes whose analytical expression is unknown
(e.g. Marron & Wand, 1992; Xu et al., 2010). For instance, Figure 3.3 shows
the histogram of a univariate random variable whose distribution is bimodal and
asymmetric around the first mode. These types of non-standard distributions
can be modelled well using mixture models.
In this thesis, finite mixture models are used for clustering purposes to identify
groups in (NU,GY ) field data.
51
Figure 3.3: Bimodal distribution generated from a mixture of three univariate normal
components. [The parameters employed to generate the sample were n = 100, π = (0.25, 0, 25, 0.50),
θ1 = (0, 1), θ2 = (3, 2) and θ3 = (7, 0.5) where θi = (µi, σi)]
3.3 Mathematical definition
Let Y = [Y1,Y2, . . . ,Yn] be a random sample of n p-dimensional random vectors
Y(1×p)j , j = 1 . . . n. Let y = [y1,y2, . . . ,yn] be the observed values of the random
sample Y. Finite mixture models assume that Yj are i.i.d according to the finite
mixture density (f):
f(yj|ψ) =
g∑
i=1
πifi(yj|θi) (3.1)
where:
g is the number of components.
fi is the i-th component density.
θi is the parameter vector of fi.
π = (π1, . . . , πg) denotes the mixing proportions, πi ≥ 0 ∀ i and
g∑
i=1
πi = 1.
ψ = (π1, . . . , πg−1,θ1, . . . ,θg, g) is the parameter vector of f .
For the bivariate mixture analysis of (NU , GY ), it is assumed that every mixture
52
component is a bivariate Gaussian with the pdf:
φ(yj|µi,Σi) =1
2π|Σi|1/2exp−(yj − µi)Σ−1
i (yj − µi)>/2
where:
µi is the mean vector of the i-th component density.
Σi is the variance-covariance matrix of the i-th component density.
3.4 Classifying data into groups: label random vectors and
posterior probabilities
Mixture models assume that the information which classifies Yj into its group is miss-
ing. Therefore, a label random variable Zj is introduced to classify Yj into g groups
(Dempster et al., 1977) . The random variable Zj models the probability of a random
draw to have one of g possible outcomes. As a consequence, Zj has a multinomial
random distribution (Mult) with parameters m = 1, where m is the number of draws,
and π = (π1, . . . , πg) the probability of each outcome.
Zj ∼ Mult(1,π)
For convenience, Zj is considered as a g-dimensional vector Zj = [Zij], i = 1 . . . g and
j = 1 . . . n, defined as follows.
Zij =
1, if Yj belongs to the i-th group (Gi),
0, otherwise.
In this thesis, it is assumed that each group is generated by one mixture compo-
nent. Under this assumption, πi is interpreted as the proportion of individuals in Gi
or the probability of Yj to belong to Gi, i = 1 . . . g (McLachlan & Peel, 2000, p.7).
The component densities are viewed as the pdf of Yj conditional on the group of origin
53
fi(yj|θi) = f(yj|Zij = 1,θi) (McLachlan & Peel, 2000, p.7). Under this interpreta-
tion, it becomes easy to verify the mixture model density using the Total Probability
Theorem:
f(yj|ψ) =
g∑
i=1
f(yj, zj = i|ψ) =
g∑
i=1
f(yj|zj = i,θi)P (zj = i) =
g∑
i=1
fi(yj|θi)πi
The probability that Yj belongs to Gi, once yj has been observed, is called the
posterior probability and denoted by τij.
τij = Pr(zij = 1|yj) =f(yj|zij = 1)P (Zij = 1)
f(yj)=
πifi(yj|θi)∑gi=1 πifi(yj|θi)
(3.2)
The posterior probabilities are used to classify the data into groups by assigning yj to
Gi if τij = maxr=1...g
τrj (McLachlan & Peel, 2000, p.31). The calculation of the posterior
probabilities requires estimating the parameters of the mixture.
3.5 Maximum likelihood estimation of the mixture parame-
ters
Since the first publications on finite mixture models (Newcomb, 1886; Pearson, 1894)
several methods have been proposed for estimating the mixture parameters (see Everitt,
1996). The maximum likelihood method has been the most successful approach so far
(Fruhwirth-Schnatter, 2006, p.49). This method aims to find the global maximum of
the likelihood function (Eq. 3.3) or the log likelihood function2 (Eq. 3.4). The natural
way to proceed is to calculate the roots of the first derivative of L(ψ) and then, check
which one is the global maximum. However, in the case of mixture models, L(ψ) is
usually quite complex and the roots of the log likelihood equations cannot be calculated
2Notice that logarithm is a strictly monotone increasing function so it preserves the maximisers of
the likelihood function.
54
explicitly (Redner & Walker, 1984).
l(ψ) =n∏
j=1
g∑
i=1
πifi(yj;θi) (3.3)
L(ψ) =n∑
j=1
ln
(g∑
i=1
πifi(yj;θi)
)(3.4)
The estimation of the mixture parameters has become much easier due to the im-
plementation of iterative algorithms and the development of fast computers. Special
mention needs to be made to, probably, the most widespread algorithm: the Expec-
tation Maximisation algorithm also known as the EM algorithm (Dempster et al., 1977).
3.6 The EM algorithm for the estimation of mixture param-
eters
The EM algorithm is an iterative procedure which searches for the global maximum of
L(ψ) in the framework of missing data. Although it is not clear who first developed
the EM algorithm, it has been generally attributed to Dempster et al. (1977), who
generalised the algorithm to an extensive number of applications and discerned for the
first time the two well-known Expectation and Maximisation steps (Meng & Van Dyk,
1997). For a brief review of the history of the EM algorithm, refer to Meng & Van Dyk
(1997); Redner & Walker (1984).
In the context of mixture models, the realisations of Zj, j = 1 . . . n (Dempster
et al., 1977) are considered missing. Before explaining the algorithm computations, let
us introduce all necessary notations. Let (Z1,Y1, . . . ,Zn,Yn) be the complete random
sample and (Y1, . . .Yn) the incomplete random sample with their realisations denoted
by (z1,y1, . . . , zn,yn) and (y1, . . . ,yn), respectively. The log likelihood function of the
55
complete random sample (Lc) is given by:
Lc(z,y,ψ) =n∑
j=1
log f(zj,yj|ψ)
The log likelihood function of the incomplete random sample (L) is given by:
L(y,ψ) =n∑
j=1
log f(yj|ψ) (3.5)
The EM algorithm approaches the problem of finding ψ, a local maximiser3 of
L(y,ψ), by applying a numerical procedure which aims to maximise Lc(z,y,ψ). How-
ever, Lc(z,y,ψ) cannot be maximised directly because it depends on the values of
zj, j = 1 . . . n, which are missing. Instead of this, Dempster et al. (1977) suggested
maximising the expected value of Lc(z,y,ψ) given y, a realisation of the incomplete
random sample, and ψ an estimate of ψ. Therefore, each iteration of the EM algorithm
has two steps:
Let ψ(r) be the estimate produced in the r -th iteration of the algorithm.
1. Compute E(Lc(z,y,ψ)|ψ(r),y), denoted asQ(ψ|ψ(r)) in the literature, (Expectation-
step or E-step).
2. Find ψ(r+1) that maximises Q(ψ|ψ(r)) (Maximisation-step or M-step).
Dempster et al. (1977) proved that given an initial estimate ψ(0) and the values of the
observed sample y, the EM algorithm produces a sequence ψ(0),ψ(1) . . .ψ(s) with the
property given in Theorem 8.
Theorem 8. The sequence L(ψ(r))sr=1 where ψ(r+1) maximises Q(ψ|ψ(r)) is non-
decreasing.
Proof. Theorem 1 in Dempster et al. (1977). A more detailed proof can be found in
Borman (2004).
3In the case of mixture models L(y,ψ) may be unbounded, thus the global maximum does not
exist, see Section 3.17
56
The convergence properties of the EM algorithm are discussed in Wu (1983). Wu
(1983) stated that if Q(ψ|η) is continuous in ψ and η, then the algorithm converges
to a local/global maximum or a saddle point.
3.7 The EM algorithm for the estimation of parameters of
mixtures of multivariate Gaussian distributions
In this section I specify the EM steps for mixtures of multivariate Gaussian distri-
butions. A recommended reference addressing this topic is McLachlan & Peel (2000,
Section 3.2-3.3). For details on the calculations of the EM steps, refer to Appendix D.
Let ψ(r) be the estimate at the r -th iteration of the algorithm.
1. E-step. The algorithm updates the posterior probabilities:
τ(r+1)ij =
π(r)i φ(y j|µ(r)
i ,Σ(r)i )
∑gi=1 π
(r)i φ(y j|µ(r)
i ,Σ(r)i )
(3.6)
2. M-step. The algorithm updates the estimates of the mixture parameters:
π(r+1)i =
n∑
j=1
τ(r+1)ij
/n (3.7)
µ(r+1)i =
n∑
j=1
τ(r+1)ij y j
/ n∑
j=1
τ(r+1)ij (3.8)
Σ(r+1)i =
n∑
j=1
τ(r+1)ij (y j − µ(r+1)
i )>(y j − µ(r+1)i )
/ n∑
j=1
τ(r+1)ij (3.9)
3.7.1 Illustration of the EM algorithm on simulated data
Let us consider a sample of 5 observations from a mixture of two bivariate normals
with parameters:
π1 = 0.5 µ>1 =
5
10
Σ1 =
1 1
1 2
π2 = 0.5 µ>2 =
20
20
Σ2 =
1 0
0 1
(3.10)
57
The data sample is displayed below in matrix M and plotted in Figure 3.4.
M =
2.92 8.47
19.22 20.48
19.67 20.21
5.38 9.16
19.12 19.61
Figure 3.4: Scatter plot of a sample from a mixture of two bivariate normal with
parameters given in Eq. 3.10. [The points in different clusters are plotted in different colours].
For the purpose of this illustration, the EM algorithm is initiated with the actual
value of the parameters ψ(0) = ψ. The algorithm repeats the same operations in each
iteration so I only present the calculations in the first iteration.
1. E-STEP (first iteration):
Let us calculate τ(1)11 :
τ 111 =
0.5φ ((2.92, 8.47)|µ1,Σ1)
0.5φ ((2.92, 8.47)|µ1,Σ1) + 0.5φ ((2.92, 8.47)|µ2,Σ2)= 1
The rest of τ(1)ij , i = 1, 2 and j = 1 . . . 5, are calculated similarly and presented in
the matrix τ (1). The j -th row of τ (1) contains the posterior probabilities of the
58
observation yj to belong to the first cluster, τ(1)1j , or to the second one, τ
(1)2j .
τ (1) =
1 0
0 1
0 1
1 0
0 1
2. M-STEP (first iteration):
Let us compute π(1)1 , µ
(1)1 and Σ
(1)1 by using Eq. 3.7, 3.8 and 3.9.
π(1)1 =
2
5= 0.4
µ(1)1 =
1(2.92, 8.47) + 1(5.38, 9.16)
2= (4.15, 8.82)
Σ(1)1 =
[(2.92, 8.47)− (4.15, 8.82)][(2.92, 8.47)− (4.15, 8.82)]>
2+
[(5.38, 9.16)− (4.15, 8.82)][(5.38, 9.16)− (4.15, 8.82)]>
2
=
1.51 0.42
0.42 0.12
The calculations of π(1)2 , µ
(1)2 and Σ
(1)2 are performed using the same procedure.
3.8 Difficulties in selecting the MLE of mixture models of
Gaussian distributions with heteroscedastic components
An important problem when fitting mixture models of Gaussian distributions with
unrestricted covariance matrices is how to select the MLE of the mixture parameters
(Everitt et al., 2011; Figueiredo & Jain, 2002; Melnykov, 2013; Seo & Kim, 2012). The
difficulties encountered are not due to a failure of the EM algorithm but to the char-
acteristics of L(ψ) for this type of mixture (McLachlan & Peel, 2000, p.99). The log
likelihood function for mixtures of Gaussian distributions with heteroscedastic compo-
nents presents:
59
1. Unboundedness, which leads to the non-existence of the global maximiser.
The points where the log likelihood function is unbounded are called spuriosities.
Then, a local maximiser is chosen as a MLE substitute (Redner & Walker, 1984).
2. Multiple maximisers, which creates the dilemma of which local maximum to
choose as the MLE and produces sensitivity to the initial values of the parameters
and stopping rules (Seidel et al., 2000).
3. Spuriosities, local maxima which do not correspond to a feasible cluster parti-
tion, (McLachlan & Peel, 2000, p.99) and which require to be inspected.
3.8.1 Unboundedness of the likelihood function
Kiefer & Wolfowitz (1956) were the first who reported the unboundedness of L(ψ).
They used the following mixture of univariate Gaussian distributions to illustrate this
property:
π1 =0.5 π2 =0.5
µ1 =µ µ2 =µ µ unknown
σ1 =1 σ2 =σ σ unknown
The log likelihood function for this mixture is given by:
L(ψ) =n∑
j=1
1
2(2π)12
exp−(yj − µ)2
2+
1
2(2π)12σ
exp−(yj − µ)2
2σ2
If yj = µ and σ → 0 then, exp−(yj − µ)2
2σ = 1 and
1
2(2π)12σ→ ∞ and L(ψ) is
unbounded.
Similarly, in the case of multivariate Gaussian distributions, the unboundedness of
L(ψ) (Eq. 3.11) is produced when yj = µi and |Σi| → 0.
L(ψ) =n∑
i=1
g∑
i=1
πi1
(√
2π)p|Σi|1/2exp−(yj − µi)Σ−1
i (yj − µi)>/2 (3.11)
60
The values of the parameters at which L(ψ) becomes unbounded are called singu-
larities. The unboundedness of the L(ψ) and therefore the non-existence of the MLE
may seems puzzling at first. Nonetheless, and according to Redner & Walker (1984);
McLachlan & Peel (2000, p.43), it is sufficient to find a local maximum of L(ψ) which
satisfies certain properties such as efficiency4 and consistency5.
3.8.2 Multiple local maxima
The fact that L(ψ) may present multiple local maxima creates the dilemma of which
one to choose as the MLE (Redner & Walker, 1984). Furthermore, even if we widen
the search for local maximisers (see Section 3.9.1), we cannot be sure that all have been
found (McLachlan & Peel, 2000, p.44). The presence of multiple maximisers results
in sensitivity of the EM algorithm to the choice of the initial values of the parameters
(seeds) and stopping rules. This fact was shown by Seidel et al. (2000), who aimed to
approximate the distribution of the likelihood ratio test statistic, λ, by bootstrapping
(see Section 3.11). Seidel et al. (2000) wanted to test if a sample came from a single
exponential distribution with parameter θ or from a mixture of two with a parameter
vector P. The bootstrap distribution of −2 log λ = 2 logL(P)− 2 logL(θ) depends on
P, which is obtained by the EM algorithm. Seidel et al. (2000) showed on examples
that by using different starting values and stopping rules for this algorithm the values
of specific quantiles of −2 log λ can vary substantially between runs. The sensitivity to
the starting positions is also illustrated in Figure 3.1, which shows how two different
seeds for the EM algorithm can lead to very different cluster partitions of the same data.
4An estimator θ is said to be efficient if it is unbiased (E(θ) = θ and V ar(θ)) is minimum (Wackerly
et al., 1996, p.373)5An estimator θn is said to be consistent if by increasing the sample size n, it get closer to the
parameter that we want to estimate. Formally, P (|θn − θ| < ε)→ 1 (Wackerly et al., 1996, p.374)
61
3.8.3 Spuriosities
There may be some local maxima which do not correspond to feasible cluster parti-
tions. These solutions are called spuriosities. A solution is considered spurious if it
is biologically meaningless, denoted here as a biological spuriosity, or if a component
has been fitted to very few data points and the determinant of its covariance matrix
is small but not zero (McLachlan & Peel, 2000, p.99), denoted here as a mathematical
spuriosity.
Biological spuriosities in nitrogen efficiency
Several biological constraints need to be considered in order to identify unfeasible clus-
ter partitions. Firstly, the values of GY and NU are always positive and finite within
a range which varies depending on the environmental conditions and the cultivar used.
This information can be supplied by biologists. For example, for the case-study in
Chapter 4, the values of GY and NU are within a range of 0 to 5000 kg/ha and 0 to
120 kg/ha, respectively. Notice that values close to zero, although possible, will indi-
cate biological outliers, e.g. the crops in the plots do not perform well due to flooding.
Secondly, the correlation of GY and NU is expected to be positive or zero. This is due
to the fact that N is a major limiting factor for the grain production (Xu et al., 2012).
Therefore, an increment in NU is expected to increase GY , resulting in a positive
correlation between both variables. However, once a plant has accumulated enough
N, GY becomes unresponsive to an increment in NU and no correlation between GY
and NU is observed (see Figure 1.4, left). Situations when the amount of N in plants
is excessive, arguably derived by an oversupply of N fertilisers, such that it adversely
affects the synthesis of proteins as well as the growing pattern and health of plants to
reduce grain (Goyal & Huffaker, 1984, p.111) are not common in sustainable agricul-
ture. Thus, this scenario will not be considered in this work. Due to these biological
constraints, cluster partitions which contemplate negative values for GY and NU or
negative correlations between these two traits are considered spurious. For instance,
the dark blue and red components in Figure 3.2.
62
In the case of applying the mixture approach to the univariate analysis of IEN ,
biological spuriosities are solutions which allow negative values of the ratio.
Mathematical spuriosities
A mathematical spuriosity is a local maximum of L(ψ) (Eq. 3.5) for which one cluster
has been allocated to a ‘small and localised random pattern’ (McLachlan & Peel, 2000,
p.99). This cluster contains few data points and the determinant of the covariance
matrix of its correspondent component is small but not zero (McLachlan & Peel, 2000,
p.99). Equivalently, a procedure to detect spurious local maxima is to investigate the
mixing proportion and the eigenvalues of the covariance matrix for each component
(McLachlan & Peel, 2000, p.103). Notice that if one of the eigenvalues of the covariance
matrix tends to zero, the |Σi| is expected to be small. Recall that if Σi is a real
symmetric matrix, we can factorise Σi as follows.
Σi = QiDiQ>i (3.12)
where Qi is an orthogonal matrix with the eigenvectors of Σi, and Di is a diagonal
matrix with the eigenvalues denoted as (λi1, . . . , λip). Thus, |Σi| = |Qi||Di||Q>i |.
As Qi is an orthogonal matrix, its determinant is equal to one. Therefore, |Σi| =
λi1 . . . λip.
3.8.3.1 Visual methods to detect spuriosities
For bivariate Gaussian distributions, spuriosities can also be detected visually by plot-
ting the prediction ellipses6 which depict the means and variance-covariance matrices
of the mixture components (Eq. 3.13) (McLachlan & Peel, 2000, p.103). This visual
method is in accordance to the ones suggested by Friendly (2006) for multivariate anal-
ysis of variance. For instance, a mathematical spuriosity corresponds to a cluster whose
prediction ellipse contains few observations and has one or two axes of small length.
(y− µi)Σ−1
i (y− µi)> =(n− 1)p
n− pn+ 1
nF (1− γ, p, n− p) (3.13)
6The region given in Eq. 3.13 has 100(1− γ)% probability of containing a new observation of the
sample conditional on belonging to i-th group (see Chew, 1966). In the bivariate case p = 2.
63
where F (1 − γ, p, n − p) is the 1 − γ quantile of the F distribution with p and n − pdegrees of freedom. The detection of spuriosities by the length of the axes is due to the
fact that the length of the ellipse axes are proportional to the square root of eigenvalues
of Σi. A comprehensive explanation of the latter fact can be found in Rencher (1998),
and I summarise it here. Considering that Σi = QiDiQ>i , and substituting Σi by its
factorisation in Eq. 3.13, it follows that:
(y− µi)QiD−1
i Q>i (y− µi)> =
(n− 1)p
n− pn+ 1
nF (1− γ, p, n− p)
Taking zi = Q>i (y− µi)>, we arrive at:
z>i D−1
i zi =(n− 1)p
n− pn+ 1
nF (1− γ, p, n− p)
2∑
t=1
z2it
λit=
(n− 1)p
n− pn+ 1
nF (1− γ, p, n− p) (3.14)
The Eq. 3.14 is a canonical ellipse with the length of axes:√
(n− 1)p
n− pn+ 1
nF (1− γ, p, n− p)λit. (3.15)
The multiplication of a vector by an orthogonal matrix acts as a rotation of the vector.
Thus, Eq. 3.13 is an ellipse with its origin at µi, the axes directions given by the
eigenvectors of Σi, and the axes lengths given by Eq. 3.15
For mixture models of univariate Gaussian components, spuriosities can be detected
by plotting the pdf of each component. For instance, a mathematical spuriosity will
present a component with small variance (e.g. McLachlan & Peel, 2000, p.100).
3.9 Strategy to select the MLE of mixtures with heteroscedas-
tic Gaussian components
Selecting the MLE for mixtures of unrestricted covariance matrices is a difficult task
due to the presence of singularities, multiple local maxima and spuriosities. Let us
consider a mixture of g components. The most common strategy for selecting the
MLE is to perform the following steps (Biernacki et al., 2006; McLachlan & Peel, 2000;
Melnykov & Maitra, 2010; Ng, 2013):
64
1. Initiate the EM algorithm with different starting strategies to identify as many
local maximisers as possible (see Section 3.9.1).
2. Stop the EM algorithm when a stopping criterion is fulfilled, e.g. the change in
L(ψ) between two consecutive iterations is less than a chosen threshold.
3. Check for and delete spuriosities and singularities.
Then, the MLE is considered a solution which gives the highest value of L(ψ).
The logic of this procedure is based on the theoretical results obtained by Hath-
away (1985), who constrained the parameter space to avoid singularities and spurious
solutions for mixtures of univariate normal distributions. The constrained parameter
space was:
Ωεc = ψ ∈ Ω|πi ≥ ε, σi ≥ cσk 1 ≤ i 6= k ≤ g c > 0
Hathaway (1985) demonstrated that the constrained global maximiser was consistent,
given that ψ belongs to Ωεc. Assuming that the same result applies for mixtures of
multivariate Gaussian distributions, the same strategy could be performed (McLachlan
& Peel, 2000, p.97). The constrained parameter space for the multivariate case was
suggested by Hathaway (1985):
Ωεc = ψ ∈ Ω|πi ≥ ε, λm(ΣiΣ
−1k ) ≥ c, 1 ≤ i 6= k ≤ g c > 0
where λm refers to the minimum eigenvalue of a matrix. However, satisfying these
constraints involve a difficult optimisation problem.
Ingrassia & Rocci (2007) suggested a simpler approach. They observed that:
λm(ΣhΣ−1j ) ≥ λm(Σh)
λs(Σj)
where λs refers to the maximum eigenvalue of a matrix.
Then, by imposing the constraints: a ≤ λi(Σj) ≤ b, where λi(Σj) is the i-th eigenvalue
of the matrix Σj and takinga
b≥ c:
λm(ΣhΣ−1j ) ≥ λm(Σh)
λs(Σj)≥ a
b> c
65
which defines the constrained parameter space (Ωab) (Ingrassia & Rocci, 2007):
Ωab = ψ|πi ≥ ε a ≤ λk(Σi) ≤ b, k = 1 . . . p and i = 1 . . . g
This parameter space implies Ωεc but results in a simpler optimisation problem.
The main difficulty is how to choose the tuning parameters c and ε (univariate case)
or a, b and ε (multivariate case) to ensure that ψ belongs to the constrained parameter
space (Hathaway, 1985). For this reason, some authors (e.g McLachlan & Peel, 2000;
Melnykov, 2013) recommend not imposing constraints and deleting the singularities
and spuriosities post-hoc. I have also decided to delete spuriosities and singularities
rather than to restrict the domain of the parameters.
3.9.1 Starting strategies for the EM algorithm
Different strategies have been proposed for initiating the EM algorithm (e.g. Biernacki
et al., 2003; Karlis & Xekalaki, 2003; Maitra, 2009); a comprehensive review is given
in McLachlan & Peel (2000, p.54-55). However, none of them outperforms the others
for all the applications (Meila & Heckerman, 2001; Melnykov, 2013). The starting
strategies used in this current project are listed below.
1. Random starts. This strategy is based on constructing a random partition of
the data. For each yj, a random value, r, is generated from the set R = 1 . . . g.Then, yj is assigned to Gr by fixing τrj = 1 and τsj = 0 for s = 1 . . . g and s 6= r
(McLachlan & Peel, 2000, p.55).
2. Solution provided by the k-means algorithm. This procedure produces a
partition of the data set based on minimising the distance between the obser-
vations and the means of the clusters. The objective function to minimise isg∑
i=1
∑
yj∈Gi
d(yj−µi) where d denotes the Euclidean distance (Forgy, 1965, as cited
in Omran et al., 2007) . Then, τij = 1 if yj is allocated to Gi by the k-means
algorithm and τsj = 0, s = 1 . . . g and s 6= i (McLachlan & Peel, 2000, p.54).
66
3. Simulated starts. The initial values of the joint means, µ(0)i , i = 1 . . . g, are
generated from a bivariate normal distribution with mean y =n∑
j=1
yj/n and co-
variance matrix S =n∑
j=1
(yj − y)>(yj − y)/n. The initial values of the mixing
proportions and the covariance matrices of the components densities are given
by π(0)i =
1
gand Σ
(0)i = S, respectively (McLachlan & Peel, 2000, p.55).
4. Subsample solution. The EM algorithm is run from random starts on a sub-
sample from the data sample. Then, the solution provided by the EM algorithm
after few runs when applied on the subsample is used to initiate the EM algo-
rithm on the entire sample. The subsample size needs to be big enough to avoid
degenerate solutions (McLachlan & Peel, 2000, p.55).
5. Short run of the EM algorithm. This strategy is based on running several
‘short runs’ of the EM algorithm when the latter has been initiated with random
starts. Then, the solution with the highest value of L(ψ) is used to perform
a ‘long run’ on the EM algorithm (Biernacki et al., 2003, 2006). A ‘short run’
indicates that the threshold chosen to stop the EM algorithm is larger than that
for a ‘long run’.
Strategies 1 and 2 are available in the R (R Core Team, 2012) package EMMIX
(McLachlan et al., 1999), strategies 3, 4 and 5 have been programmed by the author
(the code can be found in Appendices F and E).
3.10 Bayesian approach to estimating parameters of mixture
models of multivariate Gaussian distributions
In this Section I review the Bayesian approach to estimating the parameters of mixture
models of multivariate normal distributions. Refer to Lee(2012); Pena (2002, Chapter
11) for an introduction to Bayesian statistics and Fruhwirth-Schnatter (2006) for a
detailed explanation of the Bayesian approach in the context of mixture models.
67
Bayesian statistics considers the parameters of a population to be random variables
for which a pdf, called the prior distribution, is specified. The prior distribution is
chosen depending on the previous knowledge of the researcher. Then, the researcher
updates his/her knowledge once the data are recorded. The combination of the re-
searcher’s previous knowledge and his/her learning from the data are expressed by
the conditional pdf of the parameters on the observed data, the so-called posterior
distribution. The prior and posterior distributions are related by Bayes’ Theorem.
p(ψ|Y) =f(Y|ψ)p(ψ)∫f(Y|ψ)p(ψ)dψ
∝ l(ψ|Y)p(ψ) (3.16)
where:
ψ is the parameter vector.
Y is the random sample.
p(ψ|Y) is the posterior pdf.
f(Y|ψ) is the joint pdf.
p(ψ) is the prior pdf.
l(ψ|Y) is the likelihood function.
Inference in Bayesian statistics is based on the posterior probability (Pena, 2002,
p.329). For instance, ψ is estimated by taking the expected value of p(ψ|Y), denoted
as E(ψ|Y) (Pena, 2002, p.329). In the case of mixture models, E(ψ|Y) always exists
in closed form, given that priors are proper7 and conjugate8 (Diebolt & Robert, 1994).
However, calculating E(ψ|Y) requires expanding the likelihood function (Eq. 3.3),
which involves gn operations corresponding to all possible allocations of the data into
g clusters (Lee et al., 2008). This is computationally prohibitive for large sample
sizes (Lee et al., 2008). Thus, ψ is estimated by calculating the mean of a random
sample generated from the posterior distribution (Diebolt & Robert, 1994). The most
common algorithm to generate the random sample is the Gibbs sampler (Lee et al.,
2008). The fundamentals of this algorithm, its application and issues associated with
7A proper prior is a prior which integrates to one (Pena, 2002, p.330)8The prior distribution is called a conjugate prior if the posterior distribution belongs to the same
parametric family as the prior (Pena, 2002, p.332)
68
the estimation of ψ are briefly revised below.
3.10.1 The Gibbs sampler
The Gibbs sampler (Geman & Geman, 1984) is a Markov chain Monte Carlo (MCM)
method for generating a random sample from a joint pdf, e.g. fxy(x, y), by sampling
from the conditional pdfs, fx|y(x|y) and fy|x(y|x). The steps of the Gibbs sampler in
its bivariate case is detailed as follows (Robert & Casella, 2004, Chapter 9).
Let x(0) be an initial value of the random variable X. The following steps are
repeated for r = 1 . . . N , where r indicates the iteration number of the algorithm.
1. Generate y(r) from the distribution fy|x(·|x(r−1))
2. Generate x(r) from the distribution fx|y(·|y(r))
After n initial steps, n large and 1 ≤ n ≤ N , (x(n), y(n)) is a random value from the
pdf fxy(x, y) (Diebolt & Robert, 1994).
3.10.2 The Gibbs sampler for a mixture of multivariate Gaus-
sian distributions
The Gibbs sampler is used to generate (z(1), . . . , z(N),ψ(1)g , . . . ,ψ(N)
g ), a sample of size
N from the posterior distribution p(Z,ψg|Y) (Pena, 2002, p.476). The subindex g is
used to indicate that the number of mixture components are equal to g. The sample
(z(1), . . . , z(N),ψ(1)g , . . . ,ψ(N)
g ) is obtained by generating (z(1), . . . , z(N)), a sample from
p(Z|ψg,Y), and (ψ(1)g , . . . ,ψ(N)
g ), a sample from p(ψg|Z,Y) (Pena, 2002, p.476). After
n initial iterations of the algorithm, n being sufficiently large and 1 ≤ n ≤ N ,
(z(n), . . . , z(N),ψ(n)g , . . . ,ψ(N)
g ) is a random sample from p(Z,ψg|Y) (Diebolt & Robert,
1994). In addition, (ψ(n)g , . . . ,ψ(N)
g ) is a random sample from p(ψg|Y) (Robert &
Casella, 2004, p.339). As pointed out by Diebolt & Robert (1994), (ψ(n)g , . . . ,ψ(N)
g )
69
can be used to approximate the posterior expected value, Ep(ψg|Y), as follows.
Ep(ψg|Y) ≈ 1
N − nN∑
k=n+1
ψ(k)g
The computation of posterior pdfs requires specifying the priors for the mixture
parameters (see Eq. 3.16). Bensmail et al. (1997) suggested the use of conjugate
priors:
π ∼ D(α1, . . . , αg)
µi ∼MVN(ξi,Σi/ki)
Σ−1i ∼ Wp(mi,Ci)
where D refers to the Dirichlet distribution (Eq. 3.17), MVN to the multivariate
normal and Wp to the Wishart distribution (Eq. 3.18).
fD(x|α1 . . . αg) = cxα11 . . . xαg−1
g (3.17)
where:
g∑
i=1
xi = 1 and c =Γ(∑g
i=1 αi)∏gi=1 Γ(αi)
fw(X|m,C) =|X|m−p−1
2 |C|m exp−tr(CX)Γp(m)
(3.18)
with:
Γp(m) = πp(p−1)/4
p∏
k=1
Γ(2m+ 1− k
2)
where Γ is the gamma function, p is the dimension of the matrix Xp×p and tr is the
trace function.
In particular, Bensmail et al. (1997) used αi = 1/g, ki = 1, mi = 5, ξi and Ci as
the mean and covariance matrix of the entire sample, respectively. Under these priors,
the Gibbs sampler for mixtures of multivariate normal distributions is set up as follows
(Bensmail, 1997; McLachlan & Peel, 2000, p.123).
70
Let z(0) be an initial partition of the data into g clusters. The following steps are
repeated for r iterations, r = 1 . . . N .
1. Generate:
π(r) ∼ D(α1 + n(r−1)1 , . . . , αg + n(r−1)
g )
µ(r)i ∼MVN(ξ
∗(r−1)i ,
1
n(r−1)i + ki
Σ(r−1)i )
Σ−1(r)i ∼ Wp(n
(r−1)i +mi,C
∗(r−1)i )
where:
ni =n∑
j=1
zij
ξ∗i =(niyi) + kiξini + ki
C∗i = C−1i + niSi +
nikini + ki
(yi − ξi)>(yi − ξi)−1
yi =1
ni
n∑
j=1
zijyj
Si =1
ni
n∑
j=1
zij(yj − yi)>(yj − yi)
2. Generate:
Z(r)j ∼ Multg(1, τ
(r)j )
where Multg is a multinomial distribution with 1 trial, g outcomes and τ j the
probability vector of the outcomes (Eq. 3.2).
3. Calculate n(r)i , y
(r)i and S
(r)i
The estimation of the parameters by the Gibbs sampler involves dealing with an
important pitfall, the so-called ‘label switching problem’ (Redner & Walker, 1984).
71
3.10.3 Label switching problem
A well-known mathematical feature of mixture models is its non-identifiability (Marin
et al., 2005; McLachlan & Peel, 2000; Redner & Walker, 1984). A parametric family F
is said to be identifiable, if given ψ1 and ψ2, two values of the parameter vector, and
f(y|ψ1), f(y|ψ2) ∈ F such that, f(y|ψ1) = f(y|ψ2); then, ψ1 = ψ2. The parametric
family of mixture models is non-identifiable because the mixture density is invariant to
a permutation on the parameters (Marin et al., 2005; McLachlan & Peel, 2000; Redner
& Walker, 1984). This type of non-identifiability was denoted by Redner & Walker
(1984) as the ‘label switching problem’. For instance, let us observe that:
π1f1(y|θ1)+π2f2(y|θ2) = π2f2(y|θ2)+π1f1(y|θ1), but (π1,θ1, π2,θ2) 6= (π2,θ2, π1,θ1)
The ‘label switching problem’ has serious implications for the estimation of ψ by
the Bayesian approach. The invariance of the mixture to permutations results in a pos-
terior distribution with g! modes and is thus, difficult to explore by the Gibbs sampler
or other Monte Carlo methods (Lee et al., 2008). Furthermore, if there is no previous
knowledge available for discerning the components of the mixture, it is common to
take a prior invariant to a permutation of the parameters (Stephens, 2000). However,
the latter practice results in posterior expected values of πi and θi equal for all the
components (Marin et al., 2005; Stephens, 2000). A comprehensive explanation of this
fact is given by Fruhwirth-Schnatter (2006, p.64), which I present here.
Firstly, let us observe that the likelihood function is invariant to a permutation on
the parameters. In the simple case of two components:
l(y|ψ) =n∏
i=1
[π1f1(y|θ1) + π2f2(y|θ2)] =n∏
i=1
[π2f2(y|θ2) + π1f1(y|θ1)]
Let us now consider g being a fixed number of components and η a permutation on the
set G = 1, . . . , g. If we define ψη = (πη(1), . . . , πη(g),θη(1), . . . ,θη(g)) and take a prior
such that p(ψη) = p(ψ); then, it is easy to see that:
p(ψ|y) = kl(y|ψ)p(ψ) = kl(y|ψη)p(ψη) = p(ψη|y) (3.19)
72
Furthermore, the marginal posterior distributions are also invariant to a permutation
on the parameters, p(θi|y) = p(θη(i)|y). This is demonstrated as follows.
p(θi|y) =
∫
Θg−1×[0,1]gp(ψ|y)d(θ1, . . . ,θi−1,θi+1, . . . ,θg, π1 . . . πg) (3.20)
where Θ is the parameter space of θi, ∀ i = 1 . . . g. Let us transform the previous
integral by applying the permutation η. Taking into account that the determinant of
the Jacobian matrix of a permutation is equal to one and that the parameter space
does not change under a permutation:
p(θi|y) =
∫
Θg−1×[0,1]gp(ψη|y)d(θη(1), . . . ,θη(i−1),θη(i+1), . . . ,θη(g), πη(1) . . . πη(g))
= p(θη(i)|y) (3.21)
The fact that p(θi|y) = p(θη(i)|y) implies that E(θi|y) = E(θη(i)|y). The latter equal-
ity holds for all the permutations of the set G. Then, the estimates of the component
parameters are identical for any data set which is an unsatisfactory result for the esti-
mation problem (Marin et al., 2005).
Some authors imposed constraints on the parameter space to ensure the uniqueness
of labelling (e.g. Lenk & DeSarbo, 2000). However, defining valid constraints can be
a challenge when p > 1 (Fruhwirth-Schnatter, 2006, p.20). Furthermore, these con-
straints can interfere with the ability of the algorithm to explore the posterior surface
or with the algorithm convergence (Lee et al., 2008). The latest efforts have been fo-
cused on constructing relabelling algorithms which reorder the chain (ψ(n), . . . ,ψ(N))
‘a posteriori’ (e.g. Marin et al., 2005; Stephens, 2000).
Even if the approach of reordering the chain of estimates a ‘posteriori’ is able to solve
the ‘label switching problem’, the Gibbs sampler, as other Monte Carlo methods, may
be trapped in some local modes of the posterior distribution and then, become unable to
reproduce the posterior surface (Lee et al., 2008). In addition, the Bayesian approach
is computationally more costly than the frequentist procedure because it requires a
73
post-treatment of the Monte Carlo chain. Due to these difficulties, I have chosen to
use the EM algorithm, which does not need to deal with the issue of exchangeable
components (Jasra et al., 2005; McLachlan & Peel, 2000, p.27).
3.11 Selecting the number of mixture components
Selecting the number of components is a difficult statistical problem, which has been
broadly studied with no unequivocal method outperforming the others for all the ap-
plications (McLachlan & Peel, 2000, p.175). The main approaches are based on 1)
information criteria, and 2) testing procedures (Melnykov & Maitra, 2010). Both al-
ternatives are briefly described here and a full discussion can be found in McLachlan
& Peel (2000, Chapter 6).
3.11.1 Information Criteria
Including more components in the mixture increases the value of L(ψ) but leads to
more complex models (Figueiredo & Jain, 2002). Information criteria choose the model
which provides the highest value L(ψ) while penalising for the complexity of the model.
Therefore, all the information criteria have a generic expression:
−2L(ψ) + 2C (3.22)
where L(ψ) is the value of the log likelihood function at ψ and C is a complexity
penalty term.
There is a large number of information criteria. For instance, Akaike’s Information
Criterion (AIC) (Akaike, 1973, 1974); Bayesian Information Criterion (BIC) (Schwarz,
1978); Integrated Classification Likelihood Criterion (ICL) (Biernacki et al., 2000);
Bayesian Information criterion type approximation of the Integrated Classification
Likelihood Criterion (ICL-BIC) (Biernacki et al., 2000); Normalized Entropy Criterion
(NEC) (Celeux & Soromenho, 1996); Bootstrap-Based information Criterion (EIC)
(Ishiguro et al., 1997); Cross-Validation-Based Information Criterion (CVIC) (Smyth,
74
2000); Informational Complexity Criterion (ICOMP) (Bozdogan, 1990, 1993); Approx-
imate Weight of Evidence (AWE) (Banfield & Raftery, 1993) and Minimum Message
Length Criterion for mixtures (L) (Figueiredo & Jain, 2002)
Independent on any information criterion used, the selection of the number of com-
ponents is performed as follows (e.g. Fraley & Raftery, 1998; McLachlan & Peel, 2000,
p. 219).
Decide on the maximum number of components to fit. Then, decreasing the number
of component to one, perform the following steps:
• Choose ψg according to the procedure detailed in Section 3.9. The subindex g
indicates that the mixture has g components.
• Compute the information criterion chosen.
The number of components selected is the one which minimises the information crite-
rion chosen.
In this thesis, the information criteria used to select the number of components
were:
• AIC(Akaike, 1973, 1974):
AIC = −2L(ψ) + 2d
• BIC (Schwarz, 1978):
−2 logL(ψ) + d log n
where d is the number of parameters in the mixture and n is the sample size. Fonseca
& Cardoso (2007) showed that BIC had the best performance to select the number of
clusters for mixtures of multivariate Gaussian distributions in an experiment carried
out with 42 data sets with the EM algorithm.
The main inconvenience for all the information criteria is that they do not provide
a measure of confidence for choosing a particular number of components (McLachlan
75
& Peel, 2000, p.184). Such measure can be obtained by applying bootstrap techniques
on the likelihood ratio test statistic (McLachlan, 1987).
3.11.2 Likelihood ratio test for selecting the number of clus-
ters
The selection of the number of components can be performed using the likelihood ratio
test statistic (λ). The null hypothesis H0 : g = g0 is tested against the alternative one
HA : g = g0 + 1.
λ =supθ ∈ H0
L(ψ)
supθ ∈ HAL(ψ)
=L(ψg0)
L(ψg0+1)
H0 is rejected when λ is small or equivalently when −2 log λ is large. The statistic
−2 log λ usually has an asymptotic distribution of a χ2d, where d is the difference be-
tween the number of parameters under HA and H0 (Wilks, 1938). This result assumes
that the parameter space of the null hypothesis (Θ0) is in the interior9 and an iden-
tifiable subset of the parameter space (Θ)10, which is not fulfilled for mixture models
(Ghosh & Sen, 1985; McLachlan & Peel, 2000, p.185-186).
Let us consider a simple example given in Ghosh & Sen (1985) which I detail here:
Imagine we want to test:
H0 : f(y, θ0), with θ0 fixed, against
HA : πf1(y, θ1) + (1− π)f(y, θ2)
In this example, Θ and Θ0 are given by:
Θ = [0, 1]× S1 × S2
where S1 and S2 is the parameter space of θ1 and θ2, respectively.
Θ0 = (1 × θ0 × S2) ∪ (0 × S1 × θ0) ∪ ([0, 1]× θ0 × θ0)9Interior(S)=S-Boundary(S), being S a set
10Non-identifiable: The null hypothesis holds for two different values of the parameters belonging
to the parameter space of the alternative hypothesis (McLachlan & Peel, 2000, p.185)
76
Note that Θ0 is on the boundary of Θ (Ghosh & Sen, 1985). Furthermore, if H0 is true,
f(y, θ0) is represented by three densities of the form πf1(y, θ1) + (1 − π)f(y, θ2): 1)
π = 1 and θ1 = θ0, 2) π = 0 and θ2 = θ0 and 3) π ∈]0, 1[ and θ1 = θ2 = θ0 (Ghosh
& Sen, 1985). Thus, Θ0 is in a non-identifiable subset of Θ (Ghosh & Sen, 1985).
The breakdown of the regularity conditions implies that the distribution of −2 log λ
is in general unknown. McLachlan (1987) proposed to use bootstrap techniques to
approximate this distribution as follows.
1. Apply the EM algorithm on the recorded data to find the MLE
ψg0 = (π1, . . . , πg0 , θ1, . . . , θg0) by the procedure in Section 3.9.
2. Generate B bootstrap samples of size n from the finite mixture model given by:
yb1, . . . ,ybn ∼
g0∑
i=1
πifi(y|θi), b = 1 . . . B
3. Apply the EM algorithm to the bootstrap sample yb1, . . . ,ybn with g0 and g1 =
g0 + 1 components and calculate:
−2 log λb = −2 logL(ψg1)b − 2 logL(ψg0)b, b = 1, . . . , B.
4. Order the values of −2 log λb from the smallest to the largest to obtain the order
statistics.
5. Calculate the value of the likelihood ratio test statistic for the original sample
−2 log λs = −2 logL(ψg1) − 2 logL(ψg0) and compare it with the k-th order
statistic, (−2 log λk), k = 1 . . . B, to get a significance level of α = 1− k
B + 1.
6. The null hypothesis is rejected if −2 log λs is larger than −2 log λk.
The main disadvantages of the bootstrap approach are: a) its computational cost,
b) its dependence to the strategies used to handle spurious cluster partitions, to ini-
tiate and to stop the algorithm and c) it provides only an approximation of the true
distribution of −2 log λ (McLachlan & Peel, 2000, p.193).
77
3.12 Summary
In this chapter I have reviewed the fundamentals of the methodology of finite mixture
models. Finite mixture models are used to model non-standard distributional shapes.
For instance, Xu et al. (2010) used finite mixture models of univariate Gaussian distri-
butions to model the pdf of abortion time from randomly selected dairy cows. Another
important application is the classification of observations into groups. A very early ex-
ample of this application can be found in Pearson (1894) who applied mixture models
to discover subspecies among a population of blue crabs.
The modelling of pdf shapes and the classification of observations into groups re-
quire estimating the mixture parameters. These estimates can be obtained by frequen-
tist or Bayesian procedures. Under the frequentist approach, the estimation of the
parameters is based on the maximum likelihood method and the application of the
EM algorithm. The log likelihood function is multimodal so the EM algorithm needs
to be initiated from different starting values to widen the search for local maximisers.
Then, the local maximum with the highest value of the log likelihood, after the dele-
tion of spuriosities and singularities, is chosen as the maximum likelihood estimate.
The Bayesian approach requires applying the Gibbs sampler algorithm to generate a
random sample from the posterior distribution of the parameters. Then, the estimates
of the mixture are obtained by taking the mean of the random sample. In this the-
sis, I use the frequentist approach because it is computationally less costly than the
Bayesian one and does not require dealing with the problem of interchange of mixture
components. The frequentist approach for fitting mixtures can be implemented with
the R packages mixtools (Benaglia et al., 2009) and Mclust (Fraley & Raftery, 1999)
(for mixtures of univariate Gaussian distributions) or EMMIX (McLachlan et al., 1999)
(for mixtures of multivariate Gaussian distribution).
After having introduced all the necessary details of the technique, in the next chap-
ter I apply the proposed methodology for clustering the (NU , GY ) field data collected
78
across environments in a real-life case study. The inspection of the mixture groups
could reveal environmental conditions affecting (NU , GY ). By fitting mixture models
one can estimate the expected value (mean) and the variance (degree of dispersion
around the mean) of each NU and GY as well as their correlation (degree of associa-
tion between both traits) for each group (environment). Furthermore, if the researcher
still wants to estimate IEN , bivariate mixture models allow one to do so by taking the
ratio of each of the estimated means and calculating confidence sets according to the
procedures detailed in Chapter 2. Chapter 4 is written as a manuscript according to
the guidelines of the Australian and New Zealand Journal of Statistics.
79
80
Chapter 4
Bivariate models for internal nitrogen use efficiency:
mixture models as an exploratory tool
This chapter is presented according to the requirements of submission to the Australian
and New Zealand Journal of Statistics.
Statement of authorship and author contributions
Title of Paper Bivariate models for internal nitrogen use efficiency: mixture models as an
explorative tool
Publication status In preparation for submission
By signing the Statement of Authorship, each author certifies that their stated
contribution to the publication is accurate and that permission is granted for the pub-
lication to be included in the candidate’s thesis.
81
Name of the Princi-
pal Author
Isabel Munoz-Santa
Contribution to the
paper
Originated the idea of the analysis, developed the methodology, undertook
critical review of the relevant literature, analysed and interpreted the case-study
data, developed the computational code, wrote and edited the manuscript and
will act as corresponding author
Signature and date
Name of Co-Author Petra Marschner
Contribution to the
paper
Supervised the development of the work in its biological aspects, provided
expertise for the interpretation of biological consequences of the analyses, con-
tributed to the editing of the manuscript as appropriate
Signature and date
Name of Co-Author S.M. Haefele
Contribution to the
paper
Supplied the data set, provided expertise for the interpretation of biological
consequences of the analysis, contributed to the editing of the manuscript as
appropriate
Signature and date
Name of Co-Author O. Kravchuk
Contribution to the
paper
Trained the first author in interpreting statistical aspects as appropriate, con-
tributed to the original discussion of the research ideas, contributed to the
editing of the manuscript as appropriate
Signature and date
Aust. N. Z. J. Stat. 2014 doi: 10.1111/j.1467-842X.XXX
Bivariate models for internal nitrogen use efficiency: mixture models as an exploratorytool
I. MUNOZ-SANTA1 , P. MARSCHNER 2, S.M. HAEFELE 3, O. KRAVCHUK 1
University of Adelaide, Australian Centre for Plant Functional Genomics
Summary
Internal nitrogen use efficiency (IEN ) in cereals, defined as the ratio of grain yield (GY ) tonitrogen uptake (NU ), is an important trait in agronomy and plant and soil science research.In this study we discuss the application of bivariate mixed and mixture models to the analysisof GY and NU field data and compare them to the univariate mixed and mixture modelsfor IEN . The bivariate analyses preserve the information on the GY and NU traits, andavoid dealing with the abnormalities issues of ratios. Bivariate mixture models are proposedas a classification tool for identifying field conditions affecting the utilisation of nitrogen.Due to the design constraints on the collection of data in agricultural field trials, the bivariatemixture technique is suggested as supplementary to bivariate mixed models. The mixturemethodology is demonstrated on a case-study of rice research in northeast Thailand, forwhich the technique is proven useful for exploring environmental conditions, in particular,soil fertility and water availability. The bivariate mixture methodology may be applicable forother efficiency indices in agricultural research.
Key words: bivariate analysis; classification and discrimination; cluster analysis; EMalgorithm; internal nitrogen use efficiency, mixed model; mixtures
1. Introduction
Nitrogen (N) is an elementary constituent of the nucleotides and proteins of cereals
(Xu et al. 2012), but cereal plant roots have available only a fraction of the N in soil. The
availability of N depends on complex interactions between soil, plant and environmental
1 The University of Adelaide, School of Agriculture Food and Wine. Waite Building (Waite Campus), WaiteRd, Urrbrae SA 5064, Australia2 The University of Adelaide, School of Agriculture Food and Wine. Prescott Building (Waite Campus), GlenOsmond SA 5064, Australia3 Australian Centre for Plant Functional Genomics (Waite Campus), Hartley Grove, Urrbrae, SA 5064,Australia∗Author for correspondence: Munoz-Santa, I.,e-mail:[email protected],telephone:+61406815920facsimile:+61(0)883137109
Acknowledgment. The authors acknowledge the Faculty of Science of the University of Adelaide for theMasters of Research scholarship for the first author and Paul Eckermann for his comments on the draft of themanuscript and help with ASReml-R.
c© 2014 Australian Statistical Publishing Association Inc. Published by Wiley Publishing Asia Pty Ltd.
Prepared using anzsauth.cls [Version: 2014/01/06 v1.01]
2 MIXTURES FOR INTERNAL NITROGEN USE EFFICIENCY
factors (Marschner 2012, p. 315). Once N is absorbed by roots, it is transported to the rest
of the plant, and a portion of this N is stored in the grain (Xu et al. 2012). To quantify the
efficiency of the N utilisation by plants, several N efficiency indices have been defined. In
the context of growing demand for cereals worldwide and limited agricultural land, a better
understanding of N efficiency is a research priority among plant and soil scientists.
Among several N efficiency indices, this study is focused on internal nitrogen use
efficiency (IEN ). Internal nitrogen use efficiency is defined as the ratio of grain yield (GY )
to nitrogen uptake (NU ), i.e. the content of N in the aboveground biomass. In agricultural
research, this index expresses the ability of plants to utilise NU for grain production.
However, how NU is utilised for grain is a complex process, governed by environmental
factors, plant genetics and agronomic practices (Cassman et al. 2002).
In agricultural field trials, GY and NU are measured at plot level at harvest. At this
stage, a typical scatter of the GY and NU data for a particular cultivar grown under a range
of conditions exhibits an increasing monotone linear-plateau shape (e.g. Witt et al. 1999;
Naklang et al. 2006). However, this shape may result from an overlay of growth processes
under the different conditions rather than from a direct functional response of GY to NU .
As environmental conditions vary, the process of NU utilisation for GY changes, and
so does the degree of association (correlation) between these two traits. Since a fraction of
NU is utilised for seed formation (Xu et al. 2012), one would expect the correlation between
NU and GY to be positive or zero. A negative correlation may occur when an excess of N
in plants adversely affects the synthesis of proteins as well as plant health and growth pattern
(Hauck 1984, p. 111). However, an excess of N, arguably caused by an oversupply of N
fertiliser (Hauck 1984, p. 97), is not common in sustainable agriculture nowadays and will
not be considered in this work.
At harvest, conditional on major environmental factors and in the absence of a strong
competition among plants, NU and GY at plot level can be seen as a cumulative effect of
a large number of independent random variables with finite variances. By the Central Limit
Theorem (see Cramer 1946, p. 219), each NU and GY is thus expected to follow a normal
distribution. Expanding this result to the bivariate case (see Cramer 1946, p. 286), the joint
distribution of (NU , GY ) is expected to be bivariate normal. Then, the probability density
function of IEN is a mixture of two heavy-tailed distributions (Marsaglia 1965, 2006) and
the shape of the IEN distribution can vary from normal-like to skewed or bimodal depending
on the means and variances of NU , GY and their correlation (Marsaglia 1965, 2006). The
effects of these parameters on the distributional shape of the ratio cannot be easily untangled.
Despite their intrinsic abnormalities, ratios are commonly used among plant and
soil scientists. In published research in agriculture, IEN is commonly computed for each
experimental plot and analysed by univariate linear models (e.g. Peng et al. 1996; Delogu
c© 2014 Australian Statistical Publishing Association Inc.Prepared using anzsauth.cls
I. MUNOZ-SANTA ET AL. 3
et al. 1998; Fang et al. 2006; Naklang et al. 2006). However, univariate linear models on
the ratio do not maintain information on the original traits, including their correlation, which
limits the interpretatability of these models.
A more adequate approach is bivariate analyses on (NU , GY ). Bivariate analyses
preserve the information on the original traits thus, giving more insight into the mechanism of
NU utilisation for GY . Furthermore, bivariate analyses avoid dealing with the abnormalities
issues of the ratio. Among bivariate analyses, estimating and testing effects of treatments
can be done algebraically with multivariate analyses of variance or numerically through
residual maximum likelihood analyses. Evidence for the advantages of bivariate analyses
over the univariate analyses on a ratio is provided by a recent paper by Ganesalingam et al.
(2013) who analysed data on canola survival of blackleg disease in randomised complete
block experiments. Their bivariate linear mixed model better utilised the experimental data,
allowed greater flexibility in modelling spatial correlations and increased the accuracy of
variety survival predictions.
At present, IEN data are predominantly collected in designed field trials where
treatments are different fertiliser applications (e.g. Peng et al. 1996; Delogu et al. 1998; Fang
et al. 2006; Naklang et al. 2006). However, as outlined earlier, the utilisation of NU for GY
depends on nutrient availability, which can differ substantially from the amount of nutrients
applied. Availability of nutrients, including N, depends on complex processes governed by
many environmental factors such as local microbial activity, soil characteristics and water
availability (Marschner 2012, p. 315). In the field, these factors are beyond the investigator’s
control and may induce different levels of available nutrients even for plots receiving the
same fertiliser treatment.
The effect of environmental conditions may overcome, or interact or be confounded
with, the treatments, complicating the interpretation of treatment-based analyses. In this
study, we argue that such non-controlled conditions may lead to very different patterns
of NU utilisation for GY even for the same treatment and thus to groups in the data
different to treatment groups. Identifying such groups in data collected across different
environments may complement treatment-based analyses when the objective is to gain insight
into environmental drivers of NU and GY . Thus, it is important to investigate the benefits of
finite mixture models of bivariate Gaussian distributions as a clustering technique for (NU ,
GY ) field data.
Finite mixture models are a flexible statistical tool used in a large number of fields
to model data sampled from different groups or to approximate unusual distributional
shapes (Figueiredo & Jain 2002). In agriculture, finite mixture models have been used only
occasionally. For instance, univariate mixture models have been used to model the distribution
of abortion time in randomly selected dairy cows (Xu et al. 2010). Bivariate mixture models
c© 2014 Australian Statistical Publishing Association Inc.Prepared using anzsauth.cls
4 MIXTURES FOR INTERNAL NITROGEN USE EFFICIENCY
have been employed to classify soybean genotypes with respect to seed yield and seed protein
in randomised complete block designs (McLachlan & Basford 1988). To the best of the
authors’ knowledge, however, finite mixture models of bivariate normal distributions have
not been used for analysing NU and GY field data.
The analysis of (NU ,GY ) data by bivariate mixture models has three advantages: 1)
avoidance of the abnormalities of the IEN ratio, 2) identification of groups in the presence
of strong environmental factors and 3) ability to consider changes in the correlation between
NU andGY . Furthermore, if the researcher still wants to estimate the ratio within each group,
bivariate mixture models allow this through taking the ratio of the estimated means ofGY and
NU . The confidence set of the ratio of expectations can also be derived by straightforward
calculations (Fieller 1954). However, inference with mixture models is a difficult task (Chen
& Tan 2009), and we thus use the technique here for exploratory purposes only.
This study aims to demonstrate the usefulness of the bivariate mixture methodology,
as a complementary analysis to treatment-based analyses in designed field trials when the
objective is to identify potential environmental factors driving GY and NU . We discuss
the benefits of the bivariate mixture and mixed analyses in comparison with the univariate
counterparts for IEN . The proposed methodology is applied to a particular designed field
experiment on non-irrigated rice reported in Naklang et al. (2006). In that study the design of
the field experiment was treatment-based. However, the objective of the study was to analyse
IEN across a range of environmental conditions without focusing exclusively on the actual
treatments. This was intended to contribute to a better understanding of non-irrigated systems
and to improve current management practices.
This present paper is structured as follows. The fundamentals of finite mixture models
are first reviewed. The case study is then described and analysed by both mixed and mixture
models on IEN and (NU ,GY ). The final section discusses the advantages of using bivariate
models instead of the univariate models on IEN as well as the benefits of the mixture model
methodology for identifying environmental conditions affecting (NU , GY ).
2. Finite mixture models of bivariate Gaussian distributions
2.1. Definitions and notations
Let Y(2×1)j be a random vector, j = 1 . . . n where n is the sample size. Let yj be the
observed value of the random vector Yj . Mixture models of bivariate Gaussian distributions
consider the observations to be independent and identically distributed (i.i.d.) according to
the finite mixture density:
c© 2014 Australian Statistical Publishing Association Inc.Prepared using anzsauth.cls
I. MUNOZ-SANTA ET AL. 5
f(yj |ψ) =g∑
i=1
πiφ(yj |µi,Σi) (1)
where g is the number of components, πi is the mixing proportion of the i-th component and
φ is the bivariate normal density function. The parameters of the component densities are
µi, the joint mean, and Σi, the covariance matrix. The parameter vector of the mixture is
ψ = (π1, . . . , πg,µ1, . . . ,µg,Σ1, . . . ,Σg).
Finite mixture models are commonly used for clustering data (Everitt 1996), in which
case components correspond to groups. Clustering is achieved by assigning each observation
to the group (G) with the maximum posterior probability:
τij = P (yj ∈ Gi|yj) =πiφ(yj |µi,Σi)∑gi=1 πiφ(yj |µi,Σi)
(2)
The calculations of the posterior probabilities require estimating ψ. This is achieved
via the Expectation and Maximization (EM) algorithm (Dempster et al. 1977), an iterative
procedure for calculating maximum likelihood estimates in a framework of missing values.
The fundamentals of this approach are briefly reviewed below.
2.2. The EM algorithm
The EM algorithm estimates the mixture parameters by interpreting the data as a
missing data problem. The data are considered to be incomplete by introducing a g-random
vector Zj , which matches Yj with its group of origin in the following way:
Zij =
1, if Yj belongs to Gi;
i = 1 . . . g; j = 1 . . . n
0, otherwise.
The Z(j) are i.i.d. according to a multinomial distribution with g possible outcomes. The i-
th outcome occurs with probability πi. The realisations of Zj , denoted as zj , are treated as
missing observations, so that the complete data are [z1, . . . , zn, y1, . . . , yn].The log likelihood function (L) of the incomplete data [y1, . . . , yn] is mathematically
intractable, and the root of the log likelihood equations (∂L/∂ψ = 0) cannot be calculated
explicitly (Redner & Walker 1984). The EM algorithm approaches finding a local maximum
of L by maximising the expectation of the log likelihood function of the complete data
for given values of [y1, . . . , yn] and ψ. This results in a simpler optimisation problem with
the advantage of producing, from a given initial estimate, ψ(0), a sequence of estimates,
c© 2014 Australian Statistical Publishing Association Inc.Prepared using anzsauth.cls
6 MIXTURES FOR INTERNAL NITROGEN USE EFFICIENCY
ψ(r)sr=0, such that L(ψ(0)) ≤ L(ψ(1)) . . . ≤ L(ψ(s)) (Dempster et al. 1977). Under very
weak conditions, the sequence ψ(r)sr=0 converges to a stationary point of L(ψ), which can
be a local or global maximum, if the algorithm is not trapped in a saddle point (Wu 1983).
For bivariate mixture models of Gaussian distributions (Eq. 1), the sequence of
estimates, ψ(r)sr=0, is generated starting from a chosen initial estimate and iteratively
performing the two EM steps, which are outlined below for the (r+1)-th iteration.
1. Update the posterior probabilities (E-step):
τ(r+1)ij =
π(r)i φ(yj |µ(r)
i ,Σ(r)i )
∑gi=1 π
(r)i φ(yj |µ(r)
i ,Σ(r)i )
2. Update the estimates of the mixture parameters (Eq. 1) by using the new posterior
probabilities (M-step):
π(r+1)i =
n∑
j=1
τ(r+1)ij
/n
µ(r+1)i =
n∑
j=1
τ(r+1)ij yj
/ n∑
j=1
τ(r+1)ij
Σ(r+1)i =
n∑
j=1
τ(r+1)ij (yj − µ(r+1)
i )(yj − µ(r+1)i )>
/ n∑
j=1
τ(r+1)ij
The algorithm stops when the difference between the values of the log likelihood function at
two consecutive iterations is smaller than a chosen threshold.
2.3. Fitting mixture models
Fitting mixtures of multivariate Gaussian distributions with unconstrained covariance
matrices is a challenging statistical and computational task widely discussed in the statistical
literature (e.g. Figueiredo & Jain 2002; Melnykov & Maitra 2010). The issues encountered
are related to the nature of L(ψ) rather than to a failure of the EM algorithm (McLachlan &
Peel 2000, p. 99). Firstly, L(ψ) is unbounded in the points (called singularities) where the
estimated mean of one of the mixture components is equal to a yj and the determinant of the
covariance matrix of this component tends to zero (Kiefer & Wolfowitz 1956; McLachlan &
Peel 2000, p. 94). Thus, the global maximum of L(ψ) does not exist and a local maximum
needs to be chosen as the maximum likelihood estimate (Redner & Walker 1984). Secondly,
L(ψ) can present multiple local maxima, bringing the dilemma of which particular local
c© 2014 Australian Statistical Publishing Association Inc.Prepared using anzsauth.cls
I. MUNOZ-SANTA ET AL. 7
maximum to choose and causing sensitivity of the algorithm to starting and stopping rules
(Seidel et al. 2000). Finally, L(ψ) can present a large local maximum when fitting a
component to few data points whose covariance matrix has at least one eigenvalue that is
very small. These solutions are known as spuriosities (McLachlan & Peel 2000, p. 99), and
we refer to them as mathematical spuriosities. It is also possible that some solutions fall
outside the biologically meaningful range of parameters. Instead of restricting the parameter
space, one can additionally consider such solutions as biological spuriosities.
A recommended strategy [see McLachlan & Peel (2000, p. 97), Melnykov & Maitra
(2010) and Ng (2013)] for selecting the estimates of the mixture parameters is to decide
on the meaningful maximum number of components to fit, and to perform the following
steps, starting with fitting the maximum number of components and decreasing to just one
component:
1. Initiate the EM algorithm from different starting values of the parameters ψ(0) or
group partitions z(0)j .
2. Run the EM algorithm on the unrestricted parameter space until the difference
between the log likelihood function of two consecutive iterations is less than a chosen
threshold (c).
3. Examine and remove singularities and mathematical or biological spuriosities.
4. From the remaining solutions, select the one which gives the highest value of the log
likelihood function.
Multiple starting strategies has been proposed to initiate the EM algorithm (e.g.
Biernacki et al. 2003; Karlis & Xekalaki 2003; Maitra 2009) and a comprehensive review
can be found in McLachlan & Peel (2000, p. 54-55). However, none of the starting strategies
has been shown to outperform the others in all cases (Melnykov & Maitra 2010). For the
purpose of our study, we have selected the following five strategies:
1. Random starts: the initial partition is constructed by randomly allocating each
observation to one of the groups (McLachlan & Peel 2000, p. 55).
2. K-means: the initial partition is provided by the k-means algorithm (Forgy 1965 cited
in Omran et al. 2007), which produces a group partition by minimising the distance
from observations to the means of the groups (McLachlan & Peel 2000, p. 54).
3. Simulated means: the EM algorithm is initiated with g simulated means from a
bivariate normal distribution with the mean and covariance matrix calculated from the
c© 2014 Australian Statistical Publishing Association Inc.Prepared using anzsauth.cls
8 MIXTURES FOR INTERNAL NITROGEN USE EFFICIENCY
sample. The mixing proportion and covariance matrix for the i-th component are given
by π(0)i = 1/g and Σi = S, where S is the entire data covariance matrix (McLachlan
& Peel 2000, p. 55).
4. Subsample solution: the initial value of the parameters is the solution obtained by the
EM algorithm when the latter is applied to a random subsample and initiated from
random starts. The size of the subsample needs to be large enough not to produce
degenerate estimates (McLachlan & Peel 2000, p. 55).
5. Short runs of the EM algorithm: the initial value of the parameters is the solution with
the highest value of the likelihood function obtained after running several short runs
of the EM algorithm when the latter is initiated from random starts (Biernacki et al.
2003).
The first two strategies are available in the package EMMIX (McLachlan et al. 1999).
Strategies 3-5 were programmed by the first author in R (R Core Team 2012) (the code is
available upon request).
Another important decision is the choice of the number of groups. A common approach
is to select the model which minimises some information criteria (McLachlan & Peel 2000,
p. 184) such as the Bayesian Information Criterion, BIC (Schwarz 1978), or Akaike’s
Information Criterion, AIC (Akaike 1973, 1974).
BIC = −2L(ψ) + d log(n)
AIC = −2L(ψ) + 2d
where L(ψ) is the log likelihood value of the selected local maximum (ψ); d is the number
of parameters to estimate, in the case of mixtures of bivariate Gaussian distributions with
unrestricted covariance matrices d = 6g − 1 (Eq.1), and n is the sample size.
The methodology described above is easily translated to the case of mixtures of
univariate Gaussian distributions. In this case, the parameters to estimate are the mixing
proportions, πi, the means, µi, and the variances, σ2i , i = 1 . . . g. For a review of mixture
models of univariate Gaussian distributions, refer to Fruhwirth-Schnatter (2006, p. 169-190).
2.4. Visual guides for mixture models
As a visual guide for mixtures of bivariate Gaussian distributions, the data are usually
plotted in a scatter together with prediction/coverage ellipses of the mixture groups (e.g
Figueiredo & Jain 2002; McLachlan & Peel 2000, p. 103). This visual method is in
c© 2014 Australian Statistical Publishing Association Inc.Prepared using anzsauth.cls
I. MUNOZ-SANTA ET AL. 9
accordance with those suggested by Friendly (2006) for multivariate analyses of variance.
The ellipses depict the means, variances and correlations of each group and are thus useful
for identifying both biological and mathematical spuriosities. For instance, ellipses with
elongated or small circular shapes correspond to mathematical spuriosities (McLachlan &
Peel 2000, p. 103).
In the univariate case, the histogram of the data together with the probability density
functions of the fitted components is commonly used for visualisation purposes (e.g. Benaglia
et al. 2009). Spuriosities are detected by inspecting the component densities. For instance,
components with small variances correspond to mathematical spuriosities (e.g. McLachlan
& Peel 2000, p. 104).
3. Case study
The data set considered (Haefele, S.M., pers. comm., 2013) comprises 624 plot
observations of NU and GY of non-irrigated rice from the field experiments reported in
Naklang et al. (2006). The experiments were part of the field trials conducted by Wade
et al. (1999). Neither Wade et al. (1999) nor Naklang et al. (2006) formulated their research
questions exclusively in terms of treatment-based analyses; both also aimed to improve
understanding of non-irrigated rice systems across different environmental conditions. As in
our approach, Wade et al. (1999) performed a cluster analysis using the Ward’s agglomerative
hierarchical algorithm (Delacy et al. 1996) to identify environmental groups. However, the
cluster analysis was conducted exclusively on GY after removing the site and treatment
effects.
The field experiments were carried out at eight sites in the northeast of Thailand: Udon
Thani, Sakhon Nakhon, Khon Kaen, Chum Phae, Tung Kula Ronghai, Phi Mai, Surin and
Ubon Ratchathani (see Figure 1 in Naklang et al. (2006) for their locations). At each site, a
completely randomised block design was implemented with three blocks and eight fertiliser
treatments (Table 1) applied on plots of 20 m2. The experimental layout was maintained over
the wet seasons of 1995, 1996 and 1997. For each site and year, the soil water status was
visually assessed every week and rated according to three categories: dry soil surface, wet
soil surface or ponded water. However, only the dominant water conditions at pre-flowering,
flowering and post-flowering were reported. This procedure resulted in the water conditions
being completely confounded with the site.year effect in the design (Table 2, some sites
excluded as explained later). The experiment was repeated fully irrigated only at the Ubon
Ratchathani site.
Grain and straw yield samples were collected from an area of 8 m2 in the centre of
each plot and analysed for N concentration. Grain yield was adjusted to a standard moisture
c© 2014 Australian Statistical Publishing Association Inc.Prepared using anzsauth.cls
10 MIXTURES FOR INTERNAL NITROGEN USE EFFICIENCY
content of 14%. Nitrogen uptake was estimated by the following equation:
NU =[N ]SS +m[N ]GGY
1000(kg/ha)
where [N ]S is N concentration in straw (g/kg), [N ]G is N concentration in dry grain (g/kg),
S is straw yield (kg/ha), GY is grain yield (kg/ha) and m is the moisture correction factor,
equal to 0.86.
TABLE 1 ABOUT HERE
TABLE 2 ABOUT HERE
The observations from Chum Phae, Phi Mai and Tung Kula Ronghai were excluded for
the following reasons. Chum Phae and Phi Mai presented large amount of missing values of
[N ]G or [N ]S . In Tung Kula Ronghai and unlike the remaining sites, rice was direct-seeded in
1996, which resulted in observations substantially different to those coming from seedlings.
All the (NU , GY ) observations from the remaining sites over the three years were included
and presented the typical linear-plateau scatter (Figure 1).
FIGURE 1 ABOUT HERE
4. Linear mixed model analyses
4.1. Univariate mixed model
Internal nitrogen use efficiency was analysed with a univariate mixed model in Genstat
(VSN International 2012). At each site, the design considered was a strip plot with three
blocks and treatments and years treated as the strip factors (Table 3). Let s, t, y, b and n denote
the number of sites, fertiliser treatments, years, blocks and the total number of observations,
respectively. For this case study, s = 6, t = 8, y = 3, b = 3 and n = 432.
Let y(n×1) be the vector of IEN observations; y was modelled as:
y = Xτ + Zaua + Zbub + Zcuc + Zdud + Zeue + ε
where τ (k×1) is the vector of fixed effects containing the overall mean, year, treatment and
treatment.year effects, k = 1 + y + t+ yt; and X(n×k) is the corresponding design matrix.
The vectors u(s×1)a , u(sb×1)
b , u(st×1)c , u(sy×1)
d , u(sty×1)e are the site, site.block, site.treatment,
site.year and site.treatment.year random effects with design matrices Z(n×s)a , Z(n×sb)
b ,
Z(n×st)c , Z(n×sy)
d and Z(n×sty)e , respectively. The vector ε(n×1) is the vector containing the
c© 2014 Australian Statistical Publishing Association Inc.Prepared using anzsauth.cls
I. MUNOZ-SANTA ET AL. 11
plot errors. Site was considered a random term because the locations were randomly selected
from northeast Thailand. The random effects and plot errors were assumed to be independent
and normally distributed with mean zero and var(ua) = σ2aIs, var(ub) = σ2
b Isb, var(uc) =σ2cIst, var(ud) = σ2
dIsy , var(ue) = σ2eIsty and var(uε) = σ2
ε In, where var() denotes the
covariance matrix and Ir the identity matrix of dimension r.
TABLE 3 ABOUT HERE
In the univariate mixed model analysis of IEN , treatment was significant (p-value
≤ 0.01), whereas year and treatment.year were not (p-values of 0.45 and 0.86, respectively).
Control and PK presented the largest means of IEN , and FYM NPK and ALL presented the
lowest (Table 4).
The random term site.year and site.treatment.year had a very large variance component
(Table 5).
The IEN residuals did not violate the normality and homoscedasticity assumptions in
any obvious way, except for some slight heavy-tailedness (Figure 2).
TABLE 4 ABOUT HERE
TABLE 5 ABOUT HERE
FIGURE 2 ABOUT HERE
4.2. Bivariate mixed model
The (NU , GY ) data were analysed by a bivariate mixed model conducted with
ASReml-R (Butler et al. 2007). As in the univariate case, at each site, the design considered
was a strip plot with three blocks and treatments and years treated as strip factors (Table 3).
Let y(2n×1) = [y>NU , y>GY ]> be the vector containing the observations of NU and GY ;
y(2n×1) was modelled as:
Y = Xτ + Zaua + Zbub + Zcuc + Zdud + Zeue + ε
where τ (2k×1) is the vector of fixed effects containing the overall joint mean, year, treatment
and treatment.year effects for both NU and GY , and X2n×2k is the corresponding design
matrix. The vectors u(2s×1)a , u(2sb×1)
b , u(2st×1)c , u(2sy×1)
d , u(2sty×1)e are site, site.block,
site.treatment, site.year and site.treatment.year random effects for NU and GY with
design matrices Z(2n×2s)a , Z(2n×2sb)
b , Z(2n×2st)c , Z(2n×2sy)
d and Z(2n×2sty)e , respectively.
The vector ε = (ε>1 , . . . , ε>n )> contains the plot errors. As in the univariate analysis site
c© 2014 Australian Statistical Publishing Association Inc.Prepared using anzsauth.cls
12 MIXTURES FOR INTERNAL NITROGEN USE EFFICIENCY
was considered a random term. The random effects and plot errors were assumed to be
independent and normally distributed with mean zero and the following covariance matrices:
var(ua) =
[σ2a1 σa12
σa12 σ2a2
]⊗ Is var(ub) =
[σ2b1 σb12
σb12 σ2b2
]⊗ Isb
var(uc) =
[σ2c1 σc12
σc12 σ2c2
]⊗ Ist var(ud) =
[σ2d1 σd12
σd12 σ2d2
]⊗ Isy
var(ue) =
[σ2e1 σe12
σe12 σ2e2
]⊗ Isty var(uε) =
[σ2ε1 0
0 σ2ε2
]⊗ In
where ⊗ denotes the Kronecker product, σr1 and σr2 are the ur variances for NU and GY ,
respectively, and σr12 is the covariance of ur between NU and GY .
In the bivariate mixed model analysis, treatment was significant (p-value ≤ 0.001 for
GY and NU ) whereas year and treatment.year were not (p-value of 0.77 and 0.12 for GY
and NU for year, and 0.86 and 0.12 for GY and NU for treatment.year). Control and PK
presented the lowest means for bothNU andGY , and FYM NPK and ALL the largest (Table
4).
A large variance component for site was observed indicating great heterogeneity for
both NU and GY among sites (Table 6). The high correlation between GY and NU (Table
6) in all the random effects is partially due to the fact thatNU is a variable derived fromGY .
Thus, it is likely this correlation is spurious.
TABLE 6 ABOUT HERE
4.3. Limitations of the linear mixed models
In agricultural research, field trials are designed to compare IEN across different
fertiliser treatments. However, the utilisation ofNU forGY depends on the levels of nutrients
available in soil rather than the amount of nutrients applied. The availability of nutrients
is conditioned by complex environmental interactions between climate, plants and soil and
varies greatly with time and space (Marschner 2012, p. 136). This may result in very different
levels of available nutrients, even in plots under the same fertiliser application and agronomy
practice. Thus, such trials may present non-uniform (ill-defined) treatments. For instance,
Control was clearly ill-defined in our case study. This is evident even though there was no
direct measures on N available, as NU can be taken as a good surrogate indicator of N
available in Control plots (Dobermann et al. 2003). Since there was considerable variation
c© 2014 Australian Statistical Publishing Association Inc.Prepared using anzsauth.cls
I. MUNOZ-SANTA ET AL. 13
of NU among the Control plots at different sites and in different years (see Figure 1),
considerable variation of available N was expected.
Additionally, for this case study there was no design factor which could explain
the effect of water. Furthermore, site and year created a range of different environmental
circumstances but did not shed much light on the relationship between NU and GY .
5. Mixture model analyses
5.1. Univariate mixture model
For this case study, the residuals of IEN from the univariate linear mixed model did
not violate the normality and homoscedasticity assumptions in any obvious way (Figure 2).
Thus, the probability density of the ratio can be approximated by a mixture of univariate
Gaussian components. The univariate mixture analysis was performed using the R (R Core
Team 2012) mixtools package (Benaglia et al. 2009), which implements the EM algorithm
for fitting mixtures of univariate Gaussian distributions, and settings specified in Table 7. The
R (R Core Team 2012) code is available upon request.
TABLE 7 ABOUT HERE
The information criteria AIC and BIC selected two as the optimal number of groups
(Table 8). The first group included the plots with IEN between 24 and 44 and the second all
the others. The second group had a larger mean and standard deviation than the first (Table
9).
TABLE 8 ABOUT HERE
TABLE 9 ABOUT HERE
The first group contained 70% of the plots which received FYM NPK (Figure 3 a). For
the other fertiliser treatments, the proportions of observations falling into the two groups
did not differ as much as for FYM NPK (e.g. Figure 3 b). Specifically, the proportions
of observations classified in the first and second group for each fertiliser treatment are as
follows: Control (42% vs 58%), PK (41% vs 59%), N (64% vs 36%), FYM (42% vs 58%),
NPK (60% vs 40%), CR NPK (50% vs 50%) and ALL (66% vs 34%). The proportions of
observations classified in the groups for each soil water status at post-flowering were: dry
(45% vs 55%), wet (62% vs 38%) and ponded water (52% vs 48%) (Figure 3 c).
FIGURE 3 ABOUT HERE
c© 2014 Australian Statistical Publishing Association Inc.Prepared using anzsauth.cls
14 MIXTURES FOR INTERNAL NITROGEN USE EFFICIENCY
5.2. Bivariate mixture model
The (NU,GY ) data were analysed using the R (R Core Team 2012) package EMMIX
(McLachlan et al. 1999), which implements the EM algorithm for fitting mixtures of
multivariate Gaussian distributions (the settings are specified in Table 7). The R code is
available upon request.
The information criterion BIC selected three groups, whereas AIC selected five (Table
10). BIC has been shown to perform better than AIC (Fonseca & Cardoso 2007). Thus, three
was chosen as the optimal number of groups. The first and second groups presented lower
means of NU and GY than the third (Table 11). In terms of the estimated correlations, NU
and GY were tightly correlated in the first group (Table 11). The mean of IEN of each group
was estimated from the bivariate analysis by taking the ratio of the estimated joint mean,
defined in Table 11 as ˆIENi. The confidence sets of IENi were derived by straightforward
calculations according to the confidence rule in Fieller (1954) (Figure 4).
TABLE 10 ABOUT HERE
TABLE 11 ABOUT HERE
FIGURE 4 ABOUT HERE
It appears that the soil water status post-flowering and the soil N supply are the main
factors (from the measurements recorded for the analysis) defining the mixture groups (Figure
5). Most of the plots (62%) classified in the first group did not receive any added N fertiliser
(Control or PK). Soil N supply is the limiting factor for grain production (Xu et al. 2012),
which explains the low means of NU and GY and the strong correlation between NU and
GY in this group (Table 11). The first group presented the largest mean of IEN (Table
11). This result was in agreement with those provided by the univariate linear mixed model
analysis, in which Control and PK plots utilised N more efficiently (Table 4). There were
also plots with no N added in the second group but at a lower proportion (21%) than in the
first (62%). The plots with dry soil (86%) or ponded water (73%) post-flowering were mostly
classified in the first or second groups (Figure 5). Plots with dry soil had lower values of NU
and GY due to the fact that many physiological processes related to the uptake of nutrients
are impaired under water stress (Tanguilig et al. 1987). The low GY in plots with ponded
water post-flowering may have been due to the fact that rice remained green for longer,
which affected grain filling period, translocation of N from green biomass into grain, and
grain ripening (Ntanos & Koutroubas 2002). The plots in the third group were characterised
by having N, P and K added as well as by being wet post-flowering – 71% of the plots with
wet soil post-flowering were classified in the third group. This group had the largest mean
GY .
c© 2014 Australian Statistical Publishing Association Inc.Prepared using anzsauth.cls
I. MUNOZ-SANTA ET AL. 15
FIGURE 5 ABOUT HERE
5.3. Limitations of the mixture model approach
Finite mixture models assume independence of observations. However, this assumption
may be violated in designed field trials. Recent developments in mixture models allow
handling correlated observations by mixtures of linear mixed models (Ng 2013). Conditional
on the mixture components, the observations on the experimental units are modelled by a
mixed model, which provides a means to estimate correlations (Ng 2013). In our case,
modelling the correlation will be a complex task. For instance, the correlation between
plots depends on N availability, which is conditioned by environmental factors (spatial soil
variability, microbial activity) and agronomic practices (re-application of fertiliser over the
3 years and the presence of straw residues or losses of nutrients from the previous harvest).
Therefore, the extension of mixture models to include this type of correlation is challenging
and may not even be possible. Broadly speaking, as pointed out by Ng et al. (2006), adapting
clustering techniques to a wide variety of experimental design is an open research question.
6. Discussion
In published agricultural research, field trials are commonly designed to compare IENacross different fertiliser treatments and the analysis is often done by univariate linear models.
However, environmental factors may cause plots under the same fertiliser treatment and
agricultural practice to present different levels of available nutrients, resulting in a lack of
consistency in treatment replications in field trials. Furthermore, univariate linear models of
the ratio do not maintain the information on the original traits, and are often applied without
checking the normality and homogeneity of error variance assumptions.
Sampling across a range of environments may lead to different patterns (groups) of
NU for GY in field data. In this study, we have investigated the use of bivariate mixture
models for identifying groups. Once the groups are identified, their close investigation could
reveal the underlying defining factors, which may not necessarily coincide with experimental
treatments.
The benefits of using bivariate mixture models in nitrogen efficiency research have been
clearly demonstrated in our analysis of the case study on non-irrigated rice. Soil water status
post-flowering has been revealed as an environmental factor defining the mixture groups. In
terms of fertiliser treatments, both bivariate mixture and mixed models indicate that plots
with no added N produced less grain and had shown less N uptake, which supports the fact
that soil N supply is the limiting factor for grain production (Xu et al. 2012).
c© 2014 Australian Statistical Publishing Association Inc.Prepared using anzsauth.cls
16 MIXTURES FOR INTERNAL NITROGEN USE EFFICIENCY
This study has shown several benefits of using bivariate analyses on NU and GY in
comparison with their univariate counterparts for IEN . Firstly, bivariate analyses preserve
information onNU andGY , which is lost when the data are analysed as a ratio. For instance,
different levels of a factor may affect both NU and GY proportionally. Thus, even if NU
and GY values change considerably, only minor changes may be observed in the ratio. This
can be seen in our case study. The means of NU and GY increased considerably when CR
NPK was added to the soil in comparison with plots under Control treatment (Table 4). A
difference of 16.31 kg/ha (SE = 4.17) was observed for NU and 630 kg/ha (SE = 127.9)
forGY . However, the change in the ratio was minor, 3.09 (SE = 2.32). Similarly, the loss of
information is illustrated in the univariate mixture analysis of IEN which fails to identify soil
water status as a factor defining the mixture groups (Figure 3 c). Secondly, bivariate analyses
avoid dealing with the mixture and possible heavy-tailedness of the IEN distribution. The
distribution of the ratio may violate the assumptions of normality and homogeneity of error
variances leading to non-reliable inferences by linear mixed model. The departure from
normality of the ratio distribution complicates the interpretation of components as physical
groups in univariate mixture analyses with univariate Gaussian components. For instance, for
the same physical environment, if the ratio distribution is skewed, the EM algorithm may
detect more than one component in its attempt to approximate a non-Gaussian density.
There are some limitations that need to be considered when applying mixture models
to designed field trials. Mixture models assume that the observations are independent, but
possible correlations may arise as a result of the experimental design. Thus, in our opinion, the
application of bivariate mixture models in designed field trials should be used for exploratory
purposes only, and complementary to bivariate mixed models. Furthermore, for this case
study, utilising our approach most effectively would have required the recording of data on
other potential environmental factors affecting nitrogen utilisation such as temperature, the
presence of diseases or the indigenous levels of nutrients in the soil.
Consequently, designs should be adopted which avoid correlated observations and are
able to provide a more complete picture of the factors defining the mixture groups. Data
collection through surveys is a more appropriate sampling procedure for the application of
mixture models for clustering (e.g. Di Zio et al. 2005; Genge 2013). However, to the best of
our knowledge, there is a lack of research on field survey design for efficient clustering with
bivariate mixture models. Thus, how to best design a field survey for (GY , NU ) to apply
finite mixture models is an open research question.
In conclusion, bivariate mixture models of Gaussian distributions are a useful
exploratory tool for identifying potential environmental factors driving NU and GY and
effectively complement bivariate mixed models in the analysis of designed field trials.
Bivariate mixture models can also be used for analysing other similar bivariate traits in
c© 2014 Australian Statistical Publishing Association Inc.Prepared using anzsauth.cls
I. MUNOZ-SANTA ET AL. 17
agriculture or natural resource research but to fully exploit its potential they should be applied
on designed field surveys.
References
AKAIKE, H. (1973). Information theory and an extension of the maximum likelihood
principle. In B. N. Petrov & F. Csaki, editors, Second International Symposium on
Information Theory. Akademia Kiado, Budapest. pp. 267–281.
AKAIKE, H. (1974). A new look at the statistical model identification. IEEE Trans. Automat.
Control 19, 716–723.
BENAGLIA, T., CHAUVEAU, D., HUNTER, D.R. & YOUNG, D.S. (2009). mixtools: An R
package for analyzing finite mixture models. J. Stat. Softw. 32, 1–29.
BIERNACKI, C., CELEUX, G. & GOVAERT, G. (2003). Choosing starting values for the
EM algorithm for getting the highest likehood in multivariate Gaussian mixture models.
Comput. Statist. Data Anal. 41, 561–575.
BUTLER, D. G., CULLIS, B. R., GILMOUR, A. R. & GOGEL, B. J. (2007). ASReml-R
reference manual Queensland Deparment of Primary Industries and Fisheries, Australia
CASSMAN, K.G., DOBERMANN, A. & WALTERS, D.T. (2002). Agroecosystems, nitrogen-
use efficiency, and nitrogen management. Ambio 31, 132–140.
CHEN, J. & TAN, X. (2009) Inference for multivariate normal mixtures. J. Multivariate
Anal., 100, 1367–1383.
CHEW, V. (1966). Confidence, prediction, and tolerance regions for the multivariate normal
distribution. J. Amer. Statist. Assoc. 61, 605–617.
CRAMER, H. (1946). Mathematical Methods of Statistics. Princeton: Princeton University
Press.
DELACY, I.H., BASFORD, K.E., COOPER, M., BULL, J.K. & MCLAREN, C.G. (1996)
Analysis of multi-environment trials–and historical perspective In Cooper, M. &
Hammer, G. L. (ed.) Plant adaptation and crop improvement. CAB International, 39–
124. Wallingford.
DELOGU, G., CATTIVELLI, L., PECCHIONI, N., DE FALCIS, D., MAGGIORE, T. &
STANCA, A.M. (1998). Uptake and agronomic efficiency of nitrogen in winter barley
and winter wheat. Eur. J. Agron. 9, 11–20.
DEMPSTER, A.P., LAIRD, N.M. & RUBIN, D.B. (1977). Maximum likelihood from
incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B 39, 1–38.
DI ZIO, M., GUARNERA, U. & LUZI, O. (2005). Editing systematic unity measure errors
through mixture modelling. Surv. Methodol. 31, 53–63.
c© 2014 Australian Statistical Publishing Association Inc.Prepared using anzsauth.cls
18 MIXTURES FOR INTERNAL NITROGEN USE EFFICIENCY
DOBERMANN, A., WITT, C., ABDULRACHMAN, S. et al. (2003). Estimating indigenous
nutrient supplies for site-specific nutrient management in irrigated rice. Agron. J. 95,
924–935.
EVERITT, B.S. (1996). An introduction to finite mixture distributions. Stat. Methods Med.
Res. 5 , 107–127.
FANG, Q., YU, Q., WANG, E. et al. (2006). Soil nitrate accumulation, leaching and crop
nitrogen use as influenced by fertilization and irrigation in an intensive wheat-maize
double cropping system in the North China Plain. Plant Soil 284, 335–350.
FIELLER, E.C. (1954). Some problems in interval estimation J. Roy. Statist. Soc. Ser. B 16,
175–185.
FIGUEIREDO, M.A.T. & JAIN, A.K. (2002). Unsupervised learning of finite mixture
models. IEEE Trans. Pattern Anal. Mach. Intell. 24, 381–396.
FONSECA, J.R.S. & CARDOSO, M.G.M.S. (2007). Mixture-model cluster analysis using
information theoretical criteria. Intell. Data Anal. 11, 155–173.
FRIENDLY, M. (2006). Data ellipses, HE plots and reduced-rank displays for multivariate
linear models: SAS software and examples. J. Stat. Softw. 17, 1–43.
FRUHWIRTH-SCHNATTER, S. (2006). Finite mixture and Markov switching models
Springer: New York.
GANESALINGAM, A., SMITH, A.B., BEECK, C.P., COWLING, W.A., THOMPSON, R. &
CULLIS, B.R. (2013). A bivariate mixed model approach for the analysis of plant
survival data. Euphytica 190, 371–383.
GENGE, E. (2013). A latent class analysis of the public attitude towards the euro adoption in
Poland. Adv. Data Anal. Classif. (in press).
HAUCK, R.D. (1984). Nitrogen in Crop Production. Madison: American Society of
Agronomy- Crop Science Society of America- Soil Science Society of America.
KARLIS, D. & XEKALAKI, E. (2003). Choosing initial values for the EM algorithm for
finite mixtures. Comput. Statist. Data Anal. 41, 577–590.
KIEFER J. & WOLFOWITZ J. (1956) Consistency of the maximum likelihood estimator in
the presence of infinitely many incidental parameters. Ann. Math. Statist., 27, 887–906.
MAITRA, R. (2009). Initializing partition-optimization algorithms. IEEE/ACM Trans.
Comput. Biol. Bioinf. 6, 144–157.
MARSAGLIA, G. (1965). Ratios of normal variables and ratios of sums of uniform variables.
J. Amer. Statist. Assoc. 60, 193–204.
MARSAGLIA, G. (2006). Ratios of normal variables. J. Stat. Softw. 16, 1–10.
MARSCHNER, H. & MARSCHNER, P. (2012). Marschner’s mineral nutrition of higher
plants. London : Elsevier Science.
c© 2014 Australian Statistical Publishing Association Inc.Prepared using anzsauth.cls
I. MUNOZ-SANTA ET AL. 19
MCLACHLAN, G.J. & BASFORD, K.E. (1988). Mixture models. Inference and applications
to clustering. New York: Dekker.
MCLACHLAN, G.J. & PEEL, D. (2000). Finite mixture models. New York: Wiley.
MCLACHLAN, G.J., PEEL, D., BASFORD, K.E. & ADAMS, P. (1999). The EMMIX
software for the fitting of mixtures of normal and t-components. J. Stat. Softw. 4. URL
http://www.stat.ucla.edu/journals/jss.
MELNYKOV, V. & MAITRA, R. (2010). Finite mixture models and model-based clustering.
Stat. Surv. 4, 80–116.
NAKLANG, K., HARNPICHITVITAYA, D., AMARANTE, S.T., WADE, L.J. & HAEFELE,
S.M. (2006). Internal efficiency, nutrient uptake, and the relation to field water resources
in rainfed lowland rice of northeast Thailand. Plant Soil 286, 193–208.
NG, S.K. (2013). Recent developments in expectation-maximization methods for analyzing
complex data. WIREs Comput Stat 5, 415–431.
NG, S.K., MCLACHLAN, G.J., WANG, K., JONES, L.B.T. & NG, S.W. (2006). A mixture
model with random-effects components for clustering correlated gene-expression
profiles. Bioinformatics 22, 1745–1752.
NTANOS, D.A. & KOUTROUBAS, S.D. (2002). Dry matter and N accumulation and
translocation for Indica and Japonica rice under Mediterranean conditions. Field Crop.
Res. 74, 93–101.
OMRAN, M.G.H., ENGELBRECHT, A.P. & SALMAN, A. (2007). An overview of clustering
methods. Intell. Data Anal. 11, 583–605.
PENG, S., GARCIA, F.V., LAZA, R.C., SANICO, A.L., VISPERAS, R.M. & CASSMAN,
K.G. (1996). Increased N-use efficiency using a chlorophyll meter on high-yielding
irrigated rice. Field Crop. Res. 47, 243–252.
R CORE TEAM (2012). R: A Language and Environment for Statistical Computing.
R Foundation for Statistical Computing, Vienna, Austria. URL http://www.
R-project.org/. ISBN 3-900051-07-0.
REDNER, R.A. & WALKER, H. F. (1984). Mixture densities, maximum likelihood and the
EM algorithm. SIAM Rev. 26, 195–239.
SCHWARZ, G. (1978). Estimating the dimension of a model. Ann. Statist. 6, 461–464.
SEIDEL, W., MOSLER, K. & ALKER, M. (2000). A cautionary note on likelihood ratio tests
in mixture models. Ann. Inst. Statist. Math. 52, 481–487.
TANGUILIG, V.C., YAMBAO, E.B., O’TOOLE, J.C. & DE DATTA, S.K. (1987). Water
stress effects on leaf elongation, leaf water potential, transpiration, and nutrient uptake
of rice, maize, and soybean. Plant Soil 103, 155–168.
c© 2014 Australian Statistical Publishing Association Inc.Prepared using anzsauth.cls
20 MIXTURES FOR INTERNAL NITROGEN USE EFFICIENCY
VSN INTERNATIONAL (2012). Genstat for Windows 15th Edition. VSN International,
Hemel Hempstead UK. URL http://www.vsni.co.uk/.
WADE, L.J., AMARANTE, S.T., OLEA, A. et al. (1999). Nutrient requirements in rainfed
lowland rice. Field Crop. Res. 64, 91–107.
WITT, C., DOBERMANN, A., ABDULRACHMAN, S. et al. (1999). Internal nutrient
efficiencies of irrigated lowland rice in tropical and subtropical Asia. Field Crop. Res.
63, 113–138.
WU, C.F.J. (1983). On the convergence properties of the EM algorithm. Ann. Statist. 11,
95–103.
XU, G., FAN, X. & MILLER, A.J. (2012). Plant nitrogen assimilation and use efficiency.
Annu. Rev. Plant Biol. 63, 153–182.
XU, L., HANSON, T., BEDRICK, E. J. & RESTREPO, C. (2010). Hypothesis tests on mixture
model components with applications in ecology and agriculture J. Agric. Biol. Environ.
Stat. 15, 308–326.
c© 2014 Australian Statistical Publishing Association Inc.Prepared using anzsauth.cls
I. MUNOZ-SANTA ET AL. 21
TABLE 1Fertiliser treatments applied in the experiments in Naklang et al. (2006)
Treatment Description
Control No fertiliser appliedPK 0 N, 21.8 kg Phosphorus (P) ha−1, and 41.5 kg Potassium (K) ha−1
N 50 kg N ha−1
FYM Farmyard manure (cattle manure) at 10 t ha−1 (fresh weight)NPK 50 kg N ha−1, 21.8 kg P ha−1 and 41.5 kg K ha−1
CR NPK 50 kg N ha−1 (controlled-release), 21.8 kg P ha−1, and 41.5 kg K ha−1
FYM NPK Combined application of the treatments FYM and NPKALL NPK as in the NPK treatment + lime and trace elements
TABLE 2Number of sites selected from the experiment in (Naklang et al. 2006) over the three years with soils
reported to be dry, wet or having ponded water at three developmental stages of the plant
Soil status Plant developmental stages
Pre-flowering Flowering Post-flowering
Dry 0 1 6Wet 8 3 8Ponded water 10 14 4
TABLE 3Diagram of the strip plot design for two blocks at one site for the experiment in the case study (Naklang
et al. 2006)
NPK FYM PK ALL N CR NPK Control FYM NPK
1995
1996 Block 1
1997
ALL CR NPK Control PK NPK FYM FYM NPK N
1995
1996 Block 2
1997
c© 2014 Australian Statistical Publishing Association Inc.Prepared using anzsauth.cls
22 MIXTURES FOR INTERNAL NITROGEN USE EFFICIENCY
TABLE 4Estimated means (mean standard error) of internal nitrogen use efficiency (IEN ), nitrogen uptake (NU )
and grain yield (GY ) for the fertiliser treatments described in Table 1 across sites, years and blocks
Treatment Means of IEN Means of NU Means of GY
Control 48.87 (2.39) 36.22 (7.34) 1677 (266.50)PK 48.69 (2.39) 35.53 (7.34) 1629 (266.50)N 42.71 (2.39) 52.94 (7.34) 2192 (266.50)FYM 46.29 (2.39) 49.66 (7.34) 2191 (266.50)NPK 43.16 (2.39) 54.51 (7.34) 2305 (266.50)CR NPK 45.78 (2.39) 52.53 (7.34) 2307 (266.50)FYM NPK 37.92 (2.39) 69.39 (7.34) 2552 (266.50)ALL 40.76 (2.39) 61.18 (7.34) 2445 (266.50)
TABLE 5Estimated variance components (standard error) of the linear mixed model of internal nitrogen use
efficiency
Random term Estimated variance component
site 5.52 (14.18)site.block 3.18 (1.96)site.year 34.92 (18.09)site.treatment 1.57 (4.58)site.treatment.year 30.93 (7.48)
TABLE 6Estimated variance covariance components (standard error) of the bivariate mixed model of nitrogen
uptake (NU ) and grain yield (GY )
Random term variances for NU variances for GY Correlation
site 245.44 (176.43) 318 818.03 (243873.43) 0.96 (0.05)site.block 3.95 (2.86) 1481.32 (1895.60) 0.99 (NA)site.year 74.95 (39.69) 173 236.37 (81415.70) 0.76 (0.14)site.treatment 33.47 (12.90) 25 672.31 (12386.48) 0.93 (0.12)site.treatment.year 31.96 (9.76) 37 150.76 (12200.00) 0.92 (0.15)
c© 2014 Australian Statistical Publishing Association Inc.Prepared using anzsauth.cls
I. MUNOZ-SANTA ET AL. 23
TABLE 7Settings and rationales for the mixture analyses
Setting Rationale
Maximum number of components fixed at 6 Larger numbers resulted in frequent cases ofspuriosities
Starting strategy i-ii. Number of random and k-means partitions set at 100
Ensure to have a good initial partition. However,similar mixture estimates were observed whenreducing this number
Starting strategy iii. Not used for IEN analysis May produce negative initial values of the meansof IEN . Biologically impossible
Starting strategy iv. Subsamples of 200 observa-tions
Subsample size sufficient to not produce degener-ate solutions
Starting strategy v. A short run indicated that thethreshold to stop the EM algorithm was c = 10−2.The number of short runs employed was three
When initiating the short run with a large numberof random starts, similar mixture estimates wereobtained with thresholds between [10−4, 10−1] orlarger number of short runs
Stopping criteria. The threshold to stop the EMalgorithm was c = 10−6. Standard setting
NA
Biological spuriosities. Components with negativemean for IEN , and components with negativejoint mean or correlation for (NU , GY )
Biologically impossible
TABLE 8Starting strategy and AIC and BIC values for the solution with the highest value of the log likelihoodfunction when fitting g = 1 . . . 6 groups for the analysis of IEN . The minimum values of the
information criteria are highlighted in bold.
Number of groups Starting strategy AIC BIC
6 K-means 3287.43 3356.605 K-means 3281.43 3338.394 subsample solution 3275.66 3320.423 K-means 3272.12 3304.672 short runs 3270.13 3290.471 random starts 3296.07 3304.20
TABLE 9Parameter estimates (bootstrapped standard error) of the mixture model selected in Table 8 for theanalysis of internal nitrogen use efficiency (πi is the estimate of the mixing proportion, µi the estimate
of the mean of the ratio and σi is the estimate of the standard deviation for the i-th group)
πi µi σi
First group 0.45 (0.14) 38.36 (1.12) 6.14 (1.02)
Second group 0.55 (0.14) 49.01 (3.26) 11.60 (1.25)
c© 2014 Australian Statistical Publishing Association Inc.Prepared using anzsauth.cls
24 MIXTURES FOR INTERNAL NITROGEN USE EFFICIENCY
TABLE 10Starting strategy and AIC and BIC values of the solution with the highest value of the log likelihoodfunction when fitting g = 1 . . . 6 for the joint analysis of grain yield and nitrogen uptake. The minimum
values of the information criteria are highlighted in bold.
Number of groups Starting strategy AIC BIC
6 random starts 10335.46 10477.855 subsample solution 10331.07 10449.064 short runs 10336.34 10429.913 simulated means 10333.03 10402.192 K-means 10363.44 10408.191 K-means 10444.21 10464.55
TABLE 11Parameter estimates of the mixture model (bootstrapped standard error, below) selected by the BICcriterion (Table 10) for the joint analysis of grain yield and nitrogen uptake (πi is the estimate of themixing proportion, µi the estimate of the joint mean, ˆIENi is the ratio of the estimated joint mean, Σi
is the estimate of the covariance variance matrix and ρi is the estimate of the correlation for the i-thgroup)
πi µiˆIENi Σi ρi
First group 0.15(
20.421070.22
)52.41
(37.65 2607.11
2607.11 225208.08
)0.90
Second group 0.41(
42.431866.97
)44.00
(118.25 3917.713917.71 253491.45
)0.72
Third group 0.44(
70.662814.10
)39.82
(322.31 5272.205272.20 308423.50
)0.52
First group 0.03
(2.09
157.00
)3.53
(15.30 1030.00
1030.00 75700.00
)0.05
Second group 0.06
(2.27
114.00
)1.67
(24.00 1021.00
1021.00 69200.00
)0.06
Third group 0.05(
2.68100.00
)1.27
(42.80 1540.00
1540.00 61300.00
)0.09
c© 2014 Australian Statistical Publishing Association Inc.Prepared using anzsauth.cls
I. MUNOZ-SANTA ET AL. 25
Figure 1. Typical trend of grain yield on nitrogen uptake from the experiments considered in Section 3.The close and open circles represent plots with no fertiliser added and plots with a source of fertiliser,respectively.
Figure 2. The Q-Q plot for the residuals of internal nitrogen use efficiency versus the expected quantilesof a normal distribution (left) and the plot of residuals of internal nitrogen use efficiency versusestimated means (right).
c© 2014 Australian Statistical Publishing Association Inc.Prepared using anzsauth.cls
26 MIXTURES FOR INTERNAL NITROGEN USE EFFICIENCY
Figure 3. Classification of the internal nitrogen use efficiency observations presented against theexperimental sites for plots receiving a) FYM NPK , b) N and Control and c) all plots together with thewater status post-flowering. The close and open symbols refer to observations classified in the first andsecond group, respectively.
c© 2014 Australian Statistical Publishing Association Inc.Prepared using anzsauth.cls
I. MUNOZ-SANTA ET AL. 27
Figure 4. Mixture groups for grain yield and nitrogen uptake data considered in Section 3. The blackdots represent the estimated joint means and the ellipses the 90% prediction regions of the groups (Eq.4.4 in Chew (1966)). The intervals are the confidence sets of the ratios of expectations of the groupscalculated according to Fieller (1954).
c© 2014 Australian Statistical Publishing Association Inc.Prepared using anzsauth.cls
28 MIXTURES FOR INTERNAL NITROGEN USE EFFICIENCY
Figure 5. Mixture groups for the grain yield and nitrogen uptake data considered in Section 3 togetherwith a) the water status of the plots post-flowering and b) N fertiliser status. The ellipses are the 90%prediction regions of the groups (Eq. 4.4 in Chew (1966)).
c© 2014 Australian Statistical Publishing Association Inc.Prepared using anzsauth.cls
Chapter 5
Conclusions and future lines of research
In this thesis I have proposed to perform bivariate mixed and mixture analyses on
(NU,GY ) instead of their univariate counterparts on IEN . Bivariate analyses main-
tain the complete information on NU and GY , including their correlation, and avoid
dealing with the distributional properties of the ratio.
The IEN data may violate the assumption of normality and homogeneity of error
variances (Marsaglia, 1965, 2006) required in linear mixed models. As for the appli-
cation of mixtures of univariate Gaussian components, the potential abnormality of
the ratio distribution (Marsaglia, 1965, 2006) also complicates its analysis due to the
following reasons. Firstly, the ratio may present heavy-tailedness; thus, its distribution
may not be satisfactorily approximated by a mixture of univariate Gaussian compo-
nents. Atypical observations of the ratio, produced when the denominator takes values
close to zero, can affect the estimates of the univariate Gaussian components (Peel
& McLachlan, 2000). Secondly, if in a physical groups the ratio distribution presents
skewness or bimodality, more than one Gaussian component will be required to approx-
imate a non-symmetric shape. This fact breaks the one-to-one correspondence between
mixture components and physical groups and requires dealing with mixtures of mix-
tures. Mixtures of mixtures arise when each component of the mixture is a mixture
itself. That is:
f(yj|ψ) =
g∑
i=1
πifi(yj|θi) =
g∑
i=1
πi
(ri∑
k=1
αikgik(yj|ξik))
(5.1)
where the gik can be assumed to be univariate normal. Mixtures of mixtures have
111
problems of non-identifiability (Di Zio et al., 2007). One of them is to choose which
components of the second level of the mixture (gik) are used to model the non-Gaussian
pdfs of the first level of the mixture (fi). Constraints on the parameters are needed to
impose identifiability (see Willse & Boik, 1999)
Bivariate mixed models on (NU,GY ) are appropriate for the analysis of designed
field trials, in which the researcher aims to test the performance of different experi-
mental treatments. In our literature study of recent publications in nitrogen efficiency
in cereals (Section 1.4), the experimental treatments mostly corresponded to different
levels of nutrients added to the soil. However, the level of available nutrients (from
indigenous as well as added sources) may substantially differ from the level of applied
nutrients. For instance, strong precipitations after an application of nitrates can result
in considerable losses of these nutrients through leaching or surface runoff (Raun &
Johnson, 1999). Thus, in order to apply mixed models, special care of the design of
such trials and a high control on the agronomic practices must be taken to ensure that
treatment applications are as uniform as possible.
There are other types of studies where fertilisers are applied to create different fer-
tility conditions rather than to test for their performance. This is the case of Naklang
et al. (2006) whose research objectives were not formulated in terms of treatment-based
analyses. Naklang et al. (2006) aimed to improve the understanding of how different
environmental conditions affect IEN in rice. With this objective in mind several ex-
periments across different sites and years were carried out to widen the range of soil
characteristics and climatic and fertility conditions. In this thesis I have argued that
different experimental and field conditions may lead to different patterns of the conver-
sion of NU into GY , and thus to the presence of clusters in the field data. Such clusters
can be identified by bivariate mixture models. The inspection of the mixture groups
may reveal the environmental and fertility conditions defining them, assuming that the
appropriate information has been recorded. Thus, mixture models are recommended
for studies with similar objectives to the ones in Naklang et al. (2006).
112
The current methodology for mixture models assumes the data to be independent.
However, NU and GY field data are commonly collected from designed field trials,
with correlation between observations induced by the design. In order to reduce the
correlation, one can apply mixture models on the observations adjusted for the spatial
trend of the field. However, in our case study the information of the spatial layout of
the designed field trial, required for modelling the spatial trends, was missing. Despite
the potential violation of the independence assumption, mixture models were useful in
the analysis of the case-study for the identification of different groups and allowed us
to highlight the effect of N availability and soil water conditions at post-flowering on
NU and GY .
The results obtained by applying mixture models can add extra information to
the ones obtained by mixed models in designed field trials. For instance, if there are
non-controlled factors which overcome, or interact with, designed treatments, the in-
spection of the mixture groups may help to reveal them. On the other hand, if the
fertiliser treatments are the main cause of having different patterns of (NU , GY ), one
will expect these groups to be revealed by the mixture approach as well as to find
significant differences when treatment contrasts are performed in the mixed models.
In conclusion, finite mixture models of bivariate Gaussian distributions are useful
models for identifying potential experimental and environmental conditions segregat-
ing the NU and GY data into clusters. In designed field trials and due to the potential
violation of the independence assumption, we recommend using mixture models for
exploratory purposes only and as a complement to mixed models. In order to fully
exploit the potential of the technique, it is important to implement sampling proce-
dures which avoid correlated observations, e.g. data collection from simple random
field surveys.
A number of interesting questions have arisen during the development of this re-
113
search. One immediate future line of research is to carry out simulation studies to assess
the coverage of the Fieller’s confidence intervals of E(GY )/E(NU) (Fieller, 1954) es-
timated jointly for each mixture group (Figure 4, Chapter 4). Samples from mixtures
with known parameters will be generated with the R package EMMIX (McLachlan
et al., 1999). Then, the Fieller’s confidence intervals will be calculated and the per-
centage of times that these intervals contain the true ratio of means for each group will
provide a measure of the coverage of the intervals. In particular, it will be interesting
to investigate how the coverage is affected by changing the area of intersection between
the clusters.
A research gap identified during the development of this research is how to design
field surveys with the aim of applying mixture models. Although mixture models have
been widely applied in surveys (e.g. Di Zio et al., 2005; Genge, 2013), to the best of our
knowledge and in agreement with Cressie (2014, pers. comm., 27 February), there are
insufficient details in the literature on how to carry out surveys with post-stratification
objectives. The simplest method would be to collect a large sample from the field in
a simple randomised manner. However, it is not clear how large the sample size needs
to be or how to collect the data to not introduce bias in the mixing proportions.
Not only the data collection protocol of the sample but also the way NU is mea-
sured by biologists has been an issue identified in this research. Nitrogen uptake is a
derived variable calculated from GY (Eq. 1.1). This induces a spurious correlation
between both variables. This is clearly shown in Table 6 (Chapter 4), where the esti-
mated correlation goes near to the boundary of the parameter space.
Finally, it would be interesting to further explore the topic of mixture of mixtures
for modelling IEN . In particular, the identification of valid constraints on the mixture
parameters to impose identifiability. Willse & Boik (1999) proposed to use constraints
of the type µis = µi + θs. The µis is the mean of the s-th component of the second
level of the mixture in the i-th component of the first level mixture. The µi is the
114
mean of the first component of the second level of the mixture in the i-th component
of the first level of the mixture. The θs is the deviation between the means of the
components of the second level of the mixture. Even with these constraints, we may
expect that different starting values of θ(0)s for initiating the EM algorithm will result
in different components of the second level of the mixture modelling the non-Gaussian
pdfs of the first level of the mixture. Alternatively, I suggest the use of mixtures of
normal-skewed or t-skewed distributions (Fruhwirth-Schnatter & Pyne, 2010) to model
the distribution of the ratio. This is another topic that would be interesting to study
in the future.
115
Appendix A
List of the studies in the review
C. Adhikari, K.F. Bronson, G.M. Panuallah, A.P. Regmi, P.K. Saha, A. Dobermann,
D. C. Olk, P.R. Hobbs, and E. Pasuquin. On-farm soil N supply and N nutrition in
the rice-wheat system of Nepal and Bangladesh. Field Crops Research, 64:273–286,
1999.
R. Albrizio, M. Todorovic, T. Matic, and A. M. Stellacci. Comparing the interactive
effects of water and nitrogen on durum wheat and barley grown in a Mediterranean
environment. Field Crops Research, 115:179–190, 2010.
W.K. Anderson and F.C. Hoyle. Nitrogen efficiency of wheat cultivars in a Mediter-
ranean environment. Australian Journal of Experimental Agricultures, 39:957–965,
1999.
M.S. Aulakh, T.S. Khera, J.W. Doran, Kuldip-Singh, and Bijay-Singh. Yields and
nitrogen dynamics in a rice-wheat system using green manure and inorganic fertilizer.
Soil Science Society of America Journal, 64:1867–1876, 2000.
P. Belder, B.A.M. Bouman, R. Cabangon, L. Guoan, E.J.P. Quilang, L. Yuanhua,
J.H.J. Spiertz, and T.P. Tuong. Effect of water-saving irrigation on rice yield and
water use in typical lowland conditions in Asia. Agricultural Water Management,
65:193–210, 2004.
P. Belder, B.A.M. Bouman, J.H.J. Spiertz, S. Peng, A.R. Castaneda, and R.M. Vis-
peras. Crop performance, nitrogen and water use in flooded and aerobic rice. Plant
and Soil, 273:167–182, 2005.
117
Bijay-Singh., K.F. Bronson, Yavinder-Singh, T.S. Khera, and E. Pasuquin. Nitrogen-15
balance as affected by rice straw management in a rice-wheat rotation in northwest
India. Nutrient Cycling in Agroecosystems, 59:227–237, 2001.
Bijay-Singh, Varinderpal-Singh, Yadvinder-Singh, H.S. Thind, A. Kumar, R.K. Gupta,
A. Kaul, and M. Vashistha. Fixed-time adjustable dose site-specific fertilizer nitrogen
management in transplanted irrigated rice (Oryza sativa L.) in South Asia. Field
Crops Research, 126:63–69, 2012.
A.K. Borrell, A.L. Garside, S. Fukai, and D.J. Reid. Season, nitrogen rate, and plant
type affect nitrogen uptake and nitrogen use efficiency in rice. Australian Journal of
Agricultural Research, 49:829–843, 1998.
R.J. Cabangon, T. P. Tuong, E. G. Castillo, L. X. Bao, G. Lu, G. Wang, Y. Cui, B.A.M.
Bouman, Y. Li, C. Chen, and J. Wang. Effect of irrigation method and N-fertilizer
management on rice yield, water productivity and nutrient-use efficiencies in typical
lowland rice conditions in China. Paddy and Water Environment, 2:195–206, 2004.
K.G. Cassman, M.J. Kropff, J. Gaunt, and S. Peng. Nitrogen use efficiency of rice
reconsidered: What are the key constraints? Plant and Soil, 155-156:359–362, 1993.
K.G. Cassman, A. Dobermann, P.C. Sta Cruz, G.C. Gines, M.I. Samson, J.P. Descal-
sota, J.M. Alcantara, M.A. Dizon, and D.C. Olk. Soil organic matter and the in-
digenous nitrogen supply of intensive irrigated rice systems in the tropics. Plant and
Soil, 182:267–278, 1996a.
K.G. Cassman, G.C. Gines, M.A. Dizon, M.I. Samson, and J.M. Alcantara. Nitrogen-
use efficiency in tropical lowland rice systems: contributions from indigenous and
applied nitrogen. Field Crops Research, 47:1–12, 1996b.
D. Chakraborty, R.N. Garg, R.K. Tomar, R. Singh, S.K. Sharma, R.K. Singh, S.M.
Trivedi, R.B. Mittal, P.K. Sharma, and K.H. Kamble. Synthetic and organic
mulching and nitrogen effect on winter wheat (Triticum aestivum l.) in a semi-arid
environment. Agricultural Water Management, 97:738–748, 2010.
118
X. Chen, J. Zhou, X. Wang, A.M. Blackmer, and F. Zhang. Optimal rates of nitrogen
fertilization for a winter wheat-corn cropping system in northern China. Communi-
cations in Soil Science and Plant Analysis, 35:583–597, 2004.
L. Chuan, P. He, J. Jin, S. Li, C. Grant, X. Xu, S. Qiu, S. Zhao, and W. Zhou.
Estimating nutrient uptake requirements for wheat in China. Field Crops Research,
146:96–104, 2013.
J.M. Clarke, C.A. Campbell, H.W. Cutforth, R.M. DePauw, and G.E. Winkleman.
Nitrogen and phosphorus uptake, translocation, and utilization efficiency of wheat
in relation to environment and cultivar yield and protein levels. Canadian Journal
of Plant Science, 70:965–977, 1990.
M.K. Conyers, C. Tang, G.J. Poile, D.L. Liu, D. Chen, and Z. Nuruzzaman. A combina-
tion of biological activity and the nitrate form of nitrogen can be used to ameliorate
subsurface soil acidity under dryland wheat farming. Plant and Soil, 348:155–166,
2011.
C. M. Cossani, C. Thabet, H. J. Mellouli, and G.A. Slafer. Improving wheat yields
through N fertilization in Mediterranean Tunisia. Experimental Agriculture, 47:459–
475, 2011.
C.M. Cossani, G.A. Slafer, and R. Savin. Nitrogen and water use efficiencies of wheat
and barley under a Mediterranean environment in Catalonia. Field Crops Research,
128:109–118, 2012.
Z. Cui, F. Zhang, X. Chen, Y. Miao, J. Li, L. Shi, J. Xu, Y. Ye, C. Liu, Z. Yang,
Q. Zhang, S. Huang, and D. Bao. On-farm evaluation of an in-season nitrogen
management strategy based on soil N min test. Field Crops Research, 105:48–55,
2008.
Z. Cui, F. Zhang, X. Chen, F. Li, and Y. Tong. Using in-season nitrogen manage-
ment and wheat cultivars to improve nitrogen use efficiency. Soil Science Society of
America Journal, 75:976–983, 2011.
119
X.Q. Dai, H.Y. Zhang, J.H.J. Spiertz, J. Yu, G.H. Xie, and B.A.M. Bouman. Crop
response of aerobic rice and winter wheat to nitrogen, phosphorus and potassium in
a double cropping system. Nutrient Cycling in Agroecosystems, 86:301–315, 2010.
D.K. Das, D. Maiti, and H. Pathak. Site-specific nutrient management in rice in
Eastern India using a modeling approach. Nutrient Cycling in Agroecosystems, 83:
85–94, 2009.
S.K. De Datta, R.J. Buresh, M.I. Samson, and Kai-Rong Wang. Nitrogen use efficiency
and nitrogen-15 balances in broadcast-seeded flooded and transplanted rice. Soil
Science Society of America Journal, 52:849–855, 1988.
G. Delogu, L. Cattivelli, N. Pecchioni, D. De Falcis, T. Maggiore, and A.M. Stanca.
Uptake and agronomic efficiency of nitrogen in winter barley and winter wheat.
European Journal of Agronomy, 9:11–20, 1998.
A. Dobermann, C. Witt, D. Dawe, S. Abdulrachman, H.C. Gines, R. Nagarajan, S. Sa-
tawathananont, T.T. Son, P.S. Tan, G.H. Wang, N.V. Chien, V.T.K. Thoa, C.V.
Phung, P. Stalin, P. Muthukrishnan, V. Ravi, M. Babu, S. Chatuporn, J. Sook-
thongsa, Q. Sun, R. Fu, G.C. Simbahan, and M.A.A. Adviento. Site-specific nutrient
management for intensive rice cropping systems in Asia. Field Crops Research, 74:
37–66, 2002.
A. Dobermann, C. Witt, S. Abdulrachman, H.C. Gines, R. Nagarajan, T.T. Son, P.S.
Tan, G.H. Wang, N.V. Chien, V.T.K. Thoa, C.V. Phung, P. Stalin, P. Muthukrish-
nan, V. Ravi, M. Babu, G.C. Simbahan, and M.A.A. Adviento. Soil fertility and
indigenous nutrient supply in irrigated rice domains of Asia. Agronomy Journal, 95:
913–923, 2003.
A.D. Doyle and I.C.R. Holford. The uptake of nitrogen by wheat, its agronomic ef-
ficiency and their relationship to soil and fertilizer nitrogen. Australian Journal
Agricultural of Agricultural Research, 44:1245–1258, 1993.
A.D. Doyle and C.C. Leckie. Recovery of fertiliser nitrogen in wheat grain and its im-
120
plications for economic fertiliser use. Australian Journal of Experimental Agriculture,
32:383–387, 1992.
Y. Duan, M. Xu, X. He, S. Li, and X. Sun. Long-term pig manure application reduces
the requirement of chemical phosphorus and potassium in two rice-wheat sites in
subtropical China. Soil Use and Management, 27:427–436, 2011.
Q. Fang, Q. Yu, E. Wang, Y. Chen, G. Zhang, J. Wang, and L. Li. Soil nitrate accumu-
lation, leaching and crop nitrogen use as influenced by fertilization and irrigation in
an intensive wheat–maize double cropping system in the North China Plain. Plant
and Soil, pages 335–350, 2006.
R.A. Fischer. Irrigated spring wheat and timing and amount of nitrogen fertilizer. II.
physiology of grain yield response. Field Crops Research, 33:57–80, 1993.
R.A. Fischer, G.N. Howe, and Z. Ibrahim. Irrigated spring wheat and timing and
amount of nitrogen fertilizer. I. grain yield and protein content. Field Crops Research,
33:37–56, 1993.
L.E. Gauer, C.A. Grant, D.T. Gehl, and L.D. Bailey. Effects of nitrogen fertilization
on grain protein content, nitrogen uptake, and nitrogen use efficiency of six spring
wheat (Triticum aestivum l.) cultivars, in relation to estimated moisture supply.
Canadian Journal of Plant Science, 72:235–241, 1992.
B.B. Ghaley, H. Hgh-Jensen, and J.L. Christiansen. Recovery of nitrogen fertilizer by
traditional and improved rice cultivars in the Bhutan Highlands. Plant and Soil,
332:233–246, 2010.
D. Giambalvo, P. Ruisi, G. Di Miceli, A. S. Frenda, and G. Amato. Nitrogen use
efficiency and nitrogen fertilizer recovery of durum wheat genotypes as affected by
interspecific competition. Agronomy Journal, 102:707–715, 2010.
G. Guarda, S. Padovan, and G. Delogu. Grain yield, nitrogen-use efficiency and baking
quality of old and modern Italian bread-wheat cultivars grown at different nitrogen
levels. European Journal of Agronomy, 21:181–192, 2004.
121
S.M. Haefele, M.C.S. Wopereis, M.K. Ndiaye, S.E. Barro, and M. Ould Isselmou. In-
ternal nutrient efficiencies, fertilizer recovery rates and indigenous nutrient supply
of irrigated lowland rice in Sahelian West Africa. Field Crops Research, 80:19–32,
2003.
S.M. Haefele, S.M.A. Jabbar, J.D.L.C. Siopongco, A. Tirol-Padre, S.T. Amarante, P.C.
Sta Cruz, and W.C. Cosico. Nitrogen use efficiency in selected rice ( Oryza sativa L.)
genotypes under different water regimes and nitrogen levels. Field Crops Research,
107:137–146, 2008.
T. Horie, M. Ohnishi, J.F. Angus, L.G. Lewin, T. Tsukaguchi, and T. Matano. Physi-
ological characteristics of high-yielding rice inferred from cross-location experiments.
Field Crops Research, 52:55–67, 1997.
M.F. Hossain, S.K. White, S.F. Elahi, N. Sultana, M.H.K. Choudhury, Q.K. Alam, J.A.
Rother, and J.L. Gaunt. The efficiency of nitrogen fertiliser for rice in Bangladeshi
farmers fields. Field Crops Research, 93:94–107, 2005.
J. Huang, F. He, K. Cui, R. J. Buresh, B. Xu, W. Gong, and S. Peng. Determination
of optimal nitrogen rate for rice varieties using a chlorophyll meter. Field Crops
Research, 105:70–80, 2008.
L. Jiang, D. Dong, X. Gan, and S. Wei. Photosynthetic efficiency and nitrogen dis-
tribution under different nitrogen management and relationship with physiological
N-use efficiency in three rice genotypes. Plant and Soil, 271:321–328, 2005.
Q. Jing, B.A.M. Bouman, H. Hengsdijk, H. Van Keulen, and W. Cao. Exploring
options to combine high yields with high nitrogen use efficiencies in irrigated rice in
China. European Journal of Agronomy, 26:166–177, 2007.
Q. Jing, B. Bouman, H. van Keulen, H. Hengsdijk, W. Cao, and T. Dai. Disentangling
the effect of environmental factors on yield and nitrogen uptake of irrigated rice in
Asia. Agricultural Systems, 98:177–188, 2008.
122
Q. Jing, H. Van Keulen, H. Hengsdijk, W. Cao, P.S. Bindraban, T. Dai, and D. Jiang.
Quantifying N response and N use efficiency in rice–wheat (RW) cropping systems
under different water management. The Journal of Agricultural Science, 147:303–
312, 2009.
N. Kalra, D. Chakraborty, P. Ramesh Kumar, M. Jolly, and P.K. Sharma. An approach
to bridging yield gaps, combining response to water and other resource inputs for
wheat in northern India, using research trials and farmers’ fields data. Agricultural
Water Management, 93:54–64, 2007.
C.S. Khind and F.N. Ponnamperuma. Effects of water regime on growth, yield, and
nitrogen uptake of rice. Plant and Soil, 59:287–298, 1981.
H.S. Khurana, S.B. Phillips, Bijay-Singh, M.M. Alley, A. Dobermann, A.S. Sidhu,
Yadvinder-Singh, and S. Peng. Agronomic and economic evaluation of site-specific
nutrient management for irrigated wheat in northwest India. Nutrient Cycling in
Agroecosystems, 82:15–31, 2008.
D. Kumar, C. Devakumar, R. Kumar, A. Das, P. Panneerselvam, and Y.S. Shivay.
Effect of neem-oil coated prilled urea with varying thickness of neem-oil coating and
nitrogen rates on productivity and nitrogen-use efficiency of lowland irrigated rice
under Indo-Gangetic Plains. Journal of Plant Nutrition, 33:1939–1959, 2010.
K. Kumar and K.M. Goh. Management practices of antecedent leguminous and non-
leguminous crop residues in relation to winter wheat yields, nitrogen uptake, soil
nitrogen mineralization and simple nitrogen balance. European Journal of Agronomy,
16:295–308, 2002.
J.K. Ladha, D. Dawe, T.S. Ventura, U. Singh, W. Ventura, and I. Watanabe. Long-
term effects of urea and green manure on rice yields and nitrogen balance. Soil
Science Society of America Journal, 64:1993–2001, 2000.
J. Le Gouis, D. Beghin, E. Heumez, and P. Pluchard. Genetic differences for nitrogen
uptake and nitrogen utilisation efficiencies in winter wheat. European Journal of
Agronomy, 12:163–173, 2000.
123
J. Liu, H. Liu, S. Huang, X. Yang, B. Wang, X. Li, and Y. Ma. Nitrogen efficiency
in long-term wheat-maize cropping systems under diverse field sites in China. Field
Crops Research, 118:145–151, 2010.
M. Liu, Z. Yu, Y. Liu, and N.T. Konijn. Fertilizer requirements for wheat and maize
in China: The QUEFTS approach. Nutrient Cycling in Agroecosystems, 74:245–258,
2006.
X. Liu, P. He, J. Jin, W. Zhou, G. Sulewski, and S. Phillips. Yield gaps, indigenous
nutrient supply, and nutrient use efficiency of wheat in China. Agronomy Journal,
103:1452–1463, 2011.
L. Lopez-Bellido, R. J. Lopez-Bellido, and F.J. Lopez-Bellido. Fertilizer nitrogen ef-
ficiency in durum wheat under rainfed mediterranean conditions: Effect of split
application. Agronomy journal, 98:55–62, 2006.
A. J. Macdonald and R.J. Gutteridge. Effects of take-all (Gaeumannomyces graminis
var. tritici) on crop N uptake and residual mineral N in soil at harvest of winter
wheat. Plant and Soil, 350:253–260, 2012.
D. Maiti, D.K. Das, and H. Pathak. Fertilizer requirement for irrigated wheat in
easter India using the QUEFTS simulation model. TheScientificWorldJOURNAL,
6:231–245, 2006.
A.M. McGuire, D.C. Bryant, and R.F. Denison. Wheat yields, nitrogen uptake, and
soil moisture following winter legume cover crop vs. fallow. Agronomy Journal, 90:
404–410, 1998.
K. Naklang, D. Harnpichitvitaya, S.T. Amarante, L.J. Wade, and S.M. Haefele. In-
ternal efficiency, nutrient uptake, and the relation to field water resources in rainfed
lowland rice of northeast Thailand. Plant and Soil, 286:193–208, 2006.
C. Noulas, I. Alexiou, J. M. Herrera, and P. Stamp. Course of dry matter and nitrogen
accumulation of spring wheat genotypes known to vary in parameters of nitrogen
use efficiency. Journal of Plant Nutrition, 36:1201–1218, 2013.
124
S.E. Ockerby, S.W. Adkins, and A.L. Garside. The uptake and use of nitrogen by
paddy rice in fallow, cereal, and legume cropping systems. Australian Journal of
Agricultural Research, 50:945–952, 1999.
D.C. Olk, K.G. Cassman, G. Simbahan, P.C. Sta. Cruz, S. Abdulrachman, R. Na-
garajan, P.S. Tan, and S. Satawathananont. Interpreting fertilizer-use efficiency in
relation to soil nutrient-supplying capacity, factor productivity, and agronomic effi-
ciency. Nutrient Cycling in Agroecosystems, 53:35–41, 1999.
R. Ortiz-Monasterio, K.D. Sayre, S. Rajaram, and M. McMahon. Genetic progress in
wheat yield and nitrogen use efficiency under four nitrogen rates. Crop Science, 37:
898–904, 1997.
H. Pathak, P.K. Aggarwal, R. Roetter, N. Kalra, S.K. Bandyopadhaya, S. Prasad,
and H. Van Keulen. Modelling the quantitative evaluation of soil nutrient supply,
nutrient use efficiency, and fertilizer requirements of wheat in India. Nutrient Cycling
in Agroecosystems, 65:105–113, 2003.
S.K. Patil, U. Singh, V.P. Singh, V.N. Mishra, R.O. Das, and J. Henao. Nitrogen
dynamics and crop growth on an Alfisol and a Vertisol under a direct-seeded rainfed
lowland rice-based system. Field Crops Research, 70:185–199, 2001.
S. Peng, F.V. Garcia, R.C. Laza, A.L. Sanico, R.M. Visperas, and K.G. Cassman.
Increased N-use efficiency using a chlorophyll meter on high-yielding irrigated rice.
Field Crops Research, 47:243–252, 1996.
S. Peng, R. J. Buresh, J. Huang, J. Yang, Y. Zou, X. Zhong, G. Wang, and F. Zhang.
Strategies for overcoming low agronomic nitrogen use efficiency in irrigated rice sys-
tems in China. Field Crops Research, 96:37–47, 2006.
V. Pooniya and Y. S. Shivay. Enrichment of basmati rice grain and straw with zinc and
nitrogen through ferti-fortification and summer green manuring under indo-gangetic
plains of India. Journal of Plant Nutrition, 36:91–117, 2013.
125
V. Pooniya, Y. S. Shivay, A. Rana, L. Nain, and R. Prasanna. Enhancing soil nutrient
dynamics and productivity of Basmati rice through residue incorporation and zinc
fertilization. European Journal of Agronomy, 41:28–37, 2012.
J. Qiao, L. Yang, T. Yan, F. Xue, and D. Zhao. Nitrogen fertilizer reduction in rice
production for two consecutive years in the Taihu Lake area. Agriculture, Ecosystems
& Environment, 146:103–112, 2012.
J. Qin, S.M. Impa, Q. Tang, S. Yang, J. Yang, Y. Tao, and K.S.V. Jagadish. Integrated
nutrient, water and other agronomic options to enhance rice grain yield and N use
efficiency in double-season rice crop. Field Crops Research, 148:15–23, 2013.
S. Qiu, X. Ju, X. Lu, L. Li, J. Ingwersen, T. Streck, P. Christie, and F. Zhang. Improved
nitrogen management for an intensive winter wheat/summer maize double-cropping
system. Soil Science Society of America Journal, 76:286–297, 2012.
V.O. Sadras and C. Lawson. Nitrogen and water-use efficiency of australian wheat
varieties released between 1958 and 2007. European Journal of Agronomy, 46:34–41,
2013.
R. Setia, K.N. Sharma, P. Marschner, and H. Singh. Changes in nitrogen, phosphorus,
and potassium in a long-term continuous maize-wheat cropping system in India.
Communications in Soil Science and Plant Analysis, 40:3348–3366, 2009.
A.R. Sharma and U.K. Behera. Nitrogen contribution through Sesbania green manure
and dual-purpose legumes in maize–wheat cropping system: agronomic and economic
considerations. Plant and Soil, 325:289–304, 2009.
Z. Shi, Q. Jing, J. Cai, D. Jiang, W. Cao, and T. Dai. The fates of 15 n fertilizer
in relation to root distributions of winter wheat under different N splits. European
Journal of Agronomy, 40:86–93, 2012a.
Z. Shi, D. Li, Q. Jing, J. Cai, D. Jiang, W. Cao, and T. Dai. Effects of nitrogen
applications on soil nitrogen balance and nitrogen utilization of winter wheat in a
rice-wheat rotation. Field Crops Research, 127:241–247, 2012b.
126
U. Singh, J.K. Ladha, E.G. Castillo, G. Punzalan, A. Tirol-Padre, and M. Duqueza.
Genotypic variation in nitrogen use efficiency in medium-and long-duration rice.
Field Crops Research, 58:35–53, 1998.
V.K. Singh and B.S. Dwivedi. Yield and nitrogen use efficiency in wheat, and soil
fertility status as influenced by substitution of rice with pigeon pea in a rice–wheat
cropping system. Australian Journal of Experimental Agriculture, 46:1185–1194,
2006.
E.M.A. Smaling and B.H. Janssen. Calibration of QUEFTS, a model predicting nu-
trient uptake and yields from chemical soil fertility indices. Geoderma, 59:21–44,
1993.
P. Suriyakup, A. Polthanee, K. Pannangpetch, R. Katawatin, J.C. Mouret, and
C. Clermont-Dauphin. Introducing mungbean as a preceding crop to enhance ni-
trogen uptake and yield of rainfed rice in the north-east of Thailand. Australian
Journal of Agricultural Research, 58:1059–1067, 2007.
S. Takahashi, M. R. Anwar, and S. G de Vera. Effects of compost and nitrogen fertilizer
on wheat nitrogen use in Japanese soils. Agronomy Journal, 99:1151–1157, 2007.
C. Tetard-Jones, P. N. Shotton, L. Rempelos, J. Cooper, M. Eyre, C. H. Orr, C. Leifert,
and A.M.R. Gatehouse. Quantitative proteomics to study the response of wheat to
contrasting fertilisation regimes. Molecular Breeding, 31:379–393, 2013.
J. Timsina, U. Singh, M. Badaruddin, C. Meisner, and M.R. Amin. Cultivar, nitrogen,
and water effects on productivity, and nitrogen-use efficiency and balance for rice–
wheat sequences of Bangladesh. Field Crops Research, 72:143–161, 2001.
G. Wang, A. Dobermann, C. Witt, Q. Sun, and R. Fu. Performance of site-specific
nutrient management for irrigated rice in southeast China. Agronomy Journal, 93:
869–878, 2001.
G. Wang, Q.C. Zhang, C. Witt, and R.J. Buresh. Opportunities for yield increases and
127
environmental benefits through site-specific nutrient management in rice systems of
Zhejiang province, China. Agricultural Systems, 94:801–806, 2007.
Y. Wang, E. Wang, D. Wang, S. Huang, Y. Ma, C. J. Smith, and L. Wang. Crop
productivity and nutrient use efficiency as affected by long-term fertilisation in North
China Plain. Nutrient Cycling in Agroecosystems, 86:105–119, 2010.
D. Wei, K. Cui, J. Pan, G. Ye, J. Xiang, L. Nie, and J. Huang. Genetic dissection
of grain nitrogen use efficiency and grain yield and their relationship in rice. Field
Crops Research, 124:340–346, 2011.
C. Witt, A Dobermann, S. Abdulrachman, H.C. Gines, W. Guanghuo, R. Nagarajan,
S. Satawatananont, T. Thuc Son, P. Sy Tan, L. Van Tiem, and D. C. Olk. Internal
nutrient efficiencies of irrigated lowland rice in tropical and subtropical Asia. Field
Crops Research, 63:113–138, 1999.
C. Witt, K.G. Cassman, D.C. Olk, U. Biker, S.P. Liboon, M.I. Samson, and J.C.G.
Ottow. Crop rotation and residue management effects on carbon sequestration,
nitrogen cycling and productivity of irrigated rice systems. Plant and Soil, 225:
263–278, 2000.
Y. Xu, L. Nie, R.J. Buresh, J. Huang, K. Cui, B. Xu, W. Gong, and S. Peng. Agro-
nomic performance of late-season rice under different tillage, straw, and nitrogen
management. Field Crops Research, 115:79–84, 2010.
C.Y. Xue, X.G. Yang, B.A.M. Bouman, W. Deng, Q.P. Zhang, J. Yang, W.X. Yan,
T.Y. Zhang, A.J. Rouzi, H.Q. Wang, and P. Wang. Effects of irrigation and nitrogen
on the performance of aerobic rice in northern China. Journal of Integrative Plant
Biology, 50:1589–1600, 2008.
Y. Yang, M. Zhang, L. Zheng, D. Cheng, M. Liu, Y. Geng, and J. Chen. Controlled-
release urea for rice production and its environmental implications. Journal of Plant
Nutrition, 36:781–794, 2013.
128
Yavinder-Shing, J.K. Ladha, C.S. Khind, R.K. Gupta, O.P. Meelu, and E. Pasuquin.
Long-term effects of organic inputs on yield and soil fertility in the rice–wheat rota-
tion. Soil Science Society of America Journal, 68:845–853, 2004.
Y. Ye, X. Liang, Y. Chen, J. Liu, J. Gu, R. Guo, and L. Li. Alternate wetting and
drying irrigation and controlled-release nitrogen fertilizer in late-season rice. Effects
on dry matter accumulation, yield, water and nitrogen use. Field Crops Research,
144:212–224, 2013.
J. Ying, S. Peng, G. Yang, N. Zhou, R.M. Visperas, and K.G. Cassman. Comparison
of high-yield rice in tropical and subtropical environments II. Nitrogen accumulation
and utilization efficiency. Field Crops Research, 57:85–93, 1998.
L. Zhang, J.H.J. Spiertz, S. Zhang, B. Li, and W. Van der Werf. Nitrogen economy in
relay intercropping systems of wheat and cotton. Plant and Soil, 303:55–68, 2008.
129
130
Appendix B
Journal information of the studies in the review
Table B.1: Journals and their impact factor for studies in the review
Name Category 5 years im-
pact factor
Number of
manuscripts
selected
Agricultural Systems Agriculture Multidisciplinary 2.837 2
Agricultural Water
ManagementAgronomy 2.552 3
Agriculture Ecosystems
and EnvironmentAgriculture Multidisciplinary 3.673 1
Agronomy Journal Agronomy 1.989 7
Animal Production
Science (before
Australian Journal of
Experimental
Agriculture)
Agriculture Multidisciplinary 1.228 3
Canadian Journal of
Plant ScienceAgronomy 0.764 2
Communications in Soil
Science and Plant
Analysis
Agronomy 0.612 2
131
Table B.1 – continued from previous page
Crop and Pasture
Science (before
Australian Journal of
Agricultural Research)
Agriculture Multidisciplinary 1.439 4
Crop Science Agronomy 2.096 1
European Journal of
AgronomyAgronomy 3.311 8
Field Crops Research Agronomy 2.984 28
Geoderma Soil Science 2.904 1
Journal of Agricultural
ScienceAgriculture Multidisciplinary 2.604 1
Journal of Integrative
Plant BiologyBiochemistry and Molecular 2.429 1
Journal of Plant
NutritionPlant Science 0.851 4
Molecular Breeding Agronomy 3.304 1
Nutrient Cycling in
AgroecosystemsSoil Science 1.966 8
Paddy Water and
EnvironmentAgronomy 0.889 1
Plant and Soil Agronomy 3.108 13
Soil Science Society of
America JournalSoil Science 2.232 6
Soil Use and
ManagementSoil Science 2.219 1
Scientific World
JournalMultidisciplinary Science 1.603 1
132
Appendix C
Equivalence between Pham-Gia et al. (2006) and
(Marsaglia, 1965, 2006) expressions of the pdf of the
ratio
In this appendix we demonstrate that the expression of the pdf of T =a+ U
b+ Vgiven in
Pham-Gia et al. (2006), where U and V are two independent standard normal variables
and a and b are defined as in Eq 2.5, is equivalent to the one in (Marsaglia, 1965, 2006).
Firstly, let us calculate the integral:
H−2(z) =
∫ ∞
0
te−t2−2tzdt
Notice that:
−1 =
∫ ∞
0
(−2t− 2z)e−t2−2tzdt = −2H−2(z)− 2z
∫ ∞
0
e−t2−2tzdt
If we define:
A =
∫ ∞
0
e−t2−2tzdt =
∫ ∞
0
e−t2−2tz+z2−z2dt = ez
2
∫ ∞
0
e−(t+z)2dt
By performing the following change of variable, we obtain:
t+ z =r√2⇒ dt =
dr√2
A =ez
2
√2
∫ ∞√
2z
e−r2/2dr =
ez2
√2
√2π
∫ ∞√
2z
1√2πe−r
2/2dr = ez2√π(
1− ϕ(√
2z))
(C.1)
133
Therefore, H−2(z) =1
2− zez2√π
(1− ϕ(
√2z)
. If we substitute the latter expression
in Eq. 2.8 with σx = σy = 1 and µx = b µy = a, we arrive at:
fr(r) =e−(a2+b2)/2
π(1 + r2)[H−2(s(r)) +H−2(−s(r))]
where:
s(r) =ar + b√2(r2 + 1)
=q√2
, with q defined as in Eq. 2.6
Then,
ft(t) =e−(a2+b2)/2
π(1 + r2)[1 +
q√2eq
2/2√π(−1 + ϕ(q) + 1− ϕ(−q))]
=e−(a2+b2)/2
π(1 + r2)
[1 +
q√2eq
2/2√π
∫ q
−q
e−r2/2
√2π
dr
]=e−(a2+b2)/2
π(1 + r2)
[1 +
q
2eq
2/22
∫ q
0
e−r2/2dr
]
=e−(a2+b2)/2
π(1 + r2)
[1 + qeq
2/2
∫ q
0
e−r2/2dr
]=e−(a2+b2)/2
π(1 + r2)+ q
e−(a2+b2)/2
π(1 + r2)
∫ q
0
e−(r2−q2)/2dr
= (e−(a2+b2)/2)1
π(1 + r2)+ (1− e−(a2+b2)/2)
q∫ q
0e−(r2−q2)/2dr
π(1 + r2)(e(a2+b2)/2 − 1)
The last equality is obtained noticing that e−(a2+b2)/2 =1− e−(a2+b2)/2
(e(a2+b2)/2 − 1)
134
Appendix D
Application of the EM algorithm for estimating the
parameters of a mixture of multivariate Gaussian
distributions
In this appendix we detail the application of the EM algorithm to estimate the parameter
vector of a mixture of multivariate Gaussian distributions. Following the approach of
Dempster et al. (1977), the first step is to compute Lc(z,y,ψ).
The log likelihood function of the complete random sample is given by (McLachlan &
Peel, 2007 cited in Figueiredo & Jain , 2002):
Lc(z,y,ψ) =n∑
j=1
g∑
i=1
zij ln(πiφ(yj|µi,Σi)
)
=n∑
j=1
g∑
i=1
zij(ln(πi) + ln(φ(yj|µi,Σi)
)
where φ is the pdf of a MVN .
The two steps of the E-M algorithm are:
1. Compute E(Lc(z,y,ψ)|y,ψ(r)), whereψ(r) = (π(r)1 , . . . π
(r)g−1,µ
(r)1 , . . . ,µ(r)
g ,Σ(r)1 . . . ,Σ(r)
g ),
for a fixed g.
E(Lc(z,y,ψ)|y,ψ(r)) =n∑
j=1
g∑
i=1
E(zij|yj,ψ(r))[lnπi + lnφ(yj|µi,Σi)]
E(zij|yj,ψ(r)) = 1P (zij = 1|yj,ψ(r)) + 0P (zij = 0|y,ψ(r))
= P (zij = 1|yj,ψ(r)) (D.1)
135
by Eq. 3.2 and D.1, it follows that:
E(zij|yj,ψ(r)) = P (zij = 1|yj,ψ(r)) = τ(r+1)ij =
π(r)i φ(yj|µ(r)
i ,Σ(r)i )
∑gi=1 π
(r)i φ(yj|µ(r)
i ,Σ(r)i )
2. Find ψ(r+1) that maximises E(Lc(z,y,ψ)|y,ψ(r))
E(Lc(z,y,ψ)|y,ψ(r)) =n∑
j=1
g∑
i=1
τ(r+1)ij ln(πi) + τ
(r+1)ij lnφ(yj|µi,Σi) (D.2)
Firstly, let us find π(r+1)i , i = 1 . . . g. The second addend of the Eq. D.2 does not
depend on π so it is enough to derive:
h(π) =n∑
j=1
g∑
i=1
τ(r+1)ij ln(πi) and
∂h(π)
∂πi=
∑nj=1 τ
(r+1)ij
πi−
∑nj=1 τ
(r)gj
(1− π1 . . .− πg−1)
Denoting ni =n∑
j=1
τ(r+1)ij , we get
∂h(π)
∂πi=niπi− ng
(1− π1 . . .− πg−1)
=ni(1− π1 . . .− πg−1)− ngπi
πi(1− π1 − . . . πg−1)= 0 (D.3)
Note that ∀ s, s 6= i, by subtracting∂h(π)
∂πi− ∂h(π)
∂πs, we get that
πs =nsπini
∀s = 1 . . . g s 6= i
If we substitute πs, s = 1 . . . g, into the expression D.3 we obtain:
ni(1−n1πini− n2πi
ni. . .
ng−1πini
)− ngπi = 0
ni − πi(n1 + n2 + . . .+ ng) = 0 (D.4)
Notice that:
n1 + n2 + . . .+ ng =n∑
j=1
τ(r+1)1j + τ
(r+1)2j + . . .+ τ
(r+1)gj = n (D.5)
Therefore, by Eq.D.4 and D.5 , it is concluded that:
πi(r+1) =
nin
=
∑nj=1 τ
(r+1)ij
n
136
Taking derivatives with respect to µi and Σi, ∀i = 1 . . . g, in Eq. D.2:
g∑
k=1
n∑
j=1
τ(r+1)kj
∂
∂µilnφ(yj|µk,Σk) =
n∑
j=1
τ(r+1)ij
∂
∂µilnφ(yj|µi,Σi) = 0
g∑
k=1
n∑
j=1
τ(r+1)kj
∂
∂Σi
lnφ(yj|µk,Σk) =n∑
j=1
τ(r+1)ij
∂
∂Σi
lnφ(yj|µi,Σi) = 0 (D.6)
Note that∂
∂µilnφ(yj|µk,Σk) = 0 and
∂
∂Σi
lnφ(yj|µk,Σk) = 0 if i 6= k. Using some
properties detailed in (Rencher, 1998, p.416), it is straightforward to show that µ(r+1)i
and Σ(r+1)i satisfying D.6 exist in a closed form given by:
µ(r+1)i =
n∑
j=1
τ(r+1)ij yj/
n∑
j=1
τ(r+1)ij
Σ(r+1)i =
n∑
j=1
τ(r+1)ij (yj − µ(r+1)
i )>(yj − µ(r+1)i )/
n∑
j=1
τ(r+1)ij
We firstly calculate µ(r+1)i :
n∑
j=1
τ(r+1)ij
∂
∂µ>ilnφ(yj|µi,Σi) =
n∑
j=1
τ(r+1)ij
∂
∂µ>i[−p ln(
√2π)− 1
2ln |Σi| −
1
2(yj − µi)Σ−1
i (yj − µi)>] =
n∑
j=1
τ(r+1)ij
∂
∂µ>i[−1
2(yj − µi)Σ−1
i (yj − µi)>] =
n∑
j=1
τ(r+1)ij
∂
∂µ>i[−1
2(yjΣ
−1i y>j − yjΣ
−1i µ
>i − µiΣ−1
i y>j + µiΣ−1i µ
>i )] =
n∑
j=1
τ(r+1)ij
∂
∂µ>i[−1
2(−yjΣ
−1i µ
>i − (µ>i )>Σ−1
i y>j + (µ>i )>Σ−1i µ
>i )] =
Using A.13.2 and A. 13.3 in Rencher (1998) which state that∂(a>x)
∂x=∂(x>a)
∂x= a
and A.13.3∂(x>Ax)
∂x= 2Ax,
n∑
j=1
τ(r+1)ij (−1
2)[−(yjΣ
−1i )> −Σ−1
i y>j + 2Σ−1i µ
>i ] =
137
n∑
j=1
τ(r+1)ij [Σ−1
i (−y>j + µ>i )]⇒
µi =n∑
j=1
τ(r+1)ij yj/
n∑
j=1
τ(r+1)ij
Now we calculate Σ(r+1)i . Instead of taking derivatives with respect to Σi, we do so
with respect to Σ−1i .
n∑
j=1
τ(r+1)ij
∂
∂Σ−1i
lnφ(yj|µ(r)i ,Σi) =
n∑
j=1
τ(r+1)ij
∂
∂Σ−1i
−p ln√
2π − 1
2ln |Σi| −
1
2(yj − µ(r)
i )Σ−1i (yj − µ(r)
i )>
=n∑
j=1
τ(r+1)ij
∂
∂Σ−1i
1
2ln |Σ−1
i | −1
2(yj − µ(r)
i )Σ−1i (yj − µ(r)
i )>
But
(yj − µ(r)i )Σ−1
i (yj − µ(r)i )> = tr(yj − µ(r)
i )Σ−1i (yj − µ(r)
i )>
and tr(CD) = tr(DC)
=n∑
j=1
τ(r+1)ij
∂
∂Σ−1i
1
2ln |Σ−1
i | −1
2tr(Σ−1
i (yj − µ(r)i )>(yj − µ(r)
i ))
Let us take D = (yj − µ(r)i )>(yj − µ(r)
i )
Applying∂tr(CD)
∂C= D +D> − diag(D) (A.13.5 Rencher, 1998)
and∂ ln |C|∂C
= 2C−1 − diag(C−1) (A.13.6 Rencher, 1998) we get:
n∑
j=1
τ(r+1)ij 1
2(2Σi − diag(Σi)−D −D> + diag(D))
D is symmetric so
n∑
j=1
τ(r+1)ij Σi −
1
2diag(Σi)−D +
1
2diag(D) ⇒
If we equal the previous expression to 0, we get
n∑
j=1
τ(r+1)ij Σi −
1
2diag(Σi) =
n∑
j=1
τ(r+1)ij D − 1
2diag(D) ⇒
138
Σ(r)i =
∑nj=1 τ
(r+1)ij D
∑nj=1 τ
(r+1)ij
=
∑nj=1 τ
(r+1)ij (yj − µ(r)
i )>(yj − µ(r)i )
∑nj=1 τ
(r+1)ij
To show the second order condition for (π(r+1)1 , . . . , π(r+1)
g ,µ(r+1)1 , . . . ,µ(r+1)
g ,Σ(r+1)1 , . . . ,Σ(r+1)
g )
to be a maximum, we firstly demonstrate the following property:
Property 1:
If g(y|θ1 . . .θn) =n∑
i=1
gi(y|θi) and θi is a local maximum of gi(y|θi) ∀i⇒ (θ1, . . . , θn)
is a local maximum of g(y|θ1, . . .θn)
Proof. The sufficient conditions to show that (θ1, . . . , θn) is a maximum of g(y|θ1 . . .θn)
are (see Snyman, 2005):
1. Dg(θ1, . . . , θn) = 0, where Dg is the gradient of g
2. Hg(θ1, . . . , θn) is definite negative, where Hg is the Hessian matrix of g.
The first condition is straightfoward to show by taking into account that∂g
∂θi=∂gi∂θi
= 0
∀i. The second condition follows by considering that Hg is a diagonal block matrix
where all the blocks are negative definite matrices.
Hg =
Hg1 0 . . . 0
0 Hg2 . . . 0
0 0 . . . 0...
......
...
0 0 . . . Hgn
Thus, |Hg − λI| = |Hg1 − λI| . . . |Hg−n − λI| and this implies that all the eigenvalues
of Hg are negative.
Now we want to prove that (π1, . . . πg, µ1, . . . , µg, Σ1, . . . , Σg) is the global maxi-
mum of E(Lc(z,y,ψ)|y,ψ(r)).
We can express
E(Lc(z,y,ψ)|y,ψ(r)) = h(π) +
g∑
i=1
gi(y|µi,Σi)
139
with:
h(π) =n∑
j=1
g∑
i=1
τ(r+1)ij ln(πi) (D.7)
gi(y|µi,Σi) =n∑
j=1
τ(r+1)ij lnφ(yj|µiΣi) (D.8)
It is straightforward to show that π(r+1) is the global maximum of h(π) by calculating
the second derivative and applying Property 1. To show that (µ(r+1)i ,Σ
(r+1)i ) is the
global maximum of gi(y|µi,Σi) we follow the argument in Anderson & Olkin (1985)
for the MLE of the multivariate normal distribution. The function gi(y|µi,Σi) is
continuously differentiable and gi(y|µi,Σi)→ −∞ when the parameters approach the
boundary of the parameter space. Then, its only critical point is the global maximum.
By Property 1, (π(r+1)1 , . . . , π(r+1)
g ,µ(r+1)1 , . . . ,µ(r+1)
g ,Σ(r+1)1 , . . . ,Σ(r+1)
g ) is the global
maximum of E(Lc(z,y,ψ)|y,ψ(r)).
140
Appendix E
R code for fitting mixtures models of univariate Gaus-
sian distributions
Univariate mixture models can be fitted using the R packages mclust (Fraley & Raftery,
1999) or mixtools (Benaglia et al., 2009). For our analysis, we used mixtools because
it is more convenient for implementing the starting strategies described in Section 2.3
of the manuscript. The starting strategy number iii was not used because it tends to
produce negative values of the initial estimates of the means of the ratio, which is not
biologically possible. The procedure for fitting mixture models has been detailed in the
Section 2.3 of the manuscript. In this complementary material, we display the code
employed to perform the univariate mixture analyses. For the purpose of illustrating
the analysis, we consider the number of groups (g) fixed and equal to 3.
Firstly, call the library mixtools and mclust.
library(mclust)
library(mixtools)
Then, read the csv file containing the data set.
dataset <- read.table(file.choose(), header = T, sep = ",")
IEN <- dataset$IEN
141
g = 3
# g is the number of groups
size = length(dataset$IEN)
# size is the sample size
A function called initial was created to calculate the initial estimates of the mixture
when a cluster partition is provided. The inputs needed for this function are: the
data (dat); the sample size (size); a vector containing the classification of each of the
observations into groups (clust); and the number of groups (g).
initial <- function(dat, size, clust, g) dat <- as.matrix(dat)
class <- unmap(clust)
# unmap converts the vector clust into a matrix
# with number of rows and columns given by the
# sample size and the number of groups,
# respectively. The entry class[ik] has two
# possible values: 1, if the i-th observation is
# classified in the k-th group, or 0 otherwise.
sum <- t(dat) %*% class
# Multiplying the data by class we get the sums
# of the observations values classified in each
# of the groups
k <- rep(0, g)
mu <- rep(0, g)
lambda <- rep(0, g)
aux <- matrix(nrow = g, ncol = size)
for (i in 1:g) k[i] <- sum(class[, i])
# k denotes the number of observations classified
# in each of the groups
142
mu[i] <- sum[, i]/k[i]
lambda[i] <- k[i]/size
aux[i, ] <- rep(mu[i], size)
diff <- matrix(nrow = g, ncol = size)
sumvariance <- rep(0, g)
variance <- rep(0, g)
for (i in 1:g) diff[i, ] <- (aux[i, ] - dat)^2
sumvariance[i] <- t(diff[i, ]) %*% class[,
i]
variance[i] <- sumvariance[i]/k[i]
# lambda, mu and variance are vectors containing
# the initial estimates of the mixing
# proportions, means and variances, respectively.
# These values are stored in a list called init,
# which is returned by the function initial.
init <- list()
init$lambda <- lambda
init$mean <- mu
init$variance <- variance
return(init)
Now, proceed to fit univariate mixture models initiating the EM algorithm with
the strategies detailed in Section 2.3 of the manuscript.
i) Random starts
143
clust1 <- sample(1:g, size, replace = "TRUE")
init1 <- initial(dat = IEN, size = size, clust = clust1,
g = g)
model1 <- normalmixEM(IEN, lambda = init1$lambda, mu = init1$mean,
sigma = init1$variance, epsilon = 1e-06, k = g,
maxit = 10000)
## number of iterations= 789
plot(model1, whichplot = 2)
Density Curves
Data
Den
sity
20 40 60 80
0.00
00.
005
0.01
00.
015
0.02
00.
025
0.03
00.
035
Figure E.1: Histogram of the internal nitrogen use efficiency data and the mixture
components found by the EM algorithm initiated from random starts.
ii) K-means
clust2 <- kmeans(IEN, g, nstart = 5)$cluster
init2 <- initial(dat = IEN, size = size, clust = clust2,
g = g)
model2 <- normalmixEM(IEN, lambda = init2$lambda, mu = init2$mean,
sigma = init2$variance, epsilon = 1e-06, k = g,
maxit = 10000)
144
## number of iterations= 1058
plot(model2, whichplot = 2)
Density Curves
Data
Den
sity
20 40 60 80
0.00
00.
005
0.01
00.
015
0.02
00.
025
0.03
00.
035
Figure E.2: Histogram of the internal nitrogen use efficiency data and the mixture
components found by the EM algorithm initiated from the partition provided by the
K-means algorithm.
145
iv) Subsample solution
sequence <- seq(1, size, 1)
index <- sample(sequence, size = 200, replace = FALSE)
subsample <- IEN[index]
clustsub <- sample(1:g, 200, replace = "TRUE")
initsub <- initial(dat = subsample, size = 200, clust = clustsub,
g = g)
modelsub <- normalmixEM(IEN, lambda = initsub$lambda,
mu = initsub$mean, sigma = initsub$variance, k = g,
maxit = 10, epsilon = 0.01)
## number of iterations= 2
model3 <- normalmixEM(IEN, lambda = modelsub$lambda,
mu = modelsub$mean, sigma = modelsub$variance,
epsilon = 1e-06, k = g, maxit = 10000)
## number of iterations= 812
plot(model3, whichplot = 2)
146
Density Curves
Data
Den
sity
20 40 60 80
0.00
0.01
0.02
0.03
Figure E.3: Histogram of the internal nitrogen use efficiency data and the mixture
components found by the EM algorithm initiated on a random subsample.
v) Short runs
clustshort1 <- sample(1:g, size, replace = "TRUE")
initshort1 <- initial(dat = IEN, size = size, clust = clustshort1,
g = g)
modelshort1 <- normalmixEM(IEN, lambda = initshort1$lambda,
mu = initshort1$mean, sigma = initshort1$variance,
epsilon = 0.01, k = g, maxit = 10000)
## number of iterations= 2
clustshort2 <- sample(1:g, size, replace = "TRUE")
initshort2 <- initial(dat = IEN, size = size, clust = clustshort2,
g = g)
modelshort2 <- normalmixEM(IEN, lambda = initshort2$lambda,
mu = initshort2$mean, sigma = initshort2$variance,
epsilon = 0.01, k = g, maxit = 10000)
147
## number of iterations= 2
clustshort3 <- sample(1:g, size, replace = "TRUE")
initshort3 <- initial(dat = IEN, size = size, clust = clustshort3,
g = g)
modelshort3 <- normalmixEM(IEN, lambda = initshort3$lambda,
mu = initshort3$mean, sigma = initshort3$variance,
epsilon = 0.01, k = g, maxit = 10000)
## number of iterations= 2
modelshort1$loglik
## [1] -1646
modelshort2$loglik
## [1] -1646
modelshort3$loglik
## [1] -1646
max(modelshort1$loglik, modelshort3$loglik, modelshort3$loglik)
## [1] -1646
If modelshort3 provides the solution with the highest value of the log likelihood
function among the short runs of the algorithm, use this solution to initiate a complete
run of the algorithm.
148
Density Curves
Data
Den
sity
20 40 60 80
0.00
0.01
0.02
0.03
Figure E.4: Histogram of the internal nitrogen use efficiency data and the mixture
components found after running several short runs of the EM algorithm.
modellong <- normalmixEM(IEN, lambda = modelshort3$lambda,
mu = modelshort3$mean, sigma = modelshort3$variance,
epsilon = 1e-06, k = g, maxit = 10000)
## number of iterations= 487
plot(modellong, whichplot = 2)
model1$loglik
## [1] -1628
model2$loglik
## [1] -1628
model3$loglik
## [1] -1629
149
modellong$loglik
## [1] -1629
Delete spurious solutions. From the remaining solutions, select the one with the
highest value of the log likelihood function. If there are not spuriosities, proceed as
follows:
max(model1$loglik, model2$loglik, model3$loglik, modellong$loglik)
## [1] -1628
If obj1 is the solution with the highest value of the log likelihood function, record
its AIC and BIC values.
aic <- function(loglik, g, size) aic = -2 * model1$loglik + 2 * (3 * g - 1)
return(aic)
bic <- function(loglik, g, size)
bic = -2 * model1$loglik + (3 * g - 1) * log(size)
return(bic)
aic(loglik = model1$loglik, g, size)
## [1] 3272
bic(loglik = model1$loglik, g, size)
## [1] 3305
This procedure is repeated starting with the maximum number of components until
fitting just one component. However, mixtools does not allow to fit one component,
so mclust is needed when g=1. The commands used for fitting one component with
mclust are:
150
model1 = Mclust(dataset$IEN, G = 1, modelNames = "V")
model1$parameters
## $pro
## [1] 1
##
## $mean
## [1] 44.27
##
## $variance
## $variance$modelName
## [1] "X"
##
## $variance$d
## [1] 1
##
## $variance$G
## [1] 1
##
## $variance$sigmasq
## [1] 119.4
Finally, the model selected is the one which minimises the information criteria.
151
152
Appendix F
R code for fitting mixture models of bivariate Gaus-
sian distributions
Bivariate mixture models were fitted using the R package EMMIX (McLachlan et al.,
1999). This package can be downloaded from the Web site:
http://www.maths.uq.edu.au/∼gjm/mix soft/EMMIX R/index.html. The procedure
for fitting mixture models has been detailed in Section 2.3 of the manuscript. Here, we
display the code employed to fit mixture models of bivariate Gaussian distributions.
For the purpose of illustrating the analysis, we consider the number of components (g)
fixed and equal to 3.
Firstly, set the R studio working directory to the root folder in which the package
EMMIX has been downloaded. Then, call the package EMMIX and mnormt. The
latter is needed for the starting strategy number 3 (see Section 2.3 of the manuscript).
source("EMMIX.R")
library(mnormt)
Read the csv file containing the data set.
dataset <- read.table(file.choose(), header = T, sep = ",")
For the bivariate analysis, the EM algorithm operates with the measurements of
grain yield
153
(dataset$YGR14) and nitrogen uptake (dataset$NUPT). These measurements are stored
in the matrix named dat.
size = 432
# size is the sample size
dat <- matrix(nrow = size, ncol = 2)
dat[, 1] <- dataset$NUPT
dat[, 2] <- dataset$YGR14
distr = "mvn"
# distr refers to the type of component densities
# of the mixture. In our case, multivariate
# Gaussian distributions (mvn)
ncov = 3
# ncov=3 indicates that each group is allowed to
# have a different covariance matrix.
g = 3
# g is the number of groups
A function called classcolor was created to visualise the solutions provided by the
EM algorithm. The inputs needed for this function are: the data set (y); a logic
parameter (ellipse) which can be TRUE or FALSE depending whether the user wants to
draw the prediction ellipses; the mixture model fitted by EMMIX (obj); the confidence
level for drawing the prediction ellipses (a); the size of the sample (n); and the number
of groups (g). The output of this function is a plot which displays the observations in
different colours depending on their classification. Furthermore, if ellipse=TRUE, the
prediction ellipses for a new point of the population to belong to each of the groups
are drawn. The ellipses are drawn modifying the function ellipse implemented in the
R package ellipse (Murdoch et al., 2007) according to Eq. 4.4 in Chew (1966).
154
classcolor<-function(y, ellipse, obj, a, n, g)p=2
# p is the dimension of the random vector
aux=matrix(nrow=n, ncol=3)
# aux is a matrix whose first two columns contain
# the data of nitrogen uptake and grain yield, and the third
# column contains the number of the group in which the
# observations have been classified.
tau<-obj$tau
for (i in 1: n)aux[i,1]<-y[i,1]
aux[i,2]<-y[i,2]
aux[i,3]<-which(tau[i,]==max(tau[i,]))
# Next, the measurements of grain yield and nitrogen uptake
# are coloured according to the classification given by the
# EM algorithm
xmin=0
xmax=max(y[,1])+20
ymin=0
ymax=max(y[,2])+20
plot(aux[,1][aux[,3]==1], aux[,2][aux[,3]==1],
col="red", xlim=c(xmin,xmax), ylim=c(ymin, ymax),
ylab="Grain Yield (kg/ha)",
xlab="Nitrogen Uptake (kg/ha)",cex.lab=1.2,
cex.axis=1.2 )
for(i in 2:g)points(aux[,1][aux[,3]==i], aux[,2][aux[,3]==i], col=i+1)
155
# If ellipse=TRUE, the function ellipse2 draws the prediction
# ellipses
ellipse2<-function (mu, sigma, alpha , npoints , newplot,
r1, draw, ...) es <- eigen(sigma)
e1 <- es$vec %*% diag(sqrt(es$val))
theta <- seq(0, 2 * pi, len = npoints)
v1 <- cbind(r1 * cos(theta), r1 * sin(theta))
pts = t(mu - (e1 %*% t(v1)))
if (newplot && draw) plot(pts, ...)
else if (!newplot && draw)
lines(pts, ...)
invisible(pts)
# end function
if (ellipse=="TRUE")
for (i in 1: g)n1=length(aux[aux[,3]==i])/3
q1=qf(1-a, 2, n1-2)
def<-((n1-1)*(n1+1)*p*q1)/((n1-p)*n1)
ellipse2(mu=obj$mu[,i], sigma=obj$sigma[,,i], alpha=a,
r1=sqrt(def), npoints=1000, newplot=FALSE,
draw=TRUE,type="l", lwd=2, col=i+1)
points( obj$mu[,i][1],obj$mu[,i][2] , col="black", pch=16)
156
# end for
# end if
# end function
Now, fit bivariate mixture models initiating the EM algorithm with the starting
strategies described in the Section 2.3 of the manuscript.
i) Random starts
initobj1 <- init.mix(dat, g, distr, ncov, nkmeans = 0,
nrandom = 100, nhclust = FALSE)
obj1 <- EMMIX(dat, g, distr, ncov, init = initobj1,
itmax = 1000, epsilon = 1e-06)
##
## -----------------------
##
## 3 - Component Multivariate Normal Mixture Model
##
## -----------------------
##
## $distr
## [1] "mvn"
##
## $error
## [1] 0
##
## $loglik
## [1] -5160
##
157
## $bic
## [1] 10424
##
## $aic
## [1] 10355
##
## $pro
## [1] 0.3037 0.3775 0.3188
##
## $mu
## [,1] [,2] [,3]
## [1,] 51.42 33.88 72.43
## [2,] 1993.46 1752.30 2808.45
##
## $sigma
## , , 1
##
## [,1] [,2]
## [1,] 301.8 11121
## [2,] 11121.0 473365
##
## , , 2
##
## [,1] [,2]
## [1,] 187.4 9557
## [2,] 9557.0 562741
##
## , , 3
##
## [,1] [,2]
158
0 20 40 60 80 100 120 140
010
0020
0030
0040
00
Nitrogen Uptake (kg/ha)
Gra
in Y
ield
(kg
/ha)
Figure F.1: Cluster partition found by the EM algorithm initiated from random starts.
[Observations in different colours have been classified as belonging to different groups. The dots
represent the joint means and the ellipses are the 90% prediction regions for each group].
## [1,] 368.2 6279
## [2,] 6279.4 359595
##
##
## $ICL
## [1] -5349
##
##
## -----------------------
obj1$loglik
## [1] -5160
classcolor(dat, ellipse = "TRUE", obj1, a = 0.1, n = size,
g = g)
ii) K-means
159
initobj2 <- init.mix(dat, g, distr, ncov, nkmeans = 100,
nrandom = 0, nhclust = FALSE)
obj2 <- EMMIX(dat, g, distr, ncov, init = initobj2,
itmax = 1000, epsilon = 1e-06)
##
## -----------------------
##
## 3 - Component Multivariate Normal Mixture Model
##
## -----------------------
##
## $distr
## [1] "mvn"
##
## $error
## [1] 0
##
## $loglik
## [1] -5150
##
## $bic
## [1] 10402
##
## $aic
## [1] 10333
##
## $pro
## [1] 0.4096 0.4390 0.1513
##
## $mu
160
## [,1] [,2] [,3]
## [1,] 42.43 70.66 20.42
## [2,] 1866.97 2814.10 1070.22
##
## $sigma
## , , 1
##
## [,1] [,2]
## [1,] 118.2 3918
## [2,] 3917.7 253491
##
## , , 2
##
## [,1] [,2]
## [1,] 322.3 5272
## [2,] 5272.2 308424
##
## , , 3
##
## [,1] [,2]
## [1,] 37.65 2607
## [2,] 2607.11 225208
##
##
## $ICL
## [1] -5284
##
##
## -----------------------
obj2$loglik
161
0 20 40 60 80 100 120 140
010
0020
0030
0040
00
Nitrogen Uptake (kg/ha)
Gra
in Y
ield
(kg
/ha)
Figure F.2: Cluster partition found by the EM algorithm initiated from the partition
obtained by the K-means algorithm. [Observations in different colours have been classified
as belonging to different groups. The dots represent the joint means and the ellipses are the 90%
prediction regions for each group].
## [1] -5150
classcolor(dat, ellipse = "TRUE", obj2, a = 0.1, n = size,
g = g)
iii) Simulated means
mu <- c()
mu[1] <- mean(dat[, 1])
mu[2] <- mean(dat[, 2])
S <- cov(dat)
meaninit <- rmnorm(n = g, mean = mu, S)
# the initial values of the means
sigmainit <- array(S, c(2, 2, g))
# the initial values of the covariance matrices
initobj1$pro <- rep(1/g, g)
# the initial values of the mixing proportions
162
initobj1$mu <- t(meaninit)
initobj1$sigma <- sigmainit
obj3 <- EMMIX(dat, g, distr, ncov, init = initobj1,
itmax = 1000, epsilon = 1e-06)
##
## -----------------------
##
## 3 - Component Multivariate Normal Mixture Model
##
## -----------------------
##
## $distr
## [1] "mvn"
##
## $error
## [1] 0
##
## $loglik
## [1] -5150
##
## $bic
## [1] 10402
##
## $aic
## [1] 10333
##
## $pro
## [1] 0.1524 0.4032 0.4444
##
163
## $mu
## [,1] [,2] [,3]
## [1,] 20.46 42.36 70.43
## [2,] 1073.41 1861.97 2808.18
##
## $sigma
## , , 1
##
## [,1] [,2]
## [1,] 37.85 2621
## [2,] 2621.27 226479
##
## , , 2
##
## [,1] [,2]
## [1,] 116.4 3866
## [2,] 3866.0 250825
##
## , , 3
##
## [,1] [,2]
## [1,] 324.3 5353
## [2,] 5352.7 310420
##
##
## $ICL
## [1] -5284
##
##
## -----------------------
164
0 20 40 60 80 100 120 140
010
0020
0030
0040
00
Nitrogen Uptake (kg/ha)
Gra
in Y
ield
(kg
/ha)
Figure F.3: Cluster partition found by the EM algorithm initiated from simulated
means. [Observations in different colours have been classified as belonging to different groups. The
dots represent the joint means and the ellipses are the 90% prediction regions for each group].
obj3$loglik
## [1] -5150
classcolor(dat, ellipse = "TRUE", obj3, a = 0.1, n = size,
g = g)
iv) Subsample solution
sequence <- seq(1, size, 1)
index <- sample(sequence, size = 200, replace = FALSE)
subsample <- dat[index, ]
initobjsub <- init.mix(subsample, g, distr, ncov, nkmeans = 0,
nrandom = 100, nhclust = TRUE)
objsub <- EMMIX(subsample, g, distr, ncov, init = initobjsub,
itmax = 10)
##
## -----------------------
165
##
## 3 - Component Multivariate Normal Mixture Model
##
## -----------------------
##
## $distr
## [1] "mvn"
##
## $error
## [1] 1
##
## $loglik
## [1] -2397
##
## $bic
## [1] 4885
##
## $aic
## [1] 4829
##
## $pro
## [1] 0.3471 0.3572 0.2957
##
## $mu
## [,1] [,2] [,3]
## [1,] 47.43 72.27 28.24
## [2,] 2020.94 2880.57 1359.73
##
## $sigma
## , , 1
166
##
## [,1] [,2]
## [1,] 171.2 4978
## [2,] 4977.6 284157
##
## , , 2
##
## [,1] [,2]
## [1,] 353.6 4516
## [2,] 4516.3 309734
##
## , , 3
##
## [,1] [,2]
## [1,] 105 4747
## [2,] 4747 312637
##
##
## $ICL
## [1] -2500
##
##
## -----------------------
initobj4 <- objsub
obj4 <- EMMIX(dat, g, distr, ncov, init = initobj4,
itmax = 1000, epsilon = 1e-06)
##
## -----------------------
##
167
## 3 - Component Multivariate Normal Mixture Model
##
## -----------------------
##
## $distr
## [1] "mvn"
##
## $error
## [1] 0
##
## $loglik
## [1] -5150
##
## $bic
## [1] 10402
##
## $aic
## [1] 10333
##
## $pro
## [1] 0.4059 0.4417 0.1524
##
## $mu
## [,1] [,2] [,3]
## [1,] 42.41 70.54 20.46
## [2,] 1864.94 2811.11 1073.26
##
## $sigma
## , , 1
##
168
## [,1] [,2]
## [1,] 117.1 3887
## [2,] 3887.5 252103
##
## , , 2
##
## [,1] [,2]
## [1,] 323.3 5313
## [2,] 5313.0 309438
##
## , , 3
##
## [,1] [,2]
## [1,] 37.87 2621
## [2,] 2621.16 226382
##
##
## $ICL
## [1] -5284
##
##
## -----------------------
obj4$loglik
## [1] -5150
classcolor(dat, ellipse = "TRUE", obj4, a = 0.1, n = size,
g = g)
v) Short runs
169
0 20 40 60 80 100 120 140
010
0020
0030
0040
00
Nitrogen Uptake (kg/ha)
Gra
in Y
ield
(kg
/ha)
Figure F.4: Cluster partition found by the EM algorithm initiated from the mixture
estimates obtained by running the EM algorithm on a random subsample of 200 ob-
servations. [Observations in different colours have been classified as belonging to different groups.
The dots represent the joint means and the ellipses are the 90% prediction regions for each group].
initobjshort1 <- init.mix(dat, g, distr, ncov, nkmeans = 0,
nrandom = 100, nhclust = FALSE)
objshort1 <- EMMIX(dat, g, distr, ncov, init = initobjshort1,
epsilon = 0.01)
##
## -----------------------
##
## 3 - Component Multivariate Normal Mixture Model
##
## -----------------------
##
## $distr
## [1] "mvn"
##
## $error
## [1] 0
170
##
## $loglik
## [1] -5160
##
## $bic
## [1] 10424
##
## $aic
## [1] 10355
##
## $pro
## [1] 0.3109 0.3822 0.3069
##
## $mu
## [,1] [,2] [,3]
## [1,] 72.46 34.16 51.85
## [2,] 2803.79 1765.15 2006.83
##
## $sigma
## , , 1
##
## [,1] [,2]
## [1,] 371.9 6341
## [2,] 6341.4 363120
##
## , , 2
##
## [,1] [,2]
## [1,] 190.3 9673
## [2,] 9673.0 567372
171
##
## , , 3
##
## [,1] [,2]
## [1,] 314.9 11613
## [2,] 11613.4 490879
##
##
## $ICL
## [1] -5351
##
##
## -----------------------
initobjshort2 <- init.mix(dat, g, distr, ncov, nkmeans = 0,
nrandom = 100, nhclust = FALSE)
objshort2 <- EMMIX(dat, g, distr, ncov, init = initobjshort2,
epsilon = 0.01)
##
## -----------------------
##
## 3 - Component Multivariate Normal Mixture Model
##
## -----------------------
##
## $distr
## [1] "mvn"
##
## $error
## [1] 0
172
##
## $loglik
## [1] -5160
##
## $bic
## [1] 10424
##
## $aic
## [1] 10355
##
## $pro
## [1] 0.3071 0.3878 0.3051
##
## $mu
## [,1] [,2] [,3]
## [1,] 52.29 34.33 72.52
## [2,] 2020.33 1771.45 2801.91
##
## $sigma
## , , 1
##
## [,1] [,2]
## [1,] 321.3 11862
## [2,] 11862.2 499889
##
## , , 2
##
## [,1] [,2]
## [1,] 192.2 9737
## [2,] 9737.0 569729
173
##
## , , 3
##
## [,1] [,2]
## [1,] 374 6359
## [2,] 6359 364909
##
##
## $ICL
## [1] -5351
##
##
## -----------------------
initobjshort3 <- init.mix(dat, g, distr, ncov, nkmeans = 0,
nrandom = 100, nhclust = FALSE)
objshort3 <- EMMIX(dat, g, distr, ncov, init = initobjshort3,
epsilon = 0.01)
##
## -----------------------
##
## 3 - Component Multivariate Normal Mixture Model
##
## -----------------------
##
## $distr
## [1] "mvn"
##
## $error
## [1] 0
174
##
## $loglik
## [1] -5160
##
## $bic
## [1] 10424
##
## $aic
## [1] 10355
##
## $pro
## [1] 0.3025 0.3099 0.3877
##
## $mu
## [,1] [,2] [,3]
## [1,] 72.6 52.32 34.37
## [2,] 2803.0 2022.39 1774.12
##
## $sigma
## , , 1
##
## [,1] [,2]
## [1,] 373.9 6338
## [2,] 6338.4 364670
##
## , , 2
##
## [,1] [,2]
## [1,] 323.3 11922
## [2,] 11921.7 501949
175
##
## , , 3
##
## [,1] [,2]
## [1,] 192.7 9768
## [2,] 9767.7 571508
##
##
## $ICL
## [1] -5351
##
##
## -----------------------
objshort1$loglik
## [1] -5160
objshort2$loglik
## [1] -5160
objshort3$loglik
## [1] -5160
max(objshort1$loglik, objshort3$loglik, objshort3$loglik)
## [1] -5160
If objshort3 is the model which provides the solution with the highest value of the
log likelihood function among the short runs of the algorithm; then, use this solution
to initiate a complete run of the algorithm.
176
objlong <- EMMIX(dat, g, distr, ncov, init = objshort3,
itmax = 100, epsilon = 1e-06)
##
## -----------------------
##
## 3 - Component Multivariate Normal Mixture Model
##
## -----------------------
##
## $distr
## [1] "mvn"
##
## $error
## [1] 0
##
## $loglik
## [1] -5160
##
## $bic
## [1] 10424
##
## $aic
## [1] 10355
##
## $pro
## [1] 0.3189 0.3037 0.3774
##
## $mu
## [,1] [,2] [,3]
## [1,] 72.42 51.41 33.87
177
## [2,] 2808.44 1993.19 1752.20
##
## $sigma
## , , 1
##
## [,1] [,2]
## [1,] 368.2 6280
## [2,] 6279.6 359576
##
## , , 2
##
## [,1] [,2]
## [1,] 301.7 11117
## [2,] 11117.0 473221
##
## , , 3
##
## [,1] [,2]
## [1,] 187.4 9556
## [2,] 9556.0 562705
##
##
## $ICL
## [1] -5349
##
##
## -----------------------
objlong$loglik
## [1] -5160
178
0 20 40 60 80 100 120 140
010
0020
0030
0040
00
Nitrogen Uptake (kg/ha)
Gra
in Y
ield
(kg
/ha)
Figure F.5: Cluster partition found by the EM algorithm initiated from the mixture
estimates after running several short runs of the EM algorithm. [Observations in different
colours have been classified as belonging to different groups. The dots represent the joint means and
the ellipses are the 90% prediction regions for each group].
classcolor(dat, ellipse = "TRUE", objlong, a = 0.1,
n = size, g = g)
obj1$loglik
## [1] -5160
obj2$loglik
## [1] -5150
obj3$loglik
## [1] -5150
obj4$loglik
## [1] -5150
objlong$loglik
## [1] -5160
179
Delete spurious solutions. From the remaining solutions, select the one with the
highest value of the log likelihood function. If there are not spuriosities, proceed as
follows:
max(obj1$loglik, obj2$loglik, obj3$loglik, obj4$loglik,
objlong$loglik)
## [1] -5150
If obj2 is the solution with the highest value of the log likelihood function, record
its AIC and BIC value.
obj2$aic
## [1] 10333
obj2$ICL
## [1] -5284
obj2$bic
## [1] 10402
This procedure is repeated starting with the maximum number of groups until
fitting just one group. Finally, the model selected is the one which minimises the
information criteria.
180
Bibliography
H. Akaike. Information theory and an extension of the maximum likelihood principle. In
B. N. Petrov and F. Csaki, editors, Second International Symposium on Information
Theory., pages 267–281. Budapest: Akademia Kiado, 1973.
H. Akaike. A new look at the statistical model identification. IEEE Transactions on
Automatic Control, 19:716–723, 1974.
R. Albrizio, M. Todorovic, T. Matic, and A. M. Stellacci. Comparing the interactive
effects of water and nitrogen on durum wheat and barley grown in a Mediterranean
environment. Field Crops Research, 115:179–190, 2010.
T. W. Anderson and I. Olkin. Maximum-likelihood estimation of the parameters of
a multivariate normal distribution. Linear algebra and its applications, 70:147–171,
1985.
M. Andrews, P. J. Lea, J.A. Raven, and R.A. Azevedo. Nitrogen use efficiency. 3.
Nitrogen fixation: genes and costs. Annals of Applied Biology, 155:1–13, 2009.
Australian Centre for Plant Functional Genomics. ACPFG Web site. http://
www.acpfg.com.au/search.php?q=Nitrogen%20Use%20Efficiency. Last accessed:
2014-06-12.
J. D. Banfield and A. E. Raftery. Model-based Gaussian and non-Gaussian clustering.
Biometrics, 49:803–821, 1993.
T. Benaglia, D. Chauveau, D. R. Hunter, and D. S. Young. mixtools: An R package
for analyzing finite mixture models. Journal of Statistical Software, 32:1–29, 2009.
181
H. Bensmail, G. Celeux, A. E. Raftery, and C. P. Robert. Inference in model-based
cluster analysis. Statistics and Computing, 7:1–10, 1997.
C. Biernacki, G. Celeux, and G. Govaert. Assessing a mixture model for clustering
with the Integrated Completed Likelihood. IEEE Transactions on Pattern Analysis
and Machine Intelligence, 22:719–725, 2000.
C. Biernacki, G. Celeux, and G. Govaert. Choosing starting values for the EM algo-
rithm for getting the highest likelihood in multivariate Gaussian mixture models.
Computational Statistics & Data Analysis, 41:561–575, 2003.
C. Biernacki, G. Celeux, G. Govaert, and F. Langrognet. Model-based cluster and
discriminant analysis with the MIXMOD software. Computational Statistics & Data
Analysis, 51:587–600, 2006.
S. Borman. Topics in multiframe superresolution restoration. PhD thesis, the Univer-
sity of Notre Dame, 2004.
A.F. Bouwman, G. Van Drecht, and K.W. Van der Hoek. Global and regional surface
nitrogen balances in intensive agricultural production systems for the period 1970-
2030. Pedosphere, 15:137–155, 2005.
H. Bozdogan. On the information-based measure of covariance complexity and its
application to the evaluation of multivariate linear models. Communications in
Statistics-Theory and Methods, 19:221–278, 1990.
H. Bozdogan. Choosing the number of component clusters in the mixture-model using
a new informational complexity criterion of the inverse-Fisher information matrix.
In O. Opitz, B. Lausen, and R. Klar, editors, Information and classification, pages
40–54. Heidelberg: Springer, 1993.
G. Casella and R. L. Berger. Statistical inference. Pacific Grove, CA: Duxbury, 2nd
edition, 2002.
K. G. Cassman, A. Dobermann, and D. T. Walters. Agroecosystems, nitrogen-use
efficiency, and nitrogen management. Ambio, 31:132–140, 2002.
182
K.G. Cassman, S. Peng, D. C. Olk, J.K. Ladha, W. Reichardt, A. Dobermann, and
U. Singh. Opportunities for increased nitrogen-use efficiency from improved resource
management in irrigated rice systems. Field Crops Research, 56:7–39, 1998.
G. Celeux and G. Soromenho. An entropy criterion for assessing the number of clusters
in a mixture model. Journal of classification, 13:195–212, 1996.
V. Chew. Confidence, prediction, and tolerance regions for the multivariate normal
distribution. Journal of the American Statistical Association, 61:605–617, 1966.
G. W. Cochran. Sampling Techniques. New York: Wiley, 1977.
H. Cramer. Mathematical Methods of Statistics. Princeton: Princeton University Press,
1946.
J. Crossa and J. Franco. Statistical methods for classifying genotypes. Euphytica, 137:
19–37, 2004.
A.P. Dempster, N.M. Laird, and D.B. Rubin. Maximum likelihood from incomplete
data via the EM algorithm. Journal of the Royal Statistical Society. Series B
(Methodological), 39:1–38, 1977.
M. Di Zio, U. Guarnera, and O. Luzi. Editing systematic unity measure errors through
mixture modelling. Survey Methodology, 31:53–63, 2005.
M. Di Zio, U. Guarnera, and R. Rocci. A mixture of mixture models for a classification
problem: The unity measure error. Computational statistics & data analysis, 51:
2573–2585, 2007.
J. Diebolt and C. P. Robert. Estimation of finite mixture distributions through bayesian
sampling. Journal of the Royal Statistical Society. Series B (Methodological), 56:363–
375, 1994.
A. Dobermann and K.G. Cassman. Plant nutrient management for enhanced produc-
tivity in intensive grain production systems of the United States and Asia. Plant
and Soil, 247:153–175, 2002.
183
A. R. Dobermann. Nitrogen use efficiency–state of the art. In IFA International
Workshop on Enhanced Efficiency Fertilizers. Frankfurt, 2005.
R.E. Evenson and D. Gollin. Assessing the impact of the green revolution, 1960 to
2000. Science, 300:758–762, 2003.
B. S. Everitt. An introduction to finite mixture distributions. Statistical Methods in
Medical Research, 5:107–127, 1996.
B. S. Everitt, S. Landau, M. Leese, and D. Stahl. Cluster analysis. Chicester: Wiley,
5th edition, 2011.
Microsoft Excel. Microsoft Excel. Computer Software. Microsoft Corporation, Red-
mond, Washington, 2010.
N.K. Fageria and V.C. Baligar. Enhancing nitrogen use efficiency in crop plants. Ad-
vances in agronomy, 88:97–185, 2005.
FAO. Global agriculture towards 2050. In High Level Expert Forum. How to feed the
world 2005. Rome: Food and Agriculture Organization of the United Nations, 2009.
E. C. Fieller. The distribution of the index in a normal bivariate population.
Biometrika, 24:428–440, 1932.
E. C. Fieller. Some problems in interval estimation. Journal of the Royal Statistical
Society. Series B (Methodological), 16:175–185, 1954.
M.A.T. Figueiredo and A.K. Jain. Unsupervised learning of finite mixture models.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 24:381–396, 2002.
R. F. Follett. Innovative 15n microplot research techniques to study nitrogen use
efficiency under different ecosystems. Communications in soil science and plant
analysis, 32:951–979, 2001.
J.R.S. Fonseca and M.G.M.S. Cardoso. Mixture-model cluster analysis using informa-
tion theoretical criteria. Intelligent Data Analysis, 11:155–173, 2007.
184
M.J. Foulkes, M.J. Hawkesford, P.B. Barraclough, M.J. Holdsworth, S. Kerr, S. Kight-
ley, and P.R. Shewry. Identifying traits to improve the nitrogen economy of wheat:
Recent advances and future prospects. Field Crops Research, 114:329–342, 2009.
D.B. Fowler. Crop nitrogen demand and grain protein concentration of spring and
winter wheat. Agronomy Journal, 95:260–265, 2003.
C. Fraley and A. E. Raftery. How many clusters? Which clustering method? Answers
via model-based cluster analysis. The computer journal, 41:578–588, 1998.
C. Fraley and A. E. Raftery. MCLUST: Software for model-based cluster analysis.
Journal of Classification, 16:297–306, 1999.
V. H. Franz. Ratios: A short guide to confidence limits and proper use. arXiv preprint
arXiv:0710.2024, 2007.
M. Friendly. Data ellipses, HE plots and reduced-rank displays for multivariate linear
models: SAS software and examples. Journal of Statistical Software, 17:1–43, 2006.
S. Fruhwirth-Schnatter. Finite mixture and Markov switching models. Springer: New
York, 2006.
S. Fruhwirth-Schnatter and S. Pyne. Bayesian inference for finite mixtures of univariate
and multivariate skew-normal and skew-t distributions. Biostatistics, 11:317–336,
2010.
J. N. Galloway, J. D. Aber, J. W. Erisman, S. P. Seitzinger, R. W. Howarth, E. B.
Cowling, and B. J. Cosby. The nitrogen cascade. Bioscience, 53:341–356, 2003.
A. Ganesalingam, A. B. Smith, C. P. Beeck, W. A. Cowling, R. Thompson, and B. R.
Cullis. A bivariate mixed model approach for the analysis of plant survival data.
Euphytica, 190:371–383, 2013.
R. C. Geary. The frecuency distribution of the quotient of two normal variates. Journal
of the Royal Statistical Society, 93:442–446, 1930.
185
S. Geman and D. Geman. Stochastic relaxation, gibbs distributions, and the bayesian
restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelli-
gence, 6:721–741, 1984.
E. Genge. A latent class analysis of the public attitude towards the euro adoption in
Poland. Adv. Data Anal. Classif. (in press), 2013.
J. K. Ghosh and P. K. Sen. On the asymptotic performance of the log likelihood ratio
statistic for the mixture model and related results. In Proceedings of the Berkeley
Conference in Honor of Jerzy Neyman and Jack Kiefer, pages 789–806. Monterey:
Wadsworth, 1985.
D. Giambalvo, P. Ruisi, G. Di Miceli, A. S. Frenda, and G. Amato. Nitrogen use
efficiency and nitrogen fertilizer recovery of durum wheat genotypes as affected by
interspecific competition. Agronomy Journal, 102:707–715, 2010.
S. S. Goyal and R. C. Huffaker. Nitrogen in crop production. In R.D. Hauck, editor,
Nitrogen toxicity in plants. Madison: American Society of Agronomy- Crop Science
Society of America- Soil Science Society of America, 1984.
R. J. Hathaway. A constrained formulation of maximum-likelihood estimation for
normal mixture distributions. The Annals of Statistics, 13:795–800, 1985.
M. J. Hawkesford. Reducing the reliance on nitrogen fertilizer for wheat production.
Journal of Cereal Science, 59:276–283, 2014.
D. V. Hinkley. On the ratio of two correlated normal random variables. Biometrika,
56:635–639, 1969.
B. Hirel, J. Le Gouis, B. Ney, and A. Gallais. The challenge of improving nitrogen
use efficiency in crop plants: towards a more central role for genetic variability and
quantitative genetics within integrated approaches. Journal of Experimental Botany,
58:2369–2387, 2007.
186
S. Ingrassia and R. Rocci. Constrained monotone EM algorithms for finite mixture
of multivariate Gaussians. Computational Statistics & Data Analysis, 51:5339–5351,
2007.
IRRI. IRRISTAT User’s Manual Version 3. 1994.
M. Ishiguro, Y. Sakamoto, and G. Kitagawa. Bootstrapping log likelihood and EIC,
an extension of AIC. Annals of the Institute of Statistical Mathematics, 49:411–434,
1997.
B.H. Janssen, F.C.T. Guiking, D. Van der Eijk, E.M.A. Smaling, J. Wolf, and
H. Van Reuler. A system for quantitative evaluation of the fertility of tropical soils
(QUEFTS). Geoderma, 46:299–318, 1990.
A Jasra, C. C. Holmes, and D. A. Stephens. Markov chain Monte Carlo methods and
the label switching problem in bayesian mixture modeling. Statistical Science, 20:
50–67, 2005.
D. Karlis and E. Xekalaki. Choosing initial values for the EM algorithm for finite
mixtures. Computational Statistics & Data Analysis, 41:577–590, 2003.
J. Kiefer and J. Wolfowitz. Consistency of the maximum likelihood estimator in the
presence of infinitely many incidental parameters. The Annals of Mathematical
Statistics, 27:887–906, 1956.
J. K. Ladha, H. Pathak, T. J. Krupnik, J. Six, and C. van Kessel. Efficiency of fertilizer
nitrogen in cereal production: retrospects and prospects. Advances in Agronomy,
87:85–156, 2005.
C. D. Lai, G. R. Wood, and C. G. Qiao. The mean of the inverse of a punctured normal
distribution and its application. Biometrical Journal, 46:420–429, 2004.
K. Lee, J. M. Marin, K. Mengersen, and C. Robert. In Proceedings of the Platinum
Jubilee of the Indian Statistical Institute, chapter Bayesian inference on mixtures of
distributions. Bangalore: Indian Statistical Institute, 2008.
187
P. M. Lee. Bayesian statistics: an introduction. Hoboken: Wiley, 4th edition, 2012.
P. J. Lenk and W. S. DeSarbo. Bayesian inference for finite mixtures of generalized
linear models with random effects. Psychometrika, 65:93–119, 2000.
Bruce G Lindsay. Mixture models: theory, geometry and applications. In NSF-CBMS
regional conference series in probability and statistics, pages 1–163. Hayward: Insti-
tute of Mathematical Statistics- Alexandria: American Statistical Association, 1995.
M. Liu, Z. Yu, Y. Liu, and N.T. Konijn. Fertilizer requirements for wheat and maize
in China: The QUEFTS approach. Nutrient Cycling in Agroecosystems, 74:245–258,
2006.
X. Liu, P. He, J. Jin, W. Zhou, G. Sulewski, and S. Phillips. Yield gaps, indigenous
nutrient supply, and nutrient use efficiency of wheat in China. Agronomy Journal,
103:1452–1463, 2011.
R. Maitra. Initializing partition-optimization algorithms. IEEE/ACM Transactions on
Computational Biology and Bioinformatics, 6:144–157, 2009.
J. M. Marin, K. Mengersen, and C. P. Robert. Bayesian modelling and inference on
mixtures of distributions. In C. Rao and D. Dey, editors, Handbook of Statistics,
volume 25, pages 459–507. Elsevier: Amsterdan, 2005.
J. S. Marron and M. P. Wand. Exact mean integrated squared error. The Annals of
Statistics, 20:712–736, 1992.
G. Marsaglia. Ratios of normal variables and ratios of sums of uniform variables.
Journal of the American Statistical Association, 60:193–204, 1965.
G. Marsaglia. Ratios of normal variables. Journal of Statistical Software, 16:1–10,
2006.
H. Marschner and P. Marschner. Marschner’s mineral nutrition of higher plants. Lon-
don: Elsevier, 2012.
188
G. J. McLachlan. On bootstrapping the likelihood ratio test stastistic for the number
of components in a normal mixture. Journal of the Royal Statistical Society. Series
C. (Applied Statistics), 36:318–324, 1987.
G. J. McLachlan and D. Peel. Finite mixture models. New York: Wiley, 2000.
G. J. McLachlan, D. Peel, K. E. Basford, and P. Adams. The EMMIX software for the
fitting of mixtures of normal and t-components. Journal of Statistical Software, 4,
1999.
G.J. McLachlan, D. Peel, and W.J. Whiten. Maximum likelihood clustering via normal
mixture models. Signal Processing: Image Communication, 8:105–111, 1996.
M. Meila and D. Heckerman. An experimental comparison of model-based clustering
methods. Machine Learning, 42:9–29, 2001.
V. Melnykov. Challenges in model-based clustering. Wiley Interdisciplinary Reviews:
Computational Statistics, 5:135–148, 2013.
V. Melnykov and R. Maitra. Finite mixture models and model-based clustering. Statis-
tics Surveys, 4:80–116, 2010.
X.L. Meng and D. Van Dyk. The EM algorithm - an old folk-song sung to a fast new
tune. Journal of the Royal Statistical Society. Series B: Statistical Methodology, 59:
511–567, 1997.
D. Murdoch, E.D. Chow, and J.M. Frias Celayeta. ellipse: Functions for drawing
ellipses and ellipse-like confidence regions. R package version 0.3-5, 2007.
K. Naklang, D. Harnpichitvitaya, S.T. Amarante, L.J. Wade, and S.M. Haefele. In-
ternal efficiency, nutrient uptake, and the relation to field water resources in rainfed
lowland rice of northeast Thailand. Plant and Soil, 286:193–208, 2006.
S. Newcomb. A generalized theory of the combination of observations so as to obtain
the best result. American Journal of Mathematics, 8:343–366, 1886.
189
S. Ng. Recent developments in expectation-maximization methods for analyzing com-
plex data. Wiley Interdisciplinary Reviews: Computational Statistics, 5:415–431,
2013.
C. Nicholson. The probability integral for two variables. Biometrika, 33:59–72, 1943.
O.E. Olarewaju, M.T. Adetunji, C.O. Adeofun, and I.M. Adekunle. Nitrate and phos-
phorus loss from agricultural land: implications for nonpoint pollution. Nutrient
Cycling in Agroecosystems, 85:79–85, 2009.
M.G.H. Omran, A. P. Engelbrecht, and A. Salman. An overview of clustering methods.
Intelligent Data Analysis, 11:583–605, 2007.
B.N. Otteson, M. Mergoum, and J.K. Ransom. Seeding rate and nitrogen management
effects on spring wheat yield and yield components. Agronomy Journal, 99:1615–
1621, 2007.
K. Pearson. Contributions to the mathematical theory of evolution. Philosophical
Transactions of the Royal Society of London. A, 185:71–110, 1894.
D. Peel and G. J. McLachlan. Robust mixture modelling using the t distribution.
Statistics and computing, 10:339–348, 2000.
D. Pena. Analisis de datos multivariantes. McGraw-Hill: Madrid, 2002.
T. Pham-Gia, N. Turkkan, and E. Marchand. Density of the ratio of two normal random
variables and applications. Communications in StatisticsTheory and Methods, 35:
1569–1591, 2006.
C. G. Qiao, G. R. Wood, C. D. Lai, and D. W. Luo. Comparison of two common
estimators of the ratio of the means of independent normal variables in agricultural
research. Journal of Applied Mathematics and Decision Sciences, 2006:1–14, 2006.
R Core Team. R: A Language and Environment for Statistical Computing. R
Foundation for Statistical Computing, Vienna, Austria, 2012. URL http://www.
R-project.org/. ISBN 3-900051-07-0.
190
W. R. Raun and G. V. Johnson. Improving nitrogen use efficiency for cereal production.
Agronomy Journal, 91:357–363, 1999.
R. A. Redner and H. F. Walker. Mixture densities, maximum likelihood and the EM
algorithm. SIAM Review, 26:195–239, 1984.
A. C. Rencher. Multivariate statistical inference and applications. New York: Wiley,
1998.
C. P. Robert and G. Casella. Monte Carlo statistical methods. New York: Springer,
2nd edition, 2004.
M. L. Samuels, J. A. Witmer, and A Schaffner. Statistics for the life sciences. Boston:
Pearson Education, 2012.
SAS Institute. SAS Institute version 9.4. 2013.
G. Schwarz. Estimating the dimension of a model. The Annals of Statistics, 6:461–464,
1978.
W. Seidel, K. Mosler, and M. Alker. A cautionary note on likelihood ratio tests in
mixture models. Annals of the Institute of Statistical Mathematics, 52:481–487, 2000.
B. Seo and D. Kim. Root selection in normal mixture models. Computational Statistics
& Data Analysis, 56:2454–2470, 2012.
S. Shanmugalingam. On the analysis of the ratio of two correlated normal variables.
Journal of the Royal Statistical Society. Series D (The Statistician), 31:251–258,
1982.
T. R. Sinclair. Historical changes in harvest index and crop nitrogen accumulation.
Crop Science, 38:638–643, 1998.
V. Smil. Nitrogen in crop production: An account of global flows. Global Biogeochemical
Cycles, 13:647–662, 1999.
P. Smyth. Model selection for probabilistic clustering using cross-validated likelihood.
Statistics and Computing, 9:63–72, 2000.
191
J. A. Snyman. Practical mathematical optimization: an introduction to basic opti-
mization theory and classical and new gradient-based algorithms. Boston: Springer,
2005.
J.H.J. Spiertz. Nitrogen, sustainable agriculture and food security. A review. Agronomy
for Sustainable Development, 30:43–55, 2010.
SPSS. Systat user’s guide: Statistics, version 7.0. spss. Inc., Chicago, IL, 1997.
M. Stephens. Dealing with label switching in mixture models. Journal of the Royal
Statistical Society: Series B (Statistical Methodology), 62:795–809, 2000.
S. Takahashi, M. R. Anwar, and S. G de Vera. Effects of compost and nitrogen fertilizer
on wheat nitrogen use in Japanese soils. Agronomy Journal, 99:1151–1157, 2007.
C. Tetard-Jones, P. N. Shotton, L. Rempelos, J. Cooper, M. Eyre, C. H. Orr, C. Leifert,
and A.M.R. Gatehouse. Quantitative proteomics to study the response of wheat to
contrasting fertilisation regimes. Molecular Breeding, 31:379–393, 2013.
D. Tilman, K. G. Cassman, P. A. Matson, R. Naylor, and S. Polasky. Agricultural
sustainability and intensive production practices. Nature, 418:671–677, 2002.
D. M. Titterington, A.F.M. Smith, and U.E. Makov. Statistical analysis of finite
mixture distributions. New York: Wiley, 1985.
UN. Word population prospects: The 2012 revision, highlights and advance tables.
United Nations, 2013.
K.K. Vinod and S. Heuer. Approaches towards nitrogen-and phosphorus-efficient rice.
AoB Plants, pls 028, 2012.
U. Von Luxburg and V. H. Franz. Confidence sets for ratios: a purely geometric
approach to Fieller’s theorem. Technical Report TR-133, Max Planck Institute for
Biological Cybernetics, Giessen, Germany, 2004.
192
U. Von Luxburg and V. H. Franz. A geometric approach to confidence sets for ratios:
Fieller’s theorem, generalizations, and bootstrap. Statistica Sinica, 19:1095–1117,
2009.
VSN International. Genstat for Windows 15th Edition. VSN International, Hemel
Hempstead UK. 2012. URL http://www.vsni.co.uk/.
D. D. Wackerly, W. Mendenhall, and R. L. Scheaffer. Mathematical statistics with
applications. Belmont:Duxbury, 5th edition, 1996.
Waite Institute. Waite Research Institute Web site. http://
waiteresearchinstitute.wordpress.com/tag/use-efficiency/, a. Last
accessed: 2014-06-12.
Waite Institute. Waite Research Institute Web site.
https://waiteresearchinstitute.wordpress.com/tag/
australian-centre-for-plant-functional-genomics/, b. Last accessed:
2014-06-12.
S. D. Walter, A. Gafni, and S. Birch. A geometric confidence ellipse approach to the
estimation of the ratio of two variables. Statistics in Medicine, 27:5956–5974, 2008.
S.S. Wilks. The large-sample distribution of the likelihood ratio for testing composite
hypotheses. The Annals of Mathematical Statistics, 9:60–62, 1938.
A. Willse and R. J. Boik. Identifiable finite mixtures of location models for clustering
mixed-mode data. Statistics and Computing, 9:111–121, 1999.
C. Witt, A Dobermann, S. Abdulrachman, H.C. Gines, W. Guanghuo, R. Nagarajan,
S. Satawatananont, T. Thuc Son, P. Sy Tan, L. Van Tiem, and D. C. Olk. Internal
nutrient efficiencies of irrigated lowland rice in tropical and subtropical Asia. Field
Crops Research, 63:113–138, 1999.
C. F. Wu. On the convergence properties of the EM algorithm. The Annals of Statistics,
11:95–103, 1983.
193
G. Xu, X. Fan, and A. J. Miller. Plant nitrogen assimilation and use efficiency. Annual
review of plant biology, 63:153–182, 2012.
L. Xu, T. Hanson, E. J. Bedrick, and C. Restrepo. Hypothesis tests on mixture model
components with applications in ecology and agriculture. Journal of Agricultural,
Biological, and Environmental statistics, 15:308–326, 2010.
194