Economic Needs for Improving Intelligence within the Food ...

116
www.gov.uk/defra Economic Needs for Improving Intelligence within the Food Authenticity Programme Final report November 2014 www.europe-economics.com

Transcript of Economic Needs for Improving Intelligence within the Food ...

www.gov.uk/defra

Economic Needs for Improving Intelligence

within the Food Authenticity Programme

Final report

November 2014

www.europe-economics.com

© Crown copyright 2014

You may re-use this information (excluding logos) free of charge in any format or medium,

under the terms of the Open Government Licence v.2. To view this licence visit

www.nationalarchives.gov.uk/doc/open-government-licence/version/2/ or email

[email protected]

This publication is available at www.gov.uk/government/publications

Any enquiries regarding this publication should be sent to us at

[email protected]

Contents

1. Executive Summary ...................................................................................................... 4

a. Literature review ........................................................................................................ 4

b. Methodology .............................................................................................................. 4

c. Case study ................................................................................................................. 4

d. Limitations .................................................................................................................. 5

e. Recommendations ..................................................................................................... 5

2. Introduction ................................................................................................................... 7

a. Food fraud .................................................................................................................. 7

b. Project objectives ....................................................................................................... 8

c. Structure of the report ................................................................................................ 8

3. Literature Review ........................................................................................................ 10

a. Food fraud ................................................................................................................ 10

b. The economics of fraud............................................................................................ 13

c. Fraud in other areas ................................................................................................. 14

d. Conclusions ............................................................................................................. 15

4. Factors that Affect the Risk of Fraud .......................................................................... 17

a. Economic factors and market characteristics ........................................................... 17

b. Production and distribution ....................................................................................... 18

c. Product characteristics and detection technologies ................................................. 19

d. Institutional and enforcement characteristics ........................................................... 21

5. Methodology ............................................................................................................... 23

a. Selecting a methodology .......................................................................................... 23

b. An econometric methodology ................................................................................... 25

c. Interpretation and use of the results ......................................................................... 29

2

d. Single and multiple products .................................................................................... 31

e. New types of fraud ................................................................................................... 32

f. Comparison of the proposed approach and the literature ........................................ 33

6. Case Study: Basmati Rice .......................................................................................... 36

a. Global market for Basmati rice ................................................................................. 36

b. UK market for Basmati rice ...................................................................................... 37

c. Basmati rice adulteration .......................................................................................... 38

d. Data ......................................................................................................................... 38

e. Descriptive statistics ................................................................................................ 40

f. Econometric results ................................................................................................. 44

7. Conclusions and Recommendations .......................................................................... 54

a. Case study ............................................................................................................... 55

b. Limitations ................................................................................................................ 55

c. Recommendations ................................................................................................... 56

8. Annex I: Detailed Review of Selected Literature ......................................................... 57

a. Food fraud – economics........................................................................................... 57

b. Food fraud – biological science ................................................................................ 61

c. Credit card fraud (empirical) ..................................................................................... 68

d. Credit card fraud (theoretical) .................................................................................. 68

e. Credit card fraud and computer science .................................................................. 69

f. Automobile insurance and car accidents ................................................................. 75

g. Consumer goods ...................................................................................................... 76

h. Fraud in general ....................................................................................................... 76

i. Insurance and tax fraud ........................................................................................... 80

9. Annex II: Methodologies Used to Study Fraud ........................................................... 84

3

a. Construction of risk indices ...................................................................................... 84

b. Econometric models ................................................................................................ 85

c. Data mining .............................................................................................................. 87

10. Annex III: Data Sources .............................................................................................. 90

a. Food fraud data ........................................................................................................ 90

b. Economic data ......................................................................................................... 95

c. Other data considerations ...................................................................................... 102

11. Annex IV: Econometric methodology ........................................................................ 103

12. Annex V: Linear Correlations .................................................................................... 106

13. Annex VI: Econometric Estimation............................................................................ 107

Executive Summary

4

1. Executive Summary

This report explores the scope for applying economic intelligence to the analysis and

prediction of food fraud in the UK. Food fraud is defined as “the deliberate placing on the

market, for financial gain, goods which are falsely described or otherwise intended to

deceive the consumer”.1 Given that for our purposes in this report we regard food fraud as

economically motivated, we explore whether it is possible to predict the likelihood of fraud

based on the economic variables that drive the potential profits to be made by committing

such fraud.

a. Literature review

The first part of the report consists of a review of the literature on food fraud, the

economics of fraud, statistical methodologies used to predict fraud and the potential data

sources that could be used with this purpose. Based on this review, it was possible to

identify some fundamental characteristics of food fraud in the UK and other countries; the

general approach to modelling fraud from an economic theory perspective; the factors that

have been postulated as contributors to the risk of food fraud; methodologies used to

detect and predict fraud, either based on economic analysis or other approaches; and

variables and data sources that can be employed in statistical models of fraud.

b. Methodology

After reviewing the different methodologies proposed in the literature, we consider that an

econometric approach would be the most suitable to predict the risk of food fraud. The

methodology section provides details of which variables could be included (based on our

data assessment), the estimation methods that can be employed and criteria for selecting

among the multiple possible models. In addition, we propose an approach to use the

estimation outcomes based on past fraud for prediction of future fraud. Based on the

evolution of observable economic variables, the model produces a prediction of the risk of

fraud.

c. Case study

The final section of the report focusses on testing the methodology via a case study. The

selected type of fraud is the adulteration of Basmati rice using other varieties of rice. The

selection of this type of fraud was based on well documented instances of past fraud and

availability of economic data. It was possible to gather monthly data for the period 2010-

2013 on previous incidents of food fraud, prices of Basmati rice in India and Pakistan (the

1 Elliot, Chris (2013) “Elliott Review into the Integrity and Assurance of Food Supply Networks – final

report”.

Executive Summary

5

two countries that produce this variety), the volume of production of rice, the volume of

exports of Basmati rice to the UK, the world price of long-grain rice and the consumption of

rice in the UK. After conducting a large number of regressions, we conclude that the only

variable that is statistically significant in predicting the risk of fraud is the gap between the

price of Basmati rice and other varieties of rice. Based on the estimations, we classify the

observations according to low and high risk of fraud. We find that this classification would

have predicted food fraud correctly with 66.6 per cent of accuracy.2 This level of accuracy

suggests that the test proposed in this report may contain useful information that would

indicate a higher risk of fraud. However, we would like to stress the indicative nature of this

accuracy level. Given its limitations, the fact that the test indicates high risk of fraud should

not be interpreted as conclusive evidence that fraud would occur. The 66.6% level of

accuracy is better than the level of accuracy obtained by using a trivial predictor based

only on the price ratio between the original Basmati rice and the world non-Basmati rice.

The maximum level of accuracy this predictor would yield is 62% (this is reached when the

threshold for the price ratio between the two prices of rice is set at 58% so that any price

ratio above 58% would be considered suspicious and require an investigation by the

authorities).

d. Limitations

The case study shows that the proposed methodology is feasible to implement. However,

the case study also served to illustrate the considerable limitations that could be faced

when applying the proposed methodology to a particular product or fraud type. The most

important limitation is the small sample size. The case study is based on 21 valid food

fraud observations over a time span of 3 years. The normal minimum sample size for

obtaining meaningful statistical result of more general (out-of-sample) applicability is 30 —

i.e more than the 21 available here. With 21 data points it was possible to conduct some

statistical analysis and there were potentially interesting indicative results. However

(unsurprisingly, given the data limitations), at most one explanatory variable was

significant in any model, the results being substantially weakened when more than one

variable was considered. Attempts to apply this methodology to other products may

encounter the same or even greater data limitations.

In addition to the number of observations, other data limitations include the use of low

quality or missing data and the difficulty (or impossibility) to measure variables that the

literature has identified as relevant, such as key features of the supply chain.

e. Recommendations

We consider that the proposed methodology — i.e. econometric modelling (especially the

deployment of OLS and logit models) including controlling for variables such as the level

2 The level of accuracy ranges from 50 per cent (the test has no capacity of predicting fraud) to 100 per

cent (perfect prediction accuracy).

Executive Summary

6

and change in price differences between authentic product and adulterated product and

the number of samples taken — is appropriate and solidly founded in the literature.

However, due to limitations in the data currently available, the results obtained when

applying the methodology might not be entirely satisfactory. The accuracy and reliability of

the results of the methodology would improve substantially with additional data in the

following categories. First, more data on testing and detection of past food fraud is

necessary. In the case study, the main constraint to the number of observations is the

number of months in which authenticity testing was conducted in the UK. We have used,

to the best of our knowledge, the most extensive data coverage available from the UK

Food Surveillance System (UKFSS). However, this source is fairly recent. We expect that

the quality of the estimations would increase rapidly as more data becomes available in

the near future.

Second, additional data sources on prices and other economic variables should be

explored (including, if available, panel data). The present report conducts an extensive

review of publicly available data sources. However, there might be relevant data available

from private providers. This data may allow the inclusion of new variables and improve the

quality of the data for the variables already considered in the case study. In addition, it

may provide measures for variables identified in the literature, such as the complexity of

the supply chain, for which an appropriate quantification has not yet been found.

f. Conclusion

Our provisional results are that the main identified risk factors are:

the level of price differentials between authentic product and close substitutes that

can be used as adulterants (i.e. ceteris paribus, the greater the difference, the

greater the risk of adulteration);

changes in price differentials between authentic product and close substitutes that

can be used as adulterants (i.e. ceteris paribus, a sudden increase in the differential

is associated with a greater the risk of adulteration).

Pending the gathering of more data and the development of more robust models, it would

be possible, in principle, to use these identified factors as a “rule of thumb” indicator of

where food fraud is more likely.

Introduction

7

2. Introduction

Europe Economics, with the collaboration of FoodChain Europe, is advising the

Department for Environment, Food and Rural Affairs (Defra) to explore the potential

benefits of using economic intelligence to assist existing efforts to address food fraud. The

central objective of the project would be to construct a conceptual economic model which

will be able to inform authorities about the areas where enforcement against food fraud

should be prioritised. The methodology developed in this report is preliminary and would

inform potential future work to develop the model further.

a. Food fraud

Some recent food fraud cases have received considerable media attention. Most notably,

horse meat DNA was detected in frozen beef hamburgers sold in several European

countries, including the UK in January 2013. A number of experts, including Prof. Chris

Elliott, have suggested that economic intelligence could exploit existing market data to

better direct enforcement efforts to detect and prevent food fraud. For example, the Elliott

final review recommends:

“The FSA should take the lead in the collection, analysis and distribution of information

and intelligence from a wide range of sources (including Governmental e.g. local

authorities, police, EU counterparts) acting as an ‘intelligence hub’. Through this

intelligence hub, the FSA needs to develop its links with the research sector to produce

and share horizon scanning analyses of the commodities or markets considered at most

risk from crime due to trade route complexity, commodity price fluctuations, crop failures,

fishing restrictions, the development of premium markets through labelling, and criminal

ingenuity.”

Food fraud is defined as “the deliberate placing on the market, for financial gain, goods

which are falsely described or otherwise intended to deceive the consumer”.3 This includes

the “substitution, addition, tampering, or misrepresentation of food, food ingredients, or

food packaging; or false or misleading statements made about a product, for economic

gain”.4 Common types of food fraud are adulteration, misbranding and counterfeiting.5

According to Spink and Moyer (2010), the main types of economically motivated

adulteration (EMA) of food are:

Dilution.

Substitution.

3 Elliott, Chris (2014) “Elliott Review into the Integrity and Assurance of Food Supply Networks – Final

Report”, July 2014. 4 Spink, John and Douglas C. Moyer (2011) “Defining the Public Health Threat of Food Fraud” Journal of

Food Science Vol. 76, Nr. 9. 5 Food fraud is, by definition, economically motivated. However, we should note that its consequences are

not only financial but they also include additionally public health and safety concerns.

Introduction

8

Artificially increasing weight.

Trans-shipment, disguising true country-of-origin.

Port shopping.

Theft.

Mislabelling, counterfeit, etc.

Food fraud generates costs to various members of society. These include not only the

direct costs to the producers that committed fraud, but also indirect costs to buyers of

sellers of that product and food in general and impacts on consumer confidence in the

integrity of the food supply. The following are some notable examples of these costs:

Final consumer: food quality and food safety, overpayment for non-authentic

products, ethical and religious considerations associated with consuming products

that do not respect their beliefs.

Retailer: reputational damage, costs associated with recall and disposal of

fraudulent merchandise, costs of performing quality assurance.

Other producers and the food industry in general: reputational costs due to

diminished consumer confidence.

By distorting prices, food fraud is able to interfere with the efficient allocation of

resources for the society as a whole.

Tax payer: costs of food authenticity enforcement costs, potential loss of tax

revenue from VAT and customs duties.

b. Project objectives

The project objectives are the following:

Conduct a review of the literature on food fraud and the literature on the use of

economic intelligence to detect and prevent fraud areas beyond food.

Develop a framework to estimate the risk of food fraud in various food products.

Scope potential data sources that would feed into the model.

Assess the feasibility of the application of the framework to a wide selection of food

products given the information constraints.

Validate the framework using a case study.

This project would be the initial phase of a potential future programme of follow on work,

yet to be confirmed, where the proposed framework would be applied systematically to

several other sectors beyond the case study.

c. Structure of the report

The remainder of the report is organised into the following main sections:

Summary of the literature review on food fraud and the economics of fraud.

Compilation of the factors that affect the risk of food fraud identified in the literature.

Introduction

9

Description of the methodology.

Testing of the methodology, using adulteration of Basmati rice as a case study.

Conclusions and recommendations.

The literature review identifies the approaches that have been used to construct economic

models of fraud, whether in food or other areas and limitations associated with these

approaches. This review informed our judgement in developing a methodology that could

be used to predict the risk of food fraud in the UK based on economic variables.

Based on the literature review, this report presents a comprehensive compilation of factors

that affect the risk of food fraud pointed out by various authors. These are classified into:

Economic factors and market characteristics.

Production and distribution factors.

Product characteristics and detection technology.

Institutional and enforcement factors.

Literature Review

10

3. Literature Review

The outcomes obtained in this report have been informed by a review of the literature on

food fraud, the economics of fraud, statistical methodologies used to predict fraud and the

potential data sources that could be used with this purpose.6 The literature covered

includes mainly scholarly articles or policy reports. Based on this review, it was possible to

identify:

Some fundamental characteristics of food fraud in the UK and other countries.

The general approach to modelling fraud from an economic theory perspective.

The factors that have been postulated as contributors to the risk of food fraud.

Methodologies used to detect and predict fraud (not necessarily on food), either

based on economic analysis or other approaches.

Variables and data sources that can be employed in statistical models of fraud.

This section presents a short review of the first two items. A more detailed review of

literature on fraud and how it is applied in economics is included as an Annex (Annex I).

The factors that affect the risk of food fraud that have been identified in the literature are

discussed in the next section. A review of the methodologies and data sources are

presented in two separate Annexes (Annex II & III).

a. Food fraud

The UK food and beverage market in 2013 was estimated to be worth £196bn.7 Food fraud

is relevant to a wide variety of food products. For example, Shears (2010) reports that in

1999, 8 per cent of on-licensed outlets in the UK were substituting at least one spirit

brand.8 The value of this particular fraud is estimated at £43 million per year.

Approximately 17,000 litres of fake vodka worth £1m were seized in one interception in

2013 alone.9 HM Revenue and Customs estimates that beer smuggling costs the Treasury

around £500m a year. In 2007 the FSA set up a food-fraud database. The amount of

testing has increased sharply in the last couple of years, with 1,538 cases of food fraud (in

all products) identified in 2013 alone. Other examples with well-documented instances of

food fraud include fish (e.g. salmon and cod), basmati rice, honey, olive oil and asparagus.

6 We note that the approach taken was of a standard literature review. For other approaches, see

http://www.civilservice.gov.uk/networks/gsr/resources-and-guidance/rapid-evidence-assessment/what-is. Other approaches to reviewing evidence such as Rapid Evidence Assessment were not followed due to the short timescales of this project and the wide scope of the economics literature on various types of fraud.

7 https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/315418/foodpocketbook-

2013update-29may14.pdf. 8

Shears, Peter (2010) "Food fraud–a current issue but an old problem" British food journal 112.2, 198-213. 9

The Economist (2014) “Food Crime: A la cartel” , March 15, 2014 available at: http://www.economist.com/news/britain/21599028-organised-gangs-have-growing-appetite-food-crime-la-cartel

Literature Review

11

In an effort to expand the literature on food fraud FERA has commissioned a report on the

identification of information concerning food fraud in the UK and elsewhere.10 The key

findings of the report include:

Identifying 35 different sources of information regarding food fraud including

individual companies, trade associations, consumer groups and private sector

laboratories.

Due to the wide range of food fraud cases that may occur, it is almost impossible to

identify a single source of information.

The UK Food Surveillance System (UKFSS) provides a good starting point for

future research that would monitor enforcement efforts in the food authenticity area.

The collection and analysis of food fraud data at a European and international level

is not satisfactory.

Food adulteration is mainly driven by economic incentives and although in most cases it

does not pose any health risks it should not be overlooked by public authorities.

Furthermore, there are a few cases which have proven that food adulteration can pose

considerable health risks. Such cases include the Czech Republic case whereby fake

alcohol caused 19 deaths to a small category of allergic consumers and the 2012 case of

substitution of almonds with peanuts in the UK that posed allergic consumers into severe

health risk. Consequently, the need for monitoring food authenticity on an on-going basis

becomes even more essential.

Despite the striking figures associated with food fraud, it has been recognised by experts

that the existing data on food fraud does not give a reliable estimate of its overall extent.

For instance, Everstine et al. (2013) find that there are gaps in quality assurance testing

methodologies that could be exploited for economic gain.11 They claim that large-scale

EMA incidents have been described in the scientific literature, but smaller incidents have

been documented only in media sources. For this reason, the authors have spent

substantial efforts in recent years to construct the EMA database (see Annex I for a

description). In a similar vein, Johnson (2014) emphasises that it is typically not possible

for enforcement agencies to prosecute every instance of food fraud given the wide variety

of known types of fraud and constraints in resources.12

The literature has made significant progress in identifying a large number of potential

determinants of food fraud. Fairchild et al. (2003) provide a typology of these factors.13

They note that one motivation behind economic adulteration is typically the opportunity to

reduce costs and increase profits per unit sold by increasing prices to the level of

10

Dennis, J., & Kelly, S. (2013) “The identification of sources of information concerning food fraud in the UK and elsewhere”.

11 Everstine, K., Spink, J., & Kennedy, S. (2013). Economically motivated adulteration (EMA) of food:

common characteristics of EMA incidents. Journal of Food Protection, 76(4), 723-735. 12

Johnson R. (2014) “Food Fraud and “Economically Motivated Adulteration” of Food and Food Ingredients, Congressional Research Service.

13 Fairchild, G. F., Nichols, J. P., & Capps, O. (2003). Observations on economic adulteration of high-value

food products: The honey case. Journal of Food Distribution Research, 34(2), 38-45.

Literature Review

12

unadulterated products, or to reduce input costs and lower selling price to increase sales

volume and/or market share. Cost differences can be significant enough that firms selling

adulterated product can cause economic injury to competing firms, sometimes selling

below product cost for pure products and sometimes driving producers and packers out of

business.

Spink and Moyer (2011) provide additional insights to the motivations for seeking food

fraud opportunities:

“Brand growth and increased brand recognition of a product actually increases the fraud

opportunity (that is, more victims, spending and brand equity). Finally the guardian or

hurdle gaps lead to a greater fraud opportunity. Guardians include entities that monitor or

protect the product and could include customs, federal or local law enforcement, trade

associations, nongovernmental organizations, or individual companies themselves.

Hurdles include components or systems that exist (or are put in place) to reduce the fraud

opportunity by assisting in detection or providing a deterrence.”

They note that “fraud opportunities could be reduced by increasing the risk of detection, or

increasing the costs of the necessary technology to commit the fraud and/or of developing

quality levels that would attract consumers. Countermeasures are intended to reduce the

fraud opportunity, but a refinement to a process or a narrowing of focus in detection could

inadvertently create new gaps that could be exploited by fraudsters. An example of this

uncertain nature is that fraudsters may shift ports of entry by conducting strategic “port

shopping” and by shipping fraudulent product through less monitored entry points.14

The Elliott Review notes that the global nature of the current food markets enables UK

consumers access to all types of products even when they are out of season. This means

that the supply chain for food has become much more complex as a number of these

products must be imported from abroad. Consumers have become used to variety, taste

and access at low cost. All of these factors have increased opportunities for mislabelling,

substitution and for food crime.

The literature that uses economic methods to study the adulteration of food products is

very limited. A notable exception is provided by Pouliot (2012). This study focuses on the

economics of adulteration in food imports, particularly by applying principles of economic

theory to analyse the case of imported fish and seafood in USA.15 The report aims to prove

that economic incentives can be the main driver of food adulteration in the USA. Economic

variables such as prices, supply and demand levels and country of origin were found to be

significant in predicting adulteration in food imports. Pouliot also makes reference to the

PREDICT forecasting system in the US, which assesses the risk of adulterated imports

and identifies those products that are more likely to be fraudulent, therefore helping

inspectors to concentrate their efforts on riskier imports. The PREDICT system employs a

14

Pouliot, S. (2012). On the economics of adulteration in food imports: application to US fish and seafood imports. Cahier de recherche/Working paper, 2012, 15.

15 Pouliot, S. (2012). On the economics of adulteration in food imports: application to US fish and seafood

imports. Cahier de recherche/Working paper, 2012, 15.

Literature Review

13

data mining technique which analyses information regarding the provenance of the

product, the type of product, weather information, the name of the exporting firm and

labelling information.

Using a theoretical perspective, Liang and Jensen (2007) construct a model of imperfect

food certification, opportunistic behaviour and detection.16 The analysis finds that farmers

are expected to respond to monitoring and enforcement very swiftly. Not only do the levels

of fraudulent activity decrease but also high-safety output increases. The study notes that

the optimal monitoring effort depends on the characteristics of the farms, such as size and

costs of production. Finally, fraudulent activity should be tackled with a combination of

penalties, sales bans and monitoring activities. Similarly, the Elliott Review recommends

an approach that would increase the difficulty for criminals to operate in food networks by

introducing new measures to check, test and investigate any suspicious activity. In

addition, this report suggests that those caught engaging in food fraud activity must be

severely punished by the law to deter further fraud.

b. The economics of fraud

Food fraud is motivated by economic gain. Therefore, to estimate the risk of fraud, it is

necessary to identify the economic profits that a potential food fraudster would have to

incur. The main components of the profits for fraudsters include:

Benefits: difference between prices of authentic and adulterant, multiplied by

volume.

Cost: penalties, reputational damage, develop new supply chain / technologies.

Probability of detection: according to research by Spink (2011), increasing the risk

of detection or increasing the cost of the technology required to adulterate a product

can reduce fraud opportunities.17

The benefits are straightforward to model: they consist of the gain per unit of final product

where replacing the authentic ingredient with a fraudulent one, multiplied by the number of

units sold of final product. The difference in prices could be given by the lower quality of

the adulterant or the availability of a surplus amount of this ingredient.

The costs of committing fraud are somewhat more complex. There are two sources of

costs for the fraudster. First, it may have to incur expense to make the substitution of the

authentic product feasible. These costs may include modifying the productive process,

logistics and research into how to modify the product most effectively. In addition, fraud

might constrain the type of markets in which a producer may operate without exposing

themselves to a high risk of detection. The second category includes the costs that will

have to be paid only in case of detection. They can be modelled as the probability of

detection multiplied by the penalties. The latter can include costs such as fines, bans and

16

Liang, J., & Jensen, H. H. (2007). Imperfect food certification, opportunistic behaviors and detection. Selected Paper, 175174.

17 Spink J, and Moyer D (2011), Defining the Public Health Threat of Food Fraud, Journal of Food Science.

Literature Review

14

reputation effects. It should be noted that a potential fraudster could face a trade-off

between these two costs. For example, a higher investment in the substitution process

might result in a lower probability of detection. This approach would be informed by the

FSA database of previous fraud cases to identify the relevant food products to be

modelled.

Becker (1974)18 provided a pioneering economic approach to understand criminal activity.

Becker emphasises the role of the costs and benefits of criminal activity, both from the

criminal’s and society’s perspective. From the criminal’s point of view, the relevant factors

that determine whether to commit offences are the potential gains and the probability of

conviction with its associated punishment. From the point of view of society, optimal

enforcement policies would depend on the damages caused by crime together with the

costs of increasing penalties or the level of enforcement. Becker discusses how different

combinations of penalties and probability of conviction might result in different levels of

crime and its associated costs to society. For example, public policy might focus on

increasing the likelihood of detection or increasing the associated fines. Becker shows how

different public policy instruments provide different incentives to criminals.

The economics literature that addresses food fraud is relatively small. It includes the work

of Liang and Jensen (2007)19 and Pouliot (2012a20, 2012b21). Pouliot (2012a) claims that

the decision by an exporting firm to adulterate its output depends on the relative price of

inputs and the ability of the importing country to detect adulteration. Pouliot (2012a)

concludes that the country of origin, the port of entry, product code and product description

are determinants of fraud. Liang and Jensen (2007) emphasise the effect of the monitoring

agency’s effort on the risk of fraud. This can be achieved through a combination of policies

of penalty, sale ban and monitoring activities. The authors do not proceed to elaborate on

how these can be achieved in further detail.

c. Fraud in other areas

There is abundant literature on fraud in a number of different areas beyond food.

Moreover, economic analysis has been applied to a number of them. Examples of these

activities that are subject fraud are presented in Table 3.1. The conclusions obtained by

the literature in these areas are presented as a separate Annex (Annex I). Table 3.1also

discusses the main methods used to analyse fraud in these areas. A description of these

methods is included in the Methodology section below and Annex II.

18

Becker, G. S. (1974) “Crime and punishment: An economic approach” In Essays in the Economics of Crime and Punishment, Gary S. Becker and William M. Landes, eds. UMI.

19 Liang, J., & Jensen, H. H. (2007) “Imperfect food certification, opportunistic behaviors and detection.”

20 Pouliot, S. (2012) “Using economic variables to identify adulteration in food imports: application to US

seafood imports.” 21

Pouliot, S. (2012). On the economics of adulteration in food imports: application to US fish and seafood imports. Cahier de recherche/Working paper, 2012, 15.

Literature Review

15

Table 3.1: Activities subject to fraud studied in the literature

Type of fraud Area Method Data requirements

Misrepresentation

Credit Card

Data Mining Very high

Econometrics-Logit

Model

High

Tax Compliance Panel Data Techniques Moderate

Insurance

Econometrics Moderate - High

Survey Moderate

E-commerce Data Mining High

Counterfeit

Pharmaceuticals Econometrics Moderate

Luxury Goods Econometrics - Indices Moderate

Tobacco Econometrics and

Simulations

High - Data mostly

confidential

Art - paintings Indices High - Highly specialised

data

d. Conclusions

The main lesson learned from the literature review is that there is very little, if any, existing

research that applies economic intelligence to model or predict food fraud. However, the

literature on food fraud is very rich in identifying factors that may determine the likelihood

of this type of fraud. The next section discusses these factors, informed by the reviewed

literature.

The literature review has found that economic intelligence has been applied frequently to

other types of fraud. The above discussion presents the general approach shared in the

vast majority of economics reports to conceptualise fraud. Annex I provides a detailed

review of the literature of economic intelligence used to address fraud in areas other than

food.

The key remaining question is whether the identified methodologies used in other areas

can be applied to food fraud. The methodology section gives a positive answer to this

Literature Review

16

question, by proposing an approach that adopts many of the elements used in the

literature.

Factors that Affect the Risk of Fraud

17

4. Factors that Affect the Risk of Fraud

This section presents a comprehensive list of factors that are potential determinants for the

risk of food fraud. We have collected these from various sources, including our literature

review and consultations with subject experts. The identified factors are grouped into four

categories:

Economic factors and market characteristics.

Production and distribution.

Product characteristics and detection technology.

Institutional and enforcement factors.

a. Economic factors and market characteristics

Economic information would allow us to estimate what the potential gains are for a

producer to engage in food fraud. Such gains would depend primarily on prices and

volumes which, in turn, determine total revenue and profit margins.

Economic data would include, at least, market prices and volumes produced. Price data

should cover historical series, up-to-date prices and current prices from futures markets,

where relevant. The longer the data series, the better position we will be in to draw

accurate conclusions and validate our methodology. From prices and volume data it would

be possible to construct additional variables, such as whether there are rapid changes in

market conditions. We would also evaluate whether events such as crop failures should

also be included in the analysis. This would be done only when there is reason to believe

that their effects might not be appropriately captured by commodity prices and volumes.

The relevant economic variables that we have identified include the following:

Price gap between authentic product/ingredient and adulterant: the larger this gap is

the higher the likelihood of fraud. This is because it allows the supplier to gain a

higher profit margin for each transaction. This gap widens as the price of the

authentic food increases or when the price of the adulterant decreases. Food

products that tend to have increasingly higher prices would be more likely to be

adulterated.

High profit margin: when the difference between price and cost is large there are

high incentives to commit fraud even at a small scale. Fraudsters may substitute or

adulterate branded vodka with cheap non-branded vodka. Examples of food

products that have high profit margin are: alcohol, poultry and chocolate.

Scale: even with small margins products that are sold on a large scale might enjoy

large total profits and, consequently, are attractive for fraudsters. At the same time

however, frauds at a smaller scale are less likely to be detected hence they provide

an additional incentive for fraudsters to commit the crime. Thus, ceteris paribus, it is

expected that fewer instances of fraud would occur at a large scale.

Factors that Affect the Risk of Fraud

18

Brand ownership: owners of recognised brands have an incentive to protect them

and increase the assurance controls imposed upon suppliers. At the same time,

brands that have high values might incentivise counterfeit because they are highly

demanded by consumers.

Known imbalances in quantities between primary production and final distribution:

increases in demand or reductions in supply can create unsatisfied demand (at

least in the short run) and create an incentive for fraud. Similarly, a large supply of

cheap ingredients may increase the incentive to use it as an adulterant. Such

imbalances are more likely when the particular fraud is more vulnerable to

environmental conditions. For instance, tomatoes are susceptible to bad winter

weather conditions. A bad winter may mean the supply of tomatoes may decrease.

The final product per se may not be adulterated but there is a high probability that

its derivatives such as tomato juice and ketchup will be adulterated. Another

example would be caviar. The supply of caviar eggs has remained constant

whereas demand for this good has been increasing steadily which has pushed its

price upwards.22 This provides a good opportunity for suppliers to adulterate the

food and gain higher profit margins over time.

Relation to organised crime: recent opinion suggests that organised crime groups

are increasingly involved in food fraud adulteration.23

b. Production and distribution

Food products have various production and distribution processes. Some of these

processes are influenced by features such as the geographical location of producers and

consumers or product characteristics (e.g. the number of ingredients, whether a product is

fresh or frozen - see below). Production and distribution factors that may affect the risk of

fraud include:

Long/complex supply chain: this occurs when products have a large number of

ingredients, when ingredients have in turn several other ingredients and/or a large

number of companies are involved in the supply chain. Complexity in the supply

chain is difficult to manage for the producer (which incurs the risk of purchasing

adulterated ingredients) and to audit by buyers or authorities.

Rapid increases in supplies and sales: this may occur due to changes in consumer

preferences. A particular food could turn into a “super food” overnight because a

news article has argued that it is particularly healthy. A sudden surge in demand for

that particular food is more likely to be met by an increase in the supply of the

adulterated version of the product.

22

http://www.cnbc.com/id/100838720 23

Dennis, J., & Kelly, S. (2013) “The identification of sources of information concerning food fraud in the UK and elsewhere”.

Factors that Affect the Risk of Fraud

19

Point of entry into the UK: some ports are suspected to have less strict checks and

border controls as others. This could encourage fraudsters to import the adulterated

ingredients from those particular ports as they are less likely to be detected.24

Supply chain assurance: a number of retailers and wholesalers use sophisticated

operations systems that ensure that all their suppliers satisfy the necessary

requirements and the food they sell to the final consumers is as authentic as

possible. Nonetheless, some retailers and distributors may not have the necessary

resources to engage in such detailed audits, thus the products that they sell to the

final consumers are more likely to be adulterated.

c. Product characteristics and detection technologies

The physical characteristics of food products may influence the ease of committing fraud

as certain adulterations are more difficult to detect. In addition, technological and

economics constraints exist in the detection methods available to local authorities. The

present section discusses these two sets of factors.

Product characteristics

These characteristics would include supply chain drivers such as country of origin, shelf-

life, amount of time required for production and trade route complexity, as well as other

variables such as the physical similarity between the authentic and fraudulent products.25

Physical state of the food: the physical state of the food affects the chances of

detection by buyers and authorities. The Outsmart project has identified the

following product characteristics that might make detection difficult (in decreasing

order of difficulty): liquid, ground, prepared, powder, mixed consistency, non-

characteristic colour, homogeneous consistency, dried, colourless and frozen.26

Sold and transported in bulk processed form: if the product has already been

processed it becomes more difficult for the retailer to notice if anything suspicious

has gone into the production of the good.

New product: there might be a short period, after the introduction or rapid increase

in the scale of a product, where detection is less likely. This might be because there

is little experience and proper enforcement procedures might not be in place. That

could be the case, for example, for products that increase suddenly in popularity

having been the subject of health and/or beauty claims.

Cheaper adulterant not easily detectable: such an example is provided by fish of the

same species. It is challenging to distinguish farmed salmon from wild salmon

24

http://www.sundaypost.com/news-views/uk/after-horsemeat-scandal-food-fraud-is-still-rife-1.457754 25

For example, horsemeat can be distinguished from beef via DNA testing, whilst the difference between wild and farmed salmon or organic and non-organic products might be more difficult to test.

26 NSF Safety and Quality UK Ltd (2014, “Risk Modelling Of Food Fraud Temptation - 'Outsmart' Intelligent

Risk Model Scoping Project”.

Factors that Affect the Risk of Fraud

20

based on DNA testing. Thus, the fraudster has a higher incentive to switch the two

fish and increase his profit margins

Cost of detection methodologies: the more expensive it is to detect the presence of

an adulterant in a product (see below) the higher the probability of fraud.

Low concentration of the adulterated ingredient: a smaller the amount of the

ingredient being adulterated might reduce the probability of detection.

Cost of the process of adulteration: in addition to variable costs (i.e. costs that

depend on the volume of production, such as the amount of adulterant), there might

be fixed costs. In order to achieve the adulteration the fraudster might need to

invest in a new type of technology, which could be expensive enough to deter him

from committing fraud.

Labelling/tamper proofing: the easier it is to remove or recreate the label of the

original product and attach it to the adulterated product the higher the probability of

fraud. This is not confined to the label, it also includes the packaging of the

authentic food product.

Detection methods

There is a wide variety of methods available to local authorities to test for the authenticity

of food products. Some of these methods, such as immunoassays, microscopy and

analysis of nitrogen content have a low cost associated to them. However, many other

detection technologies exist that have a higher cost and, consequently, are used less

frequently. Despite the fact that these technologies exist, cost barriers might restrict their

availability. Examples of such technologies are:27

Stable isotope ratio analysis (SIRA): geographic origin, production method. This

method is used more for agricultural products where nutrients are drawn from the

earth and is generally restricted to a certain geographical area. It includes the ID of

Italian tomatoes as Italian areas have a different isotopic profile when compared

with other countries. In the case of counterfeit wine, using isotopic analysis is the

most appropriate way to help assess the extent of fraudulent wine present in the UK

market.

DNA methods: used for species/variety identification e.g. meat breeds, fish. An

interesting example of how useful DNA identification methods are is the case of

olive oil. DNA analysis can be performed to determine the species of the olive and

thus ensure that the olive oil is authentic (Pafundo et al 2005).28

Proteomics: identifies peptide biomarkers in complex samples; searched against

databases (known protein sequences) to identify protein origin. There is potential

use for this method in meat, fish and products with defined protein ratios.

27

See Rollinson, S. (2014) “The UK Food Authenticity Programme”, Presentation at the Food Fraud Analytical Tools Conference. Available at: https://secure.fera.defra.gov.uk/foodintegrity/downloadDocument.cfm?id=101

28 Pafundo, S., Agrimonti, C., & Marmiroli, N. (2005). Traceability of plant contribution in olive oil by

amplified fragment length polymorphisms. Journal of agricultural and food chemistry, 53(18), 6995-7002.

Factors that Affect the Risk of Fraud

21

Others: metabolomics (use to identify biomarkers in fruit juices), low molecular

weight compounds in cells/tissues (to obtain “finger print” profile for honey),

metagenomics (quantification of DNA in products, such as the amount of a species

DNA in composite animal products) and lectin chips (analysis of glycoproteins,

glycolipids and polysaccharides used for cheese and milk adulteration analysis).

d. Institutional and enforcement characteristics

Measures of the level of enforcement and other institutional characteristics could provide

information about the probability of detection. These variables could be obtained, for

example, from data on past investigations and the legal framework for particular products.

Potential variables of interest include:

Testing frequency: this refers to how often tests are conducted. More frequent

testing would increase the probability of detection. Similar results were reported in

the tobacco industry where, according to a report by Deloitte, the amount of illicit

tobacco trade decreased when authorities increased the level of enforcement.29

Testing intensity: this could be captured by the number of producers that are

surveyed. A higher testing intensity would increase the probability of detection.30

Penalties in case of detection: these might include direct monetary penalties,

reputation effects (loss of trust by buyers/consumers), seizure, prohibition to

continue trading and/or prosecution. If the adulteration has health risk

consequences penalties might be even more severe. Liang and Jensen (2007)31

find that, in a theoretical framework, using severe enough penalties in case of

detection could completely eliminate the probability of fraud.

Consumer effects: the level of harm inflicted on consumers might affect the costs of

committing fraud beyond the formal penalties applied in case of detection. These

wider responses might include increased government intervention (e.g. in the form

of additional regulation), consumer retaliation (e.g. the existence or suspicion of

fraud might lead consumers to be reluctant to purchase certain products or

purchase from certain outlets) and reputational damage.

Association with organised crime: some authors claim that food fraud might be

closely related to organised crime.32 It might be possible to identify this link from tax

and financial data, although this would generate extra costs to the local authorities

and such data may not even be available to other institutions other than the police.

The FSA Food Fraud Database, however, can prove a useful tool as it documents

all possible suspicions that have been reported regarding a given supplier. Thus,

29

http://www.bata.com.au/group/sites/bat_7wykg8.nsf/vwPagesWebLive/DO7WZEX6/$FILE/medMD8EHAM5.pdf?openelement

30 We note that the level of monitoring conducted by local authorities and other enforcement agencies are

constrained by the detection technologies available to them. The preceding section provides details of these methods.

31 Liang, J., & Jensen, H. H. (2007). Imperfect food certification, opportunistic behaviors and

detection. Selected Paper, 175174. 32

See, for example, the Elliott review.

Factors that Affect the Risk of Fraud

22

the more reports that are gathered that mention the same individual/company the

higher the probability that the individual is a fraudster and should be investigated.

This finding is corroborated by the findings of Dennis and Kelly (2013).33

33

Dennis, J., & Kelly, S. (2013) “The identification of sources of information concerning food fraud in the UK and elsewhere”.

Methodology

23

5. Methodology

a. Selecting a methodology

Based on the literature review we have considered several methodologies that have been

used in the past to quantitatively estimate the risk of fraud. A description of these can be

found in Annex II.

These were grouped into three broad categories:

Risk indices: these are statistical measures represented by numbers placed on a

given scale that identify how high the risk of food fraud is. Indices take into account

a number of factors that have been identified to have a significant effect on the

probability of fraud. Each variable is given an appropriate weight in determining the

risk of fraud and then all the variables are aggregated to construct the index.

Econometric methods: these methods employ mathematics and statistical methods

to economic data in order to give empirical content to economic relations. The

objective of econometrics is to identify (causal) relationships in economic data. The

main tool used by econometricians is regression analysis which is described in

more detail in the appendix.

Data mining: these are computational methods mostly used when there is access to

large data sets, e.g. banks observe thousands of daily transactions when trying to

identify credit card fraud. The method employs computerised processes to identify

patterns in the data. For example, these methods might flag behaviour that poses a

deviation from the existent patterns.

We have established criteria for evaluating the methodologies, also taking into account the

data sources available. These criteria are that:

The methodology addresses the desired objectives. In other words, it needs to

identify which factors are relevant to the determination of the fraud risk and to

quantify the relative importance of each. The latter objective would be important to

assess the forward-looking risk of fraud based purely on the identified factors.

It is feasible to satisfy data requirements of the methodology using the data sources

available.

It must be feasible to implement within the time constraints of the project.

The methodology would ideally be applicable both to a single product and multiple

products. For example, the methodology must be testable via a case study.

However, if it is applied to further food products / types of fraud in the future the

methodology should be able to estimate the impact of the differences across

product characteristics.

The methodology efficiently exploits the information available in the data. That is, a

methodology is preferable when it assumes magnitudes or relations in the variables

that can be derived from or verified by the data.

Methodology

24

After the review of literature and data sources we have arrived at the following conclusions

when comparing the three approaches listed above:

An econometric approach would produce estimated coefficients that may be used

for prediction of the explained variable. In addition, various statistical methods could

be used to evaluate whether a particular model is a good fit for the data and,

consequently, to what extent the model is addressing the project objectives

satisfactorily.

Data mining methods (such as clustering) typically does not provide a quantification

of the relative importance of different factors. These methods are well-suited for

identifying patterns and hypotheses from existing data. However, the lack of

quantification would limit the model’s predictive power for values of the explanatory

variables that have not been observed in the past. Moreover, these methods might

not be well-suited to compare estimations for different products and obtain general

lessons.

With the exception of the most basic techniques for cluster analysis, data mining

methods typically require a very large number of observations to obtain robust

conclusions. Given our review of available data sources in Annex III, we believe that

this project will not gather sufficient data for any possible case study that would

satisfy these requirements.

The creation of ad-hoc indices does not require large amounts of data. However,

this approach relies on a number of arbitrary judgements (e.g. the weight that is

given to each of the explanatory variables). Econometric methods would estimate

what is the corresponding explanatory power of each variable depending on the

data, extracting more information from it.

Based on the criteria and conclusions above, we believe an econometric methodology

would provide the best available approach to predict the risk of food fraud given the

existing constraints. Compared to data mining techniques, this methodology would allow

for a clear quantification of the effects that can be validated using a variety of statistical

tests. Moreover, the quantification of past effects would feed straightforwardly into the

prediction of future fraud. Compared to ad-indices, econometric methods identifies the

factors that have proven to be relevant to predict fraud from the data and determines their

relative importance. While the econometric approach would be more demanding in terms

of data than ad-hoc indices we believe that there will be sufficient data available to conduct

the analysis. Despite having noted the limited data on past fraud available at the moment

the number of observations will be less constraining in the near future.34

34

For example, the case study on Basmati rice in this report has data on past authenticity tests conducted in the UK for 21 months, which might be considered a bare minimum number of observations to obtain statistically significant results. However, at the present rate the sample size would double in two years time, considerably increasing the reliability of the statistical results.

Methodology

25

b. An econometric methodology

The main tool used in econometrics is regression analysis. Regressions estimate the

correlations between an explained variable and (potentially multiple) explanatory variables,

quantifying the relationship between an explained variable and the explanatory variables.

The sign of the estimated coefficients determine the direction of the effect that the

explanatory variables have on the explained variable.

The econometric methodology detailed below follows previous economic approaches to

fraud such as the one applied by Manuela and Paba (2010) to address credit card fraud.35

The methodology presented in that report applies to the case of food fraud. However, we

note that the literature that applies econometric methods to fraud generally follows a

similar approach.

Choice of variables

The review of the literature has identified a large number of variables that could be used

as explanatory variables of past fraud and, consequently, predictors of future fraud. We

consider that our model is well suited to include such variables. In addition, our data

scoping exercise (see Annex III) indicates that it could be feasible to include the following

key variables:36

Explained variable: the most important variable of our methodology is the history of past

fraud, since correlation between this and other variables are the main source of the

methodology’s predictive power. The history of past fraud could potentially be measured in

different ways. For example, it could be a binary variable that takes the value of one if

fraud was detected in the same period and zero otherwise. A more sophisticated measure

would be the percentage of products surveyed that were found to be fraudulent.

Explanatory variables: these are observed variables which have been identified either by

the literature or by empirical findings that can have an effect on the risk of fraud and thus

can “explain” the probability of food fraud. The objective of the methodology will be to

establish and quantify the relationship between the explanatory variables and the

explained variable (i.e. the risk of fraud).

Country of origin of products: food safety regulations and their enforcement vary by

country.

Prices of authentic product and adulterant: fraud is, by definition, economically

motivated. When the price of the authentic product is significantly higher than the

price of an adulterated product, the benefit of committing fraud is given by the gap

between them (ceteris paribus). Prices can be measured in absolute terms for each

35

Manuela, P. and Paba A. (2010), "A discrete choice approach to model credit card fraud". 36

The availability of these variables would depend crucially on the product and type of fraud chosen.

Methodology

26

ingredient as levels or indices or in relative terms between the authentic ingredient

and adulterant as differences or ratios.

Volumes: these may also have an impact on the incentive to commit fraud. A

shortage of a particular product could drive its price up, thereby increasing the

incentive to commit fraud and produce more adulterated products. Increases in

demand or reductions in supply can create unsatisfied demand (at least in the short

term) and create an incentive for fraud. For instance, such imbalances are likely

when the particular fraud is more vulnerable to environmental conditions.

Conversely, unexpected availability of a cheap ingredient might incentivise its use

as an adulterant of a more expensive one. The key variables of volume to be

included are:

o Production: domestic and in country of origin.

o Consumption: domestic and in country of origin.

o Trade: particularly imports from producing countries to UK.

Rapid changes in the above variables: this may occur due to changes in consumer

preferences. For instance, a particular food product could become very popular

suddenly due to alleged health benefits in the media. A sudden surge in demand for

that particular food may be met by an increase in the supply of the adulterated

version of the product.

Level of enforcement, measured by the intensity of testing: more frequent and

intense testing can increase the probability of being caught, thus decreasing the

incentive to commit fraud.

Product specific variables: these are the idiosyncratic characteristics of a product

that make it more susceptible to food fraud:

o Cost of testing adulteration: higher costs of detecting the presence of an

adulterant in a food product might decrease the intensity of testing and

increase the probability of fraud. Therefore, local authorities with low

resources to audit the food ingredients are more likely to be supplied with

adulterated food.

o Points of entry the UK typically used: it has been reported that fraudulent

products are more likely to use points of entry with certain characteristics,

possibly due to lower likelihood of detection.

o Physical characteristics of the product such as:

State of the product (e.g. minced, frozen): the physical state of the

food affects the chances of detection by buyers and authorities. The

state of the product could be, for example, liquid, ground, prepared,

powder, mixed consistency, non-characteristic colour, homogeneous

consistency, dried, colourless or frozen.

Shelf life: may affect the probability of detection. In addition, the

financial risk borne by producers or retailers differs if the product is

perishable, which may in turn affect their incentives to commit fraud.

Methodology

27

o Form in which the product is traded. For example, it could include the

percentage typically sold in bulk. If the product has already been processed it

becomes more difficult to detect potential adulteration

We recognise that the literature has identified additional variables that are presumed to be

very relevant in the determination of the risk of fraud. A prime example are the

characteristics of the supply chain, such as its length, complexity and visibility (i.e. the

ability of a supplier/retailer to conduct quality assurance of links far removed from them).

Unfortunately, some of these variables are difficult to establish due to issues such as the

following:

They do not have unambiguous or commonly agreed definition. For example,

complexity of the supply chain might be related to factors such as the number of

ingredients and the number of suppliers or their geographical locations. However, a

precise definition is not available.

They are not measurable. While some commentators emphasise the effectiveness

of supplier quality assurance some crucial aspects of this process are not

quantifiable.

No reliable source of data is available. Even in the case of straightforward variables,

such as the number of suppliers, there is no systematically collected data available

to track them back in time for a particular food product.

Therefore, it may be that it is not feasible to develop a statistically-based methodology that

incorporates some of these variables into the analysis.

Data requirements

In general, for the methodology to be implemented, most data sources should be rich in

terms of time coverage and frequency, product and ingredient disaggregation and

geographical coverage.

The methodology requires suitable time series for all the variables that coincide in the time

period covered and the frequency of the data.37 It is possible that the data available for a

variable does not include the complete coverage of the dataset. For example, if variable A

is available for the period 2010-2013 and variable B for 2012-2014, a regression method

that includes these two variables can only be applied the period for which there is a perfect

overlap (i.e. 2012-2013). Therefore, when overlap is not perfect, some information will be

discarded. Based on the conceptual importance of the variable, it would be necessary to

decide whether the smaller sample size is justifiable or whether it would be advisable to

exclude the variable for which there is less data available.

37

We note that in certain cases it is possible to modify the frequency of time series by aggregation or interpolation. However, these approaches might not be advisable in all cases, depending on the nature of the variable.

Methodology

28

In addition the above criterion, some further conditions would be desired for the data on

previous incidents of food fraud. The two main criteria would be:

It contains detail about the number of investigations carried out and the number of

incidents detected. This would allow the methodology to assess the extent to which

there is fraud in a given product.

It covers a significant period of time in which potential fraud was investigated.

Differences in potential factors (such as prices) across this period would allow the

methodology to draw conclusions about the relative impact of them on the risk of

fraud.

Descriptive statistics

Before proceeding to construct an econometrics model that would identify the probability of

food fraud given a certain number of variables we run a series of diagnostics checks.

These checks would flag potential issues or biases in the estimates of the econometric

model.

The checks would include:

Whether variables are significantly correlated between them.

The minimum, maximum and average values of each of the variables.

Charts to illustrate the evolution of the key variables over time in order to ensure

that no shocks (structural breaks) have occurred during the period of time under

examination.

Model specification

Given the nature of food fraud data (in particular of the explained variable), the

methodologies we will test for consist of the three following classes of methods:

Ordinary Least Squares (OLS): this method postulates a relationship of the form:

,

where the s are the estimated coefficients which weigh the significance of each factor in

determining the risk of fraud.

Binary methods: these methods are particularly appropriate when the explained variable

can only take values between zero and one. In the case of food fraud, the explained

variable would take the value of one if fraud was detected in a given period or zero

otherwise. The method would estimate the probability of fraud using the functional form of

a cumulative probability distribution instead of a linear function, as postulated by OLS.

Multinomial methods: these models are an extension of the binary methods whereby the

explained variable can take more than two values. These values may or may not be

ordered.

Methodology

29

c. Interpretation and use of the results

Model selection

Based on the diagnostics and tests described above, the methodology would select a

reduced set of models based on:

The statistical significance of the coefficients: it is possible to establish that the

coefficient associated with key variables are different from zero with confidence of

at least 90 per cent.

The lack of data issues that might bias or provide spurious estimates (e.g.

heteroskedasticity): appendix IV present a list of the most common issues that may

bias econometric results together with tests used to detect them. The selection of a

most preferred model would take into account the extent to which these issues are

present or can be corrected.

Goodness of fit indicators (e.g. adjusted R-square): several measures exist that

establish the extent to which a particular model “fits” the data.

The interpretability of the results: complex functional forms might impair a clear and

usable interpretation of the estimated coefficients.

We note that the criteria above do not provide an automatic selection of the appropriate

model, particularly when there is a trade-off between them. A degree of judgement will be

inevitable in weighing the criteria and forming a view as to at the preferred model (or set of

models) when these are in conflict.

Limitations of the approach

The econometric methodology proposed in this report follows best practices established in

the literature of fraud and other areas of economics. Consequently, the limitations

associated with this approach have been already identified and discussed extensively.

The issues that are particularly relevant for the application of econometrics to food fraud

are the following:

Sample size. The predictive power of the methodology could be considerably

limited by the number of observations. For example, the case study presented

below is based on 21 observations. This is a very small sample size that might lead

to inconclusive results in the form of coefficients that are not statistically significant.

It is possible that the methodology does not establish a statistically proven

relationship between variables because the number of observations is low rather

than because the relationship does not exist. As shown in the case study, this

small number of observations can establish a relationship between fraud and one

explanatory variable (the price gap between authentic and adulterated products).

However, the model cannot identify statistically significant relationships with more

than one explanatory variable. A larger sample size would allow for the possibility

of establishing these relationships.

Methodology

30

Data quality. The reliability of the results of the methodology depends on the quality

of the data used to generate them. If the underlying data is inaccurate, the

estimated coefficients are likely to misrepresent the true relation between variables.

High variation in the estimated coefficients. The proposed methodology suggests

estimating several alternative model specifications, using different combinations of

variables to establish their capacity to predict fraud. Since the coefficient of the

same variable is likely to change across specifications, the outcome of this

exercise might sometimes be more accurately represented as a range, rather than

a point estimate. However, it is possible that the variation in the estimates is too

large to provide a good indication of the true magnitude of the effect of a given

variable. This problem would certainly be exacerbated in small samples.

Omitted relevant variables. Due to difficulties in quantifying some explanatory

variables or lack of available data it might not be possible to include all relevant

explanatory variables in the models to be estimated. In this case, the coefficient of

the variables that are included might be biased, because they would pick up the

effects of other omitted (but correlated) variables.

Lagged effects. It is conceivable that the effect of certain economic variables would

affect the risk of fraud only after some period of time. Therefore, the

contemporaneous coefficient might not necessarily be the best indicator of the true

effect of an explanatory variable. It is possible to address this problem by including

lagged variables. However, the form in which lagged effects take place might be

complex and difficult to capture accurately.

Prediction of future fraud

Binary choice models can be used to estimate the risk of food fraud. The key characteristic

of these models is that the dependent variable is always drawn from a dichotomous set of

options. Examples of such options include a “yes/no” answer which can easily translate

into “occurrence/not-occurrence” of food fraud.

Binary choice models are usually estimated using maximum likelihood methods. The

estimation process aims at finding the coefficients (weights) for each independent variable

that would maximise the likelihood of observing that particular sample of outcomes

(dependent variables).38 The coefficients can then be used to calculate the expected

probability of fraud, based on a set of observed characteristics.

Expected probabilities can be mapped to an “occurrence/not-occurrence” discrete

outcome by comparing them to a pre-determined threshold value (the choice of the

threshold can be made arbitrarily but an obvious options is to set it to 0.5, which means

treating as “occurrence” those observations with an estimated probability greater than 0.5,

and treating them as “non-occurrence” otherwise).

38

Logit and probit models are typically used for discrete choice models.

Methodology

31

The predicted outcomes can be compared to the actual outcomes in order to test the

accuracy of the model (this is known as measuring the model’s goodness of fit). Four

potential scenarios are generated through this process: the model correctly predicted as

“non-occurrence” situations where no fraud was observed; the model incorrectly predicted

“non-occurrence” situations with fraud; the model incorrectly predicted “occurrence”

situations with no fraud; the model correctly predicted “occurrence” situations were fraud

was observed. These scenarios are shown in the table below for A, B, C and D,

respectively.

Table 5.1: Measuring the accuracy of the model

Observed outcome

Model prediction Non-occurrence Occurrence

Non-occurrence A B

Occurrence C D

The method is accurate when fraud was not observed if the predicted risk is low and fraud

is observed when the predicted risk is high. Therefore, the accuracy measure of the

prediction is given by (A+D)/(A+B+C+D). The measure of accuracy could range between 0

and 100 (in a model that predicts fraud perfectly B and C would be equal to zero).

Similar predictions can be made using out-of-sample data. In this situation, the coefficients

of the model would be used to calculate the expected probability of fraud based on a set of

observed characteristics for which there is no observed outcome of fraud. As we see in the

example below, this can be used to forecast a “high” or “low” probability of occurrence for

a set of predictors (for example, variables related to observed differences in prices of real

and adulterated products).

d. Single and multiple products

Applying the methodology to a single product or type of fraud, as is done in the case study

below, introduces certain limitations to the number of variables that can be included. In

particular, it does not allow for testing the effects of features that are characteristics of the

product, market or testing technologies that do not vary over time. Any statistical method

would, by necessity, exploit variation in these characteristics in order to assess their

impact on the likelihood of fraud. However, if these features do not change, the

econometric method would not be able to attribute any impact.39

39

Technically, the regression constant would capture the impact of all time-invariant variables in the case of a single product.

Methodology

32

Applying the methodology to multiple products would allow to test for the impact of certain

effects that are not possible to test with a single product. These include:

Physical characteristics of the product.

Points of entry into the UK.

Cost of testing the particular adulteration type.

Based on the scope of the project, the case study below implements the methodology for a

single type of fraud (and product) – adulteration of Basmati rice. Therefore, this test case

is not able to estimate the effect of the variables listed above. A future extension to

implement multiple products simultaneously is highly recommendable. The estimation of

regressions with multiple products simultaneously would require the use of panel

techniques since the variables of interest would vary not only with time but also with the

different products considered. For most of the methods described above, estimators exist

that address this additional level of complexity.

e. New types of fraud

The methodology described above depends crucially on having data on past instances of

fraud and establishing the relationship between fraud and other variables. Once these

effects are estimated, prediction of future fraud is performed under the assumption that

these relations persist over time. However, it is not unusual for authorities to discover new

types of fraud that had not been detected in the past. A direct application of the

methodology presented above would not be feasible in these cases.

If the methodology is applied to multiple types of fraud for which past data is available, it

might be possible to use these estimates for an indirect approach to new types of fraud.

We note that this approach would not be as reliable as the direct approach proposed

above. The only case in which it would be advisable is for fraud in products that had not

been detected before and, therefore, have no data available to implement the alternative.

The indirect approach would rely on:

Using the multiple product estimation of known types of fraud to quantify the effect

on fraud risk of each of the product characteristics for the new product / type of

fraud.

Construction of an index for the new type of fraud accounting for the effect of its

product characteristics and the effect of other economic variables shown to be

significant in the estimations for known types of fraud.

We note, however, that this indirect approach would have much less statistical reliability

that the direct approach, since it would increase the risk of bias due to the omission of

relevant variables, that would be significant only for the new type of fraud but not those

estimated directly.

Methodology

33

f. Comparison of the proposed approach and the literature

The methodology proposed in this section is firmly based on the same principles applied

by the econometrics literature on fraud. Table 5.2 provides a comparison with each of the

individual reports that were identified. As it can be seen in the comparison column, the key

features of the approach proposed in this report agree with the methods used elsewhere in

the literature.

There is ample overlap between the literature and the proposed methodology. In

particular, the literature has extensively used OLS and binary choice models (such as logit

or probit) with past fraud as the explained variable. Some reports have suggested the use

of panel estimation techniques. While these are not part of the main approach proposed

above, their use would become necessary if multiple products are included in the analysis.

Finally, our main approach does not include tools that correct for sample selection issues,

as proposed by Artıs, Ayuso and Guillén (1999) and Greene (1998). However, if there

were evidence of a sample selection bias, the proposed methodology could be extended

as suggested by these reports to correct for this problem.

Table 5.2: Comparison with the econometric literature on fraud

Name of Article Authors Fraud

Area

Econometric approach taken Comparison

A discrete choice

approach to model

credit card fraud

Manuela, P.

and Paba A

Credit

Card

This report uses binary choice

models with fraud as a

dependent variable and a set of

explanatory variables such as

gender, location and currency

used for transactions

The approach used in this

report is closely related to the

one proposed in this section.

Sample selection in

credit-scoring

models

Greene, W. Credit

Card

The paper employs a binary

choice models to decide

whether to extend credit or not.

Additionally, it suggests an OLS

regression model for predicting

expenditures.

The proposed method consists

also of a combination of OLS

and binary choice regressions.

The dependent variable is also

a key past outcome: whether a

credit was extended.

Modelling different

types of automobile

insurance fraud

behaviour in the

Spanish market

Artıs, M.,

Ayuso, M., &

Guillén, M.

Insurance Maximum likelihood estimation

with the correction for choice-

based sampling in order to take

into account the effect of the

over-representation of fraud

claims

The method suggested above

employs logit estimations

which are based on maximum

likelihood functions. It does not

include a correction for sample

selection.

Methodology

34

Name of Article Authors Fraud

Area

Econometric approach taken Comparison

The Economic

Impact of

Counterfeiting and

Piracy

OECD Consumer

Goods

The authors construct an index

known as the General Trade-

Related Index of Counterfeiting

for products (GTRIC-p) based

on econometric techniques. This

index estimates the total number

of counterfeiting based on

seizure outcomes.

The report uses OLS

regressions to establish the link

between counterfeiting and key

factors, such as institutional

variables.

Economic

institutions and

individual ethics: A

study of consumer

attitudes toward

insurance fraud

Tennyson, S. Insurance Ordered probit and OLS

regressions to link various

factors to measured attitudes

towards fraud.

The methods (OLS and

ordered probit) are aligned with

the ones proposed above. A

key conceptual difference is

that the explained variable is

the attitude towards fraud by

consumers. This difference is

due to the fact that this report

addresses a different research

question.

Detecting

counterfeit

antimalarial tablets

by near-infrared

spectroscopy

Floyd E.

Dowell,

Elizabeth B.

Maghiranga,

Facundo M.

Fernandez,

Paul N.

Newton and

Michael D.

Green

Pharmace

uticals

Regressions and indices Similarly to the proposed

methodology, this report also

used regressions to detect the

probability of food fraud and

then developed an index which

shows the risk level of fraud.

Analysis of the

demand for

counterfeit goods

Pamela S.

Norum

Luxury

goods

T-tests and logit regressions Despite trying to quantify a

different effect, this report

makes use of t-tests and logit

regression in a similar manner

as the proposed methodology.

Estimating dynamic

demand for

cigarettes using

panel data: the

effects of

bootlegging,

taxation and

advertising

reconsidered.

Baltagi, B. H.,

& Levin, D

Tobacco Panel Data Techniques The methods used by this

report are closely related to the

proposed approach in the case

with multiple products, where

panel techniques are

recommended.

Methodology

35

Name of Article Authors Fraud

Area

Econometric approach taken Comparison

The Research and

Application of Art

Price Index

Danting

Chang

Art /

Paintings

Indices - the "Art Price Index" Despite not having a fully

econometric approach, this

report constructed indices to

classify the different degrees

risk of fraud, similarly to the

proposed methodology.

On the Economics

of Adulteration in

Food Imports:

Application to US

Fish and Seafood

Imports

Sébastien

Pouliot

Food Simulations and Calibrations.

The paper wants to simulate the

effect of the Mexican Gulf

disaster on food fraud. Thus he

creates a theoretical model and

simulates using the values

estimated by the literature for

the relevant coefficients

This report does not follow an

econometric approach.

However, it uses OLS

regressions, as the ones

proposed here, to estimate

some of the parameter values

used for the simulations.

Observations on

economic

adulteration of high-

value food

products: The

honey case

Fairchild, G.

F., Nichols, J.

P., & Capps,

O.

Food Analysis of price and revenue

impacts of honey adulteration.

Rather than providing

economic intelligence, this

report aims at quantifying

economic impacts of fraud.

They use econometric results

that estimate demand

elasticities for honey.

Case Study: Basmati Rice

36

6. Case Study: Basmati Rice

For the purposes of testing the validity of our methodology we have chosen Basmati rice to

be our case study. The reasons for doing so were:

Data availability of economic data on prices, quantities and exports of Basmati rice

for a long enough period to allow the possibility of conducting econometric

analyses.

Documented instances of food fraud in the past (relative to other products) which

provide a minimum number of observations for the sample.

Well defined dates of when the fraud has occurred. Using data provided by the FSA

it was possible to identify the precise dates that the fraud had occurred or been

detected.

Clarity of the definition of the product. During the process of choosing the most

appropriate case study we had to reject a number of candidates because the

definition of what constitutes the specific product was too broad (e.g. vodka or fish)

or ambiguously defined. While failing to satisfy this condition would not be an

insurmountable obstacle, it would require additional effort to ensure that the data

employed is consistent in their definitions.

a. Global market for Basmati rice

Basmati rice is a popular variety of rice. It has a legally enforced regional denomination,

which means that it can only be produced in India and some parts of Pakistan. In India

Basmati rice is grown in the states of Punjab, Uttar Pradesh, Haryana and Uttaranchal, in

Pakistan it is only grown in the Punjab area. The name Basmati means “the fragrant one”

which indicates that it has a distinct and pleasant aroma. The grain is long and slender and

when it is cooked it becomes longer and acquires a dry and fluffy texture.

Basmati is one of the most expensive grains in the market. More specifically, on average

Basmati rice yields double the price of other types of rice. In 2012 the average price of

Basmati rice stood at about $1000 USD per metric tonne whereas the average price of

other price varieties was about $600 USD per metric tonne.40

According to research conducted by Horizon Research, the global rice industry is

approximately worth $275 billion USD, out of which $5.8 billion or 2.1 per cent is attributed

to Basmati rice.41 In 2012 India accounted for about 72% of the world production of

Basmati rice which is about 4.8 million metric tonnes. Pakistan accounted for the rest –

about 1.9 million metric tonnes. It is estimated that the demand for Basmati rice has grown

at an average of 10.5% between 2001 and 2012. This is a significant growth rate

compared to the one for non-Basmati rice which stands at about 1.2% per annum.

40

http://www.apeda.gov.in/apedawebsite/index.asp 41

http://horizonresearchpartners.com/wp-content/uploads/2012/08/Indian-Basmati-Rice-Industry-7-26-12.pdf

Case Study: Basmati Rice

37

Horizon research has found that rice producers are seemingly identical and are

characterised by the following features:

Production of rice requires high working capital availability.

The producer needs to be able leverage its debt quite highly because Basmati rice

takes time to age and be ready for sale.

Producers have a limited pricing power.

Limited brand recognition.

b. UK market for Basmati rice

In the UK, rice is an important staple for the average household. According to Mintel, sales

of rice in UK in 2010 were worth £415m.42 CBI has found that the UK is one of the largest

consumers of milled rice in the EU.43 UK households consumed about 268 tonnes of rice

in 2004, which represents a 59% increase in rice consumption from 2003. Furthermore, in

2003 the UK imported about 70% of the total imports of Basmati rice in the EU.44 Basmati

rice is imported into the UK either directly from India and Pakistan or indirectly from millers

in the Netherlands, France or Belgium. In 2004, sales of Basmati rice were increasing by

about 12% annually and were expected to overtake sales of other long grain rice in the

following years. Prices of Basmati rice follow the global trends in relation to other types of

rice. A study conducted by the Food Authenticity Programme found that the price of

Basmati rice (£1.40 per kilogram) was double the price for other varieties of rice (£0.70 per

kilogram on average).

The increasing popularity of the grain and significant price differential between the Basmati

rice and other types of rice have made it a good target for food fraudsters to adulterate.

The most common type of food fraud we observe in Basmati rice is adulteration of the

authentic grain with other types of rice. In the UK there is not specific legislation regarding

the authenticity of Basmati rice, however, under the Food Safety Act it is illegal to sell

“food that is not of the nature, substance or quality demanded by the consumer or to

falsely or misleadingly describe or present food”.45 In the UK, the term Basmati should only

be used to describe the 11 Indian varieties and 5 Pakistani rice varieties that are

characterised by the Basmati properties. Despite this legislation, a number of fraud

instances were detected in the past. The majority of them were detected by the Food

Authenticity Programme. This dataset shows that during 2012, 3 out of the 33 samples

collected by the FSA were found to be fraudulent.

42

http://www.marketingmagazine.co.uk/article/1071314/sector-insight-pasta-rice-noodles 43

http://www.cbi.eu/system/files/marketintel/201020-20Rice20and20pulses20-20UK1.pdf 44

http://multimedia.food.gov.uk/multimedia/pdfs/fsis4704basmati.pdf 45

http://www.legislation.gov.uk/ukpga/1990/16/contents

Case Study: Basmati Rice

38

c. Basmati rice adulteration

Basmati rice is the customary name given to specific varieties of rice with unique

organoleptic characteristics and grown exclusively in the northern part of the Western

Punjab in both Pakistan and India; and in Haryana State and Western Uttar Pradesh in

India. Due to these organoleptic qualities, Basmati rice attracts a premium price, hence its

attractiveness to potential Economically Motivated Adulteration (EMA). It has been

reported that, since 2002, Indian traders have been selling a lesser quality rice, CSR 30,

as Basmati rice in major markets such the US, Canada and the EU.46 Rice exports in India

are exempt from the duty accorded to pure Basmati in the EU, making it even more

profitable for fraudsters to adulterate Basmati rice. The authentic stock of traditional

Basmati grain usually gets depleted on Indian farms (i.e. there is an excess demand for

this product). Ricesearch, a DNA rice authenticity verification service in India, has found

that more than 30 per cent of the Basmati rice sold in the retail markets of the US and

Canada is adulterated with inferior quality grains. It is suspected that this number may be

higher in Europe.

In Europe, Commission Regulation 1549/04 grants a lower import tax on nine basmati

varieties: Basmati 370, Dehradun (Type 3), Basmati 217, Taraori, Ranbir Basmati, Kernel,

Basmati 386, Pusa Basmati and Super Basmati. Other basmati rice varieties approved by

India, Pakistan and the UK include Basmati 198, Basmati 385, Haryana Basmati, Kasturi,

Mahi Suganda and Punjab Basmati; and are outlined within the Basmati Rice Code of

Practice, agreed between the UK, Indian and Pakistani industry and enforcement bodies.

This code of practice also allows for the inclusion of no more than 7% non-Basmati rice

content.

The determination of Basmati rice varieties follows established testing protocols developed

by the UK FSA, using DNA based analysis to obtain a qualitative (positive or negative

presence) and quantitative (percentage of basmati and non-basmati DNA) result. Results

obtained from samples are compared against known references based on each of the

approved varieties of Basmati rice. From this, a determination is made on authenticity and

amount of basmati rice present in the sample. The associated costs per sample are

between £150 and £200 pounds before any courier costs, which may be a significant

expense in the budget of local authorities. Therefore, these costs are unlikely to be borne

unless suspicion of EMA exists or unless specific funding is provided.

d. Data

For the purposes of testing our methodology using a case study we have used various

datasets from a number of publicly available sources. We have mainly used data from the

World Bank for our macroeconomic indicators (GDP and CPI) in order to avoid any issues

46

http://articles.economictimes.indiatimes.com/2007-07-06/news/28467196_1_basmati-rice-india-s-basmati-basmati-export

Case Study: Basmati Rice

39

of harmonisation. Our main source for the variables relating to Basmati rice coming from

India has been the All India Rice Exporters Association. Our datasets are presented in

more detail in the table below. For some missing values and for some data available at a

lower frequency (e.g. yearly instead of monthly), we have employed interpolation

techniques.

Table 6.1: Case study data sources

Data Source Variable Dates Freque

ncy

Units Link

Food

Standards

Agency

(FSA)

Basmati Rice

Samples Tested

2010-2014 Monthly Number

of tests

http://www.food.gov.uk/

enforcement/enforcewo

rk/foodfraud/foodfraudd

atabase#.U7wkn-

kU_Gg

Number of Tests

Failed

2010-2013 Monthly Number

of

instances

of fraud

http://www.food.gov.uk/

enforcement/enforcewo

rk/foodfraud/foodfraudd

atabase#.U7wkn-

kU_Gg

All India Rice

Exporters

Association

Price of Basmati

Rice India

2010-2013 Monthly USD per

MT FOB

http://www.airea.net/pa

ge/53/statistical-

data/basmati-rice-

monthly-average-price-

analysis

Exports of Basmati

Rice from India to

UK

2011-2014 Yearly MT, Value

in Rs.

Lacs

http://www.airea.net/pa

ge/58/statistical-

data/export-statistics-

of-basmati-rice

Quantity of

Basmati rice

produced in India

2010-2013 Monthly MT http://www.airea.net/pa

ge/53/statistical-

data/basmati-rice-

monthly-average-price-

analysis

Exchange rates

(US dollar to the

Indian rupee,

Pakistani rupee

and U.K. pound

sterling

2010 - 2014 Daily Monthly

data were

obtained

by simple

average

of daily

data

http://www.imf.org/exter

nal/np/fin/ert/GUI/Page

s/CountryDataBase.asp

x

APEDA Agri

Exchange

Price of Basmati

Rice Pakistan

2008-2014 Monthly USD per

MT

http://agriexchange.ape

da.gov.in/int_prices/inte

rnational_price.aspx

Case Study: Basmati Rice

40

Data Source Variable Dates Freque

ncy

Units Link

Mundi Index Price of non-

Basmati Rice

2010-2014 Monthly USD per

MT

http://www.indexmundi.

com/commodities/?com

modity=rice&months=6

0

World Bank GDP UK, India,

Pakistan

1980-2014 Yearly Per capita

value at

current

prices in

USD

http://data.worldbank.or

g/indicator/NY.GDP.MK

TP.CD

Department

for

Environment,

Food and

Rural Affairs

Consumption of

rice in UK (dried

rice, cooked rice

and take-away

rice)

1974-2014 Yearly gr per

househol

d per

week

https://www.gov.uk/gov

ernment/statistical-

data-sets/family-food-

datasets

e. Descriptive statistics

Before conducting the econometric analysis, we exhibit a set of descriptive statistics in

order to gain a preliminary understanding of the characteristics of the data. The summary

table below estimates the minimum, maximum and mean values of our variables together

with the time range that they cover. These statistics are summarised in Table 6.2.

Table 6.2: Summary statistics

Variable Description Min.

Value

Max.

Value

Mean Number of

Obs.

Period

Samples Number of samples

tested

0 45 3.1 53 Jan-10 -

May-14

Non

compliance

Number of non-

compliant samples

0 4 0.3 53 Jan-10 -

May-15

Fraud

percentage

Percentage of fraud

=non-

compliance/sample

s

0 1 0.1 21 Jan-10 -

May-16

Fraud binary Fraud was

detected=1, binary

variable

0 1 0.4 21 Jan-10 -

May-17

Basmati rice

production in

India

Basmati rice

produced in India in

Metric tones

164004 418782 277088 36 Apr-10 -

Mar-13

Case Study: Basmati Rice

41

Variable Description Min.

Value

Max.

Value

Mean Number of

Obs.

Period

Basmati price

India

Price of Indian

Basmati in USD per

Metric Tonne

845 1209 1049 36 Apr-10 -

Mar-14

Non-basmati

rice world

price

World price of non-

basmati rice in USD

per Metric Tonne

404 616 529 50 Apr-10 -

May-14

Consumption

of rice UK

Rice consumption in

UK in kgs

25105 25440 25275 36 Jan-10 -

Dec-12

GDP Pakistan GDP in Pakistan in

millions USD

85 108 100 48 Jan-10 -

Dec-14

GDP UK GDP in UK in

million USD

3048 3279 3198 48 Jan-10 -

Dec-15

Exported

quantity of

Basmati from

India

Quantity of Basmati

rice exported from

India to UK in

tonnes

5245 7083 6292 36 Apr-11 -

Mar-14

Average price

Pakistan

Average price of

Pakistani rice in

USD per Metric

Tonnes

860 1407 1174 50 Apr-10 -

May-14

High risk If fraud percentage

is >0 then the

sample is of high

risk

0 1 0.4 21 Jan-10 -

May-15

On first inspection we see that the average price of Basmati rice is twice as high as the

price of non-Basmati rice. On average, there is a 10% probability that a given sample of

Basmati rice tested by the authorities will be adulterated. Additionally, we observe that the

UK has the highest GDP per capita relative to the producing countries (almost thirty times

higher than GDP in India).47 Finally, we observe that the average price of Indian Basmati

rice is slightly lower than the one coming from Pakistan, with a mean value of 1049 USD

per MT in comparison with 1174 USD for the Indian one.

47

In UK GDP per capita stands at 39,350.64 USD while in India this number stands at 1,498.87 USD.

Case Study: Basmati Rice

42

We continue our analysis by looking at the linear correlation between variables. A

significant correlation between two potential explanatory variables means that using both

variables in the regression specification will lead to the problem of multicollinearity48.

An analysis of linear correlations between the variables indicates that there is a significant

correlation between our explanatory variables and our dependent variable (non-

compliance), which suggests that our candidate variables could potentially be causal

drivers of changes in the dependent variable (see Appendix V for details). However, we

also notice a number of significant correlations between the independent variables

themselves. This may imply that we would have to choose which variable is more

appropriate to go into the regression specification. More particularly, we observe that the

GDP in India is highly correlated with the one in Pakistan, indicating that multicollinearity

would be a problem if both measures were to be included in the same regression.

We look at a simple graphical illustration of the analysis of linear correlations identified in

Appendix V. More precisely, Figure 6.1 tries to establish whether there is a visible

relationship between the samples of Basmati rice that were tested and found non-

compliant and the gap between price of Basmati and non-Basmati rice (both Indian and

Pakistani Basmati rice). The graphs contain a straight line that represents the best linear fit

(according to OLS) of the data. Both the figures for India and Pakistan confirm a positive

correlation between price differences and detected fraud.49 It should be noted that the

small number of observations and the large residuals (i.e. the difference between the

observations and the linear fit) raises questions about the robustness of the identified

correlation. In order to obtain more reliable correlation estimates, it would be necessary to

have a larger number of observations. Finally, we also note that there is a risk of outlier

bias in these results. For example, it is quite possible that the observation shown in Figure

6.1, in which the fraud percentage is 100, is an extremely low probability event. However,

we have repeated the analysis excluding potential outliers and the results were not

affected significantly.

48

Multicollinearity (also collinearity) is a statistical phenomenon in which two or more variables in a multiple regression model are highly correlated, meaning that one can be linearly predicted from the others with a considerable degree of accuracy.

49 We note that it is difficult to assess the strength of this correlation by visually inspecting Figure 6.1. This is

because the appearance of the slope would be affected by the scaling of the vertical axis. A more reliable test is to evaluate the statistical significance of the corresponding regression coefficient, as it is performed below.

Case Study: Basmati Rice

43

Figure 6.1: Correlation between fraud percentage and the gap between Basmati rice prices

and non-Basmati rice (in India at the left and Pakistan at the right)

0.2

.4.6

.81

200 400 600 800 1000Price gap India

Linear fit Percentage of fraudulent samples

0.2

.4.6

.81

400 600 800 1000Price gap Pakistan

Linear fit Percentage of fraudulent samples

We note that because the data on the price of Basmati rice coming from India during the

period of April 2013 and July 2014 is missing, we have extrapolated the price series using

the Indian CPI. However, we appreciate that this might not be a satisfactory approach as it

fails to take into consideration any significant variations in prices which might explain the

increased occurrence of Basmati rice fraud during 2014. However, we do observe that the

trend of the extrapolated Indian price follows a similar pattern to the actual Pakistani price.

In any case, because the data of Pakistani Basmati rice covers a longer period, we would

prefer to base our inference on the findings that include the Pakistani rice.

Table 6.3: T-test of mean price difference of Indian Basmati rice and non-Basmati rice

Group Obs. Mean Std. Err. Std. Dev. [95% Conf.

Interval]

No fraud

detected

12 525.74 36.13 125.14 446.23 - 605.25

Fraud

detected

9 680.73 62.32 186.95 537.08 - 824.42

Combined 21 592.16 37.01 169.62 514.95 - 669.37

Difference -154.99 68.00 -297.32 -12.66

Pr(|T| > |t|) = 0.034

Error! Reference source not found. presents the results of t-tests (see methodology

section), comparing the means of price differences between Basmati and non-Basmati rice

depending on whether fraud was detected. The analysis indicates that there is a

statistically significant difference (at a 95 per cent confidence level) between the mean gap

in prices (Basmati price – Non-Basmati price) when fraud occurs versus when fraud is not

detected. In other words, the t-test determines whether it is possible that the observed

Case Study: Basmati Rice

44

mean in prices when fraud was detected cannot be distinguished to the mean in prices

when fraud was not detected (i.e. they could be realisations of the same probability

distribution). The result shows that this was not the case with at least 95 per cent of

confidence. This suggests that there is a statistically significant relationship between fraud

occurrence and Basmati price gaps.

f. Econometric results

Estimated models

We have conducted a large number of potential specifications and estimation methods to

explain the risk of fraud in Basmati rice in the UK. In this section we present and discuss

the key results of the econometric analysis. The complete series of regressions that were

estimated can be found in Annex VI.

We have tested to check whether variables such as the GDP in India, the export quantity

of Basmati rice and the total quantity of Basmati rice produced in India have a significant

effect on fraud percentage both contemporaneously and with a lagged impact. We did not

find any significant relationship between these variables and fraud percentage and

therefore we do not report them in the main body of our report. We further note that these

variables might be significant as explained by our quantitative analysis, however, due to

limited data availability we are not able to reach such a conclusion.

Table 6.4 and Table 6.5 present the results of the regressions for India and Pakistan

respectively. In both cases, regressions 1-3 are the estimates from OLS models and

regressions 4-6 are estimates of logit models. The top rows contain the coefficients

obtained for each variable while the bottom rows report basic information and tests

performed for each model.

The regressions were estimated in logarithms. Consequently, the coefficients are to be

interpreted as “elasticities”.50 Finally, the interpretation of the OLS regressions is different

from the logit regressions. First, OLS models are linear (i.e. the effect of the explanatory

variables is constant) while logit models are not. The reported coefficient of the logit model

is the marginal effect when the explanatory variable is equal to its mean value.

Second, the explained variable used in each model differs. In the case of OLS, this

variable is the percentage of samples in which fraud was detected. For logit models, the

explained variable is binary, taking a value equal to one for periods in which fraud was

detected (and tested for) and zero otherwise. Therefore, the estimated effect of logit

models is on the probability that fraud will occur at all. It should be noted that the logit

models use less information than the OLS models.

50

For example, if the estimated coefficient of variable x is y, the interpretation is the following: an increase in variable x of one per cent is correlated with an increase in the explained variable of y per cent.

Case Study: Basmati Rice

45

In addition to the variables included in Table 6.4 and Table 6.5, we have conducted

regressions using other variables for which data was obtained (i.e. the variables

summarised in Table 6.1 or others derived from them). These regressions included

variables such as the GDP in the countries that export Basmati rice (India and Pakistan),

the export quantity of Basmati rice and the total production of Basmati rice (in India). We

did not find any significant relationship between these variables and fraud percentage and

therefore we do not report them in the main body of the report. The negative results were

obtained using these variables both contemporaneously (i.e. the explained and

explanatory variables correspond to the same period) and with a lag (i.e. the explanatory

variables correspond to a period earlier than the explained variable). It is worth noting that

the analysis indicates that it is not possible to establish a statistically significant effect of

these variables on fraud given the (limited) data availability. However, these might be

found to be significant in the future if more data becomes available.

Table 6.4: India regression results

Regression number 1 2 3 4 5 6

Estimation Method OLS OLS OLS Logit Logit Logit

Constant -0.2 0.09 0.11

Price difference 0.05 0 0.84* 0.26

Change in price difference 0.97* 2.69

Number of samples 0.03 0.29*

Number of Observations 21 21 21 21 21 21

R2 0.00 0.16 0.02 0.14 0.10 0.30

F statistic 0.08 3.65 0.15 4.04 2.90 8.62

Prob > F 0.77 0.07 0.86 0.04 0.09 0.01

Heteroskedasticity (Breusch-Pagan test), chi2(1) = 0.04 6.16* 0.16

Adjusted R2 -0.05 0.12 -0.09

Akaike Information Criterion 1.11 -2.49 2.86

Note: All explanatory variables were expressed in logarithms. A variable is significant at * =90% level, **=95% level, ***=99% level. Null

hypotheses not rejected at ˆ =90% confidence level, ˆˆ =95% confidence level, ˆˆˆ =99% confidence level.

Case Study: Basmati Rice

46

Table 6.5: Pakistan regression results

Regression number 1 2 3 4 5 6

Estimation Method OLS OLS OLS Logit Logit Logit

Constant -0.82 0.14*** -0.61

Price difference 0.14 0.11 0.91* 0.18

Change in price difference -0.89* -0.70

Number of samples 0.01 0.31*

Number of Observations 21 21 21 21 21 21

R2 0.02 0.12 0.03 0.11 0.01 0.30

F statistic 0.43 2.62 0.23 3.23 0.32 8.47

Prob > F 0.52 0.12 0.80 0.07 0.57 0.01

Heteroskedasticity (Breusch-Pagan test), chi2(1) = 1.3 15.19*** 0.52

Adjusted R2 -0.03 0.07 -0.08

Akaike Information Criterion 0.73 -1.51 2.67

Note: The variable is significant at * =90% level, **=95% level, ***=99% level. Null hypotheses not rejected at ˆ =90% confidence level, ˆˆ

=95% confidence level, ˆˆˆ =99% confidence level.

Table 6.4 and Table 6.5 present six regression models. The first three correspond to OLS

regressions, while models 4-6 correspond to logit regressions. The differences within these groups

is the set of explanatory variables included in the estimated equation.

We note that the values for the R-squared and adjusted R-squared are low. This indicates that the

overall variation in fraud cannot be explained with these variables alone. However, the objective of

this exercise is to quantify the specific effect of selected variables and to determine whether this

effect is statistically significant. The models shown in Table 6.4 and Table 6.5 establish a

statistically significant relationship between the price difference and the level of fraud. Furthermore,

these results are intuitive in the sense that the direction of the estimated effect agrees with what is

expected according to the literature.

Analysis of regression output

Table 6.4 and Table 6.5 already include a subset of the full set of regressions that were

conducted, based on the statistical significance of the coefficient estimates.51 However,

there is still a large number of different variables and estimates in these tables. This

section will use the criteria set out in the methodology to select the “preferred” model (or

subset of models).

The relevant criteria for selecting models are:

51

The complete results are presented in Appendix VI.

Case Study: Basmati Rice

47

1. The signs of the coefficients are in the expected direction and consistent across

regressions. This is particularly relevant when comparing the regression outputs

from India and Pakistan.

2. The coefficients are statistically significant. While this was a general pre-requisite,

some variables might be significant in the OLS model but not in the logit estimation

(or vice versa), for India but not Pakistan (or vice versa), etc.

3. Joint significance (F test). This test checks the joint statistical significance of all the

coefficients in the regression, including the constant.

4. Coefficient of determination (R2). This measure increases as the residuals of the

fitted model (the difference between the model prediction and the actual

observations) decrease. In other words, this is a measure of goodness of fit. It

should be mentioned that these measures are constructed differently for OLS and

logit models. Therefore, they are not comparable across these two types of models.

However, the R2 can be used to choose between different regressions that use the

same estimation technique.

5. Information criteria (Akaike and adjusted R2). Similarly to the R2, these are other

“goodness of fit” measures. However, they include other criteria, such as penalising

the inclusion of each explanatory variable. These measures are more likely to select

a parsimonious model (i.e. a model that includes a small number of explanatory

variables) than the unadjusted R2. Information criteria are available for OLS

regressions, although there is no direct counterpart for binary choice models. The

Akaike criterion selects the model with the lowest value of its index while the

adjusted R2 would select the model with the highest value.

The variables that were found to be significant and therefore are included in Table 6.4 and

Table 6.5 are:

The price difference between Basmati and non-Basmati rice.

The change (relative to the previous period) in the price difference between Basmati

and non-Basmati rice.

The number of samples taken.

It is necessary to point out immediately that the number of samples taken is not an

economic variable. Moreover, the likelihood of fraud detection will increase by definition as

more samples are taken. Therefore, logit models that use the likelihood of fraud detection

as an explanatory variable will almost tautologically find that this variable is highly

significant. This is because, even in the case where the fraction of fraud is constant, more

testing will inevitably lead to more identified cases of fraud and, therefore, an increase in

the probability that at least one sample will be found to be non-compliant. In contrast, OLS

regressions use the percentage of fraudulent samples as the explanatory variable, which

is not necessarily correlated with the number of samples taken. Given the risk of spurious

correlations in these regressions and the very low number of observations in our sample,

we do not consider these models.

Case Study: Basmati Rice

48

Of the remaining OLS regressions, the coefficient of the price differential is not significant

in either Indian or Pakistan, while the coefficient of the change in the price differential is

significant in both cases. Therefore, model 2 is preferred over model 1. The opposite is

true for the remaining logit estimations: the coefficient of the change in the price differential

is not significant in either Indian or Pakistan, while the coefficient of the price differential is

significant in both cases. Therefore, model 4 is preferred over model 5.

We conduct a full analysis of the criteria detailed above for the two remaining models. The

conclusions are summarised in Table 6.6. The first criterion looks at the magnitude (and

sign) of the coefficients. In model 2, the sign of the coefficient goes in the expected

direction in the case of India (positive; i.e. a higher price gap is correlated to more fraud)

while the estimate for Pakistan goes in the opposite direction. Moreover, the difference

between both coefficients is quite considerable. Model 4, on the other hand, obtains similar

coefficients for India and Pakistan, with the expected sign in both cases. The second

criterion looks at the statistical significance of each individual coefficient. Given that our

previous argument discarded models that did not satisfy conditions along these lines, it is

not surprising that both models fulfil the criterion in both countries. The third criterion

requires joint significance of all coefficients according to the F-test. The only model that

does not satisfy the condition with at least 90 per cent confidence is the estimate for

Pakistan of model 2. The fourth criterion evaluates the estimates according to the

goodness of fit. While the R2 of OLS models cannot be compared to the one of logit

models, it would be desirable that the selected model has the highest R2 within their class.

This is indeed the case for model 2 but not for model 4, which is outperformed by model 6.

However, we do not find this a compelling argument against model 4 given our previous

discussion of the spurious nature of the results in model 6. Finally, the fifth criterion

evaluates the models according to the information measures of Akaike and adjusted R2.

While these measures are not available for logit models, model 2 outperforms the other

OLS regressions in this regard.

Case Study: Basmati Rice

49

Table 6.6: Criteria for selecting best performing econometric model

Criterion OLS (Model 2) Logit (Model 4)

India Pakistan India Pakistan

Sign of coefficients as expected

and consistent across regressions

Coefficients are statistically

significant

Joint significance (F test)

Coefficient of determination (R2)

Information criteria (Akaike and

adjusted R2) N/A N/A

Based on the argument above, we conclude that the preferred model across our sample is

one that runs a logistic regression on the price difference between Basmati and non-

Basmati rice. The corresponding model for India shows that a one percent increase in the

price of Basmati rice (using the mean observed value of this difference as a reference)52

will lead to a 0.84 percentage point increase in the probability that fraud will be committed.

The model for Pakistan shows that a one percent increase in the price of Basmati rice will

lead to a 0.91 percentage point increase in the probability of fraud. Since Basmati rice is

imported to the UK from both India and Pakistan, the average elasticity is estimated to be

somewhere in the range from 0.84 to 0.91.

Risk levels

In this section we explore how to map Basmati and non-Basmati price differences into

different levels of fraud risk. The econometric analysis above provides a link between

explanatory variables (e.g. prices of Basmati and non-Basmati rice) and past fraud. In

particular, it quantifies how movements in these variables would affect the risk of fraud.

Therefore, it would be possible to classify the risk of fraud in different categories

quantitatively. For example, low risk would correspond to a probability of fraud smaller

than 33.3 per cent, medium risk when the probability is between 33.3 and 66.6 per cent

and high risk when the probability of fraud is larger than 66.6 per cent. The econometric

52

Since the logit model is non-linear, the marginal effect is not constant and needs to be evaluated at a particular level of the price difference.

Case Study: Basmati Rice

50

model would allow for the construction of thresholds in the explanatory variables that

would lead to the risk of fraud being in each of the specified categories. Potential

applications of this model would track the (observable) variables and determine the risk

level predicted by the model.

The analysis will be based on the best performing model of our econometric estimation:

the logit model using only the price gap as explanatory variable (model 4) Table 6.4 and

Table 6.5. While there is only one relevant factor in this model, the methodology can be

directly extended to models that include multiple explanatory variables. For the sake of

illustration, we will focus on the results obtained for India.53

Figure 6.2 plots the probability of fraud predicted by our econometric model as a function

of the price gap between Basmati and non-Basmati rice. The fitted values represent the

probability of fraud occurring given the coefficients estimated by the regression and the

values taken by the explanatory variables. These values were obtained by replacing the

observed prices in the model to generate the predicted (“fitted”) probability of fraud. The

graph takes a positive slope which provides us with an indication that the higher the price

gap the higher the probability of fraud and, consequently, the higher the probability of its

detection. It can also be noted that, as a consequence of the chosen econometric model,

this positive relation is non-linear.

Figure 6.2: Predicted probability of fraud in a logit model (India)

0.2

.4.6

.8

pro

bab

ility

of fr

au

d

200 400 600 800 1000Price difference India

53

Alternatively, the prediction could be based on the results for Pakistan or a weighted combination of both.

Case Study: Basmati Rice

51

We note that it would be possible to conduct the same exercise in Figure 6.3, but in this

case we use the fitted values of a linear OLS regression model. In this case however, the

variables measured in the vertical axis are different than in Figure 6.2. In a logit estimation,

the predicted variable is the probability that fraud will be detected, irrespective of the

number of cases. The OLS model utilises the fraction of samples taken that tested positive

for fraud as their (continuous) dependent variable. Despite this difference in interpretation,

the overall shape of the curves predicting the risk of fraud is similar.

Figure 6.3: Predicted proportion of fraud in a linear model (India)

.08

.1.1

2.1

4

Fitte

d v

alu

es

5.5 6 6.5 7logarithm of price difference India

Finally we classify the Basmati rice price gap into two groups based on the probability of

food fraud: low and high risk. The threshold probability used is 0.5 (50 per cent). Based on

the logit model, the corresponding threshold in the price differential between Basmati and

non-basmati rice is 628 USD. Therefore, the econometric model predicts that a price

differential below (above) the threshold of 628USD correspond to low (high) risk of fraud.

The accuracy of this prediction of the model can be tested by comparing past prices and

the binary variable that takes the value of one if past fraud was detected in the same

period and zero otherwise. Table 6.7 presents a cross-tabulation. When the model

predicted low risk of fraud, this prediction was correct in 9 out of 13 cases. Similarly, when

the model predicted high risk of fraud, the prediction was correct in 5 out of 8 cases.

Assessment of the accuracy of prediction has been done using in-sample data (over the

sample used for obtaining the estimates). Out-of-sample assessment was not possible due

to the small sample size available for the analysis.

Case Study: Basmati Rice

52

Using the accuracy measure defined in the methodology section, we find that the

prediction based on the observed price difference attains an accuracy level of 66.6 per

cent (9+5=14 out of 21 observations). In other words, the model has a non-trivial predictive

power (over 50 per cent), although this power is far from perfect (100 per cent). This level

of accuracy suggests that the test proposed in this report may contain useful information

that would indicate a higher risk of fraud. However, we would like to stress the indicative

nature of this accuracy level. Given its limitations, the fact that the test indicates high risk

of fraud should not be interpreted as conclusive evidence that fraud would occur.

Furthermore, the 66.6% level of accuracy is better than the level of accuracy obtained by

using a trivial predictor based only on the price ratio between the original Basmati rice and

the world non-Basmati rice. The maximum level of accuracy this predictor would yield is

62% (this is reached when the threshold for the price ratio between the two prices of rice is

set at 58% so that any price ratio above 58% would be considered suspicious and require

an investigation by the authorities).

We note that the model is more accurate when predicting low risk (9/13 = 69.3 per cent

accuracy) than when predicting high risk (5 / 8 = 62.5 per cent accuracy). In other words,

the model is more likely to predict false positives than false negatives.

Table 6.7: Risk Classification

Fraud observed

Predicted risk No Yes Total

Low 9 4 13

High 3 5 8

Total 12 9 21

Note: The threshold used to define low and high risk was 628 USD.

We note that choice of the threshold probability is arbitrary. This approach may be further

improved by choosing several threshold probabilities and selecting the “optimal” one (i.e.

the one that leads to the highest accuracy level).

Limitations of the case study analysis

The methodology section identified a number of potential limitations on the proposed

approach. Below we assess the extent of each of these limitations for the results obtained

in the case study:

Sample size. The preceding analysis was based on 21 observations, corresponding

to the 21 months in which authenticity tests were conducted on Basmati rice

according to the UKFSS database. After assessing alternative data sources of past

fraud, we are confident that this is the largest dataset available for the UK.

Moreover, in time this database would become significantly richer and would allow

for more reliable results. However, for the time being, the available dataset has a

Case Study: Basmati Rice

53

small number of observations. We consider that there is a reasonable expectation

that many of the inconclusive results obtained (in the form of coefficients that are

not statistically significant) are a consequence of the small sample size. In addition,

the small number of observations make it difficult to disentangle the effects if

several variables are included simultaneously. We expect that a larger sample size

would allow for the inclusion of many explanatory variables and estimate

statistically significant effects for each of these variables simultaneously.

Data quality. The case study combines data from different sources, with different

degrees of reliability. Data used from trade associations, such as the All India Rice

Exporters’ Association (AIREA),54 contains information not found in other sources.

However, the statistical rigour used to collect and aggregate the data might not be

comparable to the one found in national statistics offices. Additional quality

concerns includes the data gap found in the price series for Basmati rice in India.

While the gap was interpolated using India’s general price evolution, this solution is

less than satisfactory and might fail to capture key movements that are specific to

the Basmati rice market.

High variation in the estimated coefficients. The different estimations conducted

above lead to a wide range of estimates for the effect of prices on detected fraud.

We consider this limitation to be a side effect of a small sample size. In other

words, a larger sample is expected to reduce this limitation

Omitted relevant variables. This potential limitation is pervasive in econometric

analysis, particularly when it is not possible to include in the analysis suspected

factors due to lack of data. In light of the literature of food fraud, we consider that

the largest risk of the present analysis is to omit variables that capture key features

of the supply chain. As mentioned in the literature, the length and complexity of the

supply chain are expected to be key explanatory factors in the likelihood of food

fraud. Unfortunately, the data assessment exercise performed during this project

has not identified a viable approach to capturing these characteristics

Lagged effects. Appendix VI has addressed this issue by including explanatory

variables with lags. However, as noted above, this approach might fail to capture

more complex interactions between fraud and past prices and volumes.

54

http://www.airea.net/

Conclusions and Recommendations

54

7. Conclusions and Recommendations

We consider that the main contributions made by this report are as follows:

A review of the literature on food fraud that led to the identification of a large set of

variables that are considered relevant to predict the risk of food fraud. These

variables were classified in economic factors and market characteristics, production

and distribution factors, product characteristics and detection technology and

institutional and enforcement factors.

A review of several approaches that have been used in the economic literature to

explain or predict the risk of fraud. While there are few examples of previous work

that has applied economic intelligence to study food fraud, relevant methodologies

were found in other areas such as insurance and credit card fraud.

We have compiled and reviewed multiple sources of data that can be used to

conduct estimations of the risk of food fraud for various products. Of particular

importance is the review of data sources that document past instances of food fraud

in the UK. We have concluded that the UKFSS provides the best available source

of this data.

In addition to past fraud, we have explored sources of economic data, such as

prices and volumes of authentic ingredients and adulterants. We have focused on

publicly available data produced by recognised institutions. However, additional

data is provided by private institutions on a subscription basis. We recognise that

further exploration of these sources may be required to arrive to a comprehensive

assessment of economic data currently available.

The literature review of the economics of fraud identified alternative methodological

approaches that could be applied to food fraud. We provide criteria to assess the

advantages and disadvantages of these approaches. It was concluded that an

econometric methodology would be the most suitable approach based on economic

intelligence for the case of food fraud.

The proposed methodology identifies alternative econometric models and variables

that could be estimated to predict the risk of fraud. Additionally, it provides criteria to

select between these models based on the interpretability of the results, the

statistical significance of the coefficients and goodness of fit indicators.

The proposed econometric models would provide a quantification of the link

between movements in economic variables and past fraud. The methodology also

suggests how to use these estimations in a forward looking manner. By tracking the

(observable) variables, it would be possible to determine the risk level predicted by

the model. The proposed approach suggests a method to construct thresholds for

the explanatory variables that would map into a small and discrete set of categories

of fraud risk (e.g. two categories: low and high risk).

Conclusions and Recommendations

55

a. Case study

The proposed methodology was tested using adulteration of Basmati rice using cheaper

varieties of rice as a case study. The selection of this particular type of fraud for the case

study was based on well documented instances of past fraud and availability of economic

data. It was found that the model that performed best is a logit regression in which the

price gap between the price of Basmati rice and other varieties of rice was the only

statistically significant predictor of fraud. While other variables were included, no evidence

was found that there is a link between them and the risk of fraud. We note, however, that

this is not a definitive finding and results might change considerably if a larger sample

were available. Based on the regression results, the effect of the price gap between the

authentic ingredient and the adulterant on fraud was used to predict future fraud. It was

determined that the risk of Basmati rice fraud would be high when the price gap exceeds

628USD. This result was tested using past data and it was determined that it predicts

fraud correctly with 66.6 per cent accuracy.55 This level of accuracy suggests that the test

proposed in this report may contain useful information that would indicate a higher risk of

fraud. However, we would like to stress the indicative nature of this accuracy level. Given

its limitations, the fact that the test indicates high risk of fraud should not be interpreted as

conclusive evidence that fraud would occur.

b. Limitations

The case study also served to illustrate the considerable limitations that could be faced

when applying the proposed methodology to a particular product or fraud type. The most

important limitation is the small sample size. Despite that fact that Basmati rice was

chosen as a case study due to having abundant data (relative to other products), the

results obtained suggest that the sample size was barely sufficient to establish the most

basic relationships between the variables considered. Attempts to apply this methodology

to other products may encounter the same or even greater data limitations.

The negative consequences of having a small sample size are multiple. First, for some

variables that are in fact important may not be found to be statistically significant, simply

because the sample size is too small. Second, a small sample size is likely to cause high

variation in the estimated coefficients across different models, undermining the

robustness of the results.

The main constraint on the sample size is the number of periods in which past instances

of fraud were tested (and possibly detected). These observations are obtained from the

UKFSS. While we found this database to be the most complete out of the ones identified,

it is important to highlight that it is relatively new, with very few observations for dates

earlier than 2010.

55

As explained above, this is measured on a scale from 50 to 100 per cent.

Conclusions and Recommendations

56

Other limitations of the analysis include the use of low quality or missing data and the

difficulty (or impossibility) to measure variables that the literature has identified as

relevant, such as key features of the supply chain.

c. Recommendations

We consider that the proposed methodology is appropriate and solidly founded in the

literature. However, due to limitations in the data currently available, the results obtained

when applying the methodology might not be entirely satisfactory. We recommend that this

approach is applied when the availability of data on past fraud is more abundant.

It is not possible to determine a priori which sample size would be sufficient to obtain

statistically reliable estimates. However, given the UKFSS database is constantly

expanding at an encouraging rate, the quality of the output of the approach proposed in

this report would increase significantly in the medium term.

We note that the choice of econometric methods was not guided by the time and resource

constraints of this project. While more sophisticated methods could be employed, these

were not prioritised at present due to their higher requirements in terms of sample size.

There is scope for future work in this area. It would be desirable to identify additional

sources of economic data, especially reviewing those provided by third parties. In addition,

more progress could be made in measuring some of the variables identified by the

literature but omitted from the analysis due to insufficient data.

Finally, we consider that the implementation of the methodology to multiple products

simultaneously using panel techniques might be worth exploring in more detail. The

advantage of this approach would be an increase in the number of observations. However,

we note that there would be additional complications associated with this approach, since

the effect of economic variables on fraud, such as prices, might differ significantly across

products. Therefore, there would be an increased risk of non-significant estimates and

biased results. We consider that a multi-product approach would not be excessively

challenging from a conceptual point of view. However, data quantity (and quality) will be

the deciding factors in whether this approach is successful. Therefore, the choice of

products for this analysis should be based primarily on this criterion.

Annex I: Detailed Review of Selected Literature

57

8. Annex I: Detailed Review of Selected Literature

a. Food fraud – economics

Pouliot, S. (2012). On the economics of adulteration in food imports: application to US fish

and seafood imports. Cahier de recherche/Working paper, 2012, 15.

Objective: Show the role of economics in the adulteration of food imports.

Data: Simulations and empirical data of import refusals in US from the FDA 2012

database.

Variables: The country of origin, the port of entry, product code, product description and

lists of the charges that motivated rejection. The dataset, however, does not include

information regarding the quantity and the value of products refused entry.

Method: The mechanism of impact in the model is the choice of input quality by exporting

firms. One implication of the model is that economic variables can be used to predict

adulteration in food imports. The author performed structural break tests on the weekly

number of import refusals using the structural change package in R which implements the

algorithm to find breakpoints. Here, the procedure tests for structural breaks in the average

weekly import refusals. Regression outcomes show means for the weekly number of

import refusals for fish and seafood in the periods delimited by the structural break tests.

Before the first break in November 2005, FDA refused entry to about 41 shipments per

week. The number weekly refusals of seafood shipments then decline to almost 33 per

weeks until December 2010. In addition, the author provides graphical evidence that the

Deepwater Horizon incident had no impact on import quantities of seafood.

Advantages: The model offers a framework that can be used to identify adulteration risk

for products other than food such as drugs or medical devices. With a few modifications,

the model can also help guide inspection of domestic facilities.

Disadvantages: Does not account for other covariates that could have impacted the

number of import refusals. In particular, the increase in import refusals could be due to

increase in oversight by FDA of all food imports.

Tested Method: No.

Factors that affect the likelihood of fraud: The closing of fisheries in the Gulf of Mexico

because of the Deepwater Horizon platform oil spill.

Liang, J., & Jensen, H. H. (2007). Imperfect food certification, opportunistic behaviours and

detection. Selected Paper, 175174.

Annex I: Detailed Review of Selected Literature

58

Objective: Present a theoretical framework to analyse the performance of the “Good

Agricultural Practises” program with respect to output and quality based on the assumption

of predetermined productive capacity (farm size), heterogeneous farms and exogenous

detection.

Data: Theoretical model.

Variables: Available resources to farms, farm size, effort by the monitoring agency, price

differences, reputation and advertising, cost of production.

Summary: The authors build a model which illustrates how, and under what conditions,

monitoring and enforcement activities might mitigate the fraudulent activities of food

growers under a voluntary GAPs program. The analysis brings out the following results:

first, the farms respond to the monitoring and enforcement not only on reducing fraudulent

output, but also on increasing truly high-safety output until the perfect compliance level is

achieved. Second, optimal monitoring policy depends on the exogenous parameters of the

farms. If the monitoring budget is not enough to cover the necessary inspection cost of

achieving perfect high-safety output level, it will allocate resources to farms with larger size

and lower costs; If the budget is enough to obtain perfect level of high-safety output but is

not enough to preclude fraudulent output, the monitoring agency will expend equal effort

on all the farms. Third, fraudulent behaviours can be eliminated using the combination

policies of penalty, sale ban and monitoring activities while cannot be excluded completely

under an endogenous detection rate.

Advantages: N/A.

Disadvantages: A more complicated analysis could be developed when the parameters

are dependent. Second, the monitoring budget is assumed to be exogenous in their model

and the authors do not address the question of how the budget is decided.

Tested method: No.

Factors that affect the likelihood of fraud: Monitoring and enforcement.

Pouliot, S. (2012) Using economic variables to identify adulteration in food imports:

application to US seafood imports. Working paper.

Objective: Show that inspection policies may integrate economic data to better target risk.

Data: Data on US seafood imports from China and Buzby, from Unnevehr and Roberts

(2008) describe US food import refusal data. The authors find that between 1998 and 2004, 65

percent of refusals were dues to adulteration, 33 percent for misbranding and 2 percent for

other violations. The most common type of adulteration was filth.

Variables: Losses in revenue if fail inspection, quality of inputs, inspection rate,

substitutability of inputs.

Annex I: Detailed Review of Selected Literature

59

Method: The model considers exporting firms that can buy inputs of two qualities: low and

high. The low quality input does not meet quality standards in the importing country such

that its use adulterates the output of the exporting firm. The decision by an exporting to

adulterate its output depends on the relative price of inputs and the ability of the importing

country to detect adulteration.

Advantages: Model applies to food, but applications of the model to drugs or other

products are possible with slight modifications. The model can apply to many other issues

such as the domestic inspection policies of food plants, the inspection of medicines and

the detection of counterfeits.

Disadvantages: Empirical investigation of food import quality is still limited by the availability

of inspection data. Not clear how to find suitable instruments for import inspection effort over

time.

Tested Method: No, although the FDA recognises the adulteration happens for economic

reasons it fails to take them into account when developing a model for assessing

economic fraud.

Factors that affect the likelihood of fraud: By monitoring prices, an inspection agency

can identify threats even before they materialize in imports because of lags between

production and change in prices. In the long run, an inspection agency can learn about

adulteration by observing high rates of rejection or obtaining information from other

inspection agencies.

Hoffmann, S. (2010). Food safety policy and economics. Resources for the Future.

Objective: Overview of developments in food safety policy in major industrial countries

and of economic analysis of this policy.

Data: N/A.

Variables: Willingness to pay for health, cost of illness, lost productivity, direct compliance

cost, studies, to derive valuation estimates for use in policy analysis.

Summary: It describes the elements of a risk-based, farm-to-fork food safety system as it

is emerging in OECD countries guided by discussions through Codex Alimentarius and

traces its roots in the development of risk management policy in the United States.

Empirical research estimating the benefits of food safety policy has used multiple methods

including hedonic estimates of demand for safety from market data, stated preference

surveys, and experimental auctions. Many areas of applied economics is increasingly

looking to meta-analysis - a method of using statistical analysis to look at systematic

patterns across related studies, to derive valuation estimates for use in policy analysis.

Advantages: N/A.

Annex I: Detailed Review of Selected Literature

60

Disadvantages: This system relies on risk management practices developed in the public

sector to guide environmental and health and safety policy and in the private sector to

reduce risk of failure in process engineering.

Tested Method: No, conventional approaches to food safety policy that have been in

place since the turn of the last century are not adequate to meet these new food safety

challenges.

Factors that affect the likelihood of fraud: A globalised supply chain.

Buzby, J. C., Unnevehr, L. J., & Roberts, D. (2008). Food safety and imports: An analysis

of FDA food-related import refusal reports (No. 58626). United States Department of

Agriculture, Economic Research Service.

Objective: Examines U.S. Food and Drug Administration (FDA) data on refusals of food

offered for importation into the United States from 1998 to 2004.

Data: U.S. Food and Drug Administration (FDA) refusals of food import shipments for

1998-2004 by food industry group and by type of violation.

Variables: IRR data which include those shipments ultimately refused entry into U.S.

commerce. For each refusal, FDA reports the violation or charge codes, which document

the reasons for refusal.

Method: Researchers analysed FDA Import Refusal Reports (IRR) for food shipments

refused entry into U.S. commerce between 1998 and 2004. Tabulations were created of

refusals by industry group and FDA violation code (e.g., type of violation). Adulteration

violations were examined closely, particularly those linked to pathogen contamination.

Advantages: risk based.

Disadvantages: The scope of the report does not include the imported meat, poultry, and

processed egg products regulated by FSIS.

Tested Method: Yes, the sampling strategies by the FDA and other agencies are

designed to focus enforcement and inspection efforts on areas that have the highest

probability of risk. Import refusals highlight food safety problems that appear to recur in

trade (i.e., the FDA thought they would be a problem and they are) and where the FDA

has focused its import alerts and monitoring efforts.

Factors that affect the likelihood of fraud: Types of violations, country of origin, and

product characteristics.

Starbird, S. A. (2005). Moral hazard, inspection policy, and food safety. American Journal

of Agricultural Economics, 87(1), 15-27.

Objective: examine the sampling inspection policies in the 1996 Pathogen

Reduction/Hazard Analysis Critical Control Point Act.

Annex I: Detailed Review of Selected Literature

61

Data: Theoretical discussion.

Variables: quality of goods, wage or price, inspection rate.

Summary: To gather information about safety, buyers often employ sampling inspection.

Sampling inspection exhibits sampling error so some unsafe product passes inspection

and some safe product does not. This uncertainty influences buyer and supplier behaviour.

In this article, the author uses a principal–agent model to examine how sampling

inspection policies influence food safety. They found that the sampling inspection policy,

the internal failure cost, and the external failure cost have a significant impact on the price

that the buyer is willing to offer for safer food and, therefore, on the supplier’s willingness

to exert the effort required to deliver safe food. The internal failure cost has a significant

impact on the behaviour of the supplier and the external failure cost has a significant

impact on the behaviour of the buyer. The author found the minimum external failure cost

that will motivate a risk-neutral buyer to demand high effort and showed that it depends on

the rate of lot acceptance, the contribution margin, and the safety level under high effort

and under low effort.

Advantages: Clarifies the relationship between inspection rate and risk of fraud.

Disadvantages: Analysis has focused on the behaviour of a single buyer and a single

seller in the short run. It is likely that repeated failures could affect the seller’s reputation

and market share in the long run.

Tested Method: No.

Factors that affect the likelihood of fraud: Size of penalties.

b. Food fraud – biological science

Moore, J. C., Spink, J., & Lipp, M. (2012). Development and application of a database of

food ingredient fraud and economically motivated adulteration from 1980 to 2010. Journal

of food science, 77(4), R118-R126.

Objective: To collect information from publicly available articles in scholarly journals and

general media, organize them into a database, and review and analyse the data to identify

trends.

Data: Reports of food fraud (food ingredients specifically).

Variables: N/A.

Summary: Literature search information was analysed and coded into a database by the

authors and other supporting researchers. Considerations were given to the most

appropriate and useful characteristics that could be extracted into a concise format for

tabular and database presentations that allow further data analysis and insights for

understanding and predicting food fraud and identifying analytical detection methods. The

Annex I: Detailed Review of Selected Literature

62

authors analysed the database by sorting all records into 2 datasets by report type, and

then they determined top ingredients and ingredient categories in each dataset. The

scholarly records dataset included a total of 1054 records based on 584 literature

references, and the media and other reports dataset included 251 records based on 93

articles. The authors analysed the scholarly reports dataset to determine the 25 food

ingredients with the greatest number of records or hits.

Advantages: The database provides information that can be useful for risk assessors

evaluating current and emerging risk for food fraud. The authors claim that it provides a

baseline understanding of the susceptibility or vulnerability of individual ingredients to

fraud.

Disadvantages: Many articles collected in the database do not have enough information

to facilitate classification into specific incidents.

Tested Method: Yes, the website http://www.usp.org/food-ingredients/food-fraud-

database uses their database.

Factors that affect the likelihood of fraud: government surveillance reports and

information from criminal prosecution cases for some types of food fraud.

Elliott review into the integrity and assurance of food supply networks: final report - a

national food crime prevention framework.

Summary: The Elliott Review aims at shedding light to the problem of food fraud so as to

make it much more difficult for criminals to operate in food supply networks, thus providing

the UK consumer with safer and more authentic food.

The author recommends a systems approach which is intended to provide a framework to

allow the development of a national food crime prevention strategy. Making it much more

difficult for criminals to operate in food networks by introducing new measures to check,

test and investigate any suspicious activity. The author suggests that those caught

engaging in food fraud activity must be severely punished by the law to send a clear

message to those thinking of conducting similar criminal activity.

To complete this report a data collection process took place together with well-structured

surveys with people related to the food industry. The report finds that the global nature of

the current food markets enables UK consumers access to all types of products even

when they are out of season. This means that the supply chain for food has become much

more complex as a number of these products has to be imported from abroad. Consumers

have become used to variety, taste, and access at low cost. All of these factors have

increased opportunities for mislabelling, substitution and for food crime. Based on a

number of consultations with the industry the report makes a number of recommendations.

Such recommendations are shown in the table below:

Annex I: Detailed Review of Selected Literature

63

Everstine, K., Spink, J., & Kennedy, S. (2013). Economically motivated adulteration (EMA)

of food: common characteristics of EMA incidents. Journal of Food Protection, 76(4), 723-

735.

Summary: The paper reveals gaps in quality assurance testing methodologies that could

be exploited for intentional harm. EMA incidents present a particular challenge to the food

industry and regulators because they are deliberate acts that are intended to evade

detection. Large-scale EMA incidents have been described in the scientific literature, but

smaller incidents have been documented only in media sources. The authors have

reviewed journal articles and media reports of EMA since 1980. They identified 137 unique

incidents in 11 food categories: fish and seafood (24 incidents), dairy products (15), fruit

juices (12), oils and fats (12), grain products (11), honey and other natural sweeteners

(10), spices and extracts (8), wine and other alcoholic beverages (7), infant formula (5),

plant-based proteins (5), and other food products (28). They also identified common

characteristics among the incidents that may help better evaluate and reduce the risk of

EMA. These characteristics reflect the ways in which existing regulatory systems or testing

methodologies were inadequate for detecting EMA and how novel detection methods and

other deterrence strategies can be deployed.

Johnson R. (2014), Food Fraud and “Economically Motivated Adulteration” of Food and

Food Ingredients, Congressional Research Service.

Objective: This report provides an overview of the issues pertaining to food fraud and

“economically motivated adulteration” or EMA, a category within food fraud.

Data: The data comes from two databases: (1) the United States Pharmacopeial

Convention (USP) Food Fraud Database and (2) the National Centre for Food Protection

and Defence (NCFPD) EMA Incident Database.

Variables: Profit margins, field protection and control during harvesting.

Summary: First, the report provides a general background information on food fraud and

EMA, including how it is defined and the types of fraud, as well as how food fraud fits into

the broader policy realm of food safety, food defence, and food quality. Second, the report

provides available information about foods and ingredients. Individual records therefore

have been further grouped by adulterant (e.g. melamine) and time period when the

incident is estimated to have occurred.

Advantages: It is able to highlight emerging concerns about food fraud involving “clouding

agents”.

Disadvantages: It may not be possible for FDA and DOJ to prosecute every instance of

food fraud given each agency’s myriad other responsibilities and limited personnel and

resources. Also, oftentimes inadequate evidence exists to effectively enforce against all

alleged or suspected cases of fraud.

Tested method: yes.

Annex I: Detailed Review of Selected Literature

64

Factors that affect the likelihood of fraud: N/A.

Fairchild, G. F., Nichols, J. P., & Capps, O. (2003). Observations on economic adulteration

of high-value food products: The honey case. Journal of Food Distribution

Research, 34(2), 38-45.

Objective: To highlights the issue of economic adulteration of high-value food products

and provide a context for discussion and analysis based on experiences with the U.S.

honey industry.

Data: Mail survey of fourteen U.S. honey packers was conducted at the request of the

National Honey Board in 1999.

Variables: Percentage of economically adulterated product purchased from various

sources, the answer to the question of whether or not they believe economic adulteration

is affecting their operation or creating unfair competition.

Method: One approach would be to begin with an estimation of the retail demand for a

given product, then develop estimates for own-price elasticity of demand at the retail and

producer levels of the market channel, and finally develop estimates for the upper bounds

of own-price flexibility at the producer and retail levels. Assume that high-value-product

prices are relatively sensitive to quantity changes.

Advantages: N/A.

Disadvantages: The survey was not a statistically representative (random) sample and

thus the information generated only represents the experience and opinions of the

responding firms.

Tested method: Honey case study.

Factors that affect the likelihood of fraud: One motivation behind economic adulteration

is the opportunity to reduce costs and increase profits per unit sold at prices comparable to

pure products, or to reduce input costs and lower selling price to increase sales volume

and/or market share. Cost differences can be significant enough that firms selling

adulterated product can cause economic injury to competing firms, sometimes selling

below product cost for pure products and sometimes driving producers and packers out of

business.

Spink, J., & Moyer, D. C. (2011). Defining the public health threat of food fraud. Journal of

food science, 76(9), R157-R163.

Objective: To provide a base reference document for defining food fraud - it focuses

specifically on the public health threat - and to facilitate a shift in focus from intervention to

prevention. This will subsequently provide a framework for future quantitative or innovative

research. The fraud opportunity is deconstructed using the criminology and behavioural

Annex I: Detailed Review of Selected Literature

65

science applications of the crime triangle and the chemistry of the crime. The research

provides a food risk matrix and identifies food fraud incident types.

Data: CDC annual reports and FoodNet surveys.

Variables: Level of tariffs and anti-dumping duties, profit markings, cost of inputs.

Summary: Through a literature review and peer consultation, this report was created as a

“backgrounder” on the topic. The intent of this research paper is to provide a base

reference document for defining food fraud—it focuses specifically on the public health

threat—and to facilitate a shift in focus from intervention to prevention.

Advantages: The major outcome of this study was to clarify that while the motivation may

be economic, public health remains vulnerable.

Disadvantages: None stated.

Tested method: Not yet but the authors suggest it should be used to define public policy.

Factors that affect the likelihood of fraud: Focusing on the criminal component of the

crime triangle56 provides insights to the motivations for seeking food fraud opportunities.

Brand growth and increased brand recognition of a product actually increases the fraud

opportunity (that is, more victims, spending and brand equity). Finally the guardian or

hurdle gaps lead to a greater fraud opportunity. Guardians include entities that monitor or

protect the product and could include customs, federal or local law enforcement, trade

associations, nongovernmental organizations, or individual companies themselves.

Hurdles include components or systems that exist (or are put in place) to reduce the fraud

opportunity by assisting in detection or providing a deterrence. Fraud opportunities could

be reduced by increasing the risk of detection, or increasing the costs of the necessary

technology to commit the fraud and/or of developing quality levels that would attract

consumers. Countermeasures are intended to reduce the fraud opportunity, but a

refinement to a process or a narrowing of focus in detection could inadvertently create new

gaps that could be exploited by fraudsters. An example of this uncertain nature is that

fraudsters may shift ports of entry by conducting strategic “port shopping” and by shipping

fraudulent product through less monitored entry points.

Woolfe, M., & Primrose, S. (2004). Food forensics: using DNA technology to combat

misdescription and fraud. TRENDS in Biotechnology, 22(5), 222-226.

56

There are three elements of crime opportunity or the more general term of fraud opportunity, as illustrated by the crime triangle: victim, fraudster or referenced in criminology research as the “criminal,” and guardian including hurdle gaps. It is important to emphasize that there may be very capable guardians and hurdles in place, but the nature of an evolving, emerging threat is that new gaps always occur. The term fraudsters is used since in many incidents, the food fraud is not criminal or even a civil law violations, and may not be considered unethical in many cultures (this last point is a behavioural sciences and social anthropology phenomenon). To adapt the concept, note that as the legs increase in length, the area of the triangle increases, which represents an increase in the crime opportunity. Manipulating any leg of the triangle affects the area of the triangle and the crime opportunity.

Annex I: Detailed Review of Selected Literature

66

Objective: Proving that fraud has occurred through the detection and quantification of food

constituents by DNA.

Data: Basmati rice, olive oil.

Variables: DNA sample.

Summary: The paper presents the many different chemical and biochemical techniques

that have been developed for determining the authenticity of food and, in recent years,

methods based on DNA analysis. These methods have gained increased prominence in

the past years. This is because some techniques, such as immunoassays, work well with

raw foods but lose their discrimination when applied to cooked or highly processed foods.

Also many techniques do not easily distinguish between closely related materials at the

chemical level. For example, olive and hazelnut oils are similar chemically so the usual

analytical methods cannot be applied to detect the adulteration of olive oils with hazelnut

oil. Conventional chemical methods are also not always able to detect country or region of

origin of olive oil. DNA analysis has discriminating power because ultimately the definition

of a variety or species is dependent on the sequence of the DNA in its genome. DNA is

more resilient to destruction by food processing (particularly cooking and sterilization) than

other marker substances. According to the authors the main problem with using DNA

technology in food forensics are (i) the recovery of quality DNA from the vast array of

complex food matrices and (ii) the impact of food processing on the size of DNA that can

be recovered.

Robust DNA-based methods now exist for detecting or confirming the identity of various

meat, poultry and fish species, for identifying potato varieties, for distinguishing true-line

and hybrid basmati rice varieties from other long grain rice and for detecting offal and

neuronal tissue in processed meat products. These methods are being extended to the

identification of premium tea varieties and the regional origin of cold-pressed olive oil.

Krissoff, B., Kuchler, F., Calvin, L., Nelson, K., & Price, G. (2004). Traceability in the US

food supply: economic theory and industry studies (pp. 3-10). US Department of

Agriculture, Economic Research Service.

Objective: show how exogenous increases in food traceability create incentives for farms

and marketing firms to supply safer food by increasing liability costs.

Data: Theoretical paper.

Variables: Theoretical paper.

Method: The authors model formally the linkage between traceability and food safety and

establish the implications of an increase in traceability-liability for food safety. In this

context, liability is defined as the responsibility to pay compensation for damages such as

caused by foodborne illnesses. The capacity to trace the origin of food increases the

possibility of legal remedy and compensation in the case of a food safety incident. The

authors show explicitly the mechanism through which traceability systems create

Annex I: Detailed Review of Selected Literature

67

incentives for firms to supply safer food. Traceability also allows parties to more easily

document that they are not responsible for harm. The authors show that food safety

declines with the number of farms and marketers and imperfect traceability from

consumers to marketers dampens liability incentives to supply safer food by farms.

Advantages: model implies several propositions that can be tested empirically.

Disadvantages: The authors do not discuss how such improved traceability would be

accomplished.

Tested Method: No.

Factors that affect the likelihood of fraud: Traceability.

Troop report (2013), Review of Food Standards Agency response to the incident of

contamination of beef products with horse and pork meat and DNA.

Objective: To review the response by the FSA to incidents of the adulteration of

comminuted beef products with horse and pig meat and DNA , and to make

recommendations to the FSA Board on the relevant capacity and capabilities of the FSA

and any actions that should be taken to maintain or build them.

Summary: 35 interviews were conducted with around 50 individuals, including a wide

range of FSA officials, officials in other Government Departments and Bodies, Ministers,

the Food Safety Authority of Ireland, industry representatives, Local Authority bodies and

consumer representatives. The evidence gathering for the review took place over a six

week period from 17 April to 31 May and included the review of documentation and

interviews with a wide range of individuals and organisations involved in the response to

the incident. The findings show that it was generally recognised that meat is a high value

product, which can be open to adulteration. Species substitution was known about and

action had been taken, but this focussed on for example cases of pork or chicken in beef

or lamb substituted by beef. The desire of companies to source cheap meat was

recognised but thinking was around expected meat such as chicken or pork, or cheaper

sources of beef.

National Audit Office (2013) Food safety and authenticity in the processed meat supply

chain.

Summary: The authors considered the horsemeat adulteration incident as a way to

examine the effectiveness of government’s monitoring and enforcement of legislation for

food safety and composition in England for processed meat products. The authors report

on the clarity of responsibilities, the effectiveness of food intelligence gathering and

analysis, food sample testing and the targeting of resources across the food supply chain.

The data used include the total number of food samples, budget allocated by local

authorities towards food inspection, number of public analysts. The authors do not

examine the nutritional labelling of food or the robustness of the checks on nutrition.

Annex I: Detailed Review of Selected Literature

68

c. Credit card fraud (empirical)

Manuela, P. and Paba A. (2010): "A discrete choice approach to model credit card fraud",

1.

Data: dataset of 320,000 observations from a portfolio of credit cards (Classic, Gold and

Revolving) issued in Italy. The paper employs data on every individual whose application

for a given card was accepted. Clients with a poor credit history are not accepted.

Variables: Gender, civil status, age, occupation and urbanisation affect the risk of fraud.

Method: A logit model. Fraud (dependent variable) and a set of explanatory variables (e.g.

gender, location, credit line, number of transactions in euros and in non-euros currency).

Advantages: not mentioned.

Disadvantages: no.

Tested method? Yes, already used.

Factors to affect the likelihood of fraud: Gender, location, circuit, ownership or “holder”,

outstanding balance, number of transactions in euros, number of transactions in non-

euros, credit line. Nationality (foreign customers 22.25 times more likely to perpetrate a

fraud than nationals).

d. Credit card fraud (theoretical)

Greene, W. (1998). Sample selection in credit-scoring models. Japan and the world

Economy, 10(3), 299-316.

Objective: how sample selection affects the measurement of some variables of interest to

credit card vendors.

Data: sample of observations generated by a major credit-card vendor in 1991. The

sample used was ‘choice based’. At the time the data were generated, the true acceptance

rate was closer to 60%. The credit-card vendor provided the choice based sample so as to

facilitate analysis of the very low default rate. Of 13 444 applications received, 10 499

were accepted. The purpose of the study is more theoretical than empirical and so the

data is only used to illustrate the theoretical points.

Variables: propensity to default (dependent variable), the number of derogatory reports in

an applicant's credit history, income, credit history, the ratio of credit-card burden to

current income, age, average expenditure, dependents, home owner, type of

employment,, months at current address, number of credit bureau enquiries, months

employed.

Annex I: Detailed Review of Selected Literature

69

Method: A binary choice model is used to examine the decision of whether or not to

extend credit. A selectivity aspect is introduced because such models are based on

samples of individuals to whom credit has already been given. A regression model with

sample selection is suggested for predicting expenditures, or the amount of credit. The

same considerations as in the binary choice case apply. Finally, a model for counts of

occurrences is described which could, in some settings also be treated as a model of

sample selection.

Advantages: Acceptance/rejection decisions are based on simple, easy to interpret and

justified criteria.

Disadvantages: If there are factors which enter the acceptance decision but do not appear

explicitly in the rule, and these same factors influence the response in the performance equation,

then the latter equation may produce biased predictions. Thus a predictor of default risk can be

systematically biased because it is constructed from a non-random sample of past applicants, that

is, those whose applications were accepted.

Tested method: Yes, the most common technique used for credit scoring is linear

discriminant analysis. The technique of discriminant analysis rests on the assumption that

there are two populations of individuals, which the authors denote `1' and `0,' each

characterized by a multivariate distribution of a set of attributes, x, including such factors

as age, income, family size, credit history, occupation, and so on.

Factors to affect the likelihood of fraud: the number of derogatory reports in an

applicant's credit history.

e. Credit card fraud and computer science

Chan, P. K., Fan, W., Prodromidis, A. L., & Stolfo, S. J. (1999). Distributed data mining in

credit card fraud detection. Intelligent Systems and their Applications, IEEE, 14(6), 67-74

Objective: Improve fraud warning systems using large scale data mining.

Data: Chase and First Union Bank members of the FSTC provided real credit card data for

the study. The two datasets contain credit card transactions labelled as fraudulent or

legitimate. Each bank supplied 0.5 million records spanning over a year with 20% fraud

and 80% non-fraud distribution for Chase bank and 15% versus 85% for First Union Bank.

Note that in practise fraudulent transactions are much less frequent that 15-20% in the

data.

Variables: N/A.

Method: Combine multiple learned fraud detectors under a “cost model”. They divide a

large dataset of labelled transactions (either fraudulent or legitimate) into smaller subsets,

apply mining techniques to generate classifiers in parallel and combine the resultant base

models by meta-learning from the classifier’s behaviour to generate a meta-classifier.

Annex I: Detailed Review of Selected Literature

70

Their approach treats the classifiers as black-boxes so that a variety of algorithms can be

employed.

Advantages: Efficient approach in generating large number of classifiers and

directapproach for sharing knowledge without sharing data.

Disadvantages: Not as efficient as the fine-grained parallelisation approaches.

Tested method: No.

Factors that affect likelihood of fraud: N/A.

Stolfo, S., Fan, W., Lee, W., Prodromidis, A., & Chan, P. (1997). Credit card fraud

detection using meta-learning: Issues and initial results. In AAAI-97 Workshop on Fraud

Detection and Risk Management.

Objective: Describe initial experiments using meta-learning techniques to learn models of

fraudulent credit card transactions.

Data: A large database, 500,000 records, of credit card transactions from one of the

members of the Financial Services Technology Consortium (FSTC, URL: www.fstc.org).

Each record has 30 fields and a total of 137 bytes. Under the terms of their nondisclosure

agreement, they cannot reveal the details of the database schema, nor the contents of the

data. The data were sampled from a 12-month period, but does not reflect the true fraud

rate.

Variables: N/A.

Method: Meta-learning is used to combine different (base) classifiers from different

learning algorithms and generate a (meta-) classifier that has better performance than any

of its constituents.

Advantages: None stated.

Disadvantages: Lack of effective metrics to guide the selection of base classifiers that will

produce the best meta-classifier.

Tested method: No.

Factors that affect likelihood of detecting fraud: good quality training data.

Brause, R., Langsdorf, T., & Hepp, M. (1999). Neural data mining for credit card fraud

detection. In Tools with Artificial Intelligence, 1999. Proceedings. 11th IEEE International

Conference on (pp. 103-106). IEEE.

Objective: Show how advanced data mining techniques and neural network algorithm can

be combined successfully to obtain a high fraud coverage combined with a low false alarm

rate.

Annex I: Detailed Review of Selected Literature

71

Data: For the analysis the authors used a sample set of 5,850 fraud transactions and

542,858 legal transactions, ordered by their time stamps.

Variables: N/A.

Summary of findings: In this contribution they develop concepts for the statistic-based

credit card fraud diagnosis. They showed that this task has to be based on the very special

diagnostic situation imposed by the very small proportion of fraud data of 1:1000.

Additionally, they showed that, by algorithmically generalizing the transaction data, one

may obtain higher levels of diagnostic rules. Combining this rule-based information and

adaptive classification methods yield very good results.

Advantages: Fraud decisions are about 80% valid.

Disadvantages: None stated.

Tested method: No.

Factors that affect likelihood of fraud: N/A.

Chan, P. K., & Stolfo, S. J. (1998). Toward Scalable Learning with Non-Uniform Class and

Cost Distributions: A Case Study in Credit Card Fraud Detection. In KDD (Vol. 1998, pp.

164-168).

Objective: Find a method that will reduce loss significantly due to illegitimate credit card

transactions.

Data: The Chase Manhattan Bank provided them with a data set that contains 500, 000

transactions between 1995 and1996, about 20% of which are fraudulent.

Method: Their approach is based on creating data subsets with the appropriate data class

distribution, applying learning algorithms to the subset independently and integrating to

optimise cost performance of the classifiers by learning from their classification behaviour.

Advantages: Their method utilises all available training examples and does not change

the underlying learning algorithm. It also handles non-uniform cost per error and is cost

sensitive during the learning process.

Disadvantages: The user needs to run preliminary experiments to determine the desired

distribution based on a defined cost model.

Tested Method: No.

Variables that affect the likelihood of fraud: N/A.

Srivastava, A., Kundu, A., Sural, S., & Majumdar, A. K. (2008). Credit card fraud detection

using hidden Markov model. Dependable and Secure Computing, IEEE Transactions

on, 5(1), 37-48.

Annex I: Detailed Review of Selected Literature

72

Objective: Model the sequence of operations in credit card transaction processing using a

Hidden Markov Model (HMM) and show how it can be used for the detection of frauds.

Data: A simulator is used to generate a mix of genuine and fraudulent transactions.

Variables: N/A.

Method: An HMM is initially trained with the normal behaviour of a cardholder. If an

incoming credit card transaction is not accepted by the trained HMM with sufficiently high

probability, it is considered to be fraudulent. At the same time, they try to ensure that

genuine transactions are not rejected.

Advantages: Comparative studies reveal that the accuracy of the system is close to 80

percent over a wide variation in the input data. The system is also scalable for handling

large volumes of transactions.

Tested Method: Yes, banks use detection systems similar to the one in the paper and

when the system confirms the transaction to be malicious, it raises an alarm, and the

issuing bank declines the transaction. The concerned cardholder may then be contacted

and alerted about the possibility that the card is compromised.

Factors that affect likelihood of fraud: Previous amounts spent on transactions.

Figure 8.1: An intuitive way of how the model works

Quah, J. T., & Sriganesh, M. (2008). Real-time credit card fraud detection using

computational intelligence. Expert Systems with Applications, 35(4), 1721-1732.

Objective: Real-time fraud detection and to present a new approach in understanding

spending patterns to decipher potential fraud cases.

Data: The data used is from the test database (an extraction from the actual banking

database) of a well-known bank.

Annex I: Detailed Review of Selected Literature

73

Variables: Number of transactions performed during the past hours and number of

transactions beyond $X, branch code of the transaction, account number, debit currency in

which transaction is done, debit or credit, terminal or PoS preference, transaction amount.

Method: A multi-layered approach consisting of: the initial authentication and screening

layers, the risk scoring and behaviour analysis layer (core layer), a layer of further review

and decision-making.

Advantages: Dynamic and can adapt to changing patterns in the e-marketplace and to

converge more and more information from all possible avenues for decision-making.

Disadvantages: It requires information not only on the customer profiles but also on

merchant profiles, their selling patterns, rules and policies in the market for accurate fraud

detection.

Tested Method: No.

Factors that affect likelihood of fraud: Deviation from previous pattern of behaviour.

Panigrahi, S., Kundu, A., Sural, S., & Majumdar, A. K. (2009). Credit card fraud detection:

A fusion approach using Dempster–Shafer theory and Bayesian learning. Information

Fusion, 10(4), 354-363.

Objective: Identify an effective way for detecting credit card fraud.

Data: Simulation data.

Variables: Spending pattern which is further categorised into: risk loving, risk neutral and

risk averse.

Method: Their method combines evidences from current as well as past behaviour. The

fraud detection system (FDS) consists of four components, namely, rule-based filter,

Dempster–Shafer adder, transaction history database and Bayesian learner. In the rule-

based component, they determine the suspicion level of each incoming transaction based

on the extent of its deviation from good pattern. Dempster–Shafer’s theory is used to

combine multiple such evidences and an initial belief is computed. The transaction is

classified as normal, abnormal or suspicious depending on this initial belief. Once a

transaction is found to be suspicious, belief is further strengthened or weakened according

to its similarity with fraudulent or genuine transaction history using Bayesian learning.

Advantages: It generates fewer false alarms relatively to other methods. The architecture

is flexible enough so that new rules can also be included at a later stage to further

augment the rule-based component. In addition, Bayesian learning takes place so that the

FDS adapts to the changing behaviour of genuine customers as well as fraudsters over

time.

Tested method: No.

Annex I: Detailed Review of Selected Literature

74

Factors that affect likelihood of fraud: Suspicion score.

Bhattacharyya, S., Jha, S., Tharakunnel, K., & Westland, J. C. (2011). Data mining for

credit card fraud: A comparative study. Decision Support Systems, 50 (3), 602-613.

Objective: To evaluate two advanced data mining approaches, support vector machines

and random forests, together with the logistic regression, as part of an attempt to better

detect (and thus control and prosecute) credit card fraud.

Data: The dataset was obtained from an international credit card operation. The study

uses Artificial Neural Networks (ANN) tuned by Genetic Algorithms (GAs) to detect fraud.

This dataset includes 13 months, from January 2006 to January 2007, of about 50 million

(49,858,600 transactions) credit card transactions on about one million (1,167,757 credit

cards) credit cards from a single country.

Variables: Retail purchase, cash advance, transfer.

Advantages: accessibility for practitioners, ease of use, and noted performance

advantages in the literature.

Disadvantages: instability and reliability issues.

Tested method: Yes, one of the three methods that are presented is used in practise -

logistic regression. “It is well-understood, easy to use, and remains one of the most

commonly used for data-mining in practice”.

Factors that affect likelihood of fraud: N/A.

Dheepa, V. and Dhanapal, R. (2009) "Analysis on credit card fraud detection methods."

International Journal of Recent Trends in Engineering, Vol 2.

Objective: The main task is to explore different views on credit card fraud and see what

can be learned from the application of each different technique.

Data: Theoretical approach.

Variables: N/A.

Summary: Three methods to detect fraud are presented. First, the clustering model is

used to classify the legal and fraudulent transaction using data clusterisation of regions of

parameter value. Secondly, Gaussian mixture model is used to model the probability

density of credit card user’s past behaviour so that the probability of current behaviour can

be calculated to detect any abnormalities from the past behaviour. Lastly, Bayesian

networks are used to describe the statistics of a particular user and the statistics of

different fraud scenarios.

Conclusion: To improve the fraud detection system, the combination of the three

presented methods could be beneficial.

Annex I: Detailed Review of Selected Literature

75

f. Automobile insurance and car accidents

Artıs, M., Ayuso, M., & Guillén, M. (1999). Modelling different types of automobile

insurance fraud behaviour in the Spanish market. Insurance: Mathematics and

Economics, 24(1), 67-81.

Objective: To model fraud behaviour in the automobile insurance industry. The model

should account for different types of fraud and explain individual behaviour. The authors

propose to distinguish the type of fraud chosen, because they assume that different kinds

of fraud may produce different benefits to the individual.

Data: The database has been obtained from a sample of claims of a Spanish company.

Data were collected from 1993 to 1996. Half of the claims are legitimate, the other half are

claims that had been identified as fraudulent. Each fraud has been classified as being for

personal benefit or for a third party benefit. The sample is not strictly random. The

estimation was obtained using maximum likelihood with the correction for choice-based

sampling in order to take into account the effect of the over-representation of fraud claims.

Therefore, weights were included in the estimation procedure.

Variables: type of claim, number of files associated to the claim, insurer did not accept

fault, police officer reported about the accident, presence of witnesses, accident took place

in a nonurban area, number of previous claims and number of years since vehicle

fabrication.

Method: Based on discrete-choice models for fraud behaviour. The authors estimate the

influence of the insured and claim characteristics on the probability of committing fraud.

They correct for choice-based sampling in the estimation due to the oversampling of fraud claims.

In the framework of discrete-choice models, their fraud model classifies each claim into

one of several different classes: legitimate, fraud for personal profit and fraud for a third

party benefit. They use two different approaches: firstly the authors consider a multinomial

logit model (MNL) and, secondly, a nested multinomial logit model (NMNL) is used.

Advantages: Their results are comparable to those proposed by other authors, but they

have accounted for a unified framework with several kinds of fraud.

Disadvantages: Some legitimated claims might be fraud claims that could not be detected

and thus they could improve the model by introducing measurement error in the

dependent variable.

Tested Method: No.

Factors that affect the likelihood of fraud: The number of files associated to a claim is

related to a higher probability of a legitimate claim. All variables are significant at a 5%

significance level except the location of the accident.

Annex I: Detailed Review of Selected Literature

76

g. Consumer goods

Grossman, G. M., & Shapiro, C. (1988). Counterfeit-product trade.

Objective: To characterize a counterfeiting equilibrium and explore its properties.

Data: Theoretical model.

Variables: Price and quality of genuine product and aggregate supply of adulterated

product.

Model: In the presence of counterfeiting, trademark owners compete subject to two

constraints. First, each price - quality offer must be credible, i.e., the manufacturer must

find it optimal to supply the promised quality rather than to run down his supply so that

each firm price its product above marginal cost and earns a flow of quasi - profits that

provide a competitive rate of return to the firm's reputation. Second, each firm must

account for (actual and potential) competition by counterfeiters. Brand name

manufacturers must avoid price/quality combinations that offer positive profits to

counterfeiters. Counterfeiters produce abroad and enjoy a cost advantage, but face the

possibility of confiscation at the border. Detection is more likely if the genuine product is of

higher quality. Counterfeiting also becomes more costly as the aggregate supply of

counterfeits rises, driving up foreign factor prices. In this model, counterfeiting provides an

additional avenue of export for the foreign country.

Factors that affect the likelihood of fraud: price and quality of genuine product and

aggregate supply of adulterated product.

h. Fraud in general

Becker, G. S. (1974). Crime and punishment: An economic approach. In Essays in the

Economics of Crime and Punishment (pp. 1-54). UMI.

Objective: Provide answers to the following questions: what determines the amount and

type of resources and punishments used to enforce a piece of legislation? In particular,

why does enforcement differ so greatly among different kinds of legislation?

Data: It is a theoretical paper so data is not very widely used. He uses an indicative

number on the costs of crime. Economic costs of crime are calculated by using the

President’s commission report 1967 which looks at the expected cost of being caught

when committing a crime. The Crime Commission estimates the direct costs of various

crimes.

Variables: value of crime and cost of crime.

Model: He breaks down the variable that determine the risk of fraud into five categories:

the relations between (1) the number of crimes, called "offenses" and the cost of offenses,

Annex I: Detailed Review of Selected Literature

77

(2) the number of offenses and the punishments meted out, (3) the number of offenses,

arrests, and convictions and the public expenditures on police and courts, (4) the number

of convictions and the costs of imprisonments or other kinds of punishments, and (5) the

number of offenses and the private expenditures on protection and apprehension.

Contribution: demonstrates that optimal policies to combat illegal behaviour are part of an

optimal allocation of resources. Since economics has been developed to handle resource

allocation, an "economic" framework becomes applicable to, and helps enrich, the analysis

of illegal behaviour. At the same time, certain unique aspects of the latter enrich economic

analysis: some punishments, such as imprisonments, are necessarily nonmonetary and

are a cost to society as well as to offenders; the degree of uncertainty is a decision

variable that enters both the revenue and cost functions.

Disadvantages: In reality people usually differ on the amount of damages or benefits

caused by different activities. To some, any wage rates set by competitive labour markets

are permissible, while to others, rates below a certain minimum are violations of basic

rights; to some, gambling, prostitution, and even abortion should be freely available to

anyone willing to pay the market price, while to others, gambling is sinful and abortion is

murder. These differences are basic to the development and implementation of public

policy but have been excluded from his inquiry. The author assumes consensus on

damages and benefits and simply tries to work out rules for an optimal implementation of

this consensus.

Tested Method: No.

Factors that affect the likelihood of fraud: Level of enforcement and the number of past

fraud instances that were detected.

Kou, Y., Lu, C. T., Sirwongwattana, S., & Huang, Y. P. (2004). Survey of fraud detection

techniques. In Networking, sensing and control, 2004 IEEE international conference

on (Vol. 2, pp. 749-754). IEEE.

Objective: presents a survey of current techniques used in credit card fraud detection,

telecommunication fraud detection, and computer intrusion detection. The goal of this

paper is to provide a review of different techniques to detect frauds.

Data: survey paper.

Method 1: Outlier Detection. An outlier is an observation that deviates so much from other

observations as to arouse suspicion that it was generated by a different mechanism.

Unsupervised learning approach is employed to this model. Usually, the result of

unsupervised learning is a new explanation or representation of the observation data,

which will then lead to improved future responses or decisions. Unsupervised methods do

not need the prior knowledge of fraudulent and non-fraudulent transactions in historical

database, but instead detect changes in behaviour or unusual transactions. These

methods model a baseline distribution that represents normal behaviour and then detect

observations that show greatest departure from this norm. Outliers are a basic form of

Annex I: Detailed Review of Selected Literature

78

non-standard observation that can be used for fraud detection. In supervised methods,

models are trained to discriminate between fraudulent and non-fraudulent behaviour so

that new observations can be assigned to classes. Supervised methods require accurate

identification of fraudulent.

Method 2: Neural Networks. A neural network is a set of interconnected nodes designed

to imitate the functioning of the human brain. Each node has a weighted connection to

several other nodes in adjacent layers. Individual nodes take the input received from

connected nodes and use the weights together with a simple function to compute output

values. Neural networks come in many shapes and forms and can be constructed for

supervised or unsupervised learning. The user specifies the number of hidden layers as

well as the number of nodes within a specific hidden layer. Cardwatch features neural

networks trained with the past data of a particular customer. It makes the network process

the current spending patterns to detect possible anomalies.

Method 3: Model-based Reasoning. Model-based detection is a misuse detection

technique that detects attacks through observable activities that infer an attack signature.

There is a database of attack scenarios containing a sequence of behaviours making up

the attack.

Method 4: Data Mining. Data mining approaches can be applied for intrusion detection. An

important advantage of data mining approach is that it can develop a new class of models

to detect new attacks before they have been seen by human experts. A classification

model with association rules algorithm and frequent episodes is developed for anomaly

intrusion detection. This approach can automatically generate concise and accurate

detection models from large amount of audit data. However, it requires a large amount of

audit data in order to compute the profile rule sets. Moreover, this learning process is an

integral and continuous part of an intrusion detection system because the rule sets used

by the detection module may not he static over a long period of time.

Tested Method: Yes, they are widely being used to detect credit card fraud and computer

intrusion detection.

OECD (2008), The Economic Impact of Counterfeiting and Piracy, OECD Publishing.

doi: 10.1787/9789264045521-en.

Objective: The report suggests ways to develop information and analysis, and calls on

governments to consider strengthening legal and regulatory frameworks.

Data: An analysis of international trade data (landed customs value basis57) was carried

out. They also conduct a survey to identify the extent of counterfeiting taking place and

use data on the value of seizures and infringements of IP.

57

Customs value is the value of merchandise assigned by customs officials; in most instances this is the same as the transaction value appearing on accompanying invoices. Landed customs value includes the

Annex I: Detailed Review of Selected Literature

79

Variables: Intensity and frequency of infringing activities, GDP per capita. Other

dependant variables include the share of products within given categories in total exports

from a given economy, dummy variables for preferential agreements, volume of inflowing

FDI, population size and openness rank. They construct a relative propensity index for

importing counterfeit goods from source economies.

Summary: The overall degree to which products are being counterfeited and pirated is

unknown and there do not appear to be any methodologies that could be employed to

develop an acceptable overall estimate. However, insights can be gained through an

examination of various types of information, including data on enforcement and information

developed through surveys. This information has significant limitations, however, and falls

far short of what is needed to develop a robust overall estimate. The General Trade-

Related Index of Counterfeiting for products (GTRIC-p) is constructed in three steps: (1)

first, the general seizure percentages are calculated for each reporting economy; (2) from

these, each product category’s counterfeiting factor is established; and (3) based on these

factors, the GTRIC-p is derived.

Advantages: N/A.

Disadvantages: GTRIC-p is formed on a 2-digit HS basis and establishes the relative

likelihood for products in one chapter to be counterfeit relative to another. Within any one

chapter, there could be considerable variation among products and the relative

counterfeiting propensities must therefore be seen as averages for the hundreds of goods

covered by each HS chapter.

Tested Method: No, OECD suggests that this method should be used by governments

internationally.

Factors that affect the likelihood of fraud: profitability and technology.

Peck, H. (2005). Drivers of supply chain vulnerability: an integrated framework.

International Journal of Physical Distribution & Logistics Management, 35(4), 210-232.

Objective: This paper aims to report on findings of a cross-sector empirical study of the

sources and drivers of supply chain vulnerability.

Data: Data collection involved semi-structured interviews with 47 managers, representing

five tiers of the network involved in the production of four distinct aircraft types.

Interviewees were selected using snowball sampling. The managers concerned performed

a range of supply chain management related roles. They were drawn from across the

aircraft programs (product lines/families) of the prime contractor (the assembler), its first-

and second-tier suppliers, industry associations – including one representing small and

medium sized enterprises – and customers in the UK Ministry of Defence.

insurance and freight charges incurred in transporting goods from the economy of origin to the economy of importation.

Annex I: Detailed Review of Selected Literature

80

Variables: N/A.

Framework: This paper develops a framework rather than analysing econometrically the

data. This paper has taken the findings of exploratory research into sources and drivers of

supply chain vulnerability and, drawing on systems theory, developed a multi-level

framework for analysis, providing the basis of a model (Figure 6.1) to explain the scope

and dynamic nature of supply chain risk.

Advantages: A starting point for developing more complete predictive simulations of the

likely effects of specific actions on dynamic supply chain networks.

Disadvantages: It would have been desirable to conduct in-depth multi-tier case studies in

each of the sectors used to validate the findings of the aerospace case study, immediately

after the initial study was undertaken. Unfortunately this was not possible due to time and

resource limitations.

Tested method: No.

Factors that affect the likelihood of fraud: Unanticipated side-effects or consequential

risks to supply chain processes, arising from specific managerial decisions, requirements

or industry trends. Demands for shorter lead-times, outsourcing and increasing use of

global sourcing and supply, as well as “off-set” (politically determined counter trade

agreements) were among the legitimate and well-intentioned measures identified by

interviewees as sources of risk to supply chain performance.

i. Insurance and tax fraud

Tennyson, S. (1997). Economic institutions and individual ethics: A study of consumer

attitudes toward insurance fraud. Journal of Economic Behavior & Organization, 32(2),

247-265.

Objective: To explore the determinants of consumers’ attitudes toward filing exaggerated

automobile insurance claims.

Data: The data for the study are obtained from a national survey of 1,987 adult individuals

taken in May 1991. The survey was developed by the Insurance Research Council, an

insurance industry research and information organization.

Variables: Age, white, male, years of school, executive, fraction of others who agree with

fraud statement: acceptable to not report income to IRS, good idea to reduce mandatory

insurance, good idea to give up right to sue, serious if any insurer bankrupt, confident of

financial stability of insurer, auto insurance premiums a major problem. The Herfindahl

Index of seller concentration, the percentage of cars insured in the residual market and the

average insurance premium per car (lagged two years).

Annex I: Detailed Review of Selected Literature

81

Method: Binary response variables are constructed which represent “agreement” versus

“disagreement” with each statement about insurance fraud. The variable is assigned a

value of 1 if the respondent “strongly agrees,” “agrees” or “probably agrees” with a

statement, and is assigned a value of 0 if the respondent “probably disagrees,” “disagrees”

or “strongly disagrees.” Respondents who replied “don’t know” to the question are

eliminated from the sample. Given the uneven distribution of responses to the fraud

questions documented the advantages of the ordered probit in this study are uncertain. In

preliminary analysis the authors explored both a 6- class and a 4-class multinomial probit

model. The estimation results were not markedly different from those of the binomial

probit, and the predictive accuracy was poor for most response cells. Hence, only the

results for the binary response variables were reported.

Advantages: Instead of viewing the prevalence of deviant attitudes as a function of

exogenously determined initial conditions, this view acknowledges that the attitudes of a

given population may also depend upon their institutional environment, and the perceived

legitimacy of the institutions in question.

Disadvantages: Using a binary response variable reduces the efficiency of estimation, by

failing to incorporate information regarding the strength of agreement or disagreement with

the statements; however, predictive accuracy of a more general ordered probit model will

be low for those cells in which there are few observations.

Tested method: No.

Factors that affect the likelihood of fraud: The Herfindahl Index of seller concentration,

the percentage of cars insured in the residual market and the average insurance premium

per car (lagged two years).

Andreoni, J., Erard, B., & Feinstein, J. (1998). Tax compliance. Journal of economic

literature, 818-860.

Objective: Characterising and explaining the observed patterns of tax non-compliance

and ultimately finding ways to reduce it.

Data: Theoretical discussion which suggests a number of sources to find appropriate data.

Such data include the household TCMP, state tax amnesty data and the IRS annual

report.

Variables: Tax rates, income, form of penalties, distribution of income.

Method: Tobit model of evasion using data from the Taxpayer’s Compliance Measurement

Programme (TCMP) and including as independent variables after tax, income, the

combined state and marginal tax rate and a variety of other socio-economic indicators.

Advantages: Considers the interaction between tax payers and tax authorities.

Annex I: Detailed Review of Selected Literature

82

Disadvantages: More psychological factors need to be included to analyse non-

compliance behaviour.

Tested model: No.

Factor that affect the likelihood of fraud: Social environment.

Derrig, R. A. (2002). Insurance fraud. Journal of Risk and Insurance, 69(3), 271-287.

Objective: To examine alternative cooperative arrangements that could reduce or

eliminate the potential inefficiency arising from the behaviour of insurance companies that

consider the possible cost savings of the total claim which can reduce the effectiveness of

investigations.

Data: Theoretical model.

Variables: Cost of insurance, subrogation expenses, cost of investigating claims.

Summary: The relationships between the cost of investigation and expected savings, as

well as the determination of the optimal levels of investigation under different

circumstances, are illustrated. The optimal level of investigation is determined when the

slopes of the cost of investigation line and the savings are equal.

Kornhauser, M. E. (2008). Normative and cognitive aspects of tax compliance: Literature

review and recommendations for the IRS regarding individual taxpayers. In 2007 Annual

Report to Congress (Vol. 138, pp. 138-180).

Objective: This report offers the IRS several concrete suggestions for improving individual

taxpayer compliance based on the tax morale literature.

Data: Experimental data.

Variables: Personal sense of integrity, degree of altruism, procedural justice, trust in

government, labels, rules of thumb, framing.

Summary: This Report surveys recent literature concerning the “tax morale” model of tax

compliance as it relates to individuals. It examines some of the cognitive processes

involved such as framing, but it concentrates on the moral, psychological, and social

factors influencing tax compliance.

Factors that affect the likelihood of fraud: rewards for not committing fraud.

Slemrod, J. (2007). Cheating ourselves: the economics of tax evasion. The journal of

economic perspectives, 21(1), 25-48.

Objective: reviews what is known about the magnitude, nature, and determinants of tax

evasion, with an emphasis on the U.S. income tax.

Data: U.S. Department of the Treasury, Internal Revenue Service (2006).

Annex I: Detailed Review of Selected Literature

83

Variables: Sources of income.

Summary: The paper reviews the current state of knowledge on the economics of tax

evasion with an emphasis on the U.S income tax. The main themes in the paper evolve

around the issues of tax evasion by sources of income, the type of people who have been

identified in the literature as more likely to evade (lower income people), the role of big

businesses in the tax system (how much corporation tax is evaded and why) and how the

standard deterrence model of tax evasion performs in practise.

Factors that affect the likelihood of fraud: Married filers and taxpayers younger than 65

have significantly higher average levels of noncompliance than others, and econometric

studies by Clotfelter58 (1983) and Feinstein59 (1991) that control for income and marginal

tax rates come to similar conclusions. Baldry60 (1987) found evidence, in an experimental

setting that men evade more than women.

58

Clotfelter, C. T. (1983). Tax evasion and tax rates: An analysis of individual returns. Review of Economics and Statistics, 65(3), 363-373.

59 Feinstein, J. S. (1991). An econometric analysis of income tax evasion and its detection. The RAND

Journal of Economics, 14-35. 60

Baldry, J. C. (1987). Income tax evasion and the tax schedule: Some experimental results. Public Finance= Finances publiques, 42(3), 357-83.

Annex II: Methodologies Used to Study Fraud

84

9. Annex II: Methodologies Used to Study Fraud

We have identified three main methodologies to assess the risk of fraud for different food

products:

Indices.

Econometrics.

Data mining.

For each of the methodologies we provide a description, examples of how they have been

applied to fraud detection and discuss some advantages and disadvantages for the

purposes of this project.

a. Construction of risk indices

The main objective of this report is to develop a methodology that can establish the level of

risk of fraud for various food products. An approach that has been used in the literature

would be to directly construct one or several indices that capture the risk of fraud.

The construction of an adequate risk index would rely crucially on the following steps:

The selection, a priori, of the relevant factors to be included in the index.

The determination of the relation and relative importance between these factors.

An example of a simple index would be as follows. Suppose that there are n factors

considered and that the relative importance is given by weights (denoted ) for each of

them. The simplest possible risk index could be estimated by the following formula:

We note that this formula is simply an example that consists of a weighted arithmetic

average. Other specifications, such as geometric or other non-linear functions would also

be possible (see examples below).

The main advantage of this approach is the very low data requirement. Even with a small

sample of observations it would be possible to construct one or more indices. There are

two main disadvantages. First, there are significant arbitrary decisions in this approach.

The arbitrariness would include the selection of factors but, more importantly, their

associated weights. Second, this approach would not have any associated statistical

methods to determine how robust the estimates are.

A relevant example of this method is given by OECD (2008). In this study the degree to

which products are being counterfeited and pirated is unknown and there do not appear to

Annex II: Methodologies Used to Study Fraud

85

be any methodologies that could be employed to develop an acceptable overall estimate.61

This report constructs the so-called General Trade-Related Index of Counterfeiting for

products (GTRIC-p) that measures the relative propensity to counterfeit different product

categories in international trade. It is based on two assumptions: i) the counterfeiting factor

of a given product category is positively related to the actual intensity of international trade

in counterfeit goods and ii) differences in counterfeiting factors may be due to the fact that

some products are easier to detect than others.

The GTRIC is constructed in three steps. First, the general seizure percentages are

calculated for each reporting economy. This is done by dividing the sum of the seizure

values or seizure incidents of product k over time by the total value of all seizures over

time.

From these, each product category’s counterfeiting factor is established. These

counterfeiting factors capture the sensitivity of product counterfeiting in a given category

relative to its share in international trade. Counterfeiting factors are defined as:

Based on these factors, the GTRIC-p is derived. It is estimated by taking a transformation

of the counterfeiting factor (CP) which would give relative weights to lower counterfeiting

factors. The GTRIC-p is finally obtained by re-scaling the counterfeiting factor. The

GTRIC-p is formed on a 2-digit HS basis and establishes the relative likelihood for

products in one chapter to be counterfeit relative to another.

b. Econometric models

Econometrics is a body of statistical methods used to analyse economic data. The main

tool used in econometrics is regression analysis. Regressions estimate the (often linear)

relation between a so-called explained variable and (potentially multiple) explanatory

variables. Various statistical methods are used to obtain the estimation that provides the

best fit for the data. The most popular of these methods is Ordinary Least Squares (OLS),

although there are several others that are used to address particular data structures and

data issues. In particular, when the explained variable is binary (or more generally,

discrete), methods such as logistic (or logit) and probit regressions are usually employed.

Logit and probit models estimate the probability that the explained variable would take a

particular value assuming a logistical or normal distribution, respectively.

61

OECD (2008), “The Economic Impact of Counterfeiting and Piracy”, OECD Publishing.

Annex II: Methodologies Used to Study Fraud

86

For this project, the explained variable would be a measure that captures the extent of

fraud. Ideally, this measure could be constructed from the number of fraud incidents

detected expressed as a percentage of the total investigations conducted by authorities.

Alternatively, this could be a binary variable that indicates whether any instance of fraud

was detected in a given point in time. The explanatory variables would include the factors

that are expected to affect the likelihood of fraud. Many of these factors have been

suggested in the literature. A comprehensive list of these factors was presented above

together with a discussion of each.

In broad terms, regression analysis will use a particular method (e.g. OLS, logit or probit)

to estimate a relation of the following form:

The chosen method will estimate the coefficients that best fit the data. Therefore, this

method “lets the data speak” when quantifying the relative importance of the various

factors affecting the risk of fraud. In contrast, a methodology based on indices would

choose the weights of these factors arbitrarily, potentially giving high importance to factors

that have little real explanatory power for assessing the likelihood of fraud.

In addition to the estimates of the coefficients, regression analysis provides the following:

Confidence intervals around coefficients. A smaller interval is interpreted as a

higher degree of confidence in the estimation. Moreover, it is possible to establish

whether a coefficient is statistically significantly different from zero (i.e. the factor

does not affect the risk of fraud).

Measures of goodness of fit, such as R-squared. These capture what fraction of the

risk fraud can be explained by the proposed factors.

A series of diagnostic tests that identify the appropriateness of the particular

statistical method chosen.

Econometric estimations typically have higher data requirements than the construction of

indices. If the number of observations is too low, the estimated coefficients will have large

confidence intervals, to the point that they might not be statistically distinguishable from

zero. In addition, it is necessary to have data on past incidents of fraud and the

explanatory factors for the same points in time.

The economics literature has used econometric modelling to address different types of

fraud, including food, insurance and tax among others.

Pouliot (2012) attempts to estimate food fraud based on data for import refusal in the

United States.62 His approach uses a particular estimation method aimed at detecting

structural breaks in the data. He concludes that economic variables can be used as

leading indicators of fraud. As an additional point, Pouliot (2012) acknowledges that, while

62

Pouliot, S. (2012). On the economics of adulteration in food imports: application to US fish and seafood imports. Cahier de recherche/Working paper, 2012, 15.

Annex II: Methodologies Used to Study Fraud

87

it is not accounted for in his model, the enforcement level should be introduced as a

control variable to avoid biases in the estimations.

In other sectors, Manuela and Paba (2010) used a logit model to estimate the effect of

various factors on the risk of credit card fraud.63 Artıs, Ayuso and Guillén (1999) develop a

discrete choice methodology based on a multinomial logit model to estimate the risk of

automobile insurance fraud.64 Their approach uses a multinomial (instead of binary)

approach since they consider different categories of claims: legitimate, fraudulent for

personal profit and fraudulent for third party benefit. Their dataset includes claims data for

three years, of which half were legitimate and the other half fraudulent. Another example of

econometric modelling is Andreoni, Erard, and Feinstein (1998), who survey a variety of

papers that employ econometric methods (such as tobit model) to estimate the

determinants of tax compliance.65 These analyses are typically based on the US Taxpayer

Compliance Measurement Program (TCMP) data.

c. Data mining

With the increasing availability of very large data sets in some fields (sometimes referred

as ‘big data’), several methods have been developed in computer science to identify

patterns in the data. These methods are sometimes labelled collectively as data mining.

The main advantage of these methods is that they are largely automated and require

relatively few assumptions on the part of the designer on the particular form of the

relationship between the variables. In fact, these methods attempt to discover these

relationships themselves from the data.

When compared to econometric methods, these methods have the advantage of allowing

for the identification of highly non-linear and/or clustered patterns. In other words, they are

less restrictive in terms of functional forms. The main disadvantages are two-fold. First, the

data requirements are substantially higher (typically in the order of at least tens of

thousands data points). Not surprisingly, if more information is to be extracted from the

data, more data is needed. Second, the scope for testing the statistical validity of the

established relationships is more limited.

Data mining includes a variety of methods. These can be broadly classified in the following

categories:

Cluster analysis / nearest-neighbour methods. Clustering consist of grouping

observations according to their similarities. There are a large number of algorithms

used to detect patterns in the data. An example of cluster analysis is the class of

nearest-neighbour methods, where the notion of closeness is determined by a

dissimilarity function.

63

Manuela, P. and Paba A. (2010): "A discrete choice approach to model credit card fraud". 64

Artıs, M., Ayuso, M., & Guillén, M. (1999). Modelling different types of automobile insurance fraud behaviour in the Spanish market. Insurance: Mathematics and Economics, 24(1), 67-81.

65 Andreoni, J., Erard, B., & Feinstein, J. (1998). Tax compliance. Journal of economic literature, 818-860.

Annex II: Methodologies Used to Study Fraud

88

Artificial neural networks (ANN). ANNs are models based on biological nervous

systems in that nodes (or ‘neurons’) are connected by a network. These models

allow for multiple logic ‘layers’ (i.e. multiple types of patterns)

Machine learning / decision trees / rule-based learning. This type of models

optimises prediction rules according to experience and some performance

measure. Decision trees and rules are particular forms of representing prediction

rules.

Data mining techniques are used to detect fraud in areas where there is abundant data

available. The main field in which these methods are applied is credit card fraud. For this

type of fraud banks typically possess millions of observed transactions, with a high

proportion of them identified as fraudulent. Data mining techniques are then used to detect

patterns in fraudulent transactions and predict the likelihood of fraud in new transactions.

Stolfo, Fan, Lee, Prodromidis, and Chan (1997)66 and Chan, Fan, Prodromidis, and Stolfo

(1999)67 apply data mining techniques to look for patterns in credit card transactions data.

Their dataset contains several months of data and at least half a million of credit card

transactions, with substantial numbers of both with legitimate and fraudulent ones. The

specific methods employed are meta-learning via classifiers and parallelisation. Also in this

literature, Srivastava, Kundu, Sural, and Majumdar, (2008) used an alternative method:

Hidden Markov Models (HMM).68

Panigrahi, et al. (2009) propose the use of Bayesian learning to tackle credit card fraud.69

They describe a fraud detection system (FDS) that determine the suspicion level of each

incoming transaction based on the extent of its deviation from good pattern. The

transaction is classified as normal, abnormal or suspicious depending on this initial belief.

Once a transaction is found to be suspicious, belief is further strengthened or weakened

according to its similarity with fraudulent or genuine transaction history using Bayesian

learning.

Bhattacharyya, Siddhartha, et al. (2011) compared the performance of Artificial Neural

Networks (ANN) methods with the results logistic regression (with a binary fraud

variable).70 They find that logistic regressions performs competitively and often surpassed,

more sophisticated data mining techniques in some performance measures. Their analysis

is based on a dataset of approximately 50 million credit card transactions.

66

Stolfo, S., Fan, W., Lee, W., Prodromidis, A., & Chan, P. (1997). Credit card fraud detection using meta-learning: Issues and initial results. In AAAI-97 Workshop on Fraud Detection and Risk Management.

67 Chan, P. K., Fan, W., Prodromidis, A. L., & Stolfo, S. J. (1999) “Distributed data mining in credit card

fraud detection” Intelligent Systems and their Applications, IEEE, 14(6), 67-74. 68

Srivastava, A., Kundu, A., Sural, S., & Majumdar, A. K. (2008) “Credit card fraud detection using hidden Markov model” Dependable and Secure Computing, IEEE Transactions on, 5(1), 37-48.

69 Panigrahi, Suvasini, et al. "Credit card fraud detection: A fusion approach using Dempster–Shafer theory

and Bayesian learning." Information Fusion 10.4 (2009): 354-363. 70

Bhattacharyya, Siddhartha, et al. "Data mining for credit card fraud: A comparative study." Decision Support Systems 50.3 (2011): 602-613.

Annex II: Methodologies Used to Study Fraud

89

In food, cluster analysis has been used by the Economically Motivated Adulteration (EMA)

database of food fraud elaborated by NCPFD (see the Annex on data sources) to establish

susceptibility of fraud for several food ingredients. This method is based on five criteria

identified by expert evaluations: the level of complexity of composition of the ingredient,

variability of the ingredient, selectivity of the ID test(s), specificity of the assay(s), and the

ability to detect EMA based on a loss of function in the final food product. The resulting

scores of the evaluations were used to perform a cluster analysis to yield distinct groups of

ingredient monographs based on EMA susceptibility. These groups are separated by the

following characteristics: higher susceptibility based on ID tests and assays, generally

lower susceptibility to EMA based on all attributes, generally higher susceptibility to RMA

based on all attributes and pending review. In contrast to direct index construction, this

categorisation contains no ranking or score provided for any of the ingredients on the

website.

Annex III: Data Sources

90

10. Annex III: Data Sources

In this section we identify and describe some of the main candidate data sources that

would feed the proposed methodology. We primarily review databases with previous

incidents of food fraud and economic data.

A large number of the data sources presented here are publicly available, although some

of them are only available by subscription and a small minority only allows access to policy

makers and institutions. We used electronic search engines and articles (both from

academic and policy literatures) to identify the relevant datasets. We explored the

databases that were accessible to us analytically and catalogued the available data

offered in each one of them. We also report the existence of other data sources that might

be relevant but for which we currently have no access.

In the remainder of this section we describe the content of these data sources. In a

separate Annex screenshots of some of these databases are presented, providing more

detail on the way in which they are structured.

a. Food fraud data

First we explore the available datasets that contain information on previous instances of

food fraud. For each database, we use the information available to us to identify:

Time coverage.

Geographical coverage.

Products covered.

Data on the extent of fraud / adulteration detected.

At the moment we have identified the following datasets:

USP Food Fraud Database:71 the dataset lists observed and reported food adulterants

since 1980 and a directory of possible detection methods reported in peer-reviewed

scientific journals. The database includes at present 1305 records, including 1000 records

with analytical methods collected from 677 references. In the future, this database may

expand to include additional publically available articles published before 1980 and in

other languages, as well as data outside the public domain. Most of the data entries refer

to incidents that occurred in US.

Time period covered: 1980-2010 for all food ingredients. The database contains all

the reports that have been published since 1980 on each possible food ingredient.

Data on the extent of fraud: this is not uniform and depends on the available

reports. Some reports are able to provide more details on the extent of the fraud

such as how many units have been adulterated and which geographical areas have

been affected while others merely report a particular incident without providing any

71

http://www.foodfraud.org/node?destination=node

Annex III: Data Sources

91

background information to allow us to analyse how significant that incident has

been.

EMA research database:72 every year the NCFPD compiles a database with documented

incidents of Economically Motivated Adulterations since 1980 in an online, searchable

database for the USA. This database provides information about the food product,

adulterant, the type of EMA, known health consequences and how the incident was

discovered. The dataset is initially available on a free trial basis and subsequently on a

paid-for-access basis. We do not have information on the price for this service.

Time period covered: 1980- present for all products.

Data on the extent of fraud: the products are organised in the following categories:

alcoholic beverages, animal food products, coffee and tea, dairy products, eggs,

fish and seafood products, fruit juices and concentrates, functional food ingredients,

grains and grain products, honey and other natural sweeteners, infant formula,

meat and meat products, oils and fats, spices, other food products and other

beverages. For each incident the user can find details on the year that it began and

ended, the number of illnesses or deaths that it caused, the type of adulteration

together with the number of references, the name of the consumer brands that were

affected, the adulterant, the produced location and the distributed location.

The FSA Food Fraud Database:73 the information in this database is only available to local

authorities and other governmental organisations. The database includes reported

incidents of food fraud in the UK. The information included in the database is received

from a range of sources including local authorities, consumers, industry, government

departments and other enforcement bodies.

Time period covered: 2007 – present.

Data on the extent of fraud: the database connects food fraud incidents as reported

by local authorities. The scope of the dataset is best understood through an

example. In the dataset there are a number of nodes which are interconnected.

These nodes can be of different types, such as local authorities, retailers, suppliers

and the food products involved. When a local authority reports suspicions of

fraudulent behaviour by a given supplier then this is documented in the data. In

case that the specific supplier has been reported before then the new information

will be immediately linked to his profile. At the same time, the database contains

information about the retailers who bought products from that specific supplier and

sometimes intelligence about the people who have supplied the given supplier with

various inputs. This way, a complex network is created which might be investigated

if enough evidence/concern is present to justify an investigation. It contains about

1400 investigations so far and covers the entire UK. While the unit of analysis of

this database are the reports received by the FSA, it would be theoretically possible

to aggregate them by food product.

72

https://www.foodshield.org/ 73

http://www.food.gov.uk/enforcement/enforcework/foodfraud/foodfrauddatabase#.U4WhFXJdVBk

Annex III: Data Sources

92

HorizonScan:74 this database is available by subscription only. However, it can be

accessed for a short period of time on a free trial basis. Data are gathered, wherever

possible, from official government sources and to date 66 countries around the world are

monitored daily. Data for countries other than the UK are gathered from reliable web

sources. The dataset can be requested initially on a free trial basis and subsequently it

costs £1500 to continue to have access to the dataset.

Time period covered: 1982-present for all reported products.

Data on the extent of fraud: the database contains reports on hundreds of products,

however, the level of detail available is limited. The user is able to find information

on the type of fraud that has occurred (adulteration or imitation, expiry date

changes, fraudulent health certificate/adulteration, produced without inspection,

unapproved premises, unauthorised/ unsuitable transport), the exporting and

importing country and finally the date when the incident was reported. However, it is

not possible to determine the extent of the fraud and how many people have been

affected.

Rapid Alert System for Food and Feed (RASFF):75 the Rapid Alert System for Food and

Feed (RASFF) was developed in order to provide food and feed control authorities a way

to exchange and access information about measures taken in response to serious risks

detected in relation to food or feed. The RASFF portal features an interactive searchable

online database. The information included in the database is arranged by classification,

hazard and product category and country of origin of the product notified. An ‘alert

notification’ is transmitted when a food is detected that presents a considerable risk and is

on the market which means that an immediate action is in another country than the

notifying country. Alerts are triggered by the Member State that has detected the problem

and has proceeded to take relevant measures, such as withdrawal or recall. The data is

only available to the relevant authorities.

Time period covered: 1991-present for all reported products.

Data on the extent of fraud: the dataset contains only information on the products

that were detected to be hazardous or adulterated. However, there could be a

number of products that have crossed borders without being detected even though

they were representing a serious hazard. The possibility to detect follow-up

notifications and volumes traded allows us to get a picture of the extent of fraud. In

2013, a total of 3205 original notifications were transmitted through the RASFF, of

which 596 were classified as alert, 442 as information for follow-up, 705 as

information for attention and 1462 as border rejection notification.

Zhichuchuangwai database:76 This is a database of past instances of food fraud in China.

The data is publicly available in Chinese. The data was collected from the Chinese food

safety issues News Archive.

Time period covered: 2004-2011 for all products.

74

https://secure.fera.defra.gov.uk/deskcheck/ 75

http://ec.europa.eu/food/food/rapidalert/index_en.htm 76

www.zccw.info

Annex III: Data Sources

93

Data on the extent of fraud: the data is broken down by Chinese region and

analytical statistics are provided for each region on the level of fraud detected in

each one of them. The products that are analysed include milk, tea powder, pork,

rice, wine, soy sauce, cooking oil, waste oil, beef, beverage, moon cake, candied

honey, wine, jelly bean, sprouts, egg, flour, dumplings, drinking bottled water, seeds

and vegetables, health food, ginger bread, mineral water, instant noodles, cabbage,

cake, ham, beans ice-cream cakes, steamed apple juice, wine, chocolate biscuits,

lamb, dried bean milk , canned mushroom etc.

FSA Food Authenticity Programme:77 this program conducted a number of studies to

investigate whether the food purchased by the consumer matches its description. These

studies included consumer research, the introduction of markers that characterise the

authenticity of foods, validation testing methods and undertaking surveys. The latter

activity produced a considerable amount of data that could be used for the purposes of this

project. For every examination that they conduct a report is published with the findings.

These reports are publicly available.

Time period covered: 2000-2008.

Data on the extent of fraud: the reports published are very detailed in terms of the

laboratory findings and the extent of fraud that was identified. The surveys report

the number of producers investigated, together with the number of incidents. While

the time coverage of the whole program is significant, each survey was conducted

for a specific product (or group of them) covering much more limited periods

(typically a few months).

Database on International Intellectual Property (DIIP) Crime:78 this database is compiled

by INTERPOL and fed by sources such as Operation Opson. It collects information about

trafficking in illicit, adulterated and counterfeit goods and deals with transnational cases.

The data gathered is then analysed in order to establish potential links between

transnational and organised cross-sector criminal activity and also to develop strategic

illicit trade crime reports.

INTERPOL does not disclose specific information contained in the DIIP. However, the

affected industries and stakeholders are notified in the form of referrals indicating that two

or more industries are being targeted by the same transnational organized criminals. While

we currently do not have access to this database, we consider that it might be possible to

obtain it in the near future.

Time period covered: 2008- present for all reported products.

Data on the extent of fraud: The dataset could provide information about who is

involved in counterfeit and illicit foodstuffs in the UK, how are the supply chain

organised and whether there are any key enablers to these supply chains.

77

http://tna.europarchive.org/20100929190231/http://www.food.gov.uk/science/research/choiceandstandardsresearch/authenticityresearch/

78 http://www.interpol.int/Crime-areas/Trafficking-in-illicit-goods-and-counterfeiting/Databases

Annex III: Data Sources

94

POISON:79 this database is part of an electronic tool utilises data from a database

denominated FoodFraudster. The developers are a small private undertaking based in the

United States named FoodQuest TQ.80 The dataset is based on information that is

available on the web (via web scraping) for any new information that becomes available

that relates to food fraud and food safety. The information is obtained using a number of

sophisticated algorithms that retrieves the data, structures it and applies a probabilistic

system filters that weigh information according to the most relevant and reliable sources.

The process is then supervised by subject experts. At present, the database contains 1100

types of fraud for a selected group of products. In addition to food fraud data from

POISON, FoodFraudster uses similar web scraping methods to obtain relevant economic

data on the selected products.

Time period covered: any date found.

Data on the extent of fraud: the database focuses on 6 products, beef; honey; fish;

olive oil; rice, and; cocoa on a worldwide basis. It contains intelligence on when a

fraud took place, how many instances were detected, what steps the authorities

took, what are the possible symptoms, the country from which the fraudulent

product came from etc. Each data entry is given a specific weight depending on its

reliability and that feeds through to raise an alarm whenever a particular area is

under high risk of fraud because many reports came to surface. The system on its

own is able to create a risk profile for each category of food and for a number of

countries internationally.

UK Food Surveillance system:81 is a national database for central storage of analytical

results from feed and food samples taken by enforcement authorities (local authorities and

port health authorities) as part of their official controls. Information about each sample and

the results of analysis are entered onto the system, and then validated, using the data

entry tool. The database is password protected and can be accessed by enforcement

authorities and laboratories to search for anonymised local, regional, and national

datasets, and identify trends and areas of non-compliance that can help develop sampling

plans. This database might provide some measures of the level of investigative efforts

devoted by local authorities on different food products. The Food Surveillance System is a

database complied and managed by the FSA. The database maintains a record of food

and feed samples taken by local authorities and examined by public analysts across UK.

The dataset covers almost the entire Northern Ireland, 62% of all the local authorities in

England and 77% of all the local authorities in Wales. The way it works is that for every

food sample taken by local authorities a new record is created which documents all the

tests taken upon that sample. The outcome of these tests is also documented so that

public authorities are able to take action when a number of hazardous incidents are

reported. The dataset contains a number of useful information on each sample. This

information includes the premise where it was taken (retail, manufacturing etc.), reason for

taking the sample, follow ups, description of the product, category of the product,

79

http://nfpcportal.com/FQTools/FoodFraudster/tabid/329/Default.aspx 80

http://foodquesttq.com/ 81

http://www.food.gov.uk/enforcement/monitoring/fss/#.U4d6mnJdVBk

Annex III: Data Sources

95

packaging and labelling information and a picture of the label. Public authorities are able to

extract the public analyst’s report and read it and also track the samples that were taken

but not analysed yet. In 2013 about 31000 samples were collected and over 200 000 tests

were conducted on these samples.

Time period covered: 2006 – present for all reported products.

Data on extent of fraud: public authorities are able to filter their searches according

to the tests that were taken, the type of food tested or whether the test taken has

deemed the sample satisfactory or not. At the same time, public authorities are able

to have answers to the questions of how many times a given food sample was

tested and out of those times how many instances of fraud were detected.

b. Economic data

This database category will include all the identified datasets that provide economic

information which could affect the probability of food fraud. We believe that prices and

volumes traded are the most significant variables in determining the possibility of food

fraud and therefore we break down our analysis into datasets that contain information on

prices and datasets that contain information on volumes.

Economic data – prices

HMRC Database catalogue – imports and exports:82 this database provides overseas

statistics on the international trade between UK and the rest of the world. The data can be

broken down by commodity code or international trade classification. The data is collected

by HMRC's statistical and administrative systems and is available for a considerable

number of food products. The primary dataset relates to EU trade in goods arrivals

(imports) and dispatches (exports) data. The database is a publicly available dataset.

Period covered: 1996- present available by month, by year and quarterly.

Variables: Commodity - HS2 to CN8, or SITC 1-5 hierarchy, EU indicator - world,

EU, non-EU and continent groupings, country, year/month (or quarter for RTS), flow

- import, export, arrival, DispatchPort - UK place of clearance (for non-EU trade

only), total value and total volume.

Products: Live animals, meat and edible meat offal, fish and crustaceans, molluscs

and other aquatic invertebrates, dairy produce; birds' eggs; natural honey; edible

products of animal origin, not elsewhere specified or included, products of animal

origin not elsewhere specified or included, live trees and other plants; bulbs, roots

and the like; cut flowers and ornamental foliage, edible vegetables and certain roots

and tubers, edible fruit and nuts; peel of citrus fruits or melons, coffee, tea, mate

and spices, cereals, products of the milling industry; malt; starches; inulin; wheat

gluten, oil seeds and oleaginous fruits; miscellaneous grains, seeds and fruit;

industrial or medical plants; straw and fodder.

82

https://www.uktradeinfo.com/Statistics/BuildYourOwnTables/Pages/Home.aspx

Annex III: Data Sources

96

Eurostat:83 The Absolute Agricultural Prices database includes the prices on main

agricultural outputs and inputs. Since 2006 only annual prices have been collected. Before

that the dataset also included monthly observation. The Member States provide Eurostat

with the required annual price series. The way that prices are measured by Eurostat is by

considering the amount they directly contribute to farmer’s income. Therefore, selling

prices are recorded at the marketing stage (the price at which the producer sells to the

trader), while purchase prices are recorded at the last marketing stage (the price from the

trader to the producer).

Period covered: For several countries data availability goes back to 1970, for most

of the countries data is available from 1990. Monthly data are available until 2006.

Variables: time, country, currency, product, producer price, consumer price.

Products: rice, chick peas, dried peas, dried beans, broad beans, lentils, main crop

potatoes, sugar, tobacco, soya, lemons, cherries, apricots, garlic, melons,

asparagus, cauliflower, chicory.

FAO GIEWS Food Price Data and Analysis Tool:84 the GIEWS started in 2008-2009 and it

contains basic food prices. This activity was part of the FAO Initiative on Soaring Food

Prices (ISFP). The database currently includes 1168 monthly domestic retail and/or

wholesale price series of major foods consumed in 190 countries covering a total of 20

different food commodity categories. Sources include meteorological information, agencies

operating satellites for earth observation, news services such as Reuters, Associated

Press, other news organizations, information from national institutions available through

publications or web sites, various reports and studies. They also send questionnaires to

various partners (FAO offices, government agencies and NGOs)

Period covered: 2000-present. Since 2012, data can also be downloaded on a

monthly basis.

Variables: country, market, commodity, price, year, currency, weight.

Products: bread, wheat, rice, soybean oil, sugar, mutton meat, potatoes, beef meat,

maize, millet, milk, prawns, bananas, barley, yam.

FAOSTAT database:85 this database provides a large selection of time series and cross

sectional data that are related to hunger issues, food and agriculture for 245 countries and

territories and 35 regional areas from 1961 to present. It also provides a number of tools

for the visualisation of the data and some basic statistical analysis (univariate and

multivariate regressions). Most of the data originated from country sources received

through the FAO Questionnaire. Periodicity of national data collections vary as countries

follow different national practices and methodologies. Data collection at the national level

is normally monthly, but can be weekly for some countries. FAO collects annually the

average prices from the countries on an annual basis. All the data is publicly available.

Period covered: 1966-present for some 200 commodities, representing over 97

percent of the world’s value of gross agricultural production in 2006.

83

http://epp.eurostat.ec.europa.eu/portal/page/portal/agriculture/data/database 84

http://www.fao.org/giews/pricetool/ 85

http://faostat3.fao.org/faostat-gateway/go/to/home/E

Annex III: Data Sources

97

Variables: producer prices, producer price indices, consumer price indices, country,

region, item, year.

Products: apples, barley, cabbages and other brassicas, carrots and turnips,

cauliflowers and broccoli, leeks, other alliaceous vegetables, lettuce and chicory,

milk, whole fresh cow, mushrooms and truffles, onions, dry, pears, potatoes, wheat.

Farmers weekly:86 this is a UK focused database that provides statistical information for

farmers. It is constructed by the Farmers Weekly magazine and is publicly available. The

information is provided to the magazine from local businesses and local authorities.

Period covered: past three months, on daily, weekly and monthly basis.

Variables: prices, quotas, region, time period, currency.

Products: HGCA grain prices, potatoes, grains, oilseeds, pulses, vegetables, hay

and straw, straights, milk, meat, cattle, sheep, pig.

Seafish authority market data:87 the database contains market data on the UK seafood

market, which includes information on retail price fluctuations at species level, and

category-based consumption trends. The reports produced by the authority together with

the monthly syndicated market data is not publicly available and only industry related

businesses can gain access to this information. More high level information such as

summaries of the market is publicly available online. The source of the data presented on

the website is the BTS Trade Statistics and the MMO reports.

Period covered: 1974-present, more detailed statistics are available for the period

between 2011 and 2014.

Variables: retail sales, top species by value and volume, seafood sales, share of

trade between major retailers, household purchases of fish, UK landings, UK ports,

import countries, export countries.

Products: salmon, tuna, cod, haddock, warm-water prawns, mackerel, pollack,

scampi, whitting, tilapia, sardines, trout, sea bass, mussels, sole, crabstick, kipper,

crab, scallops, basa, anchovy, pilchards, herring, squid, coley, sea bream, monk

fish, halibut, crayfish, lobster, hake.

IMF Primary Commodity Prices:88 this dataset presents data on primary commodity prices

on a monthly, quarterly and yearly basis. The prices are quoted in nominal terms in US

dollars. The data is collected directly by the IMF and is publicly available.

Period covered: 1980- present, annual quarterly and monthly.

Variables: price (2005=100), country, index, currency.

Products: food and beverage, beverage, industrial input, timber, cotton, wool,

rubber, and hides price indices, copper, aluminium, iron ore, tin, nickel, zinc, lead,

and uranium, crude oil (petroleum), natural gas, and coal, bananas, barley, beef,

cocoa, coffee, rapeseed oil, fishmeal, groundnut, lamb.

86

http://www.fwi.co.uk/business/prices-trends/ 87

http://www.seafish.org/research--economics/market-insight/market-data 88

http://www.imf.org/external/np/res/commod/index.aspx

Annex III: Data Sources

98

Defra - Wholesale fruit and vegetable prices:89 The fruit and vegetable wholesale price

database contains data on the average price at wholesale markets in England. Prices are

collected for a selection of the most common home-grown fruit, vegetables and flowers.

Prices are collected from Birmingham, Bristol, Liverpool and New Spitalfields with flower

prices also collected from New Covent Garden. The five most usual prices are collected

from each market along with the percentage sold at this price. Additionally information on

the supply of produce at each market is recorded.

Period covered: 2004-2014 on a monthly basis.

Variables: fruit or vegetable, quality, units, average monthly price.

Products: blackberries, blackcurrants, cherries, cooking apples, dessert apples,

gooseberries, pears, plums, raspberries, strawberries, asparagus, beetroot, brussel

sprouts, onions, cabbage, cauliflower, leeks, lettuce, spinach, tomato, turnip,

swede, sweet corn, rhubarb.

Defra – commodity prices dataset:90 this database contains the prices for selected

agricultural and horticultural produce and is published on a weekly or monthly basis. The

data source depends on the item but includes prices taken from trade journals or other

organisations in addition to prices collected by the Department for Environment, Food and

Rural Affairs.

Period covered: 2009 - present, on weekly monthly and yearly basis.

Variables: price per tonne, product, time.

Products: animal feed, bananas, cattle compensation prices, hay and straw,

livestock, cereals, poultry, eggs, butter, cheese, potatoes, sugar, sheep and pigs.

Defra – UK milk and composition of milk:91 the data available in this dataset is gathered

through a monthly survey run by Defra in England and Wales to collect information on the

volume, value and protein content of milk purchased from farms. Similar surveys are run in

Scotland and Northern Ireland. Additional information is collected by the Rural Payments

Agency (RPA) on the protein and butterfat content of the milk. The UK average farm-gate

milk price, protein content and butterfat content are then calculated.

Period Covered: 1991 - present, on weekly monthly and yearly basis.

Variables: price excluding bonus payment, price including bonus payment, butterfat

%, protein %, time.

Products: milk.

Gov.uk - Overseas trade in food, feed and drink:92 this database contains a variety of

statistics on UK imports and exports of food, feed and drink based on data collected by

HM Revenue and Customs. These statistics include Defra a long term series showing the

value of UK imports, exports and balance of trade for total food, feed and drink from 1936,

detailed figures on the value and volume of UK imports and exports of food, feed and

89

https://www.uktradeinfo.com/Statistics/BuildYourOwnTables/Pages/Home.aspx 90

https://www.gov.uk/government/statistical-data-sets/commodity-prices 91

https://www.gov.uk/government/publications/uk-milk-prices-and-composition-of-milk 92

https://www.gov.uk/government/statistical-data-sets/overseas-trade-in-food-feed-and-drink

Annex III: Data Sources

99

drink, degree of processing and commodity type from 1988, and a series showing the UK’s

food production to supply ratio (commonly referred to as the “self-sufficiency” ratio) from

1956. The dataset is publicly available.

Period covered: 1936 - present on a yearly basis.

Variables: imports, exports, balance of trade, time, value and volume of UK imports

and exports, food production to supply ratio.

Products: animal oils and fats, not chemically modified, apples, fresh, apricots,

cherries, peaches, plums and sloes (fresh), bacon and ham, bananas, barley,

unmilled, beef and veal, beef products (incl. corned beef), beer, bread, crispbreads,

savoury biscuits, butter, cereal, milled, cereal, rolled or flaked, cheese, chocolate,

cider & other fermented beverages, cocoa, coffee, condensed milk, crustaceans,

dog or cat food for retail, edible offal and other meat, eggs & egg products, fish &

crustaceans prepared or preserved, fish fresh or chilled, fish frozen, fish live, flours,

meals and pellets of meat, offal or fish, grapes, fresh or dried, honey, ice cream,

infant food for retail, jams, juice, lamb and mutton, lemons and limes, lettuce and

chicory, fresh or chilled, margarine, milk and cream.

Mundi index:93 this website contains statistics on countries commodity and trade data. It

also contains charts and maps compiled from various data sources. The datasets are

publicly available.

Period covered: 1992 – present, yearly.

Variables: production, consumption, exports, imports, prices.

Products: crude oil, jet fuel, gasoline, diesel, coffee, tea, barley, maize, rice, wheat,

bananas, oranges, beef, poultry, lamb, swine, salmon, shrimp, sugar, coconut oil,

palm oil, peanut oil, rapeseed oil, cotton, rubber, wood.

Brand View:94 this is an international price and promotions intelligence tool that provides

information to retailers and manufacturers so that they can measure and manage their

price position. It is available on a 14 day free trial and after that on a paid-for-access

subscription basis.

Period covered: 2009 – present, on a daily, weekly, monthly and yearly basis.

Variables: the database contains information on price fluctuations across retailers

for specific products.

Products: we have been given access to data on chilled meals as part of a trial.

However, the website keeps track of all the products that are presented on major

retailer’s websites and the prices at which those products are sold through the

website. These are consumers prices and now wholesale prices.

Economy Watch – Economics Statistics Database:95 the price Index indicators that are

available on this website have been constructed using IMF data from 1980 onwards. Using

that data the website also makes forecasts about future price indicators up to the end of

93

http://www.indexmundi.com/ 94

http://www.brandview.com/ 95

http://www.economywatch.com/

Annex III: Data Sources

100

2014. The price indices measure the average cost of either single items or baskets of

goods on a global basis for the year in question. The data is publicly available but is not

UK specific. The indices available are world indicators.

Period covered: 1980 – present, on a yearly basis.

Variables: prices and year.

Products: cereal, vegetable oils, meat, seafood, sugar, bananas, and oranges.

Economic data – volumes

Comtrade:96 the United Nations Commodity Trade Statistics Database contains detailed

imports and exports statistics reported by statistical authorities of close to 200 countries or

areas. It concerns annual trade data from 1962 to the most recent year and is publicly

available.

Period covered: 1962 – present, on a yearly and monthly basis.

Variables: imports and exports prices and volumes.

Products: meat and edible meat offal, fish, crustaceans, molluscs, aquatic

invertebrates, dairy products, eggs, honey, edible animal product, products of

animal origin, live trees, plants, bulbs, roots, cut flower, edible vegetables and

certain roots and tubers, edible fruit, nuts, peel of citrus fruit, melons, coffee, tea,

mate and spices, cereals, milling products, malt, starches, inulin, wheat gluten, oil

seed, grain, seed, fruit, gums, resins, vegetable saps and extracts, vegetable

plaiting materials, vegetable products, animal, vegetable fats and oils, cleavage

products, meat, fish and seafood food preparations, sugars and sugar

confectionery, cocoa and cocoa preparations, cereal, flour, starch, milk preparations

and products, vegetable, fruit, nut, food preparations, miscellaneous edible

preparation, beverages, spirits and vinegar, residues, wastes of food industry,

animal fodder, tobacco and manufactured tobacco substitutes.

Eurostat – international trade data: this database covers both extra- and intra-EU trade:

Extra-EU trade statistics cover the trading of goods between Member States and a non-

member countries. Intra-EU trade statistics cover the trading of goods between Member

States. "Goods" means all movable property including electricity. The main source of

statistical information are mainly the traders on the basis of Customs (extra-EU) and

Intrastat (intra-EU) declarations. Data are collected by the national authorities of the

Member States and compiled according to a harmonised methodology established by EU

regulations before transmission to Eurostat.

Period covered: 1999 – present, on a yearly basis.

Variables: reporting country, reference period, trade flow, product, trading partner

mode of transport. trade value (in Euro), trade quantity in 100 kg, trade quantity in

supplementary units, gross and seasonally adjusted trade value (in million Euro),

unit-value indices, gross and seasonally adjusted volume indices, growth rates of

96

http://comtrade.un.org/db/

Annex III: Data Sources

101

trade values and indices, trade value (in billion Euro), shares of Member States in

EU and world trade, shares of main trading partners in EU trade.

Products: Food, drinks and tobacco.

ONS - Fish production:97 this database covers catch and trade statistics for the UK fishing

industry. The catch and landings data available include information on the quantity, value,

species and area of capture by UK vessels landing into the UK and abroad, and foreign

vessels landing into the UK. The overseas trade statistics bring together the data on the

fish and fish products available for consumption, imports, exports and household

consumption. The data sources include logbooks, landing declarations, sales notes and

personal contact with fishermen and merchants. The method used for collecting data

depends upon the size of vessel and location of landings. All the data are publicly

available.

Period covered: 1866 – present, on a yearly basis.

Variables: landings by UK vessels, production, UK vessels into key ports, size of

UK fishing fleet, number of UK fishermen, imports and exports of fish, GDP for fish,

world catch by sea area.

Products: fish, tuna and mackerel.

FAOSTAT – Agricultural Production Index:98 this database looks at the relative level of the

aggregate volume of agricultural production for each year in comparison with the base

period 2004-2006. They are based on the sum of price-weighted quantities of different

agricultural commodities produced after deductions of quantities used as seed and feed

weighted in a similar manner. Production quantities of each commodity are weighted by

2004-2006 average international commodity prices and summed for each year. The data is

publicly available.

Period covered: 1961 – present, on a yearly basis.

Variables: area harvested, yield, production quantity, seed.

Products: Crops, processed crops, live animals, livestock primary, livestock

processed

NOAA – National Marine Fisheries Service:99 the NOAA Fisheries, Fisheries Statistics

Division has automated data summary programs that anyone can use to rapidly and easily

summarize U.S. commercial fisheries landings.

Period covered: 1990 – present, on a yearly basis.

Variables: number of landings.

97

http://www.statistics.gov.uk/hub/agriculture-environment/fish/fish-production/index.html 98

http://faostat3.fao.org/faostat-gateway/go/to/download/Q/*/E 99

http://www.st.nmfs.noaa.gov/commercial-fisheries/

Annex III: Data Sources

102

c. Other data considerations

The selected methodology is likely to include other factors beyond the economic ones. We

expect them to be, depending on the product, a subset of the factors listed in the literature

review above. Given the wide variety of variables that could potentially be included we do

not provide a systematic review of all these sources.

We note that many of the non-economic factors, such as product and distribution

characteristics, are likely to be product specific and not change significantly over time.

Therefore, if the analysis is performed on a single product these factors would not

introduce any variation to contribute to the explanatory power of the model. To conclude

this annex, we discuss potential gaps in data availability.

Data sources not identified

The type of data that we are currently missing involves variables that are difficult to

measure or even define. For example, there are multiple sources in the literature that

indicate that the complexity of the supply chain plays an important role in determining the

likelihood of food fraud (e.g. the Elliott review). According to the literature, the longer and

more complex the supply chain the higher the probability of fraud. However, there are

multiple metrics that could be used to capture this variable. Moreover, we have not found a

universal data source that could be used to construct this variable, in whatever form it is

defined. We envisage that expert judgement and advice would be a major input in

elaborating these measures for the case study and other future applications.

Annex IV: Econometric methodology

103

11. Annex IV: Econometric methodology

While not necessarily an econometric requirement, it is good practice to provide some

description of the data before conducting any estimations. The type of statistics that are

recommendable are:

Summary table: it would contain the maximum, minimum and average value of each

variable. Additional information could include the standard error.

Linear correlation table: this square table estimates the pairwise linear correlation

between all variables. Correlations with the explained variable provide a less

sophisticated quantification of the effect of the explained variable (e.g. it would not

control for other variables being constant). More importantly, this table would be

useful to anticipate multicollinearity issues. These occur when two or more

explanatory variables are highly correlated. Therefore, regression methods might

struggle to attribute the effect of these variables separately, especially in the case of

small datasets.

Bi-variate charts, typically between the explained variable and other variables: this

type of charts usually provides insight into the nature of the correlation with the

explanatory variables.

T-tests: these are particularly useful when the explained variable is binary (e.g. if

the variable is whether fraud was observed in the period). It would be possible to

calculate the mean of the explanatory variables for observations where the

explained variable is equal to zero and one, respectively. These means would

typically differ. However, the t-test would evaluate whether the difference is

statistically significant.

Models: specifications and estimation methods

The proposed econometric approach is not a single model but a family of them. In fact, it is

considered good practice to estimate the desired relationships using different models, to

explore the robustness of the estimations. The models can vary depending on their

specification (the set of variables that are chosen) and the estimation method.

Based on a number of statistical diagnostics and tests (see below), it is possible to

determine which model and specification fits better the data and, therefore, provides the

more reliable estimations.

Given the nature of food fraud data (in particular of the explained variable), the

methodology consists of the three following classes of methods:

Ordinary Least Squares (OLS): This method postulates a relationship of the form:

where the s are the estimated coefficients. This method chooses the coefficients so that

the sum of squared errors is minimised. It is the most popular method used in economics

and has many advantageous properties, such producing unbiased estimates. The

Annex IV: Econometric methodology

104

explained variable for this method can be defined as the fraction of non-compliant tests out

of the total number of samples taken, maximising the amount of information available in

the data. However, it has an important disadvantage: the linearity of the model allow in

principle for the risk of fraud to be unbounded. This method could lead to contradictory

results, since the risk of fraud should always be bounded between zero and one.

Binary methods: These methods are particularly appropriate when the explained variable

can take only values between zero and one. The method would estimate the probability of

fraud using a cumulative probability distribution as the functional form instead of a linear

function, as postulated by OLS. This method is very popular since, by definition,

cumulative distribution functions are strictly increasing ranging from zero to one. The most

common methods are logit and probit, which use the logistic and normal cumulative

distribution functions. In the case of fraud, the explained variable would take a value of one

if fraud was detected in that period and zero otherwise.

Multinomial methods: These models are an extension of the binary methods whereby the

explained variable can take more than two values and these values are ordered. The most

used methods included the (ordered) multinomial logit (for categorical variables) and

Poisson or negative binomial methods (for integers). In the case of categorical values, the

explanatory variable could be constructed using categories such as low, medium and high

risk. In the case of integers, commonly referred as count methods, the explanatory

variable could be defined as the number of identified cases of fraud.

In addition, for each model it is possible to try different specifications either by using

different sets of explanatory variables or by defining these variables in one of the following

forms:

Levels: the variable is expressed in its original form.

Differences: the variable is expressed as the difference between the current and

previous period. A regression using differences would establish the relation

between changes in the variables from one period to the next.

Logarithms: the variable is expressed as the logarithm of the original. This might

serve two purposes. First, it modifies the functional form of the regression

equations, which might provide a better fit for the data. Second, the interpretation

of the coefficients is made in terms of “elasticities”. That is, instead of capturing the

effect of an increase in one unit, the coefficient captures the effect of an increase in

one per cent.

Lagged: the variable is expressed as the level of the previous period (or periods).

The regression is then well-suited to capture changes that manifest themselves with

a delay.

Polynomial: the variable is expressed as the different powers of the original.

Therefore, instead of estimating a linear equation, the regression estimates a

polynomial.

Annex IV: Econometric methodology

105

Diagnostics and tests

Given the large number of possible models and the conceptual differences in the methods

proposed above, it is important to count with criteria to select the most appropriate

estimations. This selection is assisted by the following statistics and tests:

Statistical significance of individual and joint variables. It is advisable to perform a

statistical test that evaluates the hypothesis that the variables have no explanatory

power at all (i.e. that the real coefficients are equal to zero). These tests are

standard in any econometric methodology. The null hypothesis that the variables

have no statistical significance can be evaluated with different levels of confidence,

typically ranging between 90 and 99 per cent. It is important to notice that these

tests might erroneously accept the null hypothesis if the data sample is small.

Explanatory power (e.g. R-squared) and goodness of fit (adjusted R-squared or

Akaike information criterion). These statistics capture the amount of variation in the

explained variable that can be attributed to variation in the explanatory variables. In

the case of the goodness of fit statistics, specifications that include a large number

of variables are penalised.

Heteroskedasticity (White test). OLS and other methods work under the assumption

that of errors are independent and identically distributed. If this assumption is not

satisfied, the coefficients might be biased. Heteroskedasticity refers in particular to

the case in which the variance in the errors is not uniform. The White test

investigates whether this is observed in the data. In case of detected

heteroskedasticity, it is possible to conduct a “robust” estimation that would correct

for this bias.

Auto-correlation (Breusch-Pagan test). Another violation of the methods’

assumptions occurs when the errors are correlated with each other over time,

creating biased results. The Breusch-Pagan test investigates whether this is

observed in the data.

Annex V: Linear Correlations

106

12. Annex V: Linear Correlations

Table 12.1: Linear Correlations

non compliant samples

fraud percentage

log of price differences india

log of price differencespakistan

log of indian exchange rate

log of basmati export from india to uk

log GDP India

log GDP Pakistan

log GDP UK

Log Basmati production India

Log of number of samples

non compliant samples

1

fraud percentage

0.62* 1

log of price differences india

0.28* 0.07*

1

log of price differences in pakistan

0.29* 0.15*

0.76* 1

log of indian exchange rate

0.3* -0.19*

0.42* 0.49* 1

log of basmati export from india to uk

0.07* -0.28*

0.17* 0.08* 0.6* 1

log GDP India

-0.01* -0.74*

-0.11* 0.24* 0.27* -0.79* 1

log GDP Pakistan

0.16* -0.73*

0.06* 0.3578* 0.71* 0.69* 0.82*

1

log GDP UK

0.14* -0.76*

0.14* 0.43* 0.61* 0.14* 0.9*

0.98*

1

log Basmati production India

-0.11* 0.0 0.19* 0.24* 0.40* 0.25* 0.20*

0.42*

0.39*

1

log of number of samples

0.62* 0.13*

0.55* 0.54* 0.60* -0.04* -0.1*

0.29*

0.26*

-0.23* 1

Note: * means the correlation is significant at a 95% significance level.

Annex VI: Econometric Estimation

107

13. Annex VI: Econometric Estimation

Table 13.1: Complete OLS results - India

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Constant -

0.2

0.44 0.0

3

0.18 -

0.46

-

0.84

0.0

9

0.11

***

0.14

**

0.01 0.0

4

0.1

1

-

0.0

8

-

0.2

1

2.0

9

4.85 2.8

2

2.7

9

2.7

9

48.

25

-

228

.21

395.7

1***

0.1

1

-

1.3

3

log_pr_diff_ind_ext

L0 0.0

5

0.98

*

1.6

0***

1.51

**

1.40

**

1.41

**

0.0

5

0.0

4

0.05 0.0

7

0.0

8

0.0

8

0 0.5

6

0.15 0 0.2

1

0.3

4

L1 -

1.03

*

-

3.0

4***

-

2.95

***

-

2.82

***

-

2.88

**

L2 1.4

6**

0.89 1.05 1.08

L3 0.55 -

0.27

-

0.05

L4 0.74 0.2

L5 0.39

d.log_pr_diff_ind_ext

L0 0.9

7*

1.59

**

1.51

**

1.63

***

L1 -

1.45

**

-

1.44

**

L2 -

0.54

L^2 5.62

*

Annex VI: Econometric Estimation

108

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

log_pr_ratio_ind_ext

L0 0.1 1.8

7*

3.6

0***

L1 -

1.9

3*

-

7.1

2***

L2

3.7

8***

log_basmati_prod_ind

L0 0 0.1

9

0

L1 -

0.3

7

0

L2 -0.4

log_basmati_exportq_ind_u

k

L0 -

0.3

6

0.8

5

0.8

5

L1 -

1.2

2

-

1.2

2

L2 *multicolin

earity

log_rice_cons_uk 22.

16

-

31.99

*

log_gdp_ind -

9.9

7***

-

14.94

***

Annex VI: Econometric Estimation

109

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

log_samples

L0 0.0

3

0 -

0.0

2

L1 0 -

0.0

1

L2 0.0

13

Number of Observations 21.

00

21.0

0

21.

00

21.0

0

21.0

0

21.0

0

21.

00

21.0

0

21.0

0

21.0

0

21.

00

21.

00

21.

00

21.

00

21.

00

21.0

0

17.

00

17.

00

17.

00

21.

00

12.

00

12.00 21.

00

12.

00

7.0

0

R2 0.0

0

0.36 0.1

8

0.40 0.46 0.49 0.1

6

0.36 0.40 0.32 0.0

1

0.1

6

0.4

5

0.0

0

0.0

8

0.13 0.1

1

0.1

5

0.1

5

0.5

4

0.1

5

0.88 0.0

2

0.4

7

0.4

0

F statistic 0.0

8

3.25 1.2

3

2.63 2.61 2.22 3.6

5

5.15 3.72 4.22 0.1

5

1.7

0

4.6

2

0.0

4

0.4

9

0.57 0.8

9

0.7

5

0.7

5

10.

61

0.8

1

20.24 0.1

5

2.3

6

0.3

4

Prob > F 0.7

7

0.05 0.3

3

0.07 0.07 0.10 0.0

7

0.02 0.03 0.03 0.7

1

0.2

1

0.0

2

0.9

6

0.6

9

0.69 0.4

3

0.5

4

0.5

4

0.0

0

0.4

7

0.00 0.8

6

0.1

5

0.8

4

Heteroskedasticity

(Breusch-Pagan test),

chi2(1) =

0.0

4

12.5

1***

9.0

7***

13.8

5***

17.7

5***

14.8

6***

6.1

6*

12.8

0***

13.6

4***

17.8

7***

0.1

9

6.7

1**

6.9

3***

0.0

4

5.9

4*

10.8

1***

1.6

9

2.0

2

2.0

3

1.0

7

5.5

5

0.05 0.1

6

0.7

3

0.1

6

Autocorrelation (Breusch-

Godfrey)

0.6

7

0.48 0.6

4

0.53 0.23 0.25 0.0

0

0.50 0.50 0.41 0.2

0

0.0

1

1.3

8

0.6

6

0.1

1

0.80 0.0

1

0.0

1

0.0

1

3.3

9*

0.0

3

0.95 0.0

8

2.4

7

6.4

0

Adjusted R2 -

0.0

5

0.25 0.0

3

0.25 0.29 0.27 0.1

2

0.29 0.29 0.24 -

0.0

4

0.0

7

0.3

5

-

0.1

1

-

0.0

8

-

0.09

-

0.0

1

-

0.0

5

-

0.0

5

0.4

9

-

0.0

4

0.84 -

0.0

9

0.2

7

-

0.8

0

Akaike Information Criterion 1.1

1

-

4.32

1.0

9

-

3.41

-

3.93

-

2.85

-

2.4

9

-

6.31

-

5.41

-

4.88

1.0

4

-

0.4

4

-

7.3

2

3.1

1

3.4

5

4.38 -

19.

99

-

18.

67

-

18.

67

-

13.

16

7.9

4

-

13.87

2.8

6

-

23.

87

-

6.8

9

Annex VI: Econometric Estimation

110

Table 13.2: Key to the labels of the variables

Abbreviation Name of variable

log_pr_diff_ind_ext logarithm of the price difference between India and the world

d.log_pr_diff_ind_ext difference in the logarith of the price difference between India and the world

log_pr_ratio_ind_ext logarithm of the price ratio of Indian price to the world price

log_basmati_prod_ind logarithm of the basmati productionin India

log_basmati_exportq_ind_uk logarithm of the basmati quantity exported from India to UK

log_rice_cons_uk logarithm of UK consumption if Indian Basmati rice

log_gdp_ind logarithm of Indian GDP

log_samples logarithm of the number of samples tested by the FSA

log_pr_pak logarithm of the Pakistani basmati rice

d.log_pr_pak difference in the logarithm of the price of Pakistani rice

log_pr_ratio_pak logarithm of the price ratio of Pakistani price to the world price

L0 variable at current level

L1 variable lagged by one period

L2 variable lagged by two periods

L3 variable lagged by three periods

Table 13.3: Complete OLS results -Pakistan

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Constant -0.82 -1.17 -0.33 -0.33 -0.54 0.16 .14*** 0.12** 0.12 -0.86 -0.05 -0.12 0.06 -0.61

log_pr_pak

L0 0.14 -0.77 -0.93* -0.93* -0.95 -0.86 0.11

L1 0.98* 1.94*** 1.93** 1.83* 1.59*

Annex VI: Econometric Estimation

111

1 2 3 4 5 6 7 8 9 10 11 12 13 14

L2 -0.94* -0.93 -0.74 -0.39

L3 -0.01 -0.26 -0.64

L4 0.21 0.76

L5 -0.47

d.log_pr_pak

L0 -0.89* -0.97* -.95** 0.15

L1 0.99** 0.97*

L2 0.04

L^2 -1.74

log_pr_ratio_pak

L0 0.21 -1.07 -1.6*

L1 1.40 3.57***

L2 -1.9**

log_samples

L0 0.01

Annex VI: Econometric Estimation

112

1 2 3 4 5 6 7 8 9 10 11 12 13 14

L1

L2

Number of Observations 21.00 21.00 21.00 21.00 21.00 21.00 21.00 21.00 21.00 21.00 21.00 21.00 21.00 21.00

R2 0.02 0.16 0.33 0.33 0.34 0.41 0.12 0.33 0.33 0.03 0.03 0.13 0.34 0.03

F statistic 0.43 1.76 2.80 1.98 1.54 1.61 2.62 4.35 2.75 0.31 0.52 1.40 2.95 0.23

Prob > F 0.52 0.20 0.07 0.15 0.24 0.22 0.12 0.03 0.07 0.74 0.48 0.27 0.06 0.80

Heteroskedasticity (Breusch-Pagan test), chi2(1) = 1.30 12.81*** 14.79*** 14.67*** 10.46*** 10.57*** 15.19*** 17.83*** 16.64*** 0.51 0.48 10.47*** 12.5*** 0.52

Autocorrelation (Breusch-Godfrey) 0.19 1.69 1.08 1.08 1.01 0.18 1.43 0.88 0.88 0.20 0.22 1.29 0.49 0.14

Adjusted R2 -0.03 0.07 0.21 0.16 0.12 0.15 0.07 0.25 0.21 -0.07 -0.02 0.04 0.23 -0.08

Akaike Information Criterion 0.73 -0.55 -3.23 -1.23 0.52 0.19 -1.51 -5.09 -3.10 2.48 0.64 0.17 -3.60 2.67

Annex VI: Econometric Estimation

113

Table 13.4: Complete OLS results – Pakistan

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Constant -0.82 -1.17 -0.33 -0.33 -0.54 0.16 .14*** 0.12** 0.12 -0.86 -0.05 -0.12 0.06 -0.61

log_pr_pak

L0 0.14 -0.77 -0.93* -0.93* -0.95 -0.86 0.11

L1 0.98* 1.94*** 1.93** 1.83* 1.59*

L2 -0.94* -0.93 -0.74 -0.39

L3 -0.01 -0.26 -0.64

L4 0.21 0.76

L5 -0.47

d.log_pr_pak

L0 -0.89* -0.97* -.95** 0.15

L1 0.99** 0.97*

L2 0.04

L^2 -1.74

log_pr_ratio_pak

Annex VI: Econometric Estimation

114

L0 0.21 -1.07 -1.6*

L1 1.40 3.57***

L2 -1.9**

log_samples

L0 0.01

L1

L2

Number of Observations 21.00 21.00 21.00 21.00 21.00 21.00 21.00 21.00 21.00 21.00 21.00 21.00 21.00 21.00

R2 0.02 0.16 0.33 0.33 0.34 0.41 0.12 0.33 0.33 0.03 0.03 0.13 0.34 0.03

F statistic 0.43 1.76 2.80 1.98 1.54 1.61 2.62 4.35 2.75 0.31 0.52 1.40 2.95 0.23

Prob > F 0.52 0.20 0.07 0.15 0.24 0.22 0.12 0.03 0.07 0.74 0.48 0.27 0.06 0.80

Heteroskedasticity (Breusch-Pagan test), chi2(1) = 1.30 12.81*** 14.79*** 14.67*** 10.46*** 10.57*** 15.19*** 17.83*** 16.64*** 0.51 0.48 10.47*** 12.5*** 0.52

Autocorrelation (Breusch-Godfrey) 0.19 1.69 1.08 1.08 1.01 0.18 1.43 0.88 0.88 0.20 0.22 1.29 0.49 0.14