Experiments on credence goods - uibk.ac.at · low income 1.021 1.066 1.079 1.055 high income 1.037...

102
Field Experiments on Credence Goods and Fraud Prof. Dr. Loukas Balafoutas Innsbruck Winter School on Credence Goods, Incentives and Behavior Kühtai, 16.03.2019 – 22.03.2019

Transcript of Experiments on credence goods - uibk.ac.at · low income 1.021 1.066 1.079 1.055 high income 1.037...

Field Experiments on Credence Goods

and Fraud

Prof. Dr. Loukas Balafoutas

Innsbruck Winter School on Credence Goods, Incentives and Behavior

Kühtai, 16.03.2019 – 22.03.2019

• Internal validity: Do the data permit correct causal

inference? A matter of experimental design

(treatment choice, parameterization, successful

randomization) and data analysis. Usually not a

problem.

• External validity: Is it possible to generalize

inferences from the experiment to the real world?

• Often a major source of criticism.

Validity of experiments

• Lack of realism in the interaction protocol; essential

elements missing in sterile lab environment

• Used subject pools are special

• Subjects do not take experiments seriously; money at

stake is too small

• Small sample sizes

• Hawthorne effects

For an economist’s defense, see A. Falk & J. Heckman,

“Lab experiments are a major source of knowledge in the

social sciences”, Science (2009)

Typical concerns against (lab)

experiments in economics

• One option: Take the lab to the field (e.g., credence goodsexperiments with medical students or professionals).

• Partly overcomes the criticism on special subject pools, but does not go a long way towards realism.

• Natural field experiments: Today‘s topic. Observe thebehavior of subjects in their „natural environment“. Theydo not know they are part of an experiment.

• Combine randomization and realism, but possibly lesscontrol.

• Also, they face challenges in design.

• And they are subject to ethical issues (more on this later).

Field experiments

Field experiments with credence

goods: Schneider (2012)

• Schneider (2012) reports on undercover visits to auto repair

garages with one test vehicle.

• His main aim is to find out if there is mistreatment in the car

repair industry.

• Then he examines whether reputation mitigates these

problems by appearing as either a one-time or repeat-

business customer.

Schneider (2012)

• In total, 91 undercover garage visits with a prearranged set

of defects (loose battery cable, low level of coolant, missing

taillight).

• The mechanic was asked to thoroughly inspect the vehicle,

diagnose its condition, make a repair recommendation, and

provide a price estimate.

• Schneider then compares the observed outcomes with the

repairs that were known ahead of time to be actually

necessary.

• And rejects all repairs.

Implementing reputational concerns

• Random assignment of mechanics to two procedures.

• In one, Schneider appeared as a one-time customer, stating

that he was moving away and having moving boxes visible

in the back of the car.

• In the other, he appeared as a possible repeat customer by

providing a home address near to the garage, and

suggesting that he was seeking a local mechanic for an

ongoing relationship.

• Then he estimated the effects of this repeat-business

procedure on the number of legitimate defects discovered,

the diagnosis fee, the repair recommendation, and the

repair price.

Main results

• Overtreatment was pervasive. In about 30% of cases the

mechanics provided completely unnecessary repairs.

• Overcharging was very limited. Only 2% of total charges are

accounted for by overcharging.

• Undertreatment was frighteningly high. In about 80% of

cases at least one of the defects was missed.

• Is this intentional, or is it due to incompetence? Highlights

the importance of investing effort in diagnostic precision.

• But mechanics working with the author confirm that these

defects should be identifiable to even the least experienced

mechanics.

Main results

• Reputational concerns do not seem to matter much. No

evidence that motorists representing repeat business

receive different repair recommendations, repair prices, or

diagnosis quality versus one-time motorists.

• However, there is a difference in the diagnosis fee: 37.70$

for repeat customers, but 59.75$ for one-time customers.

• Perhaps because this is a dimension that the customer can

directly observe and evaluate.

• Literature on labor supply of taxi drivers, also more

recently on prosociality and gender discrimination.

• However, it is interesting to think about the actual

product taxi drivers sell – an expert service!

• Possible types of fraud

▫ overtreatment = taking longer route (provides no

benefit)

▫ overcharging = charging more than justified by chosen

route (e.g. night tariff, fictional fares, no taximeter)

▫ undertreatment usually not a problem

10

Field experiments with taxi drivers

• Field experiment with taxi drivers in Athens.

• First experiment was intended to measure whether and

how informational advantages of taxi drivers lead to

exploitation of customers.

11

Balafoutas, Beck, Kerschbamer and

Sutter (2013)

11

• H1: Information about city

▫ passengers who are not familiar with the city are more

likely to face (more extensive) overtreatment

• H2: Information about fares

▫ passengers who are not familiar with the fares are

more likely to face overcharging

• H3: Income

▫ high-income passengers receive worse service than

low-income passengers

Hypotheses

12

• “Local” passenger

▫ Enters taxi and states requested destination

▫ Speaking in Greek

• “Non-local native” passenger

▫ Enters taxi and states requested destination, adding

“do you know this destination, because I am not

familiar with the city”

▫ Speaking in Greek

• “Foreign” passenger

▫ Identical to “non-local” passenger, but

▫ Speaking in English

13

Method – Manipulating familiarity with

city and tariff

• High income

▫ wearing suit and

carrying briefcase

▫ top-end hotel

• Low income

▫ casual clothes and backpack

▫ low-end accommodation

14

Method – Manipulating perceived

income

14

• The experiment

▫ five experimenters (all male in their late twenties)

▫ simultaneous observations through triples: three

experimenters with same starting point, same

destination, different roles

▫ randomization over routes, days, and time

16

Method

local native non-local native foreigner

low income 58 58 58

high income 58 58 58

• Facts

– 14,000 taxi drivers in Athens

– one-man companies

– same fare system nationwide

• Incentives

– one minute can be spent on

• detour : 37 cent

• traffic jam: 16 cent

• waiting: 0 cents

Taxis in Athens

• keeping track of fraud

▫ GPS-logger

records exact position every single second

allows reconstructing exact route, duration etc.

18

Method – GPS Logger

19

Method – Routes

20

Example for Overtreatment

21

Example for Overtreatment

OT-index: 1.43

additional distance: 4.2km

22

Example for Overcharging

additional fare: €9.5

local non-local foreigner

locals non-locals foreigners

OT-index 1.029 1.077 1.087

23

Results – Overtreatment (normalized by

shortest route in triple)

1

1.01

1.02

1.03

1.04

1.05

1.06

1.07

1.08

1.09

local non-local native foreigner

Support for H1:

local

significantly

smaller than

non-local and

foreigner

no difference

between

non-local and

foreigner

24

Results – Duration Index by Origin

of Customer

24

0.2

.4.6

.81

1 1.2 1.4 1.6 1.8 2

locals non-local natives

foreigners

25

Results – Overcharging frequency

locals non-locals foreigners

OC

frequency

0.034 0.078 0.224

0

0.05

0.1

0.15

0.2

0.25

local non-local native foreigner

Support for H2:

foreigner

significantly

higher than

local and non-

local

no difference

between

local and

non-local

26

Results – Total fare paid (normalized

on cheapest fare in triple)

locals non-locals foreigners

price-index 1.038 1.113 1.223

0.9

0.95

1

1.05

1.1

1.15

1.2

1.25

local non-local native foreigner

27

Results – Overtreatment by Income

of Customer

OT-index locals non-locals foreigners total

low income 1.021 1.066 1.079 1.055

high income 1.037 1.087 1.096 1.073

low-income

high-income

0.98

1

1.02

1.04

1.06

1.08

1.1

local non-local native foreigner

No support for

H3:

No difference

between

high-income

and low-

income

passengers

(1)

Overtreatment

Index

(2)

Overcharged

Amount

(3)

Price

Index

non-resident (non-

local + foreigner)

0.049***

(0.016)

0.123

(0.093)

0.084***

(0.024)

foreign0.002

(0.028)

1.499**

(0.660)

0.130**

(0.055)

high income0.017

(0.013)

-0.209

(0.189)

-0.024

(0.020)

time of the day0.024

(0.018)

-0.140

(0.359)

0.026

(0.038)

Additional controls Yes Yes Yes

N = 348. *, **, *** denotes significance at the 10%, 5%, 1% level respectively. Clustering of standard

errors by set of simultaneous observations. Route and experimenter fixed effects. Tobit regressions,

dependent variable left-censored at 0/1. Marginal effects reported. Standard errors in parentheses.

Econometric Analysis

• Overtreatment: more extensive for passengers who are

not familiar with the city (H1)

• Overcharging: more common for foreigners (H2),

presumably due to unfamiliarity with tariffs

• Income: High-income customers are on average slightly

more prone to fraud, but not significantly so (contrary to

H3).

• If they want to, taxi drivers know whom to cheat and how

to cheat!

29

Summary Balafoutas et al. (2013)

„First-degree“, or direct moral

hazard and credence goods

• Agent takes hidden action that hurts interests of

uninformed principal

• Example (credence goods): insured patient asks for

more extensive or more expensive treatment

• Higher cost for insurance company

• Organizational context: expense account fraud

• Moral hazard operates through demand side in credence

goods market

• Expert seller anticipates weak incentives of consumer

to exert control

• Increases fraud (overtreatment or overcharging)

• Moral hazard operates through supply side

• Examples: service on corporate cars, physicians’

behavior with fully insured patients

„Second-degree“, or indirect moralhazard and credence goods

Balafoutas, Kerschbamer and Sutter

(2017)

• We study the possibility of second-degree moral hazard

in a field experiment.

• Notice that the stories are observationally equivalent:

higher coverage leads to higher expenditure.

• But we control behavior on the demand side.

• Also, adverse selection not an issue.

Design

• Same market as in first study: Taxi rides in Athens

• Only one role: non-local natives, to control for perceived

information and to preserve the credence goods nature

of the service

• No income manipulation

• Both genders

Design: Moral hazard

• Exists when the passenger does not pay for ride herself,

i.e., has expenses reimbursed.

• Hence, our central treatment variation: Full Refund vs.

Control.

• Fixed script upon entering the taxi.

• In CTR: script for non-local native (“Do you know where

X is? I am not from Athens”)

• Then: “Can I get a receipt at the end of the ride?”

Design: Moral hazard

• In FR (Full Refund) Treatment: exact same script as in

CTR, adding one short phrase.

• “Can I get a receipt at the end of the ride? My employer

is covering the expenses.”

• This short phrase was the only difference between the

two treatments!

The experiment

• Four assistants, similar in age.

• Took (almost) simultaneous rides in 100 quadruples

(=400 observations).

• Each assistant switching between CTR and FR, but all

four cells in each quadruple.

Full Refund Control

Male

passengerN=100 N=100

Female

passengerN=100 N=100

The experiment

• 11 routes (subset of first experiment, adding one route

from southern Athens to airport).

• Rides spread over 15 days in March 2013 and July

2014.

• 8am to midnight.

• Total furation: 156 h.

• Total distance: 6482 km

Hypotheses

• Main hypothesis: We should observe more fraud in the

Full Refund treatment.

• Overtreatment or Overcharging?

• No prior regarding gender effects or differential effects of

Full Refund by gender.

• Although, intuitively we thought women might be more

susceptible to fraud (perceived as less likely to engage

in confrontation?)

Results: Overtreatment Index (duration)

1.00

1.04

1.08

1.12

1.16

1.20

Men Women

Full Refund

Control

No significant difference between FR and CTR.

Similar results with distance travelled.

Overcharging frequency

0.00

0.10

0.20

0.30

0.40

0.50

0.60

Men Women

Full Refund

Control

Overcharging

• Overall difference between FR and CTR: 37% vs. 20% (p < 0.01, Fisher’s exact test).

• We have a strong treatment effect on the likelihood of facing overcharging!

• In CTR: women face more fraud than men (26% vs. 13%, p < 0.05)

• Difference disappears under Full Refund: Behavior towards men is more responsive to our treatment manipulation and drives this result

Overcharging amount

• Moreover, the extent of overcharging is higher in the

moral hazard treatment.

• Unconditional overcharging amounts:

CTR FR total

male passengers 0.72 1.46 1.09

female

passengers1.10 1.40 1.25

total 0.91 1.43 1.17

Results: Price Index

1.00

1.04

1.08

1.12

1.16

1.20

Men Women

Full Refund

Control

Results: Price Index

• Overall difference between FR and CTR: 1.17 vs. 1.09 (p <

0.01, Wilcoxon signed-ranks test based on treatment

averages within quadruple).

• Central result of the paper: Our moral hazard manipulation

leads to higher consumer expenditure.

• Due to more frequent and more extensive overcharging

(given insignificant differences in duration and distance

index).

Overtreatment vs. Overcharging

• If driver anticipates no control from passenger, then

natural to expect stronger differences in the

overcharging dimension (more lucrative).

• Another reason: opportunity costs of time.

• Passenger may not mind about price, but would resent

longer route.

A similar study in the computer repair

sector – Kerschbamer, Neururer and

Sutter (2016)

• Field experiment in the Austrian market for computer

repair services.

• Goal: Measure the impact of informing the expert provider

that an insurance company will pay the repair bill on the

extent and type of fraud. We like to call this second-

degree moral hazard.

Experimental Design

• Hardware of a computer is manipulated in such a way that

it is no longer possible to boot the computer (details later).

• Computers are handed in for repair at computer shops all

over Austria with a fixed script.

• Two treatments: control treatment (CONTROL) and

insurance treatment (INSURANCE).

• Between-subjects design and treatments are randomly

assigned to repair shops.

Experimental Design

• After the repair we compare the defect to the actual repair

and the bill to detect and quantify overtreatment,

undertreatment and overcharging.

• Hypothesis: Indicating to the expert that the repair cost

will be paid by an insurance company increases the

amount of overtreatment and overcharging.

Experimental Treatments

• Fixed script for computer problem: “When starting my

computer an error message appears and I am not able to

boot the computer. I have no idea what this means and I

would like you to repair it, please.”

• Control treatment: “I need a bill for the repair!”

• Insurance treatment: “I need a bill for the repair because I

have an insurance.”

Implementation Issues

• Test-computers must be in a perfect condition, except for

our manipulation:

– We bought five refurbished laptops warranty not

visible for the computer experts and perfect condition

was assured.

• The value of the test-computers should be high enough

such that repair (instead of replacement) is a plausible

strategy:

– The laptop cost 684 € and the correct price of an

appropriate repair was about 60 - 80 €.

Implementation Issues

• Important that the computer expert is able to diagnose the

problem correctly (otherwise measured misconduct might

be due to incompetence and not fraud):

– We destroy one of the two RAM modules of the

computer. As a result the computer beeps in a specific

way and is no longer able to boot.

– Additionally, the following error message appears on

the screen: “ERROR 1830: Invalid memory

configuration - Power off and install a memory module

to Slot -0 or the lower slot.”

Implementation Issues

• Our IT Department helped us with the manipulation and

they assured us that this error should be diagnosed

correctly by any expert in the field. In addition, it is not

uncommon that RAM modules crash from time to time

and therefore the problem should be well known to

experts.

• Our IT Department estimated 30 minutes for the repair.

Data Set

• Experiment was conducted between March and

September 2013 and all computers were handed in during

regular shop opening hours.

• The shops were randomly selected from the 251 shops

listed in the telephone directory.

• Treatment assignment was also random.

Results – successful repairs

• 58 out of 61 repair shops managed to repair the computer

successfully (remaining three observations are excluded

from further analysis).

• Average time until pick-up was possible:

2.29 days

Results – Repair Price

• Control treatment: 70.17€

• Insurance treatment: 128.68€

• Difference in mean repair price is economically

impressive and statistically highly significant

(p = 0.002, Mann-Whitney test).

Cumulative Frequencies of Repair

Prices

0

10

20

30

40

50

60

70

80

90

100

0 25 50 75 100 125 150 175 200 225 250 275

Rela

tive c

um

ula

tive f

requency (

in %

)

Repair prices (in Euro)

CONTROL INSURANCE

Overtreatment (too much repair)

• Five observations with additional repairs not related to our

manipulation and all of them took place in the insurance

treatment.

– Overtreatment is significantly more frequent in

INSURANCE (17.24% vs. 0%; p = 0.018, Fisher’s

exact test). Average overtreatment costs about 200€.

Overcharging in the Spare Parts

Dimension

• Four cases with overcharging in the spare parts

dimension. Two of them happened in CONTROL and two

in INSURANCE. So, there is no treatment difference with

respect to the frequency of overcharging in the spare

parts dimension.

Overcharging in the Working Time

Dimension

• Out of the 58 repair shops, 30 indicated the working time

on the bill.

– Control treatment: 0.55 hours

– Insurance treatment: 1.02 hours

– p = 0.01, Mann-Whitney test (this holds also when

overprovision observations are excluded, p = 0.046,

Mann-Whitney test).

– The difference accounts for 41 Euro out of the 58 Euro

difference between CONTROL and INSURANCE.

Summary of Kerschbamer et al. (2016)

• Consumer expenditures on computer repairs are higher in

the presence of an indicated insurance coverage.

• Decomposing the treatment effect into different types of

fraud we find that the difference in repair prices is mainly

due to overtreatment (replacing more parts than

necessary) and to overcharging in the working time

dimension (charging for more working time than actually

provided).

A recent experiment on ways to

reduce fraud

"Can fraud be reduced with priming? Evidence

from a real-world market for credence goods"

Christopher Bindra and Graeme Pearce

(work in progress)

Motivation

• Reducing the informational advantage of the seller is one

way that fraudulent behaviour can be mitigated.

• But this isn't always a viable solution

• Evidence from social psychology and behavioural

economics suggests that priming could be used to

positively impact people's behaviour.

• Making honest behaviour salient might encourage sellers

to reduce fraudulent behaviour.

The Taxi Market in Vienna

Experiment

• 20 RAs, randomly assigned to groups of four.

• Within a group, all testers took a taxi from a designatedtaxi stand to a common destination. We refer to this as a quadruple.– Took taxis in random order with apprx. 1 minute between

each journey.

– Each tester carried a GPS tracker to record the route

• Testers recorded a number of variables: the meterreading, the number of „extra charges“ the driver added, if the driver kept change, and so on.

• The experiment varies the script that the testers spoke.

• We collected data in two waves in Mai 2018 (400 rides) and February 2019 (200 rides).

Experiment

• Baseline: “I would like to go to x. Do you know where it is?

I’m not from Vienna and I dont know the way.“

• Positive: “Did you hear about that study where researchers

found that around 80% of taxi drivers were shown to behave

honestly towards passengers, always taking them on the

cheapest route? I read about it on the internet.”

• Negative: “Did you hear about that study where researchers

found that around 20% of taxi drivers were shown to behave

dishonestly towards passengers, taking them on more

expensive routes than necessary? I read about it on the

internet.“

• Uber Priming: “I checked the Uber price on line and it

seemed cheap.“

Overcharging Normalized Fare

Results - OLS

Results – OLS 2

Summary

• Find that “Positive” priming of taxi drivers induces them

to be more dishonest.

• But not negative priming. Your thoughts?

• All priming scripts generally increase consumer

expenditure, even if not significant.

• Find little evidence that overtreatment is influenced by

the priming manipulations.

Anagol, Cole and Sarkar (2017)

• Field experiment to evaluate quality of advice of

insurance agents in India.

• Undercover consumers asked for life insurance

recommendations.

• Term versus whole life insurance.

• Standardized scripts, auditors were men in their late 30s.

• Audits lasted for about 35 minutes.

Anagol et al. (2017)

• Exogenous treatment variation in:

• Customer‘s needs:

– “I want to save and invest money for the future, and I also

want to make sure my wife and children will be taken care

of if I die. I do not have the discipline to save on my own”

(whole life),

– „I am worried that if I die early, my wife and kids will not be

able to live comfortably or meet our financial obligations. I

want to cover that risk at an affordable cost“ (term)

• Customer‘s beliefs & second opinion:

– „I have heard from [source] that whole [term] insurance is

a really good product for me. Maybe we should explore

that further”

Anagol et al. (2017)

Anagol et al. (2017)

• Overwhelming majority of recommendations are for

whole life insurance.

• This is essentially a form of overtreatment, especially

when the customer needs term.

• Insurance agents respond (even if weakly) to

consumer‘s needs and to their beliefs, even when beliefs

are wrong.

Anagol et al. (2017)

• Overall, mentioning that one has received advice from

another agent does not affect the quality of advice.

• However, when this second opinion was a poor

recommendation, there is an increase in the quality of

advice.

• Hence, there is some evidence to suggest benefits of

second opinions.

Field experiments in the healthcare

sector

• Possibly the largest and economically and socially most

important expert service; field evidence of great value.

• However, there are important ethical issues (to be

discussed later), as well as difficulties in

implementation.

• For instance, it is important to ensure sufficient training

for the undercover patients and cover as many

eventualities as possible.

Currie, Lin and Meng (2014)

• Field experiment in Chinese hospitals, on financial

incentives and antibiotics prescription.

• Four similar undercover patients to each of 80 doctors in

16 different hospitals, in 2011-2012.

• All complained of mild flu-like symptoms: „For the last

two days, I've been feeling fatigued. I have been having

a low grade fever, slight dizziness, a sore throat, and a

poor appetite. This morning, the symptoms worsened so

I took my body temperature. It was 37 °C.”

• For such mild symptoms, antibiotics should not be

prescribed unless tests diagnose a bacterial infection.

• Hence, a good setting to study overtreatment.

Currie et al. (2014)

• Patient A: baseline

• Patient B: asked for antibiotics prescription: „Doctor, can

you prescribe some antibiotics for me?”

• Patient C: asked for a prescription and indicated that he

would buy the drugs elsewhere: „Doctor, I can get a

discounted price in a drug store, but I don't know what

medicine to take. Can you write a prescription for me?”

• Patient D: asked for antibiotics prescription and indicated

that he would buy the drugs elsewhere (B plus C).

• All four types visited the same doctor (at least two

months interval between C and D).

Currie et al. (2014)

• Antibiotics prescription frequencies / number of drugs

prescribed:

• Antibiotics prescriptions driven mainly by financial

incentives, not by consumer demand or physician

ignorance.

A (Baseline) 55% ** / 2.63 **

B (asked for antibiotics) 85% ** / 3.24 **

C (asked for prescription + buys

elsewhere)10% ** / 1.79 **

D (asked for antibiotics + buys

elsewhere)14% ** / 1.97 **

Currie et al. (2014)

• Four more treatments in Experiment 2

• B‘: small gift

• C‘: patient knowledge. „I learned from the Internet that

simple flu/cold patients should not take antibiotics. Is this

true? Can I not take antibiotics unless they are

necessary?”

• D‘: buys elsewhere. „Doctor, my sister-in-law works at a

drug store. She can offer me a discount if I buy drugs in

her store. But I don't know what medicine to take, so

could you please write a prescription for me?

• E‘: C‘ plus D‘

Currie et al. (2014)

• Antibiotics prescription frequencies / number of drugs

prescribed:

• Antibiotics prescriptions drop as a result of patient

knowledge and lack of financial incentives.

A (Baseline) 0.63 / 2.49

B‘ (gift) 0.50 / 2.33

C‘ (patient knowledge) 0.43 ** / 2.21

D‘ (buys elsewhere) 0.12 ** / 1.88 **

E‘ (patient knowledge + buys

elsewhere)0.08 ** / 1.62 **

Lu (2014)

• Two possible mechanisms on why insurance can

increase consumer expenditure in healthcare markets:

• Agency hypothesis: doctors increase their income at the

expense of patients (potentially suboptimal treatments).

• Considerate doctor hypothesis: doctors want to improve

patients‘ well-being taking into account both, quality of

treatment and ability to pay.

• Observational data cannot disentangle between the two

hypotheses, due to issues such as adverse selection

and patient responses to insurance („second-degree

moral hazard“).

Lu (2014)

• Field experiment in Chinese hospitals, on the role of

patient insurance and financial incentives.

• Two hypothetical patients: Both male around 65. Patient

1 at risk of heart disease and diabetes (hypertension,

high triglycerides, high blood sugar); Patient 2 at risk of

heart disease (hypertension) and already receiving

treatment.

• Two testers who visit doctors on behalf of the

hypothetical patients – relatives who live in another

region – and ask to buy drugs for them.

• Visits to cardiology and endocrinology departments in 49

hospitals in or near Beijing.

Lu (2014)

Treatment variation:

1. Incentives vs. no incentives:

„{the relative} asked me to buy the medicines here for him”

(incentives)

“{the relative} wants to get a prescription and buy drugs at

his local store”

(no incentives)

2. Insured vs. not insured: tester indicates whether the

relative has government insurance

Lu (2014)

• Physicians write prescriptions that are significantly more

costly for insured than uninsured patients, but only if

physicians receive kickbacks.

• In line with agency hypothesis, but not with considerate

doctor hypothesis.

Insured /

incentives

Not insured

/ incentives

Insured /

no incentives

Not insured /

no incentives

Monthly drug

expenditure (in

yuan)

424.78

(23.54)

298.71

(15.84)

324.50

(18.95)

307.03

(15.44)

Number of

prescribed

drugs

2.47

(0.10)

2.20

(0.08)

2.18

(0.07)

2.18

(0.06)

Das, Holla, Mohpal and Muralidharan

(2016)

• Field experiment in India, comparing physician effort and

treatment between private and public healthcare

providers.

• Fifteen fake patients presented to public and private

clinics in rural India with symptoms of one of three

conditions: chest pain in 45 yr-old male; asthma in 25 yr-

old male or female; dysenteria in child (not present).

• Treatment variation:

– Between subjects: Public vs. Private clinic

– Within subjects: visits to doctors who work in a public clinic and

also operate private practice

Das et al. (2016)

Outcome variables:

• Price charged for visit

• Measures of quality of care:

– Indian government guidelines for questions and examinations to

secure differential diagnosis in each case;

– length of consulation;

– correct diagnosis and appropriate treatment and prescriptions,

based on inputs from a panel of doctors, pharmacists, and a

pharmaceutical company

Main findings

• Public vs. private providers: private providers exert more

effort and perform equally well in terms of diagnosis and

treatment quality.

• Within subjects, in clinic vs. in private practice: same

doctors spend more time and are more likely to offer

correct diagnosis when seeing patient in practice.

• No differences with respect to overtreatment.

Gottschalk, Mimra and Waibel (2018)

• Field experiment in market for dental care in Switzerland.

• Aims:

– Measure overtreatment

– Explore the effect of patient information

– Explore the effect of patient socioeconomic status (SES)

– Explore the role of further, non-experimental market and

dentist characteristics

Design

• One test patient visits 180 different dentists in the canton

of Zurich.

• Patient has one minor superficial caries lesion between

two teeth.

• Condition assessed by four indepentent reference

dentists: patient should receive no treatment (i.e., no

filling).

• Case considered very easy, requires low diagnostic

effort.

Design

• Test patient visits dentists, based on the following

scenario: he had x-ray taken at a dental hygiene

practice, which recommended a check-up at a dentist.

• Hence, test patient shows the same x-ray to all dentists.

• Advantages:

– Easy diagnosis

– Condition did not change over time, identical information to

all dentists

– Guidelines provide clear recommendation (provide no

treatment and re-evaluate one year later)

Treatment variation: patient

information

• Informed condition: The patient indicates to the dentist

– that he has uploaded the x-ray to an internet platform

where dentists offer free advice

– that he has not yet received a reply

• Standard condition: No additional script

Treatment variation: patient SES

• High SES patient:

– high quality suit, expensive watch, car key, expensive

mobile phone

– specified occupation on patient form: translator at a bank

• Low SES patient

– cheap unbranded clothes, backpack, no accessories

– specified occupation: student of translation doing

internship

Outcome variables

• Overtreatment: recommendation that includes at least

one filling.

• Undertreatment not possible, given that correct

recommendation is no treatment.

• Charging dimension: dentists can choose prices within a

certain range.

• They can also charge a diagnosis fee.

Recommendations and overtreatment (extent)

Recommendations and overtreatment

(binary)

Standard

patient

Informed

patientAverage

Low SES 37.8% 26.7% 33.2%

High SES 20.0% 26.7% 23.3%

Average 28.9% 26.7% 27.8% (50/180)

Patient information

• Some evidence of information effect among low SES

patients.

• However, all differences insignificant.

• Hence, no effect of petient information on dentists‘

recommendations on the whole.

• Contrary to intuition, theory, and previous evidence

• Possible explanations?

Patient SES

• Higher SES leads to less overtreatment.

• Contrary to expectations based on distributional

preferences or price sensitivity.

• Possible explanations put forward in the paper:

similarity of dentists‘ and patients‘ SES

perceived likelihood of repeated interaction

higher SES as a signal of more or better information

Regressions and additional variables

Ethical concerns

Discuss relevant ethical aspects of the field experiments

discussed today, in particular (but not only) in healthcare

markets

Ethical concerns

1. Ethical aspects and externalities

is it acceptable to encourage fraudulent behavior?

experts‘ wasted time and resources; cost-benefit

analysis

ethical clearance for experiments (especially in

healthcare industry)

2. Deception (aren‘t we lying to experts?)

3. Anonymity must be guaranteed, despite fraudulent

behavior.

4. Ongoing methodological discussion on false positives

(applies to most experimental work)