Evaluation of the suitability of people services for performing delphi studies

6. Evaluation of the suitability of People Services for performing Delphi studies

Seminar Service Science, Management & Engineering: Human-based electronic Services (People Services)

KSRI Seminararbeit - Johannes Kananen


Agenda 1. Motivation 2. Final implementation 3. Research questions 4. Subject of the study 5. Study design 6. Results 7. Criticism 8. Problems


1. MotivationpServices

” … object is to obtain the most reliable consensus of opinion of a group of experts.” (Dalkey & Helmer 1963, p.458)

Anonymity Iteration Controlled feedback Statistical

aggregation(Rowe & Wright 1999, p.354)

“Web based services that deliver human intelligence, perception, or action to customers as massively scalable resources”

+ Performance Scalability Availability Correctness (Kern et al. 2009, p.4)

More precise results Rapid execution of

studies Decentralized

availability

Delphi Method


2. Implementation

MturkHIT

Qualtrics

SurveySite

Delphi

Round1.

Conductor

Delphi

Round2.

1.Estimates

3. Final

estimat

es

Reward

2.Feedback (means)Password

to 2nd and

3rd Study

E-Mail with

link


Amazon mTurk site


Qualtrics site

Name in mTurk = Name in Qualtrics Authentication


Research questions 1. Can it be done?

Technical implementation Participants Appropriate answers

2. Will the Delphi-estimates converge towards the mean? “One of the aims of using Delphi is to achieve greater consensus

amongst panellists” (Rowe&Wright, 1999)

3. Will Delphi method beat the control group in the accuracy of the estimates?

Results will deliver the answers


Subject of the study: Fifa 2010 World Cup Groups

D (AUS, GER, GHA, SER) E (DEN, NED, JAP, CAM) G (BRA, COT, NKOR, POR)

4 last matches of the Group Stage 1) 1-x-2 Estimate of the matches 2) Goal estimate of the matches 3) Estimate of the final standings after group

stage


5. Design 2 Groups

1. Delphi group (15 pers.) with three rounds (2 with feedback)

2. Control (12. Pers) group with one round Feedback in form of mean from previous

round The first Delphi round online on 14.6.2010 Control group study online on 20.6.2010 Both studies offline just before the match

beginning Incentives: $0.1 per study, whole 3-time-

participation $0.4 for the Delphi group, additionally $0.1 bonus for the better group


6. Results - Observations Participation decreased during the study

1st round 15 persons 2nd round 12 persons 3rd round 9 persons

Most of the participants from India Many participants had no idea about football Some participants couldn’t answer correctly Shoestring budget, in the end $7.1 was

paid(estimate $10)


Results – Variance (1-X-2) Variance among Delphi Group vs. Control

Group

1 2 30

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

1 - X - 2

Group DGroup EGroup GControl DControl EControl G

Axis Title

1 2 30

0.5

1

1.5

2

2.5

3


Amount of goals


Results – Variance (Ranking)

1 2 30

0.2

0.4

0.6

0.8

1

1.2

1.4


Variance among Delphi Group vs. Control Group


Conclusions about variance Every Delphi group estimate had smaller

variance among the results after three rounds compared with the control group

Reason for the rise of variance in the second round remains unclear Explanation: single individuals skew the variance?


Results – Estimation Accuracy Final results not before the last match on

Friday1-x-2 # of Goals

Estimation of the

result

Estimation of

the goal amount

s

Result G.Home G.Away D1 D2 D3 Control D/C/0 D1 D2 D3 Control Diff. D3 Difference C D3 vs. C Diff. D1D1 vs.

C

GER-SER 2 0 1 1 1 1(89) 1(92) C 2.6 - 1.0 2.7 - 1.3 2.1 - 0.6 2.1 - 1.2 2,50 2,30 C 2,60 C

GHA-AUS X 1 1 1 2 2(56) x(50) C 1.2 - 0.7 1.2 - 1.5 0.8 - 0.9 1.0 - 1.6 0,30 0,60 D 0,90 C

GHA-GER 2 0 1 2 2 2(89) 2(42) D 0.7 - 2.0 1.3 - 2.2 0.8 - 2.2 1.3 - 1.5 2 1,80 C 1,7 D1

AUS-SER 1 2 1 2 2 x(44) 2(67) D(33) 0.9 - 1.1 1.6 - 1.6 1.1 - 1.1 1.5 - 1.8 1 1,30 D 1,2 D1

NED-JAP 1 1 0 1 1 1(78) 1(42) D 2.2 - 1.1 2.1 - 1.3 1.6 - 0.7 1.6 - 1.0 1,3 1.6 D 2,3 C

CAM-DEN 2 1 2 2 2 2(56) 1(42) D 1.2 - 1.5 0.8 - 1.5 0.9 - 1.4 1.2 - 1.0 0,7 0,80 D 0,7 D1

DEN-JAP 1 1 1(44) 2(58) 1.5 - 1.3 1.7 - 1.9 1.2 - 1.1 1.1 - 1.4

CAM-NED 2 2 2(67) 2(42) 1 - 1.7 1.3 - 2.2 0.8 - 1.6 1.7 - 1.6

BRA-COT 1 3 1 1 1 1(100) 1(58) D 2.8 - 0.8 3.2 - 1.4 2.6 - 0.6 1.6 - 1.4 0,8 1,80 D 0,4 D1

POR-NKOR 1 7 0 1 1 1(89) 1(50) D 1.8 - 0.7 2.2 - 1.1 1.7 - 0.9 1.7 - 1.2 6,2 6,50 D 5,9 D1

POR-BRA 2 2 2(67) 1(42) 1.4 - 2.1 1.7 - 2.1 0.9 - 1.7 1.5 - 1.5

NKOR-COT x x x(67) x(50) 1.2 - 0.9 1.4 - 1.4 1.4 - 1.2 1.2 - 1.6


Conclusions about accuracy Final results will appear in the term paper More accurate – group rated the final result of

the game with bigger likelihood than the other group

In the 1-x-2 Estimation Delphi achieved more accurate estimates in 6/8 games than the control group

In the goal amount estimation the Delphi achieved also better estimates in 6/8 games (by far)


Criticism Study design:

Bigger sample Better timing Parallel studies

mTurk Quality of the results User base is not heterogen Response rate was low


Problems

Delphi has multiple rounds

User base of Amazon mTurk

mTurk supports only one-time-tasks

Conflict of interest between

Delphi method for expert estimates

Principal (requester)Quality

Agent (worker)Money

Evaluation of the suitability of people services for performing delphi studies

Data & Analytics

Transcript of Evaluation of the suitability of people services for performing delphi studies