Lirong Xia Speech recognition, machine learning Friday, April 4, 2014.
Truth-Revealing Social Choice ADT-15 Tutorial Lirong Xia.
-
Upload
patrick-harmon -
Category
Documents
-
view
217 -
download
3
Transcript of Truth-Revealing Social Choice ADT-15 Tutorial Lirong Xia.
Truth-Revealing Social Choice
ADT-15 Tutorial
Lirong Xia
• Member of Parliament
election:
Plurality rule Alternative vote?
• 68% No vs. 32% Yes
2
2011 UK Referendum
Ordinal Preference Aggregation: Social Choice
> > social choice
mechanism
> >
> >
3
A profile
Carol
Alice
Bob
A B C
A B C
ACB
A
4
A B C
A B C
Turker 1 Turker 2 Turker n
…
> >
Ranking pictures [PGM+ AAAI-12]
...
.
.
.
....
. ....
. . .
.. .
. .
. .. . .
> > AB > B C>
5
Social choice
R1 R1*
Outcome
R2 R2*
Rn Rn*
social choice mechanism
… …
Profile
Ri, Ri*: full rankings over a set A of alternatives
Applications: real world
• People/agents often have conflicting
preferences, yet they have to make a
joint decision
6
• Multi-agent systems [Ephrati and Rosenschein 91]
• Recommendation systems [Ghosh et al. 99]
• Meta-search engines [Dwork et al. 01]
• Belief merging [Everaere et al. 07]
• Human computation (crowdsourcing) [Mao et al.
AAAI-13]
• etc.7
Applications: academic world
How to design a good social choice mechanism?
8
What is being “good”?
Two goals for social choice mechanisms
GOAL1: democracy
9
GOAL2: truth
THIS TUTORIALAxiomatic social choice
• Axiomatic social choice
• The Condorcet Jury Theorem (CJT)
• Break
• Four directions of extending CJT
• Beyond CJT: the objective decision-
making perspective
10
Outline
• Research questions + Basic models
– tip of the iceberg
• More references
– Survey by Nitzan and Paroush (online):
Collective Decision Making and Jury Theorem
– Survey by Gerlinga et al. [2005]: Information
acquisition and decision making in committees:
A survey
– My personal summary, send me an email 11
Flavor of this tutorial
• Joerg’s text book
• Handbook of Computational Social Choice
12
Computational social choice
• Axiomatic social choice
• The Condorcet Jury Theorem (CJT)
• Break
• Four directions of extending CJT
• Beyond CJT: the objective decision-
making perspective
13
Outline
Common voting rules(what has been done in the past two centuries)
• Mathematically, a social choice mechanism (voting rule) is a mapping from {All profiles} to {outcomes}– an outcome is usually a winner, a set of winners, or a ranking
– m : number of alternatives (candidates)
– n : number of agents (voters)
– D=(P1,…,Pn) a profile
• Positional scoring rules• A score vector s1,...,sm
– For each vote V, the alternative ranked in the i-th position gets si points
– The alternative with the most total points is the winner
– Special cases
• Borda, with score vector (m-1, m-2, …,0)
• Plurality, with score vector (1,0,…,0) [Used in the US]
An example
• Three alternatives {c1, c2, c3}
• Score vector (2,1,0) (=Borda)
• 3 votes,
• c1 gets 2+1+1=4, c2 gets 1+2+0=3,
c3 gets 0+0+2=2
• The winner is c1
1 2 3c c c 2 1 3c c c 3 1 2c c c
2 1 0 2 1 0 2 1 0
• Kendall tau distance
– K(V,W)= # {different pairwise comparisons}
• Kemeny(D)=argminW K(D,W)=argminW ΣP∈DK(P,W)
• For single winner, choose the top-ranked alternative
in Kemeny(D)
• [Has a statistical interpretation] 16
The Kemeny rule
K( b ≻ c ≻ a , a ≻ b ≻ c ) =112
• Approval, Baldwin, Black, Bucklin,
Coombs, Copeland, Dodgson, maximin,
Nanson, Range voting, Schulze, Slater,
ranked pairs, etc…
17
…and many others
18
• Q: How to evaluate rules in terms of
achieving democracy?
• A: Axiomatic approach
19
Axiomatic approach(what has been done in the past 50 years)
• Anonymity: names of the voters do not matter– Fairness for the voters
• Non-dictatorship: there is no dictator, whose top-ranked alternative is always the winner– Fairness for the voters
• Neutrality: names of the alternatives do not matter– Fairness for the alternatives
• Consistency: if r(D1)∩r(D2)≠ϕ, then r(D1∪D2)=r(D1)∩r(D2)
• Condorcet consistency: if there exists a Condorcet winner, then it must win– A Condorcet winner beats all other alternatives in pairwise elections
• Easy to compute: winner determination is in P– Computational efficiency of preference aggregation
• Hard to manipulate: computing a beneficial false vote is hard
20
Which axiom is more important?
• Some of these axiomatic properties are not compatible with others
Condorcet consistency
Consistency Easy to compute
Positional scoring rules
N Y Y
Kemeny Y N N
Ranked pairs Y N Y
21
An easy fact
• Theorem. For voting rules that selects a single winner, anonymity is not compatible with neutrality– proof:
>
>
>
>
≠W.O.L.G.
NeutralityAnonymity
Alice
Bob
22
Not-So-Easy facts• Arrow’s impossibility theorem
– Google it!
• Gibbard-Satterthwaite theorem
– Google it!
• Axiomatic characterization
– Template: A voting rule satisfies axioms A1, A2, A2 if it is rule X
– If you believe in A1 A2 A3 are the most desirable properties then X is
optimal
– (anonymity+neutrality+consistency+continuity) positional scoring
rules [Young SIAMAM-75]
– (neutrality+consistency+Condorcet consistency) Kemeny [Young&Levenglick SIAMAM-78]
• Axiomatic social choice
• The Condorcet Jury Theorem (CJT)
• Break
• Four directions of extending CJT
• Beyond CJT: the objective decision-
making perspective
23
Outline
• Given
– two alternatives {a,b}.
– competence 0.5<p<1,
• Suppose
– agents’ signals are i.i.d. conditioned on the ground truth
• w/p p, the same as the ground truth
• w/p 1-p, different from the ground truth
– agents truthfully report their signals
• The majority rule reveals ground truth as n→∞
24
The Condorcet Jury theorem (CJT) [Condorcet 1785, Laplace 1812]
• It Justifies the democracy and wisdom of
the crowd
• It “lays, among other things, the
foundations of the ideology of the
democratic regime” [Paroush SCW-98]
25
Why CJT is important?
• Group competence
– Pr(maj(Pn)=a|a)
– Pn: n i.i.d. votes given ground truth a
• Random variable Xj : takes 1 w/p p, 0 otherwise
– encoding whether signal=ground truth
• Σj=1nXj /n converges to p in probability (Law of Large
Numbers)26
Proof
The group competence
1. is higher than that of any single agent
2. increases in the group size n
3. goes to 1 as n→∞
27
Three parts of CJT
• From 2k to 2k+1
– The extra vote breaks ties with higher probability in
favor of the ground truth
– k@a+k@b
• From 2k+1 to 2k+2
– (k+1)@a+k@b(k+1)@a+(k+1)@b
– k@a+(k+1)@b(k+1)@a+(k+1)@b28
Proof of competence monotonicity
(k+1)@a+k@b
k@a+(k+1)@b
p
1-p
• Given
– two alternatives {a,b}.
– competence 0.5<p<1,
• Suppose
– agents’ signals are i.i.d. conditioned on the ground truth
• w/p p, the same as the ground truth
• w/p 1-p, different from the ground truth
– agents truthfully report their signals
• The majority rule reveals ground truth as n→∞
29
Limitations of CJT
more than two?
heterogeneous agents?
dependent agents?
strategic agents?
other rules?
• Axiomatic social choice
• The Condorcet Jury Theorem (CJT)
• Break
• Four directions of extending CJT
• Beyond CJT: the objective decision-
making perspective
30
Outline
• Dependent agents
• Heterogeneous agents
• Strategic agents
• More than two alternatives
31
Extensions
32
An active area
Social
Choice and
Welfare
American
Political
Science
Review
Games and
Economic
Behavior
Mathematical
Social
Sciences
Theory and
DecisionPublic
ChoiceEconometrica +
JET
Myerson
Shapley&Grofman
MSS special issue on ADT-15
• Dependent agents
• Heterogeneous agents
• Strategic agents
• More than two alternatives
33
Extensions
The group competence
1. is higher than that of any single agent
– Not always (mimicking one leader)
2. increases in the group size n
– Not always (mimicking one leader)
3. goes to 1 as n→∞
– Yes for some dependency models [Berg 92; Ladha
92, 93; Peleg&Zamir 12]34
Does CJT hold for dependent agents?
• Positive correlations
– agents are likely to receive similar signals even
conditioned on the ground truth
• Negative correlations
– agents are likely to receive different signals
• Conjecture: Positive correlations reduces group
competence
– positively correlated agents effectively reduces the
number of agents35
Dependent agents
• One leader (Y), 2k followers (X1,…, X2k), same
competence p
– Pr(Y=1) = Pr(Xj =1)=p
– Xj’s are independent conditioned on Y
• Correlation r2
– Pr(Xj =1|Y=1) = p+r(1-p)
– Pr(Xj =0|Y=0) = (1-p) + rp
• Theorem. In the opinion leader model
– when p>0.5 the group competence decreases in r
– when p<0.5 the group competence increases in r
– when p=0.5 the group competence does not change in r 36
Opinion leader model[Boland et al. 89]
• One common evidence (E), 2k+1 agents (X1,…, X2k+1),
same competence p
– Pr(E=1) = Pr(Xj=1)=p
– Xj’s are independent conditioned on E
• Correlation r2
– Pr(Xj=1|E=1) = p+r(1-p)
– Pr(Xj=0|E=0) = (1-p) + rp
• Theorem. In the common evidence model
– when p>0.5 the group competence decreases in r
– when p<0.5 the group competence increases in r
– when p=0.5 the group competence does not change in r 37
Common evidence model[Boland et al. 89]
• Ground truth G
• Common evidence E
• Given any ideal vote function f: EG
– Competence pe=Pr(Xj =f(e)|e)
• Theorem. The majority rule converges to
f(e) as n→∞
38
Common evidence model[Dietrich and List 2004]
G
E
X1 Xn…
• Dependent agents
• Heterogeneous agents
• Strategic agents
• More than two alternatives
39
Extensions
The group competence
1. is higher than that of any single agent
– Not always (1, 0.9.0.8,…)
2. increases in the group size n
– Not always (1, 0.9.0.8,…)
3. goes to 1 as n→∞
– not always: pj=0.5+1/n
– Yes under some condition [Berend&Paroush, 1998]
40
Does CJT hold for heterogeneous agents?
• Independent signals
• Agent j’s competence is pj
• Theorem [Berend&Paroush, 1998]. CJT holds
if and only if
1. , or
2. for every sufficiently large n,
41
Group competence for heterogeneous agents
• Given the competence {p1,…,pn} of n agents
where pj ≥0.5
– Ml: average competence of m randomly chosen
agents
• Theorem [Berend&Sapir 05]. For two alternatives
and all l≤n-1
– Ml ≤ Ml+1 if m is even
– Ml = Ml+1 if m is odd42
Competence monotonicity[Berend&Sapir 05]
• Theorem [Shapley and Grofman 1984]. Given
the competence {p1,…,pn} of n agents, the
maximum likelihood estimator is the
weighted majority voting with
• Proof. Suppose the ground truth is a, the
log likelihood of the profile is
43
Optimal voting rule for two alternatives
• Dependent agents
• Heterogeneous agents
• Strategic agents
• More than two alternatives
44
Extensions
The group competence
1. is higher than that of any single agent
– Not always (same-vote equilibrium)
2. increases in the group size n
– Not always (same-vote equilibrium)
3. goes to 1 as n→∞
– Yes for some models and informative
equilibrium 45
Does CJT hold for strategic agents?
• Common interest Bayesian voting game [Austen-
Smith&Banks APSR-96]
– two alternatives {a, b}, two signals {A,B}, a prior, Pr(signal|
truth),
• pa=Pr(signal=A|truth=a)
• pb=Pr(signal=B|truth=b)
– agents have the same utility function U(outcome, ground
truth) =1 iff outcome = ground truth
– sincere voting: vote for the alternative with the highest
posterior probability
– informative voting: vote for the signal
– strategic voting: vote for the alternative with the highest
expected utility46
Strategic voting
1. Nature chooses a ground truth g
2. Every agent j receives a signal sj~Pr(sj|g)
3. Every agent computes the posterior
distribution (belief) over the ground truth
using Bayesian’s rule
4. Every agent chooses a vote to maximizes
her expected utility according to her belief
5. The outcome is computed by the voting
rule 47
Timeline of the game
• Two signals, two voters
• Model:
Pr( | )
= Pr( | )
= p>0.5
48
High level example
p 1-p
+ my vote , winner:
utility for voting :
half/half half/half
p 1-p p1-p
Truthful agent:
1 0.5 0 0.5
Posterior:
The other signal:
• Setting
– Two alternatives {a, b}, two signals {A,B}
– Three agents
– pa=0.8, pb=0.6
– Uniform prior: Pr(a)=0.1, Pr(b)=0.9
• An agent receives A
– Informative voting: a
– posterior probability:
• 0.1*0.8@a vs. 0.9*0.4@b
• sincere voting: b49
Sincere voting = informative voting?
• Setting
– Two alternatives {a, b}, two signals {A,B}
– Three agents
– pa=0.8, pb=0.6
– Uniform prior: Pr(a)=Pr(b)=0.5
• An agent receives A, other two agents are sincere/informative
– Informative voting: a
– posterior probability: 2/3@a+1/3@b
• sincere voting: a
– probability of a tie (other two agents’ votes are {a, b})
• 0.32|a, 0.48|b
– Expected utility for voting a: 0.32*2/3
– Expected utility for voting b: 0.48*1/3
– Strategic voting: a 50
Sincere voting = strategic voting?
• Setting
– Two alternatives {a, b}, two signals {A,B}
– Three agents
– pa=0.8, pb=0.6
– Uniform prior: Pr(a)=Pr(b)=0.5
• An agent receives A, other two agents are
sincere/informative
– Conditioned on other two votes are {a, b}
– Signal profile is (A,A,B)
– Posterior probabilities
• Pr(a|A,A,B) Pr(∝ a)×Pr(A|a)×Pr(A|a)×Pr(B|a)=0.5pa2(1-pa)
• Pr(b|A,A,B) Pr(∝ b)×Pr(A|b)×Pr(A|b)×Pr(B|b)=0.5(1-pb)2pb
– Strategic voting: a51
The “pivotal” approach
• Given a Bayesian game, a Bayesian Nash
Equilibrium is a strategy profile (s1,…, sn)
such that
– sn: signal vote
– every agent j prefers sj to any other strategy,
conditioned on other agents playing s
• Example of strategy
– Informative voting: s(A)=a, s(B)=b
– You can also: s(A)=b, s(B)=a52
Bayesian Nash Equilibrium
• Theorem [McLennan 98]. Let r* denote
the voting rule with maximum expected
utility given informative vote. Informative
voting is a Bayesian Nash Equilibrium
under r*.
53
Equilibrium under the optimal voting rule
• Key question:
– What are the equilibria of the game (hopefully informative
voting)?
– Does CJT hold in equilibria?
• Similar model for juries– [Feddersen&Pesendorfer Econometrica-97, APSR-98, PNAS-99]
• Number of voters is uncertain, following a Poisson
distribution– [Myerson GEB-98, JET-02]
• Three alternatives– [Nunez JTP-10; Goertz&Maniquet JET-11;B outon and Micael Castanheira
Econometrica-12; Goertz SCW-14; Goertz&Maniquet EL-14]
54
Subsequent work
• Dependent agents
• Heterogeneous agents
• Strategic agents
• More than two alternatives
55
Four extensions
Condorcet’s MLE approach• Parametric ranking model Mr: given a “ground truth” parameter Θ
– each vote V is drawn i.i.d. conditioned on Θ, according to Pr(V|Θ)
– Each P is a ranking
• For any profile P=(V1,…,Vn),
– The likelihood of Θ is L(Θ|P)=Pr(P|Θ)=∏V∈P Pr(V|Θ)
– The MLE mechanism
MLE(P)=argmaxΘ L(Θ|P)
– Break ties randomly
• What if Decision space ≠ Parameter space?
“Ground truth” Θ
V1 V2 Vn…
56
• Fix the dispersion ϕ<1
• Parameter space
– all full rankings over alternatives
• Sample space
– i.i.d. generated full rankings
• Probabilities: given a ground truth ranking
W, generate a ranking V w.p.
PrW(V) ∝ ϕ Kendall(V,W)
• MLE is the Kemeny rule57
Mallows’ model [Mallows-1957]
• Fix the dispersion ϕ<1
• Parameter space
– all binary relations over alternatives
• Sample space
– i.i.d. generated binary relations
• Probabilities: given a ground truth relation
W, generate a relation V w.p.
PrW(V) ∝ ϕ Kendall(V,W)
58
Condorcet’s model [Condorcet-1785, Young-1988]
• Understanding truth-revealing property of
existing rules
– MLE: [Conitzer&Sandholm UAI-05; Conitzer,Rognile&Xia IJCAI-
09; Xia,Conitzer&Lang AAMAS-10; Xia&Conitzer AAAI-11]
– Consistent estimator: [Caragiannis, Procaccia & Shah EC-13]
– Most probable winner: [Procaccia, Reddit&Shah UAI-13;
Elkind&Shah UAI-14; Azari Soufiani, Parkes,&Xia NIPS-14]
• Learning ranking models
– Mallows’ model: [Lu&Boutilier ICML-11; Hughes, Hwang&Xia
UAI-15; Awasthi et al. NIPS-14; Chierichetti et al. ITCS-15]
– Random Utility Models [too many to show] 59
Recent Work in Computer Science
• Axiomatic social choice
• The Condorcet Jury Theorem (CJT)
• Break
• Four directions of extending CJT
• Beyond CJT: the objective decision-
making perspective
60
Outline
• Thinking about Arrow’s impossibility theorem
– axiomatic properties are used to evaluate and
compare voting rules
• New perspective
– an objective measurement for voting rules
– can be seem as another numerical “axiomatic”
property
61
Beyond CJT
• How to make objectively optimal decision using
voting?
• Goal: new computationally tractable voting rule
with desirable axiomatic+statistical properties– 2 alternatives: majority rule
– Kemeny’s rule (for ranking), NP-hard to compute
• Especially when Decision space ≠ Parameter
space
– e.g. use Mallows’ model to choose a single winner 62
CJT: the optimal objective decision-making perspective
• Social choice community
– statistical models are compelling
• Statistics/Machine Learning community
– some axioms are desirable
• strategy-proofness
• monotonicity
• agents have less incentive to lie
63
Why care?
StatML
Social
Choice
Inputs
The rule
64
Statistical decision-theoretic framework for social choice
[Azari Soufiani, Parkes &Xia NIPS-14]
• statistical model: Θ, S, Prθ(s)
• decision space: D
• loss function: L(θ, d)∈ℝ
f : Profiles⟶D with minimum Bayesian expected lost:
– f (P) argmin∈ d Eθ|P L(θ,d)
unknown ground truth decision to make
• fB1 (Mallows)
– Statistical model: Mallows’ model
– Decision space: single winners
– Loss function: the top loss function
• L(W, a) =0 if a is top-ranked in W, otherwise it is 1
– Bayesian estimator with uniform prior
• fB2 (Condorcet)
– Statistical model: Condorcet’s model
– Decision space: single winners
– Loss function: the top loss function
• L(W, a) =0 if a is top-ranked in W, otherwise it is 1
– Bayesian estimator with uniform prior 65
Two examples
Anonymity, neutrality,
monotonicity
Consistency
Majority, Condorcet Complexity
Min Bayesian
risk
Kemeny(Fishburn)
✔ ✗
✔ ✗ ✗
fB1
(Mallows)
✗✗ ✔ for
Mallows
fB2
(Condorcet) ✔✔ for
Condorcet
66
Comparisons
Highlight: fB2 does well in many aspects.
• How much does strategic agents hurt the truth-revealing
power?
• Price of Anarchy (PoA) [Koutsoupias&Papadimitriou STACS-99]
Optimal truth-revealing power
WORST truth-revealing power in equilibrium
• Price of Stability (PoS) [Anshelevich et al. FOCS-04]
Optimal truth-revealing power
BEST truth-revealing power in equilibrium
• Theorem [Xia-15]. Informative voting is a BNE under plurality
for a wide range of statistical modes with m>2
• Theorem [Xia-15]. The PoA of plurality is at least m, the PoS
of plurality is 1 as n→∞67
CJT: numerically evaluate the effect of strategic agents
• The Condorcet Jury Theorem
• Four extensions
– dependent agents
– heterogeneous agents
– strategic agents
– more than two alternatives
• The new perspective
– design new mechanisms
– PoA and PoS 68
Wrap up
• Numerical extensions of the CJT to
– dependent, heterogeneous, and strategic
agents
– with m>2
– for commonly studied voting rules
• The new perspectives
– New frameworks and rules compromising
axiomatic, computational, and truth-revealing
properties69
Open questions
Thank you!
• Given
– a similarity function d
• symmetric, coincidence axiom
• not necessarily triangle inequality
– a dispersion 0<ϕ<1
• Prb(a) ∝ ϕ d(a, b)
70
Mallows-like models
• d:
• Suppose an agent receives a1
– When t is sufficiently small, reporting a2 has a
higher expected utility given that other agents
are sincere under the plurality rule
– When triangle inequality is satisfied, sincere
voting is a BNE
71
Sincere voting is not always a BNEa1 a3
a2 a4
t t33
2
2