Introduction of “Fairness in Learning: Classic and Contextual Bandits”

24
Introduction of “Fairness in Learning: Classic and Contextual Bandits” authorized by Matthew Joseph, Michael Kearns, Jamie Morgenstern, and Aaron Roth NIPS2016-Yomi January 19, 2017 Presenter: Kazuto Fukuchi

Transcript of Introduction of “Fairness in Learning: Classic and Contextual Bandits”

Page 1: Introduction of “Fairness in Learning: Classic and Contextual Bandits”

Introduction of “Fairness in Learning:

Classic and Contextual Bandits”

authorized by Matthew Joseph, Michael Kearns, Jamie Morgenstern, and Aaron Roth

NIPS2016-YomiJanuary 19, 2017

Presenter: Kazuto Fukuchi

Page 2: Introduction of “Fairness in Learning: Classic and Contextual Bandits”

Fairness in Machine LearningConsequential decisions using machine learning

may lead unfair treatmentE.g., Google’s ad suggestion system [Sweeney 13]

Fairness in contextual bandit problem

African descent names European descent names

Arrested? Located

Negative ad. Neutral ad.

Page 3: Introduction of “Fairness in Learning: Classic and Contextual Bandits”

Individual fairness persons

• Choose one person for conducting an action• E.g., lend loan, hire, admission, etc.

When we can preferentially choose one person?Only if the person has the largest ability

There is no other reason for preferential choice Payback 90% Payback 60%

>

Page 4: Introduction of “Fairness in Learning: Classic and Contextual Bandits”

Contextual Bandit Problem

Each round 1. Obtain a context for each arm 2. Choose one arm 3. Observe reward s.t. and a.s.

-arms

𝑓 1 𝑓 2 𝑓 3 𝑓 4 𝑓 5

Unknown to the learner

Goal: Maximize the expected cumulative reward

Page 5: Introduction of “Fairness in Learning: Classic and Contextual Bandits”

Example: Linear Contextual BanditDefine• Suppose

E.g., Online recommendation• : Feature of a product • : Feature of a user regarding the product • Score of a user for a product is an inner

product

Page 6: Introduction of “Fairness in Learning: Classic and Contextual Bandits”

Example: Classic Bandit

• Expected reward is • Set for any • Then, the contextual bandit becomes to the

classic bandit

𝜇1 𝜇2 𝜇3 𝜇4 𝜇5

Page 7: Introduction of “Fairness in Learning: Classic and Contextual Bandits”

Regret• History : a record of experiences

• contexts, arm chosen, and reward observed• A policy : mapping from and to a distribution on

arms • Probability of choosing arm with at round

Regret: Dropped reward compared to the optimal policy

Regret bound if

Page 8: Introduction of “Fairness in Learning: Classic and Contextual Bandits”

Fairness ConstraintIt is unfair to preferentially choose one individual without an acceptable reason

A policy is -fair if with probability

Quality of the chosen individual is larger than others.

Probability of choosing arm at round

𝑓 𝑗 (𝑥 𝑗𝑡 )

>𝑓 𝑗 ′ (𝑥 𝑗 ′

𝑡 )

Page 9: Introduction of “Fairness in Learning: Classic and Contextual Bandits”

Institution of Fairness Constraint• Optimal policy is fair• But we can’t get the optimal policy due to

unknown

>

Can’t distinguish which arm has high expected reward

Expected reward is lower than the left group with h.p.

Fairness constraint enforces to choose a arm from the left group with uniform distribution

Page 10: Introduction of “Fairness in Learning: Classic and Contextual Bandits”

Fairness in Classic Bandit• Consider confidence bounds of the expected

rewards

• Choose uniformly from the chained group

expected rewards

Arm 1Arm 2Arm 3Arm 4Arm 5

Chained

Expected reward is lower than that of arms in the chained group

Page 11: Introduction of “Fairness in Learning: Classic and Contextual Bandits”

Fair Algorithm for Classic Bandit

Page 12: Introduction of “Fairness in Learning: Classic and Contextual Bandits”

Regret Upper BoundIf , then FairBandits has regret

• rounds require to obtain non-trivial regret, i.e., • Non-fair case:

• becomes by fairness constraint• Dependence on is optimal

Page 13: Introduction of “Fairness in Learning: Classic and Contextual Bandits”

Regret Lower BoundAny fair algorithm experiences constant per-round regret for at least

• constant per-round regret = non-trivial regret• To achieve non-trivial regret, we need at least

rounds• Thus, is necessary and sufficient

Page 14: Introduction of “Fairness in Learning: Classic and Contextual Bandits”

Fairness in Contextual BanditKWIK learnable = Fair bandit learnable

KWIK (Know What It Know) learning• Online regression• Learner outputs either prediction or

• denotes “I Don’t Know”• Only when , the learner observes feedback s.t.

𝑥𝑡Feature

Learner

“I Don’t Know”

Accurately predictable

Page 15: Introduction of “Fairness in Learning: Classic and Contextual Bandits”

KWIK learnable-KWIK learnable on a class with if1. for all w.p.

Institutions• Prediction is accurate if • With small number of answering

• number of answering =

Page 16: Introduction of “Fairness in Learning: Classic and Contextual Bandits”

KWIK Learnability Implies Fair Bandit LearnabilitySuppose is -KWIK learnable with Then, there is -fair algorithm for s.t.

For where

Page 17: Introduction of “Fairness in Learning: Classic and Contextual Bandits”

Linear Contextual Bandit Case• Let

• Then,

Page 18: Introduction of “Fairness in Learning: Classic and Contextual Bandits”

KWIK to Fair

Page 19: Introduction of “Fairness in Learning: Classic and Contextual Bandits”

Institution of KWIKToFair• Predict the expected rewards using KWIK

algorithm for each arm• If the outputs of KWIK algorithm is not

• Same strategy of classic bandit is applicableexpected rewards

Arm 1Arm 2Arm 3Arm 4Arm 5

2𝜖∗

Page 20: Introduction of “Fairness in Learning: Classic and Contextual Bandits”

Fair Bandit Learnability Implies KWIK LearnabilitySuppose • There is -fair algorithm for with regret • There exists s.t. for Then, there is -KWIK learnable algorithm for with is the solution of

Page 21: Introduction of “Fairness in Learning: Classic and Contextual Bandits”

An Exponential Separation Between Fair and Unfair Learning• Boolean conjunctions: Let

• Boolean conjunctions without fairness constraint

• For such , KWIK bound is at least • For , worst case regret bound is

Page 22: Introduction of “Fairness in Learning: Classic and Contextual Bandits”

Fair to KWIK

Page 23: Introduction of “Fairness in Learning: Classic and Contextual Bandits”

Institution of FairToKWIK• Divide domain of s.t. each width becomes

• Using fair algorithm, 𝑓 (𝑥𝑡 )0 𝜖∗ 2𝜖∗

𝑥 (0) 𝑥 (1) 𝑥 (2)

𝑥𝑡

𝑥 (ℓ) 𝑥𝑡

><?

𝑥 (3) 𝑥 (4 )

𝑝ℓ ,1 𝑝ℓ ,2Prob. of choosing

left armProb. of choosing

right arm

If for all , is in the red area

Output Otherwise,

Output

Page 24: Introduction of “Fairness in Learning: Classic and Contextual Bandits”

Conclusions• Fairness in contextual bandit problem and

classic bandit problem• -fair: with probability

Results• Classical Bandits: Necessary and sufficient

rounds to achieve non-trivial regret is • Contextual Bandits: Tightly relationship with

Knows What it Knows (KWIK) learning