Fairness in Machine Learning
-
Upload
delip-rao -
Category
Technology
-
view
304 -
download
0
Transcript of Fairness in Machine Learning
Fairness in Machine LearningDelip Rao
Metrics
Every ML practitioners dream scenarioWell defined eval objective
Lots of clean data
Rich data with lot of attributes
Incorporating Ethnicity improves Engagement Metrics
But should you do it?
Report Link
Goldstein, the “computer expert”
Dramatic Changes in Machine Learning Landscape
Rise of fast/cheap data collection, processing
Rise of popular, easy-to-use tools
Rise of Data Scientist Factories
Two QuestionsShould everything that can be predicted, be predicted?
If you really have to predict, what should you be aware of?
ScS protected
class
population
Blatant Explicit DiscriminationFeature4231:Race=’Black’
Discrimination Based on Redundant Encoding
Feature4231:Race=’Black’
Features = {‘loc’, ‘income’, ..}
Polynomial kernel with degree 2
Feature6578:Loc=’EastOakland’^Income=’<10k’
Big DataThere is no data like more data.
Big Data
Classifier error rate
Number of training examples in your data
Most ML Objective functions create models accurate for the majority
class at the expense of the protected class
Cultural differences can throw a wrench in your models
Look at Error Cases vs. Error RatesMacros - Accuracy, RMSE, F1, etc
vs.
Individuals
Becoming Responsible Gatekeepers
We are pretty good at learningfunction approximations today
Image Credit: Jason Eisner, the three cultures of ML, 2016
NNs &Decision Trees
Learning methods that introduce fairness
Ways to characterize fairness
Need
How can we characterize fairness?What does fairness even mean?
Group Fairness vs. Individual Fairness
How can we characterize fairness?One way to characterize group fairness is to ensure both majority and the protected population have similar outcomes.
or
P(FavorableOutcome | S) : P(FavorableOutcome | Sc) = 1 : 1
How can we characterize fairness?One way to characterize group fairness is to ensure both majority and the protected population have similar outcomes.
or
P(FavorableOutcome | S) : P(FavorableOutcome | Sc) = 1 : 1
often this is hard to achieve.
For example, for jobs, the EEOC specifies this ratio should be no less than 0.8 : 1 (aka 80% rule).
Characterizing Fairness of a black box classifierOne way: Is classifier outcome correlated with membership in S?
Fairness as a constraintIs classifier outcome correlated with membership in S?Sensitive attributes
Decision function
Want
Fairness as a constraint
Constraint to be added:
Supervised Learning with Fairness Constraint
minimize
such that
Zafar et al, ICML 2015
“If we allowed a model to be used for college admissions in 1870, we’d still have 0.7% of women going to college.”
Recommended Reading
Reading ListThere’s much material on fairness in data-driven decision/policy making from literature in
- law
- sociology
- political science
- computer science/machine learning
- economics
(the machine learning literature is nascent, only around 2009 onwards)
Reading List (Fairness in ML)Pedreschi, Dino, Salvatore Ruggieri, and Franco Turini. "Measuring Discrimination in Socially-Sensitive Decision Records." SDM. 2009.
Kamiran, Faisal, and Toon Calders. "Classifying without discriminating."Computer, Control and Communication, 2009. IC4 2009. 2nd International Conference on. IEEE, 2009.
Dwork, Cynthia, et al. "Fairness through awareness." Proceedings of the 3rd Innovations in Theoretical Computer Science Conference. ACM, 2012
Romei, Andrea, and Salvatore Ruggieri. "A multidisciplinary survey on discrimination analysis." The Knowledge Engineering Review 29.05 (2014)
Reading List (Fairness in ML)Friedler, Sorelle, Carlos Scheidegger, and Suresh Venkatasubramanian. "Certifying and removing disparate impact." CoRR (2014).
Barocas, Solon and Selbst, Andrew D., Big Data's Disparate Impact (August 14, 2015). California Law Review, Vol. 104,
Zafar, Muhammad Bilal, et al. "Fairness Constraints: A Mechanism for Fair Classification." arXiv preprint arXiv:1507.05259 (2015).
Zliobaite, Indre. "On the relation between accuracy and fairness in binary classification." arXiv preprint arXiv:1505.05723 (2015).
Other resourcesNSF’s “Big Data Innovation Hubs” were created in part to address these challengeshttp://www.nsf.gov/news/news_summ.jsp?cntn_id=136784
Stanford Law Review touches upon this topic regularlyhttp://www.stanfordlawreview.org/online/privacy-and-big-data
Fairness bloghttp://fairness.haverford.edu
Academic: FATML workshops (NIPS 2014, ICML 2015)www.fatml.org
LessonsDiscrimination is an emergent property of any learning algorithm
Watch out for discrimination (implicitly) encoded in features
Big Data can cause Big Problems
Watch out for the proportion of the protected classes
Always do error analysis with protected classes in mind
Notions of fairness are nascent at best. Involve as many people to improve understanding.
There is no one best notion of fairness
questions@deliprao / [email protected]