A Methodology for Direct and Indirect Discrimination Prevention in Data Mining
description
Transcript of A Methodology for Direct and Indirect Discrimination Prevention in Data Mining
![Page 1: A Methodology for Direct and Indirect Discrimination Prevention in Data Mining](https://reader036.fdocuments.us/reader036/viewer/2022062520/56815f33550346895dce03ec/html5/thumbnails/1.jpg)
A Methodology for Direct and IndirectDiscrimination Prevention in
Data Mining
Presented By:Rucha Bhutada
Guided By:Prof. M. R. Wanjari
![Page 2: A Methodology for Direct and Indirect Discrimination Prevention in Data Mining](https://reader036.fdocuments.us/reader036/viewer/2022062520/56815f33550346895dce03ec/html5/thumbnails/2.jpg)
Outline:
Introduction Challenges Discrimination analysis Why discrimination Papers read Findings of the base paper Future plans
![Page 3: A Methodology for Direct and Indirect Discrimination Prevention in Data Mining](https://reader036.fdocuments.us/reader036/viewer/2022062520/56815f33550346895dce03ec/html5/thumbnails/3.jpg)
Introduction:
Data mining is an increasingly important technology for extracting useful knowledge hidden in large collections of data.
Some Negative social perceptions can also be mined, like: Potential Privacy invasion Potential discrimination
If the training datasets are biased in what regards discriminatory attributes like gender, race, religion, discriminatory decisions may follow.
![Page 4: A Methodology for Direct and Indirect Discrimination Prevention in Data Mining](https://reader036.fdocuments.us/reader036/viewer/2022062520/56815f33550346895dce03ec/html5/thumbnails/4.jpg)
Challenges:
Direct and indirect discrimination instead of only direct discrimination
To find a good tradeoff between discrimination removal and the quality of the resulting training data sets and data mining models.
![Page 5: A Methodology for Direct and Indirect Discrimination Prevention in Data Mining](https://reader036.fdocuments.us/reader036/viewer/2022062520/56815f33550346895dce03ec/html5/thumbnails/5.jpg)
Why this topic:
It’s an extension to association rule mining. And a novel application of association rule mining in social environment.
It is more than obvious that most people do not want to be discriminated on any of the sensitive issues.
Can be useful in deriving discrimination free rule base for decision making systems like insurance, loan, job etc.
![Page 6: A Methodology for Direct and Indirect Discrimination Prevention in Data Mining](https://reader036.fdocuments.us/reader036/viewer/2022062520/56815f33550346895dce03ec/html5/thumbnails/6.jpg)
Example:
U.S. federal laws prohibit discrimination on the basis of: Race , Color, Religion, Nationality, Marital status, Age
In a number of settings: • Credit/insurance scoring • Sale, rental, and financing of housing • Personnel selection and wage• Access to public accommodations, education, nursing
homes, adoptions, and health care.
![Page 7: A Methodology for Direct and Indirect Discrimination Prevention in Data Mining](https://reader036.fdocuments.us/reader036/viewer/2022062520/56815f33550346895dce03ec/html5/thumbnails/7.jpg)
Papers read:
Sr. No.
Paper Name Author Year Conclusion
1 A Methodology for Direct and IndirectDiscrimination Prevention in Data Mining
Sara Hajian and Josep Domingo-Ferrer
2013 To develop a new preprocessing discrimination prevention methodology
2 “RuleProtection for Indirect Discrimination Prevention in DataMining
S. Hajian, J. Domingo-Ferrer, and A. Martı´nez-Balleste
2011 To protect thedecision rules made for discrimination
3 Classification with no Discriminationby Preferential Sampling
F. Kamiran and T. Calders
2010 To refine the model of discrimination
![Page 8: A Methodology for Direct and Indirect Discrimination Prevention in Data Mining](https://reader036.fdocuments.us/reader036/viewer/2022062520/56815f33550346895dce03ec/html5/thumbnails/8.jpg)
Discussion On Findings Of Base Paper
Discrimination is unfair or unequal treatment of people based on membership to a category or a minority, without regard to individual merit
Discrimination can be either direct or indirect:
Direct discrimination occurs when decisions are made based on sensitive attributes.
Indirect discrimination occurs when decisions are made based on non-sensitive attributes which are strongly correlated with biased sensitive ones.
![Page 9: A Methodology for Direct and Indirect Discrimination Prevention in Data Mining](https://reader036.fdocuments.us/reader036/viewer/2022062520/56815f33550346895dce03ec/html5/thumbnails/9.jpg)
Approach: Anti-discrimination techniques have been introduced in data
mining:
- Discrimination discovery:Consists of supporting the discovery of discriminatory
decisions hidden, either directly or indirectly, in a dataset of historical decision records.
- Discrimination Prevention:Consists of inducing patterns that do not lead to
discriminatory decisions even if the original data sets are biased.
![Page 10: A Methodology for Direct and Indirect Discrimination Prevention in Data Mining](https://reader036.fdocuments.us/reader036/viewer/2022062520/56815f33550346895dce03ec/html5/thumbnails/10.jpg)
Approach: (cont’d) Preprocessing approach Data sets: collection of data objects Item , An item set, The support of an item set, supp(X), is the fraction of records that contain
the item set X. We say that a rule X C is completely supported by a record if both X and C appear in the record.
The confidence of a rule, conf(X C), measures how often the class item C appears in records that contain X. Hence, if supp(X) > 0 then
Support and confidence range over [0,1].
![Page 11: A Methodology for Direct and Indirect Discrimination Prevention in Data Mining](https://reader036.fdocuments.us/reader036/viewer/2022062520/56815f33550346895dce03ec/html5/thumbnails/11.jpg)
Approach: (cont’d):
• A frequent classification rule is a classification rule with support and confidence greater than respective specified lower bounds.
• The negated item set, i.e., not of X is an item set with the same attributes as X, but the attributes in not of X take any value except those taken by attributes in X.
![Page 12: A Methodology for Direct and Indirect Discrimination Prevention in Data Mining](https://reader036.fdocuments.us/reader036/viewer/2022062520/56815f33550346895dce03ec/html5/thumbnails/12.jpg)
Approach: (cont’d):
o Potentially Discriminatory and Nondiscriminatory Classification Rules Let DIs be the set of predetermined discriminatory items in DB (eg.
DI={foreign worker= yes, Race= black, Gender= female}). Frequent classification rules in FR fall into one of the following two classes:
(FR stands for frequent classification rule) A classification rule X→C is potentially discriminatory (PD) when X = A,B
with A subset of DI, a nonempty discriminatory item set and B a nondiscriminatory item set. For example, {foreign worker= yes, city = NYC}→Hire = no.
A classification rule X→C is potentially nondiscriminatory (PND) when X = D,B is a nondiscriminatory item set. For example,{zip = 10451,City = NYC}→Hire = no or {Experience = low, City = NYC}→ Hire = no.
The word “potentially” means that a PD rule could probably lead to discriminatory decisions. Also, a PND rule could lead to discriminatory decisions in combination with some background knowledge;
![Page 13: A Methodology for Direct and Indirect Discrimination Prevention in Data Mining](https://reader036.fdocuments.us/reader036/viewer/2022062520/56815f33550346895dce03ec/html5/thumbnails/13.jpg)
Approach: (cont’d)o Direct Discrimination Measure Definition 1. Let A,B→C be a classification rule such that
conf(B→C>0). The extended lift of a rule is
The idea here is to evaluate the discrimination of a rule as the gain of confidence due to the presence of thediscriminatory items
Definition 2. Let α ε R be a fixed threshold and let A be a discriminatory item set. A PD classification rule c = A,B →C is a α protective w r t elift if elift (c) < α. Otherwise, c is α discriminatory.
![Page 14: A Methodology for Direct and Indirect Discrimination Prevention in Data Mining](https://reader036.fdocuments.us/reader036/viewer/2022062520/56815f33550346895dce03ec/html5/thumbnails/14.jpg)
Approach: (cont’d)
o Indirect Discrimination Measure: Definition 3. A PND classification rule r: D, B →C is a
redlining rule if it could yield an α discriminatory rule r’ : A,B→C in combination with currently available background knowledge rules of the form rb1 : A,B→D and rb2 : D,B→A, where A is a discriminatory item set.
For example: {zip= 10451, city= NYC} →Hire= no.
![Page 15: A Methodology for Direct and Indirect Discrimination Prevention in Data Mining](https://reader036.fdocuments.us/reader036/viewer/2022062520/56815f33550346895dce03ec/html5/thumbnails/15.jpg)
Approach: (cont’d)o Data Transformation for Direct Discrimination:
Direct Rule Protection:- converts α discriminatory rule into an α protective
rule
o Data transformation for indirect Discrimination:Indirect Rule Protection:
- Turns into redlining rule into non redlining
![Page 16: A Methodology for Direct and Indirect Discrimination Prevention in Data Mining](https://reader036.fdocuments.us/reader036/viewer/2022062520/56815f33550346895dce03ec/html5/thumbnails/16.jpg)
Data sets:
• Adult data set:This data set consists of 48,842 records, split into a
“train” part with 32,561 records and a “test” part with 16,281 records. The data set has 14 attributes (without class attribute).
• German credit data set: We also used the German Credit data set. This data set
consists of 1,000 records and 20 attributes (without class attribute) of bank account holders. This is a well-known real-life data set, containing both numerical and categorical attributes.
![Page 17: A Methodology for Direct and Indirect Discrimination Prevention in Data Mining](https://reader036.fdocuments.us/reader036/viewer/2022062520/56815f33550346895dce03ec/html5/thumbnails/17.jpg)
Result: (table 1)
• Misses cost (MC). This measure quantifies the percentage of rules among those extractable from the original data set that cannot be extracted from the transformed data set (side effect of the transformation process).
Ghost cost (GC). This measure quantifies the percentage of the rules among those extractable from the transformed data set that were no extractable from the original data set (side effect of the transformation process).
![Page 18: A Methodology for Direct and Indirect Discrimination Prevention in Data Mining](https://reader036.fdocuments.us/reader036/viewer/2022062520/56815f33550346895dce03ec/html5/thumbnails/18.jpg)
Result: (table 2)
![Page 19: A Methodology for Direct and Indirect Discrimination Prevention in Data Mining](https://reader036.fdocuments.us/reader036/viewer/2022062520/56815f33550346895dce03ec/html5/thumbnails/19.jpg)
Result: (table 3 and 4) .
Tables 3 and 4 shows that lower information loss in terms of the GC measure in the Adult data set than in the German Credit data set.
![Page 20: A Methodology for Direct and Indirect Discrimination Prevention in Data Mining](https://reader036.fdocuments.us/reader036/viewer/2022062520/56815f33550346895dce03ec/html5/thumbnails/20.jpg)
Future plans:
This can be implemented in Indian Scenario
To check the corruption
Gender discrimination
![Page 21: A Methodology for Direct and Indirect Discrimination Prevention in Data Mining](https://reader036.fdocuments.us/reader036/viewer/2022062520/56815f33550346895dce03ec/html5/thumbnails/21.jpg)
References:1. S. Hajian, J. Domingo-Ferrer, and A. Martı´nez-Balleste´,
“Rule Protection for Indirect Discrimination Prevention in Data Mining,” Proc. Eighth Int’l Conf. Modeling Decisions for Artificial Intelligence (MDAI ’11), pp. 211-222, 2011.
2. D. Pedreschi, S. Ruggieri, and F. Turini, “Discrimination-Aware Data Mining,” Proc. 14th ACM Int’l Conf. Knowledge Discovery and Data Mining (KDD ’08), pp. 560-568, 2008.
3. S. Ruggieri, D. Pedreschi, and F. Turini, “Data Mining for Discrimination Discovery,” ACM Trans. Knowledge Discovery from Data, vol. 4, no. 2, article 9, 2010.
4. S. Ruggieri, D. Pedreschi, and F. Turini, “DCUBE: Discrimination
5. Discovery in Databases,” Proc. ACM Int’l Conf. Management of Data (SIGMOD ’10), pp. 1127-1130, 2010.
![Page 22: A Methodology for Direct and Indirect Discrimination Prevention in Data Mining](https://reader036.fdocuments.us/reader036/viewer/2022062520/56815f33550346895dce03ec/html5/thumbnails/22.jpg)
THANK YOU…!!!