How to unify attribution explanations by interactions?

27
1 How to unify attribution explanations by interactions? Huiqi Deng

Transcript of How to unify attribution explanations by interactions?

1

How to unify attribution

explanations by interactions?

Huiqi Deng

Attribution definition

[1] Sundararajan et al. "Axiomatic attribution for deep networks." ICML, 2017. 2

Attribution explanation

โ€ข A branch of semantic explanations

โ€ข Inferring contribution score of each individual feature

Label: cat

Input image

Predict Interpret

Attribution Heatmap

Definition 1. For a pre-trained model ๐‘“, an attribution of prediction at

input ๐’™ = [๐‘ฅ1, . . . , ๐‘ฅ๐‘›] is a vector ๐’‚ = [๐‘Ž1, . . . , ๐‘Ž๐‘›], where ๐‘Ž๐‘– is the

contribution of ๐‘ฅ๐‘– to the prediction ๐‘“(๐’™).

3

Existing attribution methods

Many attribution methods are proposed recently.

Input sample

Gradient

*Input

Occlusion

DeepLIFT๐œ–-LRP

Integrated

Grads

Expected

Gradients......

โ€ข Sensitivity

โ€ข Perturbation

โ€ข Layerwise decomposition

โ€ข Averaging gradients

Different formulations

Various heuristics

4

Problems of attribution explanation

The attrbution problem is not well-defined

โ€ข The definition is uninformative for how to assign the contribution

Many attribution methods are based on different heurisitcs

โ€ข Few theoretical foundations

โ€ข No mutuality among existing methods

โ€ข Difficult to compare theoretically

5

Contributions of this paper

โ€ข We propose a Taylor attribution framework, which offers a theoretical

formulation to the attribution problem.

โ€ข Fourteen mainstream attribution methods with different formulations are

unified into the proposed framework by theoretical reformulations.

โ€ข We propose principles for a reasonable attribution, and assess the fairness

of existing attribution methods.

Deng et al. A Unified Taylor Framework for Revisiting Attribution Methods, AAAI, 2021. Deng et al. A General Taylor Framework for Unifying and Revisiting Attribution Methods. in arXiv:2105.13841

6

Contributions of this paper

โ€ข We propose a Taylor attribution framework, which offers a theoretical

formulation for how to assign contribution.

โ€ข Fourteen mainstream attribution methods are unified into the proposed

Taylor framework by theoretical reformulations.

โ€ข We propose principles for a reasonable attribution, and assess the

fairness of existing attribution methods.

Deng et al. A Unified Taylor Framework for Revisiting Attribution Methods, AAAI, 2021. Deng et al. A General Taylor Framework for Unifying and Revisiting Attribution Methods. in arXiv:2105.13841

7

Attribution problem statement

Input: pre-trained model ๐‘“, input sample ๐’™, and baseline ๐’™ (no signal state)

Output: attribution vector ๐’‚

Corresponds to ๐‘ฃ(๐‘) โˆ’ ๐‘ฃ(โˆ…)

However, there are infinite possible cases for such decomposition.

Which decomposition is reasonable?

Deng et al. A Unified Taylor Framework for Revisiting Attribution Methods, AAAI, 2021. Deng et al. A General Taylor Framework for Unifying and Revisiting Attribution Methods. in arXiv:2105.13841

๐‘“(๐’™) โˆ’ ๐‘“(๐’™) = ๐‘Ž1 +. . . +๐‘Ž๐‘›

Many attribution methods aim to distribute the outcome of ๐’™ (w.r.t the

baseline ๐’™) to each feature,

8

Taylor attribution framework

Challengesโ€ข DNN Model ๐‘“ is too complex to analyze

Basic ideaโ€ข Taylor Theroem: If ๐‘“(๐’™) is infinitely differentiable, then ๐‘“(๐’™) โˆ’ ๐‘“(๐’™)

can be approximated by a Taylor expansion function

โ€ข The Taylor expansion function can be explictly divided into independent

and interactive parts

โ€ข Then the attribution can be expressed as a function of Taylor independent

and interaction terms

Deng et al. A Unified Taylor Framework for Revisiting Attribution Methods, AAAI, 2021. Deng et al. A General Taylor Framework for Unifying and Revisiting Attribution Methods. in arXiv:2105.13841

Second-order Taylor attribution

Second-order Taylor expansion

9

๐‘“(๐’™) โˆ’ ๐‘“(๐’™) =

๐‘–

๐‘“๐‘ฅ๐‘– โˆ†๐‘– +1

2

๐‘–

๐‘—

๐‘“๐‘ฅ๐‘–๐‘ฅ๐‘— โˆ†๐‘–โˆ†๐‘— + ๐œบ (๐Ÿ)

๐‘“(๐’™) โˆ’ ๐‘“(๐’™) =

๐‘–

๐‘“๐‘ฅ๐‘– โˆ†๐‘– +1

2

๐‘–

๐‘“๐‘ฅ๐‘–2 โˆ†๐‘–2 +

1

2

๐‘–โ‰ ๐‘—

๐‘“๐‘ฅ๐‘–๐‘ฅ๐‘— โˆ†๐‘–โˆ†๐‘— + ๐œบ (๐Ÿ)

All high-order

independent terms, ๐‘ป๐’Š๐œธ

All high-order

interaction terms ๐‘ฐ(๐‘บ)

All first-order

terms, ๐‘ป๐’Š๐œถ

Divide the expansion into first-order, high-order independent and interaction terms

Attribution vector can be expressed as a function of the three type terms

๐‘Ž๐‘– = ๐œ‘(๐‘‡๐‘–๐›ผ , ๐‘‡๐‘–

๐›พ,

๐ผ(๐‘†))๐‘Ž๐‘– = ๐‘‘๐‘’๐‘๐‘œ๐‘š๐‘๐‘œ๐‘ ๐‘’( ๐‘“(๐’™) โˆ’ ๐‘“(๐’™))

Connections with related work

in Game theory

10

โ€ข Connection to Shapley Taylor interaction index [1]

Shapely Taylor interaction index ๐‘ฑ๐’Œ(๐‘บ) measures Taylor interactions of subsets

with at most ๐‘˜ players.

When ๐’Œ = ๐’, i.e., consider interactions of all subsets,

[1 ] Sundararajan, et al. The shapley taylor interaction index. ICML, 2020.

๐‘ฑ๐’(๐‘บ) = ๐‘ฐ(๐‘บ), โˆ€๐‘บ

โ€ข ๐ผ(๐‘†) is a special case of Shapley Taylor interaction index.

11

Contributions of this paper

โ€ข We propose a Taylor attribution framework, which offers a theoretical

formulation for the attribution problem.

โ€ข We prove that, Fourteen attribution methods with different formulas can

be unified into the proposed Taylor attribution framework.

โ€ข We propose principles for a reasonable attribution, and assess the

fairness of existing attribution methods.

Deng et al. A Unified Taylor Framework for Revisiting Attribution Methods, AAAI, 2021. Deng et al. A General Taylor Framework for Unifying and Revisiting Attribution Methods. in arXiv:2105.13841

Unifying attribution maps of fourteen

methods by interactions

12

Attribution maps of Fourteen methods are unified into the Taylor attribution framework.

Specifically, they can expressed as a weighted sum of the three type terms.

๐’‚๐’Š = ๐œถ๐’Š ๐‘ป๐’Š๐œถ + ๐œธ๐’Š ๐‘ป๐’Š

๐œธ+

๐’”

๐’„๐’Š๐‘บ๐‘ฐ(๐‘บ) ๐œถ๐’Š, ๐œธ๐’Š, ๐’„๐’Š

๐‘บ are the coefficients

Unifying attribution maps of fourteen

methods by interactions: Gradientร—Input

13Deng et al. A Unified Taylor Framework for Revisiting Attribution Methods, AAAI, 2021. Deng et al. A General Taylor Framework for Unifying and Revisiting Attribution Methods. in arXiv:2105.13841

Intuition. Gradient*Input produces attribution maps with improved sharpness๏ผŒby multipling the gradients with the input.

Unification. Gradientร—Input can be unified into

Taylor attribution framework.

Reformulation. In Gradientร—Input, the corresponding coefficients are,

๐›ผ๐‘– = 1,

๐›พ๐‘– = 0,

๐‘๐‘–๐‘† = 0, โˆ€ ๐‘†

๐’‚๐’Š = ๐œถ๐’Š ๐‘ป๐’Š๐œถ + ๐œธ๐’Š ๐‘ป๐’Š

๐œธ+

๐’”

๐’„๐’Š๐‘บ๐‘ฐ(๐‘บ)

First-order terms, ๐‘ป๐’Š๐œถ

high-order independent terms, ๐‘ป๐’Š๐œธ

high-order interaction terms, ๐‘ฐ(๐‘บ)

โ€ข Only assigns the first-order terms

Unifying attribution maps of fourteen

methods by interactions: ฮต-LRP

14

Intuition. It produces attribution maps by distributing the output in proportion

according to the input. It conducts in a layer-wise manner.

Unification. ฮต-LRP can be unified into the

Taylor attribution framework.

Reformulation. In ฮต-LRP, if relu is used as activation function, the corresponding

coefficients are [1],

๐›ผ๐‘– = 1,

๐›พ๐‘– = 0,

๐‘๐‘–๐‘† = 0, โˆ€ ๐‘†

[1] Ancona, Marco, et al. Towards better understanding of gradient-based attribution methods for deep neural networks. ICLR, 2018.

First-order terms, ๐‘ป๐’Š๐œถ

high-order independent terms, ๐‘ป๐’Š๐œธ

high-order interaction terms, ๐‘ฐ(๐‘บ)

๐’‚๐’Š = ๐œถ๐’Š ๐‘ป๐’Š๐œถ + ๐œธ๐’Š ๐‘ป๐’Š

๐œธ+

๐’”

๐’„๐’Š๐‘บ๐‘ฐ(๐‘บ)

โ€ข Only assigns the first-order terms when relu is applied.

Unifying attribution maps of fourteen

methods by interactions: GradCAM

15

Intuition. GradCAM conducts global average pooling to the gradients, then

perform a linear combination

Unification. GradCAM can be unified into

Taylor attribution framework.

Reformulation. Define the global average pooled features as ๐น. Consider ๐‘“(๐’™) =โ„Ž(๐น). Then in GradCAM, the corresponding coefficients of function ๐’‰ are,

๐›ผ๐‘– = 1,

๐›พ๐‘– = 0,

๐‘๐‘–๐‘† = 0, โˆ€ ๐‘†

First-order terms, ๐‘ป๐’Š๐œถ

high-order independent terms, ๐‘ป๐’Š๐œธ

high-order interaction terms, ๐‘ฐ(๐‘บ)

๐’‚๐’Š = ๐œถ๐’Š ๐‘ป๐’Š๐œถ + ๐œธ๐’Š ๐‘ป๐’Š

๐œธ+

๐’”

๐’„๐’Š๐‘บ๐‘ฐ(๐‘บ)

โ€ข Assigns the first-order terms of function โ„Ž.

Unifying attribution maps of fourteen

methods by interactions: Occlusion-1&patch

16

Intuition. Occlude one pixel/patch, and observe how the prediction changes.

Unification. Occlusion-1 & Occlusion-patch can

be unified into Taylor framework.

Reformulation. In Occlusion-1, the corresponding coefficients are,

๐›ผ๐‘– = 1,

๐›พ๐‘– = 1,

๐‘๐‘–๐‘† = 1, ๐‘–๐‘“ ๐‘– โˆˆ ๐‘†

๐‘๐‘–๐‘† = 0, ๐‘–๐‘“ ๐‘– โˆ‰ ๐‘†

First-order terms, ๐‘ป๐’Š๐œถ

high-order independent terms, ๐‘ป๐’Š๐œธ

high-order interaction terms, ๐‘ฐ(๐‘บ)

๐’‚๐’Š = ๐œถ๐’Š ๐‘ป๐’Š๐œถ + ๐œธ๐’Š ๐‘ป๐’Š

๐œธ+

๐’”

๐’„๐’Š๐‘บ๐‘ฐ(๐‘บ)

โ€ข assigns first-order, high-order independent terms of ๐‘ฅ๐‘–, and all interactions involving ๐‘ฅ๐‘–.

Unifying attribution maps of fourteen

methods by interactions: Shapley value

17

Intuition. Shapley value obtains the attribution map by averaging the marginal

contribution of ๐‘ฅ๐‘– to coalition ๐‘† over all possible coalitions involving ๐‘ฅ๐‘–.

Unification. Shapley value can be unified into

Taylor attribution framework.

Reformulation. In Shapley value, the corresponding coefficients are,

๐›ผ๐‘– = 1,

๐›พ๐‘– = 1,

๐‘๐‘–๐‘† = 1/|๐‘†|, ๐‘–๐‘“ ๐‘– โˆˆ ๐‘†

๐‘๐‘–๐‘† = 0, ๐‘–๐‘“ ๐‘– โˆ‰ ๐‘†

First-order terms, ๐‘ป๐’Š๐œถ

high-order independent terms, ๐‘ป๐’Š๐œธ

high-order interaction terms, ๐‘ฐ(๐‘บ)

๐’‚๐’Š = ๐œถ๐’Š ๐‘ป๐’Š๐œถ + ๐œธ๐’Š ๐‘ป๐’Š

๐œธ+

๐’”

๐’„๐’Š๐‘บ๐‘ฐ(๐‘บ)

โ€ข assigns first-order, independent terms of ๐‘ฅ๐‘–, and 1/|S| proportion of interactions involving ๐‘ฅ๐‘–.

Unifying attribution maps of fourteen

methods by interactions: Integrated Grads

18

Intuition. It produces the attribution map by integrating the gradients along a

straight line from baseline ๐’™ to input ๐’™.

Unification. Integrated Gradients can be unified

into Taylor attribution framework.

Reformulation. In Integrated Gradients, the corresponding coefficients are,

๐›ผ๐‘– = 1,

๐›พ๐‘– = 1,

๐‘๐‘–๐‘†(๐œ‹) = ๐‘˜๐‘–/๐พ, ๐‘–๐‘“ ๐‘– โˆˆ ๐‘†, ๐œ‹ = [๐‘˜1, . . . , ๐‘˜๐‘›],

๐‘๐‘–๐‘† = 0, ๐‘–๐‘“ ๐‘– โˆ‰ ๐‘†

โ€ข assigns first-order,independent terms of ๐‘ฅ๐‘–, and ๐’Œ๐’Š/๐‘ฒ proportion of interaction terms ๐’™๐Ÿ๐’Œ๐Ÿ๐’™๐Ÿ

๐’Œ๐Ÿ . . . ๐’™๐’๐’Œ๐’ to ๐‘ฅ๐‘–.

โ€ข For example, ๐‘“(๐’™) = ๐‘ฅ1๐‘ฅ23 + ๐‘ฅ1

2๐‘ฅ2๐‘ฅ32 , then ๐‘Ž2 =

3

4๐‘ฅ1๐‘ฅ2

3 +1

5๐‘ฅ12๐‘ฅ2๐‘ฅ3

2 .

First-order terms, ๐‘ป๐’Š๐œถ

high-order independent terms, ๐‘ป๐’Š๐œธ

high-order interaction terms, ๐‘ฐ(๐‘บ)

๐’‚๐’Š = ๐œถ๐’Š ๐‘ป๐’Š๐œถ + ๐œธ๐’Š ๐‘ป๐’Š

๐œธ+

๐’”

๐’„๐’Š๐‘บ๐‘ฐ(๐‘บ)

๐พ = ๐‘˜1+. . . +๐‘˜๐‘›

Unifying attribution maps of fourteen

methods by interactions: DeepLIFT Rescale

19

Intuition. DeepLIFT propogates the output difference in proportion according to

the input difference. Such propogation proceeds in a layer-wise manner.

Unification. DeepLIFT Rescale can be unified

into Taylor attribution framework.

Reformulation. Consider the attribution at ๐‘™ layer. If ๐’‡๐’(๐’›) = ๐ˆ(๐’˜๐‘ป๐’› + ๐’ƒ), then

in DeepLIFT Rescale, the corresponding coefficients are,

๐›ผ๐‘– = 1,

๐›พ๐‘– = 1,

๐‘๐‘–๐‘†(๐œ‹) = ๐‘˜๐‘–/๐พ, ๐‘–๐‘“ ๐‘– โˆˆ ๐‘†, ๐œ‹ = [๐‘˜1, . . . , ๐‘˜๐‘›]

๐‘๐‘–๐‘† = 0, ๐‘–๐‘“ ๐‘– โˆ‰ ๐‘†

First-order terms, ๐‘ป๐’Š๐œถ

high-order independent terms, ๐‘ป๐’Š๐œธ

high-order interaction terms, ๐‘ฐ(๐‘บ)

๐’‚๐’Š = ๐œถ๐’Š ๐‘ป๐’Š๐œถ + ๐œธ๐’Š ๐‘ป๐’Š

๐œธ+

๐’”

๐’„๐’Š๐‘บ๐‘ฐ(๐‘บ)

๐พ = ๐‘˜1+. . . +๐‘˜๐‘›

โ€ข Shares the same coefficients as Integrated gradients at each layer.

Unifying attribution maps of fourteen

methods by interactions: Deep Taylor

20

Intuition. proceeds in a layer-wise manner. It propgates all relevances to the

features with positive weight.

Unification. Deep Taylor can be unified into the framework.

Reformulation. Define ๐‘+ = {๐‘–|๐‘ค๐‘—๐‘– โ‰ฅ 0} and ๐‘โˆ’ = {๐‘–|๐‘ค๐‘—๐‘– < 0}, where ๐‘ค๐‘—๐‘– is the

parameters at ๐‘™ layer. In Deep Taylor, for features in ๐‘+, the coefs are,

First-order terms, ๐‘ป๐’Š๐œถ

high-order independent terms, ๐‘ป๐’Š๐œธ

high-order interaction terms, ๐‘ฐ(๐‘บ)

๐’‚๐’Š = ๐œถ๐’Š ๐‘ป๐’Š๐œถ + ๐œธ๐’Š ๐‘ป๐’Š

๐œธ+๐’„๐’Š

๐‘บ๐‘ฐ(๐‘บ)

๐พ = ๐‘˜1+. . . +๐‘˜๐‘›

โ€ข Noted that the interactions among features in ๐‘โˆ’ are assigned to features in ๐‘+.

๐›ผ๐‘– = 1,

๐›พ๐‘– = 1,

๐‘๐‘–๐‘†(๐œ‹) = ๐‘˜๐‘–/๐พ, ๐‘–๐‘“ ๐‘– โˆˆ ๐‘†, ๐œ‹ = [๐‘˜1, . . . , ๐‘˜๐‘›]

๐‘๐‘–๐‘† = 0, ๐‘–๐‘“ ๐‘– โˆ‰ ๐‘†, ๐‘† โŠ‚ ๐‘โˆ’

๐‘๐‘–๐‘† = ๐‘ง๐‘—๐‘–

+/๐‘ง๐‘— , ๐‘–๐‘“ ๐‘– โˆ‰ ๐‘†, ๐‘† โŠ† ๐‘โˆ’

Unifying attribution maps of fourteen

methods by interactions: LRP-๐œถ๐œท

21

Intuition. propgates ๐œถ times relevances to the features with positive weight, and

๐œท times to the features with negative weight.

Unification. LRP-๐›ผ๐›ฝ can be unified into the

Taylor attribution framework.

Reformulation. Define ๐‘+ = {๐‘–|๐‘ค๐‘—๐‘– โ‰ฅ 0} and ๐‘โˆ’ = {๐‘–|๐‘ค๐‘—๐‘– < 0}, where ๐‘ค๐‘—๐‘– is the

parameters at ๐‘™ layer. In LRP-๐›ผ๐›ฝ, for features in ๐‘+, the coefficients are,

First-order terms, ๐‘ป๐’Š๐œถ

high-order independent terms, ๐‘ป๐’Š๐œธ

high-order interaction terms, ๐‘ฐ(๐‘บ)

๐’‚๐’Š = ๐œถ๐’Š ๐‘ป๐’Š๐œถ + ๐œธ๐’Š ๐‘ป๐’Š

๐œธ+

๐’”

๐’„๐’Š๐‘บ๐‘ฐ(๐‘บ)

โ€ข The coefficients are ฮฑ times the coefficients in Deep Taylor attribution.

๐›ผ๐‘– = ๐›ผ,

๐›พ๐‘– = ๐›ผ,

๐‘๐‘–๐‘†(๐œ‹) = ๐›ผ ๐‘˜๐‘–/๐พ ,

๐‘–๐‘“ ๐‘– โˆˆ ๐‘†, ๐œ‹ = [๐‘˜1, . . . , ๐‘˜๐‘›]

๐‘๐‘–๐‘† = 0, ๐‘–๐‘“ ๐‘– โˆ‰ ๐‘†, ๐‘† โŠ‚ ๐‘โˆ’

๐‘๐‘–๐‘† = ๐›ผ ๐‘ง๐‘—๐‘–

+/๐‘ง๐‘— , ๐‘–๐‘“ ๐‘– โˆ‰ ๐‘†, ๐‘† โŠ† ๐‘โˆ’

๐พ = ๐‘˜1+. . . +๐‘˜๐‘›

Unifying attribution maps of fourteen

methods by interactions: Expected Attribution

22

Intuition. proposed to reduce the probability that attribution is dominated by a

specific baseline, which averages the attributions over multiple baselines.

Unification. Combining Eq.(1) with the previous reformulations, Expected

Attributions can be unified into the Taylor attribution framework.

โ€ข For example, Expected Gradients, Expected DeepLIFT, and Deep Shapley.

๐‘Ž๐‘–๐‘’๐‘ฅ๐‘

= เถฑ๐‘(๐’™)๐‘Ž๐‘–๐‘๐‘Ž๐‘ ๐‘–๐‘๐‘‘๐’™ (1)

where ๐‘Ž๐‘–๐‘๐‘Ž๐‘ ๐‘–๐‘ is the attribution obtained by basic methods.

๐’‚๐’Š = ๐œถ๐’Š ๐‘ป๐’Š๐œถ + ๐œธ๐’Š ๐‘ป๐’Š

๐œธ+

๐’”

๐’„๐’Š๐‘บ๐‘ฐ(๐‘บ)

23

Contributions of this paper

โ€ข We propose a Taylor attribution framework, which offers a theoretical

formulation for the attribution problem.

โ€ข We prove that, Fourteen attribution methods with different formula can be

unified into the proposed Taylor attribution framework.

โ€ข We propose principles for a reasonable attribution, and assess the

fairness of existing attribution methods.

Deng et al. A Unified Taylor Framework for Revisiting Attribution Methods, AAAI, 2021. Deng et al. A General Taylor Framework for Unifying and Revisiting Attribution Methods. in arXiv:2105.13841

Principles for a reasonable attribution

24Deng et al. A Unified Taylor Framework for Revisiting Attribution Methods, AAAI, 2021. Deng et al. A General Taylor Framework for Unifying and Revisiting Attribution Methods. in arXiv:2105.13841

โ€ข We proved that, attribution maps of fourteen methods can be unified as the

following form:

๐’‚๐’Š = ๐œถ๐’Š ๐‘ป๐’Š๐œถ + ๐œธ๐’Š ๐‘ป๐’Š

๐œธ+

๐’”

๐’„๐’Š๐‘บ ๐‘ฐ(๐‘บ)

How to define a reasonable attribution map?

Principles for a reasonable attribution

25

โ€ข The first-order terms of ๐‘ฅ๐‘– should be all assigned to ๐‘ฅ๐‘–. โ€ข The high-order independent terms of ๐‘ฅ๐‘– should be all assigned to ๐‘ฅ๐‘–. โ€ข Only Interactions of ๐‘† involving ๐‘ฅ๐‘– , should be assigned to ๐‘ฅ๐‘–.

๐œถ๐’Š = ๐Ÿ, ๐œธ๐’Š = ๐Ÿ,

๐’„๐’Š๐‘บ > ๐ŸŽ, ๐’Š๐’‡ ๐’Š โˆˆ ๐‘บ

๐’„๐’Š๐‘บ = ๐ŸŽ, ๐’Š๐’‡ ๐’Š โˆ‰ ๐‘บ

Principle 1:

๐’‚๐’Š = ๐œถ๐’Š ๐‘ป๐’Š๐œถ + ๐œธ๐’Š ๐‘ป๐’Š

๐œธ+

๐’”

๐’„๐’Š๐‘บ ๐‘ฐ(๐‘บ)

๐’˜๐Ÿ

๐’˜๐Ÿ๐’˜๐Ÿ +๐’˜๐Ÿ = ๐Ÿ

Principles for a reasonable attribution

26

โ€ข Interactions of any coalition ๐‘† should be all distributed to the players in ๐‘†.

๐’Š โˆˆ๐‘บ

๐’„๐’Š๐‘บ = ๐Ÿ, โˆ€๐‘บ

Principle 2:

๐’‚๐’Š = ๐œถ๐’Š ๐‘ป๐’Š๐œถ + ๐œธ๐’Š ๐‘ป๐’Š

๐œธ+

๐’”

๐’„๐’Š๐‘บ ๐‘ฐ(๐‘บ)

๐’˜๐Ÿ

๐’˜๐Ÿ๐’˜๐Ÿ +๐’˜๐Ÿ = ๐Ÿ

Assessing the fairness of existing

attribution methods

27

For example, Shapley value well satisfies the two principles.

๐›ผ๐‘– = 1, ๐›พ๐‘– = 1,

๐‘๐‘–๐‘† = 1/|๐‘†|, ๐‘–๐‘“ ๐‘– โˆˆ ๐‘†

๐‘๐‘–๐‘† = 0, ๐‘–๐‘“ ๐‘– โˆ‰ ๐‘†

โ€ข Interactions of ๐‘† are evenly assigned to the players in ๐‘†.

โ€ข In this sense, Shapley value is a fair attribution.

For example, Occlusion-1 satisfies principle 1, doesnโ€™t satisfy principle 2.

๐›ผ๐‘– = 1, ๐›พ๐‘– = 1,

๐‘๐‘–๐‘† = 1, ๐‘–๐‘“ ๐‘– โˆˆ ๐‘†

๐‘๐‘–๐‘† = 0, ๐‘–๐‘“ ๐‘– โˆ‰ ๐‘†

๐’Š โˆˆ๐‘บ

๐’„๐’Š๐‘บ = |๐‘บ|, โˆ€๐‘บ

โ€ข Interactions of ๐‘† are repeatedly assigned to each player.

These principles can be applied to assess the fairness of exsiting methods.

+1

2 +1

3=