How to unify attribution explanations by interactions?
Transcript of How to unify attribution explanations by interactions?
Attribution definition
[1] Sundararajan et al. "Axiomatic attribution for deep networks." ICML, 2017. 2
Attribution explanation
โข A branch of semantic explanations
โข Inferring contribution score of each individual feature
Label: cat
Input image
Predict Interpret
Attribution Heatmap
Definition 1. For a pre-trained model ๐, an attribution of prediction at
input ๐ = [๐ฅ1, . . . , ๐ฅ๐] is a vector ๐ = [๐1, . . . , ๐๐], where ๐๐ is the
contribution of ๐ฅ๐ to the prediction ๐(๐).
3
Existing attribution methods
Many attribution methods are proposed recently.
Input sample
Gradient
*Input
Occlusion
DeepLIFT๐-LRP
Integrated
Grads
Expected
Gradients......
โข Sensitivity
โข Perturbation
โข Layerwise decomposition
โข Averaging gradients
Different formulations
Various heuristics
4
Problems of attribution explanation
The attrbution problem is not well-defined
โข The definition is uninformative for how to assign the contribution
Many attribution methods are based on different heurisitcs
โข Few theoretical foundations
โข No mutuality among existing methods
โข Difficult to compare theoretically
5
Contributions of this paper
โข We propose a Taylor attribution framework, which offers a theoretical
formulation to the attribution problem.
โข Fourteen mainstream attribution methods with different formulations are
unified into the proposed framework by theoretical reformulations.
โข We propose principles for a reasonable attribution, and assess the fairness
of existing attribution methods.
Deng et al. A Unified Taylor Framework for Revisiting Attribution Methods, AAAI, 2021. Deng et al. A General Taylor Framework for Unifying and Revisiting Attribution Methods. in arXiv:2105.13841
6
Contributions of this paper
โข We propose a Taylor attribution framework, which offers a theoretical
formulation for how to assign contribution.
โข Fourteen mainstream attribution methods are unified into the proposed
Taylor framework by theoretical reformulations.
โข We propose principles for a reasonable attribution, and assess the
fairness of existing attribution methods.
Deng et al. A Unified Taylor Framework for Revisiting Attribution Methods, AAAI, 2021. Deng et al. A General Taylor Framework for Unifying and Revisiting Attribution Methods. in arXiv:2105.13841
7
Attribution problem statement
Input: pre-trained model ๐, input sample ๐, and baseline ๐ (no signal state)
Output: attribution vector ๐
Corresponds to ๐ฃ(๐) โ ๐ฃ(โ )
However, there are infinite possible cases for such decomposition.
Which decomposition is reasonable?
Deng et al. A Unified Taylor Framework for Revisiting Attribution Methods, AAAI, 2021. Deng et al. A General Taylor Framework for Unifying and Revisiting Attribution Methods. in arXiv:2105.13841
๐(๐) โ ๐(๐) = ๐1 +. . . +๐๐
Many attribution methods aim to distribute the outcome of ๐ (w.r.t the
baseline ๐) to each feature,
8
Taylor attribution framework
Challengesโข DNN Model ๐ is too complex to analyze
Basic ideaโข Taylor Theroem: If ๐(๐) is infinitely differentiable, then ๐(๐) โ ๐(๐)
can be approximated by a Taylor expansion function
โข The Taylor expansion function can be explictly divided into independent
and interactive parts
โข Then the attribution can be expressed as a function of Taylor independent
and interaction terms
Deng et al. A Unified Taylor Framework for Revisiting Attribution Methods, AAAI, 2021. Deng et al. A General Taylor Framework for Unifying and Revisiting Attribution Methods. in arXiv:2105.13841
Second-order Taylor attribution
Second-order Taylor expansion
9
๐(๐) โ ๐(๐) =
๐
๐๐ฅ๐ โ๐ +1
2
๐
๐
๐๐ฅ๐๐ฅ๐ โ๐โ๐ + ๐บ (๐)
๐(๐) โ ๐(๐) =
๐
๐๐ฅ๐ โ๐ +1
2
๐
๐๐ฅ๐2 โ๐2 +
1
2
๐โ ๐
๐๐ฅ๐๐ฅ๐ โ๐โ๐ + ๐บ (๐)
All high-order
independent terms, ๐ป๐๐ธ
All high-order
interaction terms ๐ฐ(๐บ)
All first-order
terms, ๐ป๐๐ถ
Divide the expansion into first-order, high-order independent and interaction terms
Attribution vector can be expressed as a function of the three type terms
๐๐ = ๐(๐๐๐ผ , ๐๐
๐พ,
๐ผ(๐))๐๐ = ๐๐๐๐๐๐๐๐ ๐( ๐(๐) โ ๐(๐))
Connections with related work
in Game theory
10
โข Connection to Shapley Taylor interaction index [1]
Shapely Taylor interaction index ๐ฑ๐(๐บ) measures Taylor interactions of subsets
with at most ๐ players.
When ๐ = ๐, i.e., consider interactions of all subsets,
[1 ] Sundararajan, et al. The shapley taylor interaction index. ICML, 2020.
๐ฑ๐(๐บ) = ๐ฐ(๐บ), โ๐บ
โข ๐ผ(๐) is a special case of Shapley Taylor interaction index.
11
Contributions of this paper
โข We propose a Taylor attribution framework, which offers a theoretical
formulation for the attribution problem.
โข We prove that, Fourteen attribution methods with different formulas can
be unified into the proposed Taylor attribution framework.
โข We propose principles for a reasonable attribution, and assess the
fairness of existing attribution methods.
Deng et al. A Unified Taylor Framework for Revisiting Attribution Methods, AAAI, 2021. Deng et al. A General Taylor Framework for Unifying and Revisiting Attribution Methods. in arXiv:2105.13841
Unifying attribution maps of fourteen
methods by interactions
12
Attribution maps of Fourteen methods are unified into the Taylor attribution framework.
Specifically, they can expressed as a weighted sum of the three type terms.
๐๐ = ๐ถ๐ ๐ป๐๐ถ + ๐ธ๐ ๐ป๐
๐ธ+
๐
๐๐๐บ๐ฐ(๐บ) ๐ถ๐, ๐ธ๐, ๐๐
๐บ are the coefficients
Unifying attribution maps of fourteen
methods by interactions: GradientรInput
13Deng et al. A Unified Taylor Framework for Revisiting Attribution Methods, AAAI, 2021. Deng et al. A General Taylor Framework for Unifying and Revisiting Attribution Methods. in arXiv:2105.13841
Intuition. Gradient*Input produces attribution maps with improved sharpness๏ผby multipling the gradients with the input.
Unification. GradientรInput can be unified into
Taylor attribution framework.
Reformulation. In GradientรInput, the corresponding coefficients are,
๐ผ๐ = 1,
๐พ๐ = 0,
๐๐๐ = 0, โ ๐
๐๐ = ๐ถ๐ ๐ป๐๐ถ + ๐ธ๐ ๐ป๐
๐ธ+
๐
๐๐๐บ๐ฐ(๐บ)
First-order terms, ๐ป๐๐ถ
high-order independent terms, ๐ป๐๐ธ
high-order interaction terms, ๐ฐ(๐บ)
โข Only assigns the first-order terms
Unifying attribution maps of fourteen
methods by interactions: ฮต-LRP
14
Intuition. It produces attribution maps by distributing the output in proportion
according to the input. It conducts in a layer-wise manner.
Unification. ฮต-LRP can be unified into the
Taylor attribution framework.
Reformulation. In ฮต-LRP, if relu is used as activation function, the corresponding
coefficients are [1],
๐ผ๐ = 1,
๐พ๐ = 0,
๐๐๐ = 0, โ ๐
[1] Ancona, Marco, et al. Towards better understanding of gradient-based attribution methods for deep neural networks. ICLR, 2018.
First-order terms, ๐ป๐๐ถ
high-order independent terms, ๐ป๐๐ธ
high-order interaction terms, ๐ฐ(๐บ)
๐๐ = ๐ถ๐ ๐ป๐๐ถ + ๐ธ๐ ๐ป๐
๐ธ+
๐
๐๐๐บ๐ฐ(๐บ)
โข Only assigns the first-order terms when relu is applied.
Unifying attribution maps of fourteen
methods by interactions: GradCAM
15
Intuition. GradCAM conducts global average pooling to the gradients, then
perform a linear combination
Unification. GradCAM can be unified into
Taylor attribution framework.
Reformulation. Define the global average pooled features as ๐น. Consider ๐(๐) =โ(๐น). Then in GradCAM, the corresponding coefficients of function ๐ are,
๐ผ๐ = 1,
๐พ๐ = 0,
๐๐๐ = 0, โ ๐
First-order terms, ๐ป๐๐ถ
high-order independent terms, ๐ป๐๐ธ
high-order interaction terms, ๐ฐ(๐บ)
๐๐ = ๐ถ๐ ๐ป๐๐ถ + ๐ธ๐ ๐ป๐
๐ธ+
๐
๐๐๐บ๐ฐ(๐บ)
โข Assigns the first-order terms of function โ.
Unifying attribution maps of fourteen
methods by interactions: Occlusion-1&patch
16
Intuition. Occlude one pixel/patch, and observe how the prediction changes.
Unification. Occlusion-1 & Occlusion-patch can
be unified into Taylor framework.
Reformulation. In Occlusion-1, the corresponding coefficients are,
๐ผ๐ = 1,
๐พ๐ = 1,
๐๐๐ = 1, ๐๐ ๐ โ ๐
๐๐๐ = 0, ๐๐ ๐ โ ๐
First-order terms, ๐ป๐๐ถ
high-order independent terms, ๐ป๐๐ธ
high-order interaction terms, ๐ฐ(๐บ)
๐๐ = ๐ถ๐ ๐ป๐๐ถ + ๐ธ๐ ๐ป๐
๐ธ+
๐
๐๐๐บ๐ฐ(๐บ)
โข assigns first-order, high-order independent terms of ๐ฅ๐, and all interactions involving ๐ฅ๐.
Unifying attribution maps of fourteen
methods by interactions: Shapley value
17
Intuition. Shapley value obtains the attribution map by averaging the marginal
contribution of ๐ฅ๐ to coalition ๐ over all possible coalitions involving ๐ฅ๐.
Unification. Shapley value can be unified into
Taylor attribution framework.
Reformulation. In Shapley value, the corresponding coefficients are,
๐ผ๐ = 1,
๐พ๐ = 1,
๐๐๐ = 1/|๐|, ๐๐ ๐ โ ๐
๐๐๐ = 0, ๐๐ ๐ โ ๐
First-order terms, ๐ป๐๐ถ
high-order independent terms, ๐ป๐๐ธ
high-order interaction terms, ๐ฐ(๐บ)
๐๐ = ๐ถ๐ ๐ป๐๐ถ + ๐ธ๐ ๐ป๐
๐ธ+
๐
๐๐๐บ๐ฐ(๐บ)
โข assigns first-order, independent terms of ๐ฅ๐, and 1/|S| proportion of interactions involving ๐ฅ๐.
Unifying attribution maps of fourteen
methods by interactions: Integrated Grads
18
Intuition. It produces the attribution map by integrating the gradients along a
straight line from baseline ๐ to input ๐.
Unification. Integrated Gradients can be unified
into Taylor attribution framework.
Reformulation. In Integrated Gradients, the corresponding coefficients are,
๐ผ๐ = 1,
๐พ๐ = 1,
๐๐๐(๐) = ๐๐/๐พ, ๐๐ ๐ โ ๐, ๐ = [๐1, . . . , ๐๐],
๐๐๐ = 0, ๐๐ ๐ โ ๐
โข assigns first-order,independent terms of ๐ฅ๐, and ๐๐/๐ฒ proportion of interaction terms ๐๐๐๐๐๐
๐๐ . . . ๐๐๐๐ to ๐ฅ๐.
โข For example, ๐(๐) = ๐ฅ1๐ฅ23 + ๐ฅ1
2๐ฅ2๐ฅ32 , then ๐2 =
3
4๐ฅ1๐ฅ2
3 +1
5๐ฅ12๐ฅ2๐ฅ3
2 .
First-order terms, ๐ป๐๐ถ
high-order independent terms, ๐ป๐๐ธ
high-order interaction terms, ๐ฐ(๐บ)
๐๐ = ๐ถ๐ ๐ป๐๐ถ + ๐ธ๐ ๐ป๐
๐ธ+
๐
๐๐๐บ๐ฐ(๐บ)
๐พ = ๐1+. . . +๐๐
Unifying attribution maps of fourteen
methods by interactions: DeepLIFT Rescale
19
Intuition. DeepLIFT propogates the output difference in proportion according to
the input difference. Such propogation proceeds in a layer-wise manner.
Unification. DeepLIFT Rescale can be unified
into Taylor attribution framework.
Reformulation. Consider the attribution at ๐ layer. If ๐๐(๐) = ๐(๐๐ป๐ + ๐), then
in DeepLIFT Rescale, the corresponding coefficients are,
๐ผ๐ = 1,
๐พ๐ = 1,
๐๐๐(๐) = ๐๐/๐พ, ๐๐ ๐ โ ๐, ๐ = [๐1, . . . , ๐๐]
๐๐๐ = 0, ๐๐ ๐ โ ๐
First-order terms, ๐ป๐๐ถ
high-order independent terms, ๐ป๐๐ธ
high-order interaction terms, ๐ฐ(๐บ)
๐๐ = ๐ถ๐ ๐ป๐๐ถ + ๐ธ๐ ๐ป๐
๐ธ+
๐
๐๐๐บ๐ฐ(๐บ)
๐พ = ๐1+. . . +๐๐
โข Shares the same coefficients as Integrated gradients at each layer.
Unifying attribution maps of fourteen
methods by interactions: Deep Taylor
20
Intuition. proceeds in a layer-wise manner. It propgates all relevances to the
features with positive weight.
Unification. Deep Taylor can be unified into the framework.
Reformulation. Define ๐+ = {๐|๐ค๐๐ โฅ 0} and ๐โ = {๐|๐ค๐๐ < 0}, where ๐ค๐๐ is the
parameters at ๐ layer. In Deep Taylor, for features in ๐+, the coefs are,
First-order terms, ๐ป๐๐ถ
high-order independent terms, ๐ป๐๐ธ
high-order interaction terms, ๐ฐ(๐บ)
๐๐ = ๐ถ๐ ๐ป๐๐ถ + ๐ธ๐ ๐ป๐
๐ธ+๐๐
๐บ๐ฐ(๐บ)
๐พ = ๐1+. . . +๐๐
โข Noted that the interactions among features in ๐โ are assigned to features in ๐+.
๐ผ๐ = 1,
๐พ๐ = 1,
๐๐๐(๐) = ๐๐/๐พ, ๐๐ ๐ โ ๐, ๐ = [๐1, . . . , ๐๐]
๐๐๐ = 0, ๐๐ ๐ โ ๐, ๐ โ ๐โ
๐๐๐ = ๐ง๐๐
+/๐ง๐ , ๐๐ ๐ โ ๐, ๐ โ ๐โ
Unifying attribution maps of fourteen
methods by interactions: LRP-๐ถ๐ท
21
Intuition. propgates ๐ถ times relevances to the features with positive weight, and
๐ท times to the features with negative weight.
Unification. LRP-๐ผ๐ฝ can be unified into the
Taylor attribution framework.
Reformulation. Define ๐+ = {๐|๐ค๐๐ โฅ 0} and ๐โ = {๐|๐ค๐๐ < 0}, where ๐ค๐๐ is the
parameters at ๐ layer. In LRP-๐ผ๐ฝ, for features in ๐+, the coefficients are,
First-order terms, ๐ป๐๐ถ
high-order independent terms, ๐ป๐๐ธ
high-order interaction terms, ๐ฐ(๐บ)
๐๐ = ๐ถ๐ ๐ป๐๐ถ + ๐ธ๐ ๐ป๐
๐ธ+
๐
๐๐๐บ๐ฐ(๐บ)
โข The coefficients are ฮฑ times the coefficients in Deep Taylor attribution.
๐ผ๐ = ๐ผ,
๐พ๐ = ๐ผ,
๐๐๐(๐) = ๐ผ ๐๐/๐พ ,
๐๐ ๐ โ ๐, ๐ = [๐1, . . . , ๐๐]
๐๐๐ = 0, ๐๐ ๐ โ ๐, ๐ โ ๐โ
๐๐๐ = ๐ผ ๐ง๐๐
+/๐ง๐ , ๐๐ ๐ โ ๐, ๐ โ ๐โ
๐พ = ๐1+. . . +๐๐
Unifying attribution maps of fourteen
methods by interactions: Expected Attribution
22
Intuition. proposed to reduce the probability that attribution is dominated by a
specific baseline, which averages the attributions over multiple baselines.
Unification. Combining Eq.(1) with the previous reformulations, Expected
Attributions can be unified into the Taylor attribution framework.
โข For example, Expected Gradients, Expected DeepLIFT, and Deep Shapley.
๐๐๐๐ฅ๐
= เถฑ๐(๐)๐๐๐๐๐ ๐๐๐๐ (1)
where ๐๐๐๐๐ ๐๐ is the attribution obtained by basic methods.
๐๐ = ๐ถ๐ ๐ป๐๐ถ + ๐ธ๐ ๐ป๐
๐ธ+
๐
๐๐๐บ๐ฐ(๐บ)
23
Contributions of this paper
โข We propose a Taylor attribution framework, which offers a theoretical
formulation for the attribution problem.
โข We prove that, Fourteen attribution methods with different formula can be
unified into the proposed Taylor attribution framework.
โข We propose principles for a reasonable attribution, and assess the
fairness of existing attribution methods.
Deng et al. A Unified Taylor Framework for Revisiting Attribution Methods, AAAI, 2021. Deng et al. A General Taylor Framework for Unifying and Revisiting Attribution Methods. in arXiv:2105.13841
Principles for a reasonable attribution
24Deng et al. A Unified Taylor Framework for Revisiting Attribution Methods, AAAI, 2021. Deng et al. A General Taylor Framework for Unifying and Revisiting Attribution Methods. in arXiv:2105.13841
โข We proved that, attribution maps of fourteen methods can be unified as the
following form:
๐๐ = ๐ถ๐ ๐ป๐๐ถ + ๐ธ๐ ๐ป๐
๐ธ+
๐
๐๐๐บ ๐ฐ(๐บ)
How to define a reasonable attribution map?
Principles for a reasonable attribution
25
โข The first-order terms of ๐ฅ๐ should be all assigned to ๐ฅ๐. โข The high-order independent terms of ๐ฅ๐ should be all assigned to ๐ฅ๐. โข Only Interactions of ๐ involving ๐ฅ๐ , should be assigned to ๐ฅ๐.
๐ถ๐ = ๐, ๐ธ๐ = ๐,
๐๐๐บ > ๐, ๐๐ ๐ โ ๐บ
๐๐๐บ = ๐, ๐๐ ๐ โ ๐บ
Principle 1:
๐๐ = ๐ถ๐ ๐ป๐๐ถ + ๐ธ๐ ๐ป๐
๐ธ+
๐
๐๐๐บ ๐ฐ(๐บ)
๐๐
๐๐๐๐ +๐๐ = ๐
Principles for a reasonable attribution
26
โข Interactions of any coalition ๐ should be all distributed to the players in ๐.
๐ โ๐บ
๐๐๐บ = ๐, โ๐บ
Principle 2:
๐๐ = ๐ถ๐ ๐ป๐๐ถ + ๐ธ๐ ๐ป๐
๐ธ+
๐
๐๐๐บ ๐ฐ(๐บ)
๐๐
๐๐๐๐ +๐๐ = ๐
Assessing the fairness of existing
attribution methods
27
For example, Shapley value well satisfies the two principles.
๐ผ๐ = 1, ๐พ๐ = 1,
๐๐๐ = 1/|๐|, ๐๐ ๐ โ ๐
๐๐๐ = 0, ๐๐ ๐ โ ๐
โข Interactions of ๐ are evenly assigned to the players in ๐.
โข In this sense, Shapley value is a fair attribution.
For example, Occlusion-1 satisfies principle 1, doesnโt satisfy principle 2.
๐ผ๐ = 1, ๐พ๐ = 1,
๐๐๐ = 1, ๐๐ ๐ โ ๐
๐๐๐ = 0, ๐๐ ๐ โ ๐
๐ โ๐บ
๐๐๐บ = |๐บ|, โ๐บ
โข Interactions of ๐ are repeatedly assigned to each player.
These principles can be applied to assess the fairness of exsiting methods.
+1
2 +1
3=