What is Relevant in a Text Document? - An Interpretable … · CNN1(H=1,F=600) 79.79...

What is Relevant in a Text Document?An Interpretable Machine Learning Approach

Leila Arras, Franziska Horn, Grégoire Montavon,Klaus-Robert Müller, Wojciech Samek

Christoph Schaller

Word2Vec

• Is an approach to learn vector representations of words• Using the context words to create the initial vectors

Skipgram

• Is better to represent infrequent words• Nearby context words have higher weight• Trained by each context against the word

• Predicts a word given a window of context words• Order of context words has no weight• Trained by each word against its context

Layer-Wise RelevancePropagation

Identifying Relevant Words

Figure 1: Diagram of a CNN-based interpretable machine learning system

Needs a vector-based word representation and a neural network

Step OneCompute input representation of text document

Step TwoForward-propagate input representation

Step ThreeBackward-propagate using the layer-wise relevance propagation

Step FourPool the relevance score onto the input neurons

Step One

Computing the input representation of a text document

Words Vectors

the [0.035, -0.631, ...cat [0.751, -0.047, ...sat [0.491, 0.002, ...on [-0.181, -0.086, ...the [0.035, -0.631, ...

Table 1: CBOW vector example

Step Two

Forward-propagate the input representationuntil the output is reached

• We begin with our D×L matrix-representation of the document• D is the embedding dimension• L is the document size

• The convolutional layer produces a new representationof F features maps of length L - H + 1

• ReLU is applied element wise• Features maps are pooled by computing the maximumover the text sequence of the document

• The maxpooled features are fed into a logistic classifier

Step Two

ML Model Test Accuracy (%)

CNN1 (H=1, F=600) 79.79CNN2 (H=2, F=800) 80.19CNN3 (H=3, F=600 ) 79.75

Table 2: Performance of different CNN models

Step Three

Backward-propagate using the layer-wise relevance propagation

• Delivers one scalar relevance value per input variable,input data point and possible target class

• Redistributes the score of a predicted class backto the input space

• The Neuron that had the maximum value in the pool is grantedall the relevance

Step Four

Pool the relevance score onto the convolutional layer

• R(i,t−τ)←(j,t) =zi,j,τ∑i,τ zi,j,τ

• Similar to the Equation used for LRP• More complex due to the convolutional structure of the layer

Pool the relevance score onto the input neurons

• Ri,t =∑

j,τ R(i,t)←(j,t+τ)

• The Word that had the maximum value in the pool is granted allthe relevance

Figure 2: Diagram of a CNN-based interpretable machine learning system

Condensing Information

Obtaining relevance over all dimensions of word2vec

• Rt =∑

i Ri,tpool relevances over all dimensions

• ∀i : di =∑

t Rt · xi,tcondense semantic information

• ∀i : di =∑

t Ri,t · xi,tbuild document summary vector without pooling

BoW/SVM as Baseline

Bag of Words

• Documents are represented as vectors• Each entry is TFIDF of a word in the training vocabulary

Support Vector Machine

• Hyperplanes are learned to separate classes• Linear prediction scores for each class are obtained• sc = w⊤c x+ bc• wc are class specific weights• bc is class specific bias

Performance Comparison

ML Model Test Accuracy (%)

BoW/SVM (V=70631 words) 80.10CNN1 (H=1, F=600) 79.79CNN2 (H=2, F=800) 80.19CNN3 (H=3, F=600 ) 79.75

Table 3: Performance of different ML Models

BoW/SVM as Baseline

LRP Decomposition

• Ri = (wc)i · xi + bc/D• D is the number of non-zero entries of x

Vector Document Representation

• d is built component-wise• ∀i : di = Ri · x̃i• Replacing Ri with a TFIDF score allows comparability

Why is this Approach the Baseline?

• Relies on word frequencies• All words in the embeddings are equidistant

Quality of Word References

Comparing Relevance Scores

How to compare relevance scores assigned by algorithms?Intrinsic Validation

Counting WordsCreating a list of the mostrelevant words for a categoryacross all documents

Deleting WordsRemoving words and measuringthe decrease of theclassification score

The second approach grants an objective estimation to comparerelevance decomposition methods

Measuring Model Explanatory Power

How to compare the explanatory power of ML models?Extrinsic Validation

Problems

• Need for common evaluation basis• Classifiers differ in their reaction to removed words

ApproachComparing models by how ’semantic extractive’ their wordrelevances are

Measuring Model Explanatory Power

How to compare the explanatory power of ML models?Step OneCompute document summary vectors for all test set documents

Step Two

• Normalize document summary vectors to euclidean norm• perform K-nearest neighbor classification

Step Three

• Repeat Step Two over ten random data splits• Average KNN classification accuracies

The maximum KNN accuracy is used as explanatory power index

Results

Identification of Relevant Words

Figure 3: LRP heatmaps, positive is red, negative is blue

Identification of Relevant Words

Figure 4: The 30 most relevant words for CNN2

Figure 5: The 30 most relevant words for Bow/SVM 18

Document Summary Vectors

Figure 6: The 30 most relevant words for Bow/SVM 19

Evaluating LRP

How good is LRP in identifying relevant words?

• Delete Sequence of words from document• Classify document again• Report as function of accuracy and number of missing words

Evaluating LRP

Three different approaches

1. • Start with correctly classified documents• Delete words in decreasing order of their relevance

2. • Start with falsely classified documents• Delete words in increasing order of their relevance

3. • Start with falsely classified documents• Delete words in decreasing order of their score

Evaluating LRP

Figure 7: Word deletion experiments22

Quantifying Explanatory Power

Semantic Extraction Explanatory Power Index (EPI) KNN parameter

word2vec/CNN1 LRP (ew) 0.8045 (± 0.0044) K = 10SA (ew) 0.7924 (± 0.0052) K = 9LRP 0.7792 (± 0.0047) K = 8SA 0.7773 (± 0.0041) K = 6

word2vec TFIDF 0.6816 (± 0.0044) K = 1uniform 0.6208 (± 0.0052) K = 1

BoW/SVM LRP 0.7978 (± 0.0048) K = 14SA 0.7837 (± 0.0047) K = 17

BoW TFIDF 0.7592 (± 0.0039) K = 1uniform 0.6669 (± 0.0061) K = 1

Table 4: Results over 10 random data splits23

Quantifying Explanatory Power

Figure 8: Word deletion experiments

Questions?

What is Relevant in a Text Document? - An Interpretable … · CNN1(H=1,F=600) 79.79...

Documents

Transcript of What is Relevant in a Text Document? - An Interpretable … · CNN1(H=1,F=600) 79.79...

P c- C HRC 63 H (tnfJ3J) NP-C-IOOOM- 750M- H 550M- H 450M- H 340M- H 230M- H NPR-IOOO F 750 F- H 550 F- H 450 F- H F- H 230 F- SS400 110 100 85 63 53 60 55 45 30 30 30 HRC 63 1015

Æ M @ p D D D h h h h h å Ä · å h f å h f å h f å h f å h f å h f å h f å h f ] ½ ¡ c ° v c ° v

Midterm Review Collinear: G, H, I Coplanar: G, H, F or H, I, F.

Pencil Power...f f f f f f f f f f f f f f f f f fall father favorite find fly for friend four from full funny 7 Pencil Power! Trace the letters and words.h h h h h h h h h h h h h

n h P f g ( f | g ² P f g o h P f g 2 4 | g V 3 ; û ² P f g o h P h f g ? u M … › upload › Rapport31 › P2-01.pdf · 2018-12-25 · 265 « ² P f « o h P h f « ? u M «

H H H I j h ] j Z f f b j m f u d h g l j h e e j u S7-300tekhar.com/.../PDF_all/NEW/05_S7_300_2015_ru-part2.pdf · 2015-05-11 · I j h ] j Z f f b j m _ f u _ d h g l j h e e _

J : ; H I J H = J : F F : i h l j Z l m j g h f m l g b ...лицей18.рф/doc/obsch/progranns/garmonija/liter_chtenie_3_g.pdf · J : ; H I J H = J : F F : i h l _ j Z l m j g h

h g b y i j h ] j Z f f u e lstar-nv.ru/DPOP_horeogr.pdf · m q _ [ g u f i e Z g h f i j h ] j Z f f u © O h j _ h ] j Z n b q _ k d h _ l \ h j q _ k l \ hª k j h d h [ m q _

= 3 Q H J L : G H < : Q D : R M F : O H I H < H < ? E B D ... … · H K G H < : = : A > H < : 5 : R M F : F : = 3 Ä Q h j l Z g h \ Z q d Z r m f Z O h i h \ h

@ q h4 { u ã Û q B C · 2020. 8. 17. · u Ù ± u Ù Î ¿ ã f f å ± f ± Ù ° Î. h² h³h²h² h´h²h² h¶h²h² h¸h²h² hºh²h² h³h®h²h²h²hï h³h¼h³h·h®h²h²h²

I j h ] j Z f f Z i h m q [ g h f m i j ^ f l m I H.01. M ...

k h k k F q 9 H H ¶ ! k F ù88 H B V2 - 10086

P R G R F H D F H D H - bioone.org

j h ] j Z f f «кола безопасности» I j h ] j Z f f b l Z g Z Z 4 ] h ...«Школа безопасности» I j h ] j Z f f k h k l Z \ e _ g Z g h \ j h ] j Z

F l h ^ b q k h f g e Z [ h j Z l h j g h f m j Z d l b d ...

L H G H F G ? D H F F ? J Q ? K D : H ; J : A H < : L ? E ...Ÿеречень документов... · : < L H G H F G ? D H F F ? J Q ? K D : H ; J : A H < : L ? E V H J = : G

#' 1 Â i...H º3û Z#. (FûFôFÔFöFÿF¸0¿0£ W iFþFøFÚG FøFéG F¹ H H H d ó ² 0¿8®H 1 8®F¸751¤ '¼H FûFôFÔFö (4Ä ÖFç0¿*(FéG F¹ H H H #' Fþ"g #F¸6õ µ6õ'¼FøFþ

J : K K F H L J ? G H€¦ · e b p Z f, h [ m q Z x s b f k y i h [ j Z a h \ Z l _ e v g u f i j h ] j Z f f Z f k j _ ^ g _ ] h i j h n _ k k b h g Z e v g h ] b

¢ < / < ë ¢ F h / F h [ R

Wh i i f ti f h H h ld t i i f i tWho acquires infection ... · Wh i i f ti f h H h ld t i i f i tWho acquires infection from whom: Household transmission of respiratory syncytial