Structured Output Learning with...

29
Structured Output Learning with Abstention Application to Accurate Opinion Prediction Alexandre Garcia Chloé Clavel Slim Essid Florence d’Alché-Buc LTCI, Télécom ParisTech, Paris, FRANCE [email protected] DATAIA-JST International Symposium on Data Science and AI July 10, 2018

Transcript of Structured Output Learning with...

  • Structured Output Learning with AbstentionApplication to Accurate Opinion Prediction

    Alexandre GarciaChloé Clavel

    Slim EssidFlorence d’Alché-Buc

    LTCI, Télécom ParisTech, Paris, [email protected]

    DATAIA-JST International Symposium on Data Science and AI

    July 10, 2018

    [email protected]

  • Short Overview of research activities

    Outline

    1 Short Overview of research activities

    2 Focus on Structured Output Prediction with Abstention

    3 Conclusion

    (

    LTCI, Télécom ParisTech, Paris, [email protected]

    DATAIA-JST International Symposium on DataScience and AI

    )Structured Output Learning with Abstention July 10, 2018 2 / 25

    [email protected]

  • Short Overview of research activities

    Research activities

    Laboratory: LTCI (permanent staff: 120)Signal, Statistics and Machine Learning group (20 permanent staff members, 40PhD students)

    (

    LTCI, Télécom ParisTech, Paris, [email protected]

    DATAIA-JST International Symposium on DataScience and AI

    )Structured Output Learning with Abstention July 10, 2018 3 / 25

    [email protected]

  • Short Overview of research activities

    Focus on complex output learning

    How to learn a function from X to Y when Y: set of trees,( labeled) graphs, sequences,functions .... ?

    Multi-task learning: Multiple quantile regression

    Functional-valued Regression: Infinite-task learning

    Structured Output regression: Graph prediction in chemoinformatics

    Zero-shot learning: Predict a class/complex object never seen in the training data

    New challenges: make it fast and efficient, make it robust and reliable !

    (

    LTCI, Télécom ParisTech, Paris, [email protected]

    DATAIA-JST International Symposium on DataScience and AI

    )Structured Output Learning with Abstention July 10, 2018 4 / 25

    [email protected]

  • Short Overview of research activities

    How do we solve these problems? solve an easier surrogate problem

    (1) Transform your outputs and solve an easier problem in a well chosen output featurespace

    (

    LTCI, Télécom ParisTech, Paris, [email protected]

    DATAIA-JST International Symposium on DataScience and AI

    )Structured Output Learning with Abstention July 10, 2018 5 / 25

    [email protected]

  • Short Overview of research activities

    How do we solve these problems?

    (2) Come back to the original output space by solving a pre-image problem

    (

    LTCI, Télécom ParisTech, Paris, [email protected]

    DATAIA-JST International Symposium on DataScience and AI

    )Structured Output Learning with Abstention July 10, 2018 6 / 25

    [email protected]

  • Focus on Structured Output Prediction with Abstention

    Outline

    1 Short Overview of research activities

    2 Focus on Structured Output Prediction with AbstentionLearning frameworkEmpirical results

    3 Conclusion

    (

    LTCI, Télécom ParisTech, Paris, [email protected]

    DATAIA-JST International Symposium on DataScience and AI

    )Structured Output Learning with Abstention July 10, 2018 7 / 25

    [email protected]

  • Focus on Structured Output Prediction with Abstention

    Learning to label a structure with abstention

    Setup : we want to predict the labels of a known target graph structure (encoded bya directed graph).

    TripAdvisor review⇒ sentence level opinion annotations

    The room was ok, nothingspecial, still a perfectchoice to quickly join themain places.

    (

    LTCI, Télécom ParisTech, Paris, [email protected]

    DATAIA-JST International Symposium on DataScience and AI

    )Structured Output Learning with Abstention July 10, 2018 8 / 25

    [email protected]

  • Focus on Structured Output Prediction with Abstention

    Learning to label a structure with abstention

    Setup : we want to predict the labels of a known target graph structure (encoded bya directed graph).

    TripAdvisor review⇒ sentence level opinion annotations

    The room was OK, nothingspecial, still a perfectchoice to quickly join themain places.

    (

    LTCI, Télécom ParisTech, Paris, [email protected]

    DATAIA-JST International Symposium on DataScience and AI

    )Structured Output Learning with Abstention July 10, 2018 8 / 25

    [email protected]

  • Focus on Structured Output Prediction with Abstention

    Problem

    Problem: Error at a node penalizes the prediction of descendants.

    The room was ok, nothingspecial, still a perfectchoice to quickly join themain places.

    Can we build a mechanism allowing to abstain on difficult nodes?

    (

    LTCI, Télécom ParisTech, Paris, [email protected]

    DATAIA-JST International Symposium on DataScience and AI

    )Structured Output Learning with Abstention July 10, 2018 9 / 25

    [email protected]

  • Focus on Structured Output Prediction with Abstention

    Problem

    Problem: Error at a node penalizes the prediction of descendants.

    The room was ok, nothingspecial, still a perfectchoice to quickly︸ ︷︷ ︸

    Service, Checkin

    join

    the main places︸ ︷︷ ︸Location

    .

    Can we build a mechanism allowing to abstain on difficult nodes?

    (

    LTCI, Télécom ParisTech, Paris, [email protected]

    DATAIA-JST International Symposium on DataScience and AI

    )Structured Output Learning with Abstention July 10, 2018 9 / 25

    [email protected]

  • Focus on Structured Output Prediction with Abstention

    Problem

    Problem: Error at a node penalizes the prediction of descendants.

    The room was ok, nothingspecial, still a perfectchoice to quickly︸ ︷︷ ︸

    Service, Checkin

    join

    the main places︸ ︷︷ ︸Location

    .

    Can we build a mechanism allowing to abstain on difficult nodes?

    (

    LTCI, Télécom ParisTech, Paris, [email protected]

    DATAIA-JST International Symposium on DataScience and AI

    )Structured Output Learning with Abstention July 10, 2018 9 / 25

    [email protected]

  • Focus on Structured Output Prediction with Abstention

    Problem

    Problem: Error at a node penalizes the prediction of descendants.

    The room was ok, nothingspecial, still a perfectchoice to quickly︸ ︷︷ ︸

    Service, Checkin

    join

    the main places︸ ︷︷ ︸Location

    .

    Can we build a mechanism allowing to abstain on difficult nodes?

    (

    LTCI, Télécom ParisTech, Paris, [email protected]

    DATAIA-JST International Symposium on DataScience and AI

    )Structured Output Learning with Abstention July 10, 2018 9 / 25

    [email protected]

  • Focus on Structured Output Prediction with Abstention

    Mathematical setup

    X an input sample space.Y the subset of {0, 1}d that contains all possible legal labelings of an outputstructure G.

    Goal of Structured Output Learning with Abstention: learn a pair of functions (h, r) fromX to YH,R ⊂ {0, 1}d × {0, 1}d where a predictor h predicts the labels of G and anabstention function r chooses on which components of G to abstain from predicting alabel.

    Y? ⊂ {0, 1, a}d is the set of legal labelings with abstention where a denotes theabstention label,

    Abstention-aware predictive model f h,r : X → Y? defined by :{f h,r (x)T = [f h,r1 (x), . . . , f

    h,rd (x)],

    f h,ri (x) = 1h(x)i=11r(x)i=1 + a1r(x)i=0.

    (

    LTCI, Télécom ParisTech, Paris, [email protected]

    DATAIA-JST International Symposium on DataScience and AI

    )Structured Output Learning with Abstention July 10, 2018 10 / 25

    [email protected]

  • Focus on Structured Output Prediction with Abstention

    Learning setup

    (xi , yi )i=1,...,n ∼ D are n i.i.d. samples from a distribution P over X × Y.Suppose that we have access to an abstention aware loss ∆a : YH,R × Y → R+then the risk of an abstention aware predictor is:

    R(h, r) = Ex,y∼D ∆a(h(x), r(x), y).

    Where ∆a can be rewritten under the general form :

    ∆a(h(x), r(x), y) = 〈ψwa(y),Cψa(h(x), r(x))〉,

    With C : Rp → Rq a bounded linear operator and ψa : YH,R → Rp, ψwa : Y → Rq outputembeddings.

    (

    LTCI, Télécom ParisTech, Paris, [email protected]

    DATAIA-JST International Symposium on DataScience and AI

    )Structured Output Learning with Abstention July 10, 2018 11 / 25

    [email protected]

  • Focus on Structured Output Prediction with Abstention

    Abstention-aware H-loss (Ha-loss)

    ∆a(h(x), r(x), y) =d∑

    i=1

    cAi1{f h,ri =a,fh,rp(i)=yp(i)}︸ ︷︷ ︸

    abstention cost

    + cAc i1{f h,ri 6=yi ,fh,rp(i)=a}︸ ︷︷ ︸

    abstention regret

    + ci1{f h,ri 6=yi ,fh,rp(i)=yp(i),a 6=f

    h,ri }︸ ︷︷ ︸

    misclassification cost

    This loss writes as a inner product 〈ψwa(y),Cψa(h(x), r(x))〉.

    (

    LTCI, Télécom ParisTech, Paris, [email protected]

    DATAIA-JST International Symposium on DataScience and AI

    )Structured Output Learning with Abstention July 10, 2018 12 / 25

    [email protected]

  • Focus on Structured Output Prediction with Abstention Learning framework

    Square surrogate framework

    True risk:

    R(h, r) = Ex 〈Ey|xψwa(y),Cψa(h(x), r(x))〉.

    Procedure :

    Solve a surrogate risk minimization problem :

    ming∈H

    Ex,y‖ψwa(y)− g(x)‖2︸ ︷︷ ︸surrogate risk

    .

    Solve a pre-image problem

    (ĥ(x), r̂(x)) = arg min(yh,yr )∈YH,R

    〈ĝ(x),Cψa(yh, yr )〉,

    (

    LTCI, Télécom ParisTech, Paris, [email protected]

    DATAIA-JST International Symposium on DataScience and AI

    )Structured Output Learning with Abstention July 10, 2018 13 / 25

    [email protected]

  • Focus on Structured Output Prediction with Abstention Learning framework

    Square surrogate framework: intuition

    (

    LTCI, Télécom ParisTech, Paris, [email protected]

    DATAIA-JST International Symposium on DataScience and AI

    )Structured Output Learning with Abstention July 10, 2018 14 / 25

    [email protected]

  • Focus on Structured Output Prediction with Abstention Learning framework

    Square surrogate framework: intuition

    (

    LTCI, Télécom ParisTech, Paris, [email protected]

    DATAIA-JST International Symposium on DataScience and AI

    )Structured Output Learning with Abstention July 10, 2018 15 / 25

    [email protected]

  • Focus on Structured Output Prediction with Abstention Learning framework

    Square surrogate framework: intuition

    (

    LTCI, Télécom ParisTech, Paris, [email protected]

    DATAIA-JST International Symposium on DataScience and AI

    )Structured Output Learning with Abstention July 10, 2018 16 / 25

    [email protected]

  • Focus on Structured Output Prediction with Abstention Learning framework

    Square surrogate framework: intuition

    (

    LTCI, Télécom ParisTech, Paris, [email protected]

    DATAIA-JST International Symposium on DataScience and AI

    )Structured Output Learning with Abstention July 10, 2018 17 / 25

    [email protected]

  • Focus on Structured Output Prediction with Abstention Learning framework

    Surrogate risk minimization

    Goal :g∗ = min

    g∈HEx,y‖ψwa(y)− g(x)‖2︸ ︷︷ ︸

    surrogate risk

    .

    Based on an empirical sample (xi , yi )i∈1,...,n:

    ĝ = ming

    1n

    n∑i=1

    ‖ψwa(yi )− g(xi )‖2 + λΩ(g),

    Multivariate regression problemH: operator valued kernel, vector random forest, kNN, . . .

    (

    LTCI, Télécom ParisTech, Paris, [email protected]

    DATAIA-JST International Symposium on DataScience and AI

    )Structured Output Learning with Abstention July 10, 2018 18 / 25

    [email protected]

  • Focus on Structured Output Prediction with Abstention Learning framework

    Learning guarantees

    Theorem

    Based on the previous notations, the optimal predictor (h∗, r∗) is defined as:

    (h∗(x), r∗(x)) = arg min(yh,yr )∈YH,R

    〈Cψa(yh, yr ),Ey|xψwa(y)〉.

    The excess risk of an abstention aware predictor (ĥ, r̂) defined from ĝ:R(ĥ, r̂)−R(h?, r?) is linked to the estimation error of the regression step.

    R(ĥ, r̂)−R(h?, r?) ≤ 2cl√L(ĝ)− L(Ey|xψwa(y)),

    where L(g) = Ex,y‖ψwa(y)− g(x)‖2, and cl = ‖C‖maxyh,yr∈YH,R ‖ψa(yh, yr )‖Rp .

    (

    LTCI, Télécom ParisTech, Paris, [email protected]

    DATAIA-JST International Symposium on DataScience and AI

    )Structured Output Learning with Abstention July 10, 2018 19 / 25

    [email protected]

  • Focus on Structured Output Prediction with Abstention Learning framework

    Pre-image for hierarchical structures with abstention

    Step 2 : Solve a pre-image problem

    (ĥ(x), r̂(x)) = arg min(yh,yr )∈YH,R

    〈ĝ(x),Cψa(yh, yr )〉,

    Problem: search over the set YH,R which is a subset of {0, 1}d × {0, 1}d under the

    constraint A

    yhyrc

    ≤ b.Canonical form :

    (ĥ(x), r̂(x)) = arg min(yh,yr )

    [yTh yTr c

    T ]MTψx

    s.t. Acanonical

    yhyrc

    ≤ bcanonical,(yh, yr ) ∈ {0, 1}d × {0, 1}d ,

    Where Acanonical, bcanonical encode the constraints of A, b and the one of YH,RIn general→ NP-HardThere exists polynomial time good initialization techniques.

    (

    LTCI, Télécom ParisTech, Paris, [email protected]

    DATAIA-JST International Symposium on DataScience and AI

    )Structured Output Learning with Abstention July 10, 2018 20 / 25

    [email protected]

  • Focus on Structured Output Prediction with Abstention Empirical results

    Experimental Setting

    Dataset :Input: TripAdvisor reviewsannotated at the sentence levelfrom [Marcheggiani et al. 2014]We use the dense InferSent[Conneau et al. 2017] ( featurerepresentation for handling inputdata).

    (

    LTCI, Télécom ParisTech, Paris, [email protected]

    DATAIA-JST International Symposium on DataScience and AI

    )Structured Output Learning with Abstention July 10, 2018 21 / 25

    [email protected]

  • Focus on Structured Output Prediction with Abstention Empirical results

    Experiments (subset): Joint Aspect and polarity prediction withabstention

    Parameterization: ci =cp(i)

    |siblings(i)| , cAi = KAci , cAc i = KAc ci .

    0 1 2 3 4 5 6

    mean number of abstentions on aspect nodes

    0.03

    0.04

    0.05

    0.06

    0.07

    0.08

    0.09

    Nor

    mal

    ized

    Ham

    min

    glo

    ss

    Hamming loss on the polarity nodes located after an abstention label

    KAc = 0.25

    KAc : 0.5

    KAc : 0.75

    HStrict KAc = 0.25

    HStrict KAc = 0.5

    HStrict KAc = 0.75

    (

    LTCI, Télécom ParisTech, Paris, [email protected]

    DATAIA-JST International Symposium on DataScience and AI

    )Structured Output Learning with Abstention July 10, 2018 22 / 25

    [email protected]

  • Conclusion

    Outline

    1 Short Overview of research activities

    2 Focus on Structured Output Prediction with Abstention

    3 Conclusion

    (

    LTCI, Télécom ParisTech, Paris, [email protected]

    DATAIA-JST International Symposium on DataScience and AI

    )Structured Output Learning with Abstention July 10, 2018 23 / 25

    [email protected]

  • Conclusion

    Conclusion and future works

    SOLA extends two families of approaches: learning with abstention andleast-squares surrogate structured prediction.

    Beyond ridge regression, any vector-valued regression is eligible (including deeplearning).

    Allows to build a robust representation for star rating in a pipeline framework.

    Beyond the target problem: develop general approaches to efficiently providerobust and reliable structured output prediction, whatever the underlying predictorarchitecture.

    Other ways to define r(x): Bayesian approaches

    (

    LTCI, Télécom ParisTech, Paris, [email protected]

    DATAIA-JST International Symposium on DataScience and AI

    )Structured Output Learning with Abstention July 10, 2018 24 / 25

    [email protected]

  • Conclusion

    References

    Hierarchical Multi-label Conditional Random Fields for Aspect-Oriented OpinionMining, Marcheggiani, Diego and Täckström, Oscar and Esuli, Andrea andSebastiani, Fabrizio, ECIR 2014.

    Supervised Learning of Universal Sentence Representations from NaturalLanguage Inference Data, A. Conneau and D. Kiela and H. Schwenk and L. Barraultand A.Bordes, arxiv, 2017.

    Structured Output Learning with Abstention, A. Garcia, C. Clavel, S. Essid, F.d’Alché-Buc, ICML 2018.

    Output Fisher Embedding Regression, M. Djerrab, A. Garcia, M. Sangnier, F.d’Alché-Buc, ECML/PKDD 2018 and MLJ, May 2018

    (

    LTCI, Télécom ParisTech, Paris, [email protected]

    DATAIA-JST International Symposium on DataScience and AI

    )Structured Output Learning with Abstention July 10, 2018 25 / 25

    [email protected]

    Short Overview of research activitiesFocus on Structured Output Prediction with AbstentionLearning frameworkEmpirical results

    Conclusion