Learning Within-Sentence Semantic Coherence

Elena Eneva

Rose Hoberman

Lucian LitaCarnegie Mellon University

Semantic (in)Coherence

Trigram: content words unrelated Effect on speech recognition:

– Actual Utterance: “THE BIRD FLU HAS AFFECTED CHICKENS FOR YEARS BUT ONLY RECENTLY BEGAN MAKING HUMANS SICK”

– Top Hypothesis: “THE BIRD FLU HAS AFFECTED SECONDS FOR YEARS BUT ONLY RECENTLY BEGAN MAKING HUMAN SAID”

Our goal: model semantic coherence

A Whole Sentence Exponential Model [Rosenfeld 1997]

P0(s) is an arbitrary initial model (typically N-gram)

fi(s)’s are arbitrary computable properties of s (aka features)

Z is a universal normalizing constant

)exp()(1

)Pr( )(0 sii

A Methodology for Feature InductionGiven corpus T of training sentences:

1. Train best-possible baseline model, P0(s)

2. Use P0(s) to generate corpus T0 of “pseudo sentences”

3. Pose a challenge: find (computable) differences that allow discrimination between T and T0

4. Encode the differences as features fi(s)

5. Train a new model:

)exp()(1

)( )(01 sii

Discrimination Task:

1. - - - feel - - sacrifice - - sense - - - - - - - - -meant - - - - - - - - trust - - - - truth

2. - - kind - free trade agreements - - - living - - ziplock bag - - - - - - university japan's daiwa bank stocks step –

Are these content words generated from atrigram or a natural sentence?

Building on Prior Work

Define “content words” (all but top 50) Goal: model distribution of content

words in sentence Simplify: model pairwise co-

occurrences (“content word pairs”) Collect contingency tables; calculate

measure of association for them

Q Correlation Measure

Q values range from –1 to +1

21122211

W1 yes

W2 yes c11 c21

W2 no c12 c22

Derived fromCo-occurrenceContingencyTable

Density Estimates

We hypothesized:– Trigram sentences: wordpair correlation

completely determined by distance– Natural sentences: wordpair correlation

independent of distance kernel density estimation

– distribution of Q values in each corpus– at varying distances

Q Distributions

Q Value

---- Trigram Generated Broadcast News

Distance = 1 Distance = 3

Likelihood Ratio Feature

ji ijij

TrigramdQ

BNewsdQL

, wordpairs ),|Pr(

),|Pr(

she is a country singer searching for fame and fortune in nashville

Q(country,nashville) = 0.76 Distance = 8Pr (Q=0.76|d=8,BNews) = 0.32 Pr(Q=0.76|d=8,Trigram) = 0.11 Likelihood ratio = 0.32/0.11 = 2.9

Simpler Features

Q Value based– Mean, median, min, max of Q values for content

word pairs in the sentence (Cai et al 2000)– Percentage of Q values above a threshold– High/low correlations across large/small distances

Other– Word and phrase repetition– Percentage of stop words– Longest sequence of consecutive stop/content

Datasets

LM and contingency tables (Q values) derived from 103 million words of BN

From remainder of BN corpus and sentences sampled from trigram LM:– Q value distributions estimated from ~100,000

sentences– Decision tree trained and test on ~60,000 sentences

Disregarded sentences with < 7 words – “Mike Stevens says it’s not real”– “We’ve been hearing about it”

Experiments

Learners: – C5.0 decision tree– Boosting decision stumps with

Adaboost.MH Methodology:

– 5-fold cross validation on ~60,000 sentences

– Boosting for 300 rounds

Results

Feature Set Classification

Accuracy

Q mean, median, min, max (Previous Work)

73.39 ± 0.36

Likelihood Ratio 77.76 ± 0.49

All but Likelihood Ratio 80.37 ± 0.42

All Features 80.37 ± 0.46

Likelihood Ratio + non-Q

Shannon-Style Experiment

50 sentences – ½ “real” and ½ trigram-generated– Stopwords replaced by dashes

30 participants– Average accuracy of 73.77% ± 6– Best individual accuracy 84%

Our classifier:– Accuracy of 78.9% ± 0.42

Summary

Introduced a set of statistical features which capture aspects of semantic coherence

Trained a decision tree to classify with accuracy of 80%

Next step: incorporate features into exponential LM

Future Work

Combat data sparsity– Confidence intervals– Different correlation statistic– Stemming or clustering vocabulary

Evaluate derived features– Incorporate into an exponential language model– Evaluate the model on a practical application

Agreement among Participants

Expected Perplexity Reduction

Semantic coherence feature– 78% of broadcast news sentences– 18% of trigram-generated sentences

Kullback-Leibler divergence: .814 Average perplexity reduction per word

= .0419 (2^.814/21) per sentence? Features modify probability of entire sentence Effect of feature on per-word probability is

Likelihood Value

---- Trigram Generated

Broadcast News

Distribution of Likelihood Ratio

Discrimination Task

Natural Sentence:– but it doesn't feel like a sacrifice in a sense that you're

really saying this is you know i'm meant to do things the right way and you trust it and tell the truth

Trigram-Generated:– they just kind of free trade agreements which have been

living in a ziplock bag that you say that i see university japan's daiwa bank stocks step though

Q Value

Q Values at Distance 1

Q Value

Q Values at Distance 3

Outline

The problem of semantic (in)coherence Incorporating this into the whole-

sentence exponential LM Finding better features for this model

using machine learning Semantic coherence features Experiments and results

Learning Within-Sentence Semantic Coherence

Documents

Transcript of Learning Within-Sentence Semantic Coherence

Improving Policy Coherence and Accessibility through Semantic Web Technologies: Environmental Policies as Linked Open Data

Martin Paczynski, Gina R. Kuperbergcrocker/documents/Presentations/Marinova.pdfMultiple influences of semantic memory on sentence processing: Distinct effects of semantic relatedness

The Interaction of Morphosyntactic and Semantic …¨€語研究（Gengo Kenkyu）149: 43–59（2016） The Interaction of Morphosyntactic and Semantic Processing in Japanese Sentence

Writing Style and Standards Use Clarity, Conciseness, & Coherence in: Paragraph Construction Sentence Construction Word Choice.

Deep Visual-Semantic Alignments for Generating Image … · 2015-05-26 · sentence descriptions (Figure2). We ﬁrst present a model that aligns sentence snippets to the visual regions

The syntax-semantic interface: On-line composition of ...idiom.ucsd.edu/~ivano/Lign247_12S/Readings/Pylkkanen-McElree... · The syntax-semantic interface: On-line composition of sentence

Improving Sentence Concision and Coherence. How does making sentence concise and coherent improve your argument?

OntoReST: A RST-based Ontology for Maintaining Semantic ...In this paper, we use techniques from the semantic web domain to address the problem of semantic coherence during collaborative

Learning Within-Sentence Semantic Coherence Elena Eneva Rose Hoberman Lucian Lita Carnegie Mellon University.

A coherence model for sentence ordering

Journal of Memory and Language - Harvard University...Multiple inﬂuences of semantic memory on sentence processing: Distinct effects of semantic relatedness on violations of real-world

Verisimilar Image Synthesis for Accurate Detection and ...openaccess.thecvf.com/content_ECCV_2018/papers/... · 3.1 Semantic Coherence Semantic coherence (SC) refers to the target

Semantic Sentence Matching with Densely-connected Recurrent … · 2018. 11. 5. · tional semantics. In paraphrase identiﬁcation, sentence match-ing is utilized to identify whether

Running head: SEMANTIC COHERENCE AND DISTRIBUTIONAL LEARNING 1langcog.stanford.edu/papers_new/ouyang-2015-cogsci.pdf · Running head: SEMANTIC COHERENCE AND DISTRIBUTIONAL LEARNING

Discourse-level semantic coherence influences beta ...Discourse-level semantic coherence inﬂuences beta oscillatory dynamics and the N400 during sentence comprehension Ashley Glen

Brain & Language - Carnegie Mellon University · and coherence), type of texts (e.g., short, two-sentence passages versus longer, multi-sentence passages), and methods used (e.g.,

Measuring Coherence in Chinese EFL Majors’ Writing through LSA (Latent Semantic ... WangandSui.pdf · 2014-04-25 · 1 Measuring Coherence in Chinese EFL Majors’ Writing through

SPICE: Semantic Propositional Image Caption Evaluation · SPICE: Semantic Propositional Image Caption Evaluation 5 [18,16]. The task of transforming a sentence into its meaning representation

Using latent semantic analysis to measure coherence in essays …€¦ · Using latent semantic analysis to measure coherence . in essays by foreign language learners? Yves Bestgen.

Semantic Proposal for Activity Localization in Videos via ... · Semantic Proposal for Activity Localization in Videos via Sentence Query Shaoxiang Chen and Yu-Gang Jiang Shanghai