Semantic Textual Similarity (STS) Workshop

Semantic Textual Similarity (STS) Workshop

Mona DiabEneko Agirre

Logistics• Location– March 12: The Interchurch Center (TIC), Room C&D– March 13: The Interschool Laboratory (IL), CEPSR 750

• Lunch & Breaks– Same room on both days

• Dinner Monday March 12 (Today)– If you have not signed up, please do by 10:30am Monday

March 12• Restrooms– Monday TIC: Lower level, take escalators down one floor and

then to the left of the cafeteria– Tuesday IL: Same floor (signs are posted)

Logistics

• Wifi: General Wifi access– SSID: guest@interchurch– User Name: guest– Password: guest12345

• Presentations– Please send them to [email protected] or

give them to him on a flash drive during a break ahead of the session

mailto:[email protected]

Today’s Agenda Highlights• 9:00 - 9:30am Introductions and Overarching goals of workshop • 9:30 -10:30am Discussion of What is STS? [Item A] • 10:30 - 11:00 Coffee Break • 11:00 - 11:30am SemEval 2012 STS Task• 11:30 - 12:00pm Sample Manual Annotation by participants • 12:00 - 1:00pm Discussion of participant annotations• 1:00 - 2:00pm Lunch • 2:00 - 2:30pm Evaluation of STS [Item B] • 2:30 - 4:00pm NLP applications that would benefit from STS [Item C] • 4:00 - 4:30pm coffee break • 4:30 - 5:30 How to create an STS blackbox? [Item D]

Game plan for both days

• This is a working workshop, participants are encouraged (urged) to participate and contribute, both physically present people and remotely participating people

• Each session is led by either Mona or Eneko, but discussion is expected throughout

• End of each session we will go over a summary/action points from the session where relevant

Acknowledgments: Credit where due

• Ido Dagan• Martha Palmer• Dan Cer• Alessandro Moschitti• SIGLEX Board members: Diana McCarthy, Katrin Erk, Sebastian Pado,

Rada Mihalcea• Nancy Ide, James Pustejovsky, Sanda Harabagiu• NSF Program Directors (Tanya Korelsky, Terry Langendeon)• DARPA for funding this! $$ is important • CCLS for their logistical support• Special thanks to Weiwei Guo (just got his STS paper accepted to ACL,

YAY)!• Thanks All for accepting our invitation!

Discussions Resulted in ….

• *SEM– http://ixa2.si.ehu.es/starsem/– Be sure to submit papers there (please )

• SEMEVAL 2012 STS Task 6– http://www.cs.york.ac.uk/semeval-2012/task6/

• This STS Workshop– http://www.cs.columbia.edu/~weiwei/workshop/index.html

http://ixa2.si.ehu.es/starsem/

http://www.cs.york.ac.uk/semeval-2012/task6/

Introductions

• Please introduce yourself– Name and Affiliation– Briefly: Relevance of STS to you/your work, name • Semantic component (enabling technology)• Resource for STS• End NLP application• Infrastructure/large systems• Theoretical considerations• All of the above

Goals of STS Workshop

• Pool community with respect to relevance of STSto NLP (thanks for overwhelming positive response to our invitation)

• Foster collaboration with a concrete by-in from different participants towards building a real STS framework

• Pursue/seek funding to realize STS

STS Workshop Considerations

• What is STS?– How to characterize STS quantitatively and qualitatively?– What semantic components contribute to STS– How to create a principled empirical STS framework with

utility and intrepretability?– Could this lead to a better understanding of semantics of NL

• How to create an STS blackbox?– How can different semantic components/features interact– What kind of resources and tools are necessary for such an

effort– Infrastructure desiderata

STS Workshop Considerations• Evaluation of STS

– Intrinsic• Graded vs. Binary Similarity • Metric considerations

– Extrinsic • How to illustrate the utility of STS to end NLP applications such as MT,

Distillation, etc.

• Future directions– Monolingual vs. Multilingual– Shared *SEM task?– Potential proposal submissions/funding avenues– Collaboration across the pond!

STS Framework Research Goals• To create an interoperable STS pipeline that integrates different

semantic components ranging from simple word similarity to more nuanced semantic components that can handle more complex semantic and pragmatic phenomena such as modality and lambda logic.

• To perform intrinsic evaluation of STS• To show the utility of STS to large NLP applications using extrinsic

evaluations• To advance our understanding of the underpinning semantics of

natural languages and how we can empirically exploit this knowledge• To foster stronger collaborations within the Semantic community and

across to other sub-communities within CL

STS Vision

STS Box

UIMA or some other platform?

Text A

Text B

NLP Applications

Linguistic Resources:Corpora (raw and

annotated), Treebanks, Ontologies, Propbanks,

Dictionaries, etc,

Fundamental NLP Tools: Tokenizers,

POS Taggers, Lemmatizers,

Chunkers, etc.

STS Box• A single system which takes features from different semantic

layers of representation integrated (focus of current SemEval 2012 STS Task 6)

• Multiple semantic components– Performance of components (confidence in results)– Type of component – Relevance to task– How to order the components in a sequential pipeline– If multiple components performing same task, how to control for

redundancy and complementarity– Layering annotations of different semantic knowledge on the same data

• Interaction/dependency between different semantic annotations• Representation assumptions• Formalism assumptions

– How to operationalize the interaction among components

What is STS? (Item A)

What is Semantic Textual Similarity?

Semantic Similarity

يج يجدي يدجياجد جدالكجديحسيفحس يحيح وغو جي

جيييدج جي ي سف يحيحفيحفجحسوجح كجساكجاسحوحوس. حيحي كححسح ج

جي يو يجدي حدي دحجيحجفححكسحجسكحك

وي حفحسوحوشيحيدويويدتعالى يو>سحفوفوفوطبسهتبنبسط دعوه، ومالكش

انبساط اخر

Hnh whdun duuhj js ijd dj iow oijd oidj dk uwhd8 yh djhdhwuih jhu h uh jhihk, jdhhii,

gdytysla, yuiyduinsjsh, iodpisomkncijsi. Kjhhuduh, dhdhhd hhduhd jjhuiq…Welcome to

my world, trust me you will never be disappointed djijdp idiowdiw I iwfiow

ifiwoufowi ioiowruo iyfi I wioiwf oid oi iwoiwy iowuouwr ujjd hihi iohoihiof uouo ou o oufois f uhdiy oioi oo ouiosufoisuf iouiouf paidp paudoi

uiu fh uhhioiof

Shjkahsiunu iuhndhau dhdkhn hdhaud8 kdhikahdi dhjhd dhjh jiidh iihiiohio hihiahdiod Yo! Come over

here, you will be pleasantly surprised idoasd io idjioio jidjduio iodio oi iiouio oiudoi ifuiosu fiuoi oiuiou oi io

hiyuify 8iy ih iouoiu ou o ooihyiush iuh fhdfosiip upouosu oiu oi o oisyoisy oi sih oiiou ios oisuois uois

oudiosu doi soiddu os oso iio oioisosuo.

Добро пожаловать в мой мир, поверьте мне вы никогда не будете

разочарованы

안녕하세요 제가 당신에게 전화했지만 아무 소용이있을려고 ... 당신이 시간을 즐기고 있었다 희망

Quantitative Graded Similarity ScoreConfidence Score

Principled Interpretability, which semantic components/features led to results (hopefully will lead to us

gaining a better understanding of semantics)

Monolingual Semantic Similarity

Semantic Similarity

تعالى بسومالكش

دعوه، هتبنبسط

اخر انبساط

Welcome to my world, trust me you will never be disappointed

Yo! Come over here, you will be pleasantly

surprised

Monolingual Semantic Similarity

Semantic Similarity



اخر انبساط



surprised

Semantic Similarity score: 4.5, Grade: 4Interpretation: Lexical X Y, Syntactic AB, CD, Scoping xyz, etcConfidence: 0.8

Multilingual Semantic Similarity

Semantic Similarity



اخر انبساط



surprised

Semantic Similarity score: 3, Grade: 5Interpretation: lexical B C D, syntactic, pragmaticConfidence: 0.9

Why STS?• Most NLP applications need some notion of semantic

similarity to overcome brittleness and sparseness – IR, IE, QA, MT, Dialogue, Pedagogical Systems, …– Also enabling tasks like parsing, SRL, Textual Entailment, ...

• Provides evaluation beyond surface text processing– “Understanding” or interpretability of results– Nuanced semantics with utility

• A hub for semantic processing as a black box in applications beyond NLP (open source release)

• Lends itself to an extrinsic evaluation of scattered semantic components

Why STS?• Monolingual Space– MT evaluation– Summarization– Paraphrase Generation

• Multi Lingual Space– Direct MT evaluation– X-lingual Summarization– X-lingual Generation

• But overall better understanding of semantic spaces– How do different languages carve up the space – What impact does it have on our thinking

• Relates to code switching and speaker state as well?

What is STS?

• The graded process by which two snippets of text (t1 and t2) are deemed equivalent semantically, i.e. bear the same meaning

• An STS system will quantifiably inform us on how similar t1 and t2 are, resulting in a similarity score

• An STS system will tell us why t1 and t2 are similar giving a nuanced interpretation of similarity based on semantic components’ contributions

What is STS?• Word similarity has been relatively well studied

– For example according to WN

cord smile 0.02rooster voyage 0.04

noon string 0.04fruit furnace 0.05

...hill woodland 1.48car journey 1.55

cemetery mound 1.69...

cemetery graveyard 3.88automobile car 3.92

More similar

What is STS?• Fewer datasets for similarity between sentences

A forest is a large area where trees grow close together.VS.

The coast is an area of land that is next to the sea. [0.25]


A forest is a large area where trees grow close together.VS.

Woodland is land with a lot of trees. [2.51]


Once there was a Czar who had three lovely daughters.VS.

There were three beautiful girls, whose father was a Czar. [4.3]

Multilingual STS

• No one to our knowledge has directly quantified the cross linguistic similarity between two texts

How is STS different from …• Rich Textual Entailment (RTE) to date– RTE binary vs. STS graded– directionality (text to hypothesis)– typically text is (much) longer than hypothesis

• Paraphrase (Pph) to date– Pph binary vs. STS graded– Notion of (principled) interpretability

Pipelined STS• An interoperable pipeline of semantic components – Input

• Two text snippets– Output

• Numerical score of similarity with graded similarity on a scale of 0-5

• What semantic components/features led to score (principled interpretability)

• Confidence level in response

• Evaluation– Intrinsic evaluation in the context of sentence similarity– Extrinsic evaluation in the context of MT evaluation– Intrinsic component evaluations

Main Objectives• Plug & play environment for semantic components– WSD/WSI, lexical substitution, SRL, MWE, paraphrase, anaphora

and coreference resolution, time and date resolution, named-entity handling, Under specification, hedging, semantic scoping, discourse analysis, etc.

• Pipeline Creation– Components produce scores, then combine– Combine Features directly in MuSeS environment

• Interpretability of contributing factors– Explicitly characterize why they are considered similar, i.e. which

semantic component(s) contributed to the similarity score• Quantifying STS, formalizing it as a probabilistic story• Associating confidence levels with scores

Call on people for contribution

Katrin ErkChristian ChiarcosEnrique Alfonesca

Intrinsic Evaluation Issues (Item B)

• Binary similarity – What is the cut off threshold

• Graded similarity– How to bin the results (2-4)

• How to assess and integrate confidence values from components? Should we weight different components differently?

• Depend on their stand alone performance• Weight their contribution by their salience and relevance to

STS? Theoretical considerations?

• Degree/Level of transparency/interpretability?

Extrinsic Evaluation Issues (Item B)• How to integrate the STS blackbox in an NLP application

– Is it simply ablation or is there something more interesting• Where to integrate STS in different applications• Do different applications require different types of STS

(biased/weighted STS)? What implications would that have on design of STS?

• Can we come up with different STS formalisms (i.e. maybe with a known set of components?) similar to different syntactic formalisms/perspectives

• Role of intrinsic STS confidence level in integration and evaluation

• Again, Degree/Level of transparency/interpretabilityof underlying semantic components?

STS in NLP Applications (Item C)• Distillation and MT (Marjorie Freedman) • MT and MT evaluation (Alon Lavie, Dekai Wu, Lucia Specia,

Kevin Knight, Scott Miller) • Machine Reading (Ralph Weishdel) • Watson Jeopardy (Alfio Gliozzo) • Generation (Christian Chiarcos) • Summarization (Enrique Alfonseca) • Opinion Mining and Social Media Mining (Sanda Harabagiu) • Inference (Johan Bos, Ido Dagan) • (Tentative) Semantic Web and Ontologies (Michael Uschold)

Semantic Textual Similarity (STS) Workshop

Documents

Transcript of Semantic Textual Similarity (STS) Workshop