STS: Tempora l and Spatial Constraints on Text Similarity
Embed Size (px)
description
Transcript of STS: Tempora l and Spatial Constraints on Text Similarity
CS114 Introduction to Computational Linguistics
STS: Temporal and Spatial Constraints on Text SimilarityJames PustejovskyBrandeis UniversityMarch 13, 2012
Measuring SimilarityObjectsEvents
2Object similarity is a function of:Sortal correlationTemporal proximitySpatial proximity
the Latin Quarter of the 1920sthe 5th Arrondissement in 1929Paris in 1925The Left Bank in the early 20th Century
3Event similarity is a function of:Predicative similarityParticipant correlationTemporal proximitySpatial proximity
cf. Kim (1993), Davidson (1980), Lewis (1986)4Event Similaritya. Mary visited John in Boston on Tuesday.b. The woman/she saw her husband in Copley Square yesterday.Sim(P1,P2): visit vs. seeSim(Subj1,Subj2): Mary vs. the womanSim(Obj1,Obj2): John vs. her husbandSim(Loc1,Loc2): Boston vs. Copley SquareSim(Time1,Time2): Tuesday vs. yesterday
Brandeis CS114-2012 PustejovskyPredicative SimilarityLexical resourcesLSAVector-based models
Brandeis CS114-2012 PustejovskyArgument AlignmentSemantic Role Labeling +Sortal SimilarityBrandeis CS114-2012 PustejovskyTemporal SimilarityNormalizationMap to standardized ISO-TimeML formatReferencingReference relative to local temporal values
Val(Tuesday) = Val(yesterday)Brandeis CS114-2012 PustejovskySpatial SimilarityNormalizationMap to standardized ISO-Space formatReferencingReference relative to accessible spatial values
Val(Copley_Sq) Spatial-IN Val(Boston)Brandeis CS114-2012 PustejovskyTemporal IssuesSubsumption in anchoringThe bombing occurred Monday morning.The bombing occurred Monday.The bombing occurred last week.
Brandeis CS114-2012 PustejovskyMotivation for time and event markupNatural language is filled with references to past and future events, as well as planned activities and goals; Without a robust ability to identify and temporally situate events of interest from language, the real importance of the information can be missed; A Robust Annotation standard can help leverage this information from natural language text.Temporal Awareness in Real TextThe bridge collapsed during the storm but after traffic was rerouted to the Bay Bridge. President Roosevelt died in April 1945 beforethe war ended. (event happened)he dropped the bomb. (event didnt happen) The CEO plans to retire next month. Last week Bill was running the marathon when he twisted his ankle. Someone had tripped him. He fell and didn't finish the race.Current Time Analysis TechnologyDocument Time LinkingFind the document creation time and link that to all events in the text;
Local Time Stampingfind an event and a local temporal expression, and link it to that time;Document Time StampingApril 25, 2010President Obama paid tribute Sunday to 29 workers killed in an explosion at a West Virginia coal mine earlier this month, saying they died "in pursuit of the American dream." Theblast at the Upper Big Branch Mine was the worst U.S. mine disaster in nearly 40 years.Obamaordered a review earlier this month and blamed mine officials for lax regulation.Document Time Stamping: April 25, 2010President Obama paid tribute Sunday to 29 workers killed in an explosion at a West Virginia coal mine earlier this month, saying they died "in pursuit of the American dream." Theblast at the Upper Big Branch Mine was the worst U.S. mine disaster in nearly 40 years.Obamaordered a review earlier this month and blamed mine officials for lax regulation.Identify which Events Should be OrderedThe annotation specification should specify a kernel of events and time expressions to be annotated. Anchoring relations between events and times depend on genre, style, and register. Ordering relations between events depend largely on discourse relations in the text. Creation vs. Narrative TimeDocument Creation Timewhen the utterance is made (speech time)Narrative Timewhen the event occursGenre, Style, and RegisterParticipantsRelations among participantsChannelProduction CircumstancesSettingCommunicative PurposeTopicGenre, Register, and StyleHelp distinguish text types in order to better characterize the information structure of the text Example, news wire vs. news articlenarrative time (NT) is a function of publication/creation frequency.
Narrative TimeIdentifies the temporal interval of the events being described in the text. Document Narrative Time: set by text-genreCurrent Narrative Time: shifts through the textDocument Time Stamping: for realApril 25, 2010President Obama paid tribute Sunday to 29 workers killed in an explosion at a West Virginia coal mine earlier this month, saying they died "in pursuit of the American dream." Theblast at the Upper Big Branch Mine was the worst U.S. mine disaster in nearly 40 years.Obamaordered a review earlier this month and blamed mine officials for lax regulation.Narrative ContainerApril 25, 2010President Obama paid tribute Sunday to 29 workers killed in an explosion at a West Virginia coal mine earlier this month, saying they died "in pursuit of the American dream." Theblast at the Upper Big Branch Mine was the worst U.S. mine disaster in nearly 40 years. Obamaordered a review earlier this month and blamed mine officials for lax regulation.Time Stamping: the good, bad, Set up a meeting on Tuesday with EMC. Franklin arrives tomorrow from London. Franklin arrives on the afternoon flight from London tomorrow. Most people drive today while talking on the phone.
ISO-TimeML Enables Temporal ParsingA new generation of language analysis tools that are able to temporally organize events in terms of their ordering and time of occurrenceThese tools can be integrated with visualization, summarization, question answering, and link analysis systems to help analyze large event-rich information spaces.
ISO-TimeML Provides elements to:Find all events and times in newswire textLink events to the document time and to local timesOrder event relative to other eventsEnsure consistency of the the temporal relations ISO-SpaceCapture the complex constructions of spatial language in textProvide an inventory of how spatial information is presented in natural language
ISO-Space is not designed to provide a formalism that fully represents the complexity of spatial languageApplications of ISO-SpaceBuilding a spatial map of objects relative to one another.Reconstructing spatial information associated with a sequence of events.Determining object location given a verbal description.Translating viewer-centric verbal descriptions into other relative descriptions or absolute coordinate descriptions.Constructing a route given a route description.Constructing a spatial model of an interior or exterior space given a verbal description.Integrating spatial descriptions with information from other media.Semantic Requirements for AnnotationFundamental distinction between the concepts of annotation and representationBased on ISO CD 24612 Language resource management - Linguistic Annotation Framework (Ide and Romary, 2004)
Distinguish between abstract syntax and concrete syntaxConcrete Syntax XML encodingAbstract Syntax Conceptual inventory and a set of syntactic rules defining the combination of these elementsSpatial ExpressionsConstructions that make explicit reference to the spatial attributes of an object or spatial relations between objectsFour grammatically defined classes:Spatial Prepositions and Particles: on, in, under, over, up, down, left ofVerbs of Position and Movement: lean over, sit, run, swim, arriveSpatial Attributes: tall, long, wide, deepSpatial Nominals: area, room, center, corner, front, hallwaySpatial RelationsTopological:In, inside, touching, outsideOrientational (with frame of reference):Behind, left of, in front ofTopo-metric:Near, close byTopological-orientational:On, over, belowMetric:20 miles awayFrames of Reference (Levinson, 2003)Absolute The lake is north of the city.RelativeThe book is to your left. The tree is between the Pru and the Monitor. IntrinsicTheres a ball in front of the car.The tree is behind the bench.Frames of reference
The tree to the left of the entranceThe steps in front of me/the entranceISO-Space 1.4Spatial Relations are split into 4 types:Topological (QSLink)Relational (OrientLink)Movement (MoveLink)Measurement (MLINK, from TimeML)Spatial Relations are identified with role labels, include Figure and GroundSPATIAL_NAMED-ENTITY
Conclusion: Measuring Semantic SimilarityNormalizing temporal and spatial expressionsDeveloping standardized specifications contribute towards corpora for training and evaluation for such normalizationCases in point:ISO-TimeML (ISO adopted)ISO-Space (in development)