Commonsense for Machine Intelligence: Text to Knowledge...
Transcript of Commonsense for Machine Intelligence: Text to Knowledge...
Part2:DetectingandCorrectingOddCollocationsinText
1
CommonsenseforMachineIntelligence:TexttoKnowledgeandKnowledgetoText
IntroductiontoCollocations
• Correctnativespeakerexpressioninagivenlanguage
• Strongtea(notpowerfultea)• Clearsky(notpuresky)• Gohome(notgotohome)• Gotoschool(notgoschool)• Housearrest(notarresthouse)• Friendcircle(notcirclefriend)
2
CollocationErrorsorOddCollocations
• Expressionsthatmaybegrammaticallycorrect,nottypicalamongnativespeakers• Redmeat&whitemeatarecorrectcollocationsinEnglish• TheirliteraltranslationsareoddcollocationsinGerman• NotusuallyusedbyDeutschespeakers• Machinetranslationcanoftencausesuchcollocationerrors• Canbeduetolackofcommonsense&worldknowledge
3
CollocationsandIdioms
• Somecollocationsareidiomaticexpressions:“couchpotato”• Literalidiomtranslationmaybetotallyabsurd:“sofapotato”• Note:Correctidiomusage&translationisharder
• Allcollocationsarenotidioms,e.g.,“fastcars”(vs“quickcars”)• Yet,correctcollocationusageisimportantinmanysituations
4
MotivationtoAddressCollocations– DailyCommunication
• Touristwants“blackcoffee”(regularcoffeewithoutmilk)inacoffeeshop• Asksfor“darkcoffee”usingonlinetranslationhelp• Serverbringscoffeewithmilk,madewithdarkestcoffeebeansavailable• Thisisnotwhatthetouristintended…• Whatifheislactoseintolerant?
• Note:“CoffeeShop”inAmsterdammightmeansomethingcompletelydifferentJ Aplacefordrugs!• Importanttoaddresscollocationswithcommonsense&worldknowledge
5
MotivationtoAddressCollocations– WrittenTexts
• ClassicBiblequotealsoinShakespeare’sHamlet
• Literalmachinetranslationcanyielddifferentmeaning!• Collocationse.g.,“willingspirit”&“weakflesh”mustbetranslatedwithcommonsense&referencetocontext
6
MotivationtoAddressCollocations– SearchEngines
• Oddcollocation“quickcars”returnsfewerhits& lessappropriateresults• Correctcollocation“fastcars”showsbettersite&imagesofcarsasgoodsearchresults• Machinetranslationhelpforsearchenginesshouldfixcollocationerrors
7
TechniquestoAddressOddCollocations
• TreatmentofCollocations• Differenttypesoddlycollocatedterms• Examplesofeachtypewithproblemscaused
• LinguisticClassification• Classifyingtermsascorrectvsincorrectcollocations• Consideringassociations/usingsourcelanguage
• DetectionandCorrection• Findingvariousincorrectlycollocatedtermsusingfrequencyetc.• Providingcorrectresponses,similaritymeasures,rankingthesuggestions
8
TreatmentofCollocations
• Collocationsaretypicallytreatedindifferentcategories• InsertionErrors:addingawrongterm• DeletionErrors:omittingarequiredterm• TranspositionErrors:changingorderofterms• SubstitutionErrors:usingoneterminsteadofanother
• Webrieflydescribeeachtypewithexamplesandtheproblemstheycouldcause
9
InsertionErrors• Theseincludeaddingatermnotappropriateinacorrectnativespeakerexpression
“Iwentto home” vs“Iwenthome”
“Whenwillyoureturnbackfrom Singapore?”vs“WhenwillyoureturnfromSingapore?”
“Takeabreakforthelunch”vs“Takeabreakforlunch”
• Articleerrorsquitecommoninthiscategory(addingunnecessaryarticles)• Manyoftheseerrorsinvolvegrammaticalmistakes• Thesetypesoferrorscreateproblemsin
• Fluencyofspeechespeciallyatformalevents• Clarityofwrittendocuments
10
DeletionErrors• Thesearetheoppositeofinsertionerrors&involvemissingatermneededinanexpression
“Einsteinwasscientist”vs“Einsteinwasascientist”
“Hiresomeonetodojob”vs“Hiresomeonetodothejob”
“Letuswaither”vs“Letuswaitforher”
• Theyalsocreatesimilarproblemswithrespecttofluencyandclarity• Manydeletionerrorsalsopertaintoodduseofarticles(omittinganecessaryone)• Approachesintheliteratureforarticleerrortreatmentareapplicablehere• Thesealsooftenpertaintogrammaticalmistakes 11
TranspositionErrors
• Theseerrorsoccurwhentermsarenotplacedintheappropriateorder• Theycouldbemoreproblematicthaninsertion&deletionerrors
“Don’ttalkwithyourfullmouth”vs“Don’ttalkwithyourmouthfull”
“Howtomakefriendshipsclose”vs“Howtomakeclosefriendships”
• Theymightconveythewrongmeaning,e.g.,talkingwithyourfullmouthisdifferentfromtalkingwithyourmouthfull• Sometimesit’salmosttheoppositemeaning,e.g.,closefriendshipsvsfriendshipsclose• Often,knowingnativelanguageofspeaker/originofthesourcetextmighthelphere
12
SubstitutionErrors
• Theseinvolveusinganinappropriateterminanexpressioninsteadofatermincorrectusage
“Thisactordoes money”vs“Thisactormakesmoney”
“Whereisthenearestquickfood place?”vs“Whereisthenearestfastfoodplace?”
• Mostcommontypesofcollocationerrors• Oftencausemiscommunicationproblemswhiletalking,writing,searchingetc.• Manyapproachesintheliteratureaddressmainlysubstitutionerrors• Theycanbepotentiallyappliedtoaddresstheothertypesaswell• Incorporationofcommonsenseknowledgeisparticularlyusefulhere
13
AddressingOddCollocationsbyLinguisticClassification
• Someworksfocusonclassifyingcollocationerrorsfromalinguisticperspective• Usingcollocationmeasuresonsyntacticpatternsforlexicalclassificationascorrectlycollocatedtermvserror[Futagi etal.,2008]• Consideringsourcelanguage(ofESLlearnerormachinegeneratedtext)toclassifycollocations[Dahlmeier,2011]
14
CollocationMeasuresonSyntacticPatterns
• Thisworkaddresses7aspectsoflexicalcollocations• Collocationerrorslexicallyclassifiedusingcandidatewordstrings• POStaggingoftextsisconductedfollowedbypatternmatching
15
[Futagi etal.]
CollocationMeasuresonSyntacticPatterns(Contd.)
• Afterspellchecking,variantsofwordstringsbuiltwitharticles,synonymsetc.• WordstringslookedupinareferenceDB(RRDB)tofindamatch• Ifnomatchfound,itisclassifiedasacollocationerror
[Futagi etal.]
16
CollocationMeasuresonSyntacticPatterns(Contd.)
• Measureofcollocationstrength• Rankratiostatistic• From1bwordsofnativespeakertexts• Incorporatingcommonsenseknowledge
• Whenevaluatedbyagoldstandardwithnativespeakers,this workgivesaround85%precisioninclassification• Thisworkdoesnotprovidecorrectsuggestionsasresponsestocollocationerrors
[Futagi etal.]
17
SourceLanguagetoClassifyCollocations
• Errorsoftencausedbysemanticsimilarityofwordsinsourcelanguage• ThisiscalledtheL1language• Literaltranslationtodestinationlanguagecancausecollocationerrors• Thus,L1inducedparaphrasesareproposedforclassifyingcollocations
18
OveradozenEnglishTranslations:look,see,watch,readetc.
vs
[Dahlmeier etal.]
PossibletranslationfromsourceIliketolookmovies
Iliketowatchmovies
SourceLanguagetoClassifyCollocations(Contd.)
• NUCLE:Annotated1mwordcorpusof1400essaysbyESLuniversitystudents• Annotatedwithstart&endoffset,errortype,goldstandardcorrection• IncorporatescommonsenseknowledgefromprofessionalEnglishinstructors• Theyfilteroutpreposition&articleerrors,focusoncollocationsinvolvingsemantics
19
StatisticsofNUCLEAnalysis
[Dahlmeier etal.]
SourceLanguagetoClassifyCollocations(Contd.)
• Detectederrorsclassifiedas:Spelling,Homophone,Synonyms,L1-transfer• Spelling:Editdist.(erroneousphrase,correction)<threshold• Homophone:(erroneousword,correction)havesamepronunciation• Synonym:(erroneousword,correction)havesimilarmeaning• L1-transfer:(erroneousphrase,correction)shareacommontranslation
[Dahlmeier etal.]
20
SourceLanguagetoClassifyCollocations(Contd.)
• NumberoferrorsinL1-transfer> othertypes• ExtractEnglish-L1,L1-Englishphrasesmax3words• Phraseextractionheuristic:
• Here,f:foreignlanguagephrase• Translationprobabilitiesp(e1|f),p(f|e2)predictedbymaxlikelihoodestimation• Onlykeepphraseswithprobability>threshold(0.001inthiswork)• Thisservesasthebasisforsuggestingcorrections
[Dahlmeier etal.]
AnalysisofCollocationErrors
21
Discussion
• Theseresearchworksclearlyfocusmoreonlexicalclassificationofcollocationerrors• Linguisticperspectivesaresignificanthere• Commonsenseknowledgeisincludedincollocationerrorclassificationusingcorporafromnativespeakers/Englishinstructors• Theseworksprovideaninsightintothereasonsforcollocationerrorsandtheirgrammaticalplacements• Suchresearchheadstowardsproposingcorrectivemeasures
22
CollocationErrorDetectionandCorrection
• Theseapproachesdeveloptoolsfortheactualdetectionandcorrectionofcollocationerrors• AwkChecker:Whileauserwritesatextdocument,flagcollocationerrorsandsuggestreplacementsthatcorrespondcloselytoconsensususingword-levelstatisticaln-grams[Parketal.,2008]• CollOrder:Whenauserentersaterminthetool,detectcollocationerrorsandprovidecorrectlyorderedcollocatedresponsesasoutputsusinganensembleofsimilaritymeasures[Vargheseetal.,2015]
23
AwkChecker
• End-usertooltocorrectcollocationerrorsinwrittendocuments• Userswritetext,AwkwardphrasesareCheckedbyhighlightingthem• Userscanclickawkwardphrasestoseesuggestedreplacements• 1st evertoolforcollocationerrorcorrection
24
AwkChecker’s userinterface:A)FlaggedphrasesinthecompositionwindowB)Suggestedreplacementfor“powerfultea”
[Parketal.]
AwkChecker (Contd.)
• Buildsstatisticaln-grams(sequencesofnwords)fromtrainingcorpus&recordsfrequencies• Analyzesuserinputagainstcorpustofindifaphraseisacollocationerror• Flagserrorifthereexistsimilarphraseswithfrequency>inputfrequency• Generatesreplacementsusingn-gramfrequencybasedapproach• Candidateswithmuchhigherfrequencyarepotentialreplacements
25
[Parketal.]
AwkChecker (Contd.)
• Statisticaln-gramsareusedoverrelevantcorporaincludingWikipedia• Helpfulincapturingcommonsensewithdomain-specificknowledgeusingfrequency-basedapproach• Example:Referringtoamedicalcorpustoflagphrasesawkwardinmedicalresearchwriting• Assumption:Relevantcorporaarecorrectmorefrequentlythantheyareincorrect• Evaluationrevealsusefulnessincollocationcorrection,butdetailsofaccuracynotdiscussed
26
[Parketal.]
CollOrder• Detects&correctscollocationerrorsintermsinputtothetool• Outputsrankedresponsesofcorrectlycollocatedterms• Correctcollocationssource:ANC/BNC(American/BritishNationalCorpus)• Includescommonsenseknowledgefromnativespeakers’writings• UsefulinWebqueries,textdocuments,ESLtranslationetc.
27
ApproachintheCollOrder tool[Vargheseetal.]
CollOrder (Contd.)• Ensembleofmeasuresisusedforsimilaritysearchandranking• ConditionalProbability:MeasuresrelativeoccurrenceoftermsA&B
• Jaccard’s Coefficient:MeasuresextentofsemanticsimilaritybetweenA&B
• WebJaccard:Toreduceadverseeffectsofrandomco-occurrence(duetoscale&noiseinWebdata)[Bolegalla etal.,2009]
28
[Vargheseetal.]
CollOrder (Contd.)
• These&othermeasures(FrequencyNormalized,FrequencyRatio)areused[Vargheseetal.,2015]• Differentmeasuresempiricallyyieldgoodresultsindifferentscenarios• Ensembleofmeasureswithclassifiersthusproposedtooptimizeperformance• Classifierused:JRIP,implementationofRIPPER(RepeatedIncrementalPruningtoProduceErrorReduction)[Cohen,1995]• CollOrder evaluationwithMTurk onnativespeakers:Averageaccuracy92.44%
29
Exampleofensemblelearningbytheclassifier“bluesky”isavalidsuggestion,classifiedas“y”“nightsky”isnotavalidsuggestion,classifiedas“n”
[Vargheseetal.]
OtherRelatedWorks
• [Ramosetal.,2010]buildannotationschemawith3DtopologytoclassifycollocationsmainlyinSpanish&Englishtranslation:• 1st dimensionfindsiferrorisforwholeorpartofcollocation• 2nd dimensiondoeslanguage-orientederroranalysis• 3rd dimensiondoesinterpretiveerroranalysis
• [Lietal.,2009]useaprobabilisticapproachforcollocationcorrection:• UseBNCandWordNetaslanguagelearningsources• Suggestcorrectionsbasedoncommonlyusedexpressions• Donotdevelopatoolforcollocationdetection&correction
30
Discussion
• Collocationerrorcorrectiontoolsintheliteraturearefoundusefulbyusers• Commonsenseknowledgefromnativespeakersistypicallyentailedinthesourcecorporausedforlearning• Approachesinlinguisticclassificationaswellasincollocationcorrectionrelyheavilyonfrequency
• Thus,potentialissuesrelatedtosparsedatawithcorrectcollocationscallforfurtherresearch
31
TexttoKnowledgeandKnowledgetoText
• Collocationapproachesstartwithtextandextractknowledgefromcorpora• Differentmethodsusedforknowledgeextraction - probabilistic,ensemble• Extractedknowledgeusedforlinguisticclassification,errorcorrection
• Statisticaltextcategorizationoccursduetoanalysisinlinguisticclassification• Correctlycollocatedtextresponsesofferedassuggestionsinerrorcorrection• Thus,extractedknowledge servestoprovidetextbasedoutputs
• Commonsense knowledgeplaysarolemainlyinsourcecorporafromnativespeakers&expertwritings
• Thiscontributestomachineintelligencebyprovidingbettermachinetranslationincorporatingcommonsense
32
References• Bollegala,D.,Matsuo,Y.andIshizuka,M.,Measuringthesimilaritybetweenimplicitsemanticrelationsusingwebsearchengines,WSDM2009,pp.104-113.
• Cohen,W.,Fasteffectiveruleinduction.InProceedingsoftheInternationalConferenceonMachineLearning,ICML1995,pp.115–123.
• Dahlmeier,D.andNg.,H.T.,Correctingsemanticcollocationerrorswithl1-inducedparaphrases.InProceedingsoftheConferenceonEmpiricalMethodsinNaturalLanguageProcessing,EMNLP2011,pp.107–117.
• Futagi,Y.,Deane,P.,Chodorow,M.andTetreault.,J.,Acomputationalapproachtodetectingcollocationerrorsinthewritingofnon-nativespeakersofEnglish, ComputerAssistedLanguageLearning2008,21(4):353–367.
• Li-E,L.A.,Wible,D.andTsao,N-L.,Automatedsuggestionsformiscollocations,Proceedingsofthe4thWorkshoponInnovativeUseofNLPforBuildingEducationalApplications,2009,pp.47-50.
• Park,T.,Lank,E.,Poupart,P.andTerry,M.,Istheskypuretoday- Awkchecker:Anassistivetoolfordetectingandcorrectingcollocationerrors,ACMSymposiumonUserInterfaceSoftwareandTechnology2008,pages121–130.
• Ramos,M.A.,Wanner,L.,Vincze,O.,delBosque,G.C.,Veiga,N.V.,Suárez,E.M.andGonzález,S.P.,TowardsaMotivatedAnnotationSchemaofCollocationErrorsinLearnerCorpora,LREC2010, pp.3209-3214.
• Varghese,A.,Varde,A.,Peng,J.andFitzpatrick.E.,AframeworkforcollocationerrorcorrectioninWebpagesandtextdocuments,ACMSIGKDDExplorations2015,17(1):14–23. 33