Words, more words … and statistics · Picking out single words in a flow of speech is no easy...

4
Words, more words … and statistics To segment words, the brain could be using statistical methods May 19, 2016 Picking out single words in a flow of speech is no easy task and, according to linguists, to succeed in doing it the brain might use statistical methods. A group of SISSA scientists has applied a statistics-based method for word segmentation and measured its efficacy on natural language, in 9 different languages, to discover that linguistic rhythm plays an important role. The study has just been published in the Journal of Developmental Science. Have you ever racked your brains trying to make out even a single word of an uninterrupted flow

Transcript of Words, more words … and statistics · Picking out single words in a flow of speech is no easy...

  • Words,morewords…andstatistics

    Tosegmentwords,thebraincouldbeusingstatisticalmethodsMay19,2016Pickingoutsinglewordsinaflowofspeechisnoeasytaskand,accordingtolinguists,tosucceedindoingitthebrainmightusestatisticalmethods.AgroupofSISSAscientistshasappliedastatistics-basedmethodforwordsegmentationandmeasureditsefficacyonnaturallanguage,in9differentlanguages,todiscoverthatlinguisticrhythmplaysanimportantrole.ThestudyhasjustbeenpublishedintheJournalofDevelopmentalScience.

    Haveyoueverrackedyourbrainstryingtomakeoutevenasinglewordofanuninterruptedflow

  • ofspeechinalanguageyouhardlyknowatall?Itisnaïvetothinkthatinspeechthereiseventhesmallestofpausesbetweenonewordandthenext(likethespaceweconventionallyinsertbetweenwordsinwriting):inactualfact,speechisalmostalwaysacontinuousstreamofsound.However,whenwelistentoournativelanguage,word“segmentation”isaneffortlessprocess.Whatare,linguistswonder,theautomaticcognitivemechanismsunderlyingthisskill?Clearly,knowledgeofthevocabularyhelps:memoryofthesoundofthesinglewordshelpsustopickthemout.However,manylinguistsargue,therearealsoautomatic,subconscious“low-level”mechanismsthathelpusevenwhenwedonotrecognisethewordsorwhen,asinthecaseofveryyoungchildren,ourknowledgeofthelanguageisstillonlyrudimentary.Thesemechanisms,theythink,relyonthestatisticalanalysisofthefrequency(estimatedbasedonpastexperience)ofthesyllablesineachlanguage.Oneindicatorthatcouldcontributetosegmentationprocessesis“transitionalprobability”(TP),whichprovidesanestimateofthelikelihoodoftwosyllablesco-occurringinthesameword,basedonthefrequencywithwhichtheyarefoundassociatedinagivenlanguage.Inpractice,ifeverytimeIhearthesyllable“TA”itisinvariablyfollowedbythesyllable“DA”,thenthetransitionalprobabilityfor“DA”,given“TA”,is1(thehighest).If,ontheotherhand,wheneverIhearthesyllable“BU”itisfollowedhalfofthetimebythesyllable”DI”andhalfofthetimeby“FI”,thenthetransitionalprobabilityof“DI”(and“FI”),given“BU”,is0.5,andsoforth.Thecognitivesystemcouldbeimplicitlycomputingthisvaluebyrelyingonlinguisticmemory,fromwhichitwouldderivethefrequencies.ThestudyconductedbyAmandaSaksida,researchscientistattheInternationalSchoolforAdvancedStudies(SISSA)inTrieste,withthecollaborationofAlanLangus,SISSAresearchfellow,underthesupervisionofSISSAprofessorMarinaNespor,usedTPtosegmentnaturallanguage,byusingtwodifferentapproaches.BasedonrhythmSaksida’sstudyisbasedontheworkwithcorpora,thatis,bodiesoftextsspecificallycollectedforlinguisticanalysis.Inthecaseathand,thecorporaconsistedoftranscriptionsofthe“linguisticsoundenvironment”thatinfantsareexposedto.“Wewantedtohaveanexampleofthetypeoflinguisticenvironmentinwhichachild’slanguagedevelops”,explainedSaksida,“Wewonderedwhetheralow-levelmechanismsuchastransitionalprobabilityworkedwithreal-lifelanguagecues,whichareverydifferentfromtheartificialcuesnormallyusedinthelaboratory,whicharemoreschematicandfreeofsourcesof‘noise’.Furthermore,thequestionwaswhetherthesamelow-levelcueisequallyefficientindifferentlanguages”.Saksidaandcolleaguesusedcorporaofnolessthan9differentlanguages,andtoeachtheyappliedtwodifferentTP-basedmodels.FirsttheycalculatedtheTPvaluesforeachpointofthelanguageflowforallofthecorpora,andthenthey“segmented”theflowusingtwodifferentmethods.Thefirstwasbasedonabsolutethresholding:acertainfixedreferenceTPvaluewasestablishedbelowwhichaboundarywasidentified.Thesecondmethodwasbasedonrelativethresholding:theboundariescorresponded

  • tothelocallylowestTPfunction.Inallcases,Saksidaandcolleaguesfoundthattransitionalprobabilitywasaneffectivetoolforsegmentation(49%to86%ofwordsidentifiedcorrectly)irrespectiveofthesegmentationalgorithmused,whichconfirmsTPefficacy.Ofnote,whilebothmodelsprovedtobequiteefficient,whenonemodelwasparticularlysuccessfulwithonelanguage,thealternativemodelalwaysperformedsignificantlyworse.“Thiscross-linguisticdifferencesuggeststhateachmodelisbettersuitedthantheotherforcertainlanguagesandviceversa.Wethereforeconductedfurtheranalysestounderstandwhatlinguisticfeaturescorrelatedwiththebetterperformanceofonemodelovertheother”,explainsSaksida.Thecrucialdimensionprovedtobelinguisticrhythm.“WecandivideEuropeanlanguagesintotwolargegroupsbasedonrhythm:stress-timedandsyllable-timed“.Stress-timedlanguageshavefewervowelsandshorterwords,andincludeEnglish,SlovenianandGerman.Syllable-timedlanguagescontainmorevowelsandlongerwordsonaverage,andincludeItalian,SpanishandFinnish.ThethirdrhythmicgroupoflanguagesdoesnotexistinEuropeandisbasedon“morae”(apartofthesyllable),suchasJapanese.Thisgroupisknownas“mora-timed”andcontainsevenmorevowelsthansyllable-timedlanguages.Theabsolutethresholdmodelprovedtoworkbestonstress-timedlanguages,whereasrelativethresholdingwasbetterforthemora-timedones.“It’sthereforepossiblethatthecognitivesystemlearnstousethesegmentationalgorithmthatisbestsuitedtoone’snativelanguage,andthatthisleadstodifficultiessegmentinglanguagesbelongingtoanotherrhythmiccategory.Experimentalstudieswillclearlybenecessarytotestthishypothesis.Weknowfromthescientificliteraturethatimmediatelyafterbirthinfantsalreadyuserhythmicinformation,andwethinkthatthestrategiesusedtochoosethemostappropriatesegmentationcouldbeoneoftheareasinwhichinformationaboutrhythmismostuseful”.Thestudyisinfactunabletosaywhetherthecognitivesystem(ofbothadultsandchildren)reallyusesthistypeofstrategy.“Ourstudyclearlyconfirmsthatthisstrategyworksacrossawiderangeoflanguages”,concludesSaksida.“Itwillnowserveasaguideforlaboratoryexperiments.”USEFULLINKS:

    • OriginalpaperArticolooriginale:http://goo.gl/cOk5VD

    IMAGES:

    • Credits:Jev55(Flickr:https://goo.gl/yVVdJ3)

    Contact:

    Pressoffice:[email protected]

  • Tel:(+39)0403787644|(+39)366-3677586viaBonomea,26534136TriesteMoreinformationaboutSISSA:www.sissa.it