CLADAG is a member of the International Federation of ... · CLADAG is a member of the...

28

Transcript of CLADAG is a member of the International Federation of ... · CLADAG is a member of the...

Page 1: CLADAG is a member of the International Federation of ... · CLADAG is a member of the International Federation of Classification Societies ... , Manuel Roveri, Letizia Tanca] ...
Page 2: CLADAG is a member of the International Federation of ... · CLADAG is a member of the International Federation of Classification Societies ... , Manuel Roveri, Letizia Tanca] ...

CLADAG is a member of the International Federation of ClassificationSocieties(IFCS).Amongitsactivities,CLADAGorganizesabiennialscientificmeeting, schools related to classification and data analysis, publishes anewsletter, and cooperates with othermember societies of the IFCS to theorganization of their conferences. The scientific program comprises threeKeynote Lectures, an Invited Session, 10 Specialized Sessions, 15 SolicitedSessions and 15 Contributed Sessions. All the Specialized and SolicitedSessions have been promoted by the members of the Scientific ProgramCommittee. The organizers wish to thank them for their cooperation incontributingtothesuccessofCLADAG2015.TheBookofAbstractscontainsshortpapersofallthepresentationsscheduledintheconferenceprogram.Itis organized according to type of session/lecture: Keynote Lectures,SpecializedSessions,SolicitedSessionsandContributedSessions.

Page 3: CLADAG is a member of the International Federation of ... · CLADAG is a member of the International Federation of Classification Societies ... , Manuel Roveri, Letizia Tanca] ...

CLADAG2015

10thScientificMeetingoftheClassificationandDataAnalysisGroup

oftheItalianStatisticalSociety

FlamingoResort,SantaMargheritadiPula,October8-10,2015

BOOKOFABSTRACTS

Editors:

FrancescoMola,ClaudioConversano

Page 4: CLADAG is a member of the International Federation of ... · CLADAG is a member of the International Federation of Classification Societies ... , Manuel Roveri, Letizia Tanca] ...

CUECEditricebySardegnaNovamediaSoc.Coop.ViaBasilicatan.57/5909127Cagliari,ItalyTel.&Fax(+39)[email protected]

FirstElectronicEditionCUEC©2016ISBN:978-88-8467-949-9Euro6,90

eBookDesign:GiovanniCaprioliwww.servizi-per-editoria.it•[email protected]

Page 5: CLADAG is a member of the International Federation of ... · CLADAG is a member of the International Federation of Classification Societies ... , Manuel Roveri, Letizia Tanca] ...

ParticipatingOrganizations

Page 6: CLADAG is a member of the International Federation of ... · CLADAG is a member of the International Federation of Classification Societies ... , Manuel Roveri, Letizia Tanca] ...

TableofContents

PrefaceConferenceThemesCommitteesKEYNOTELECTURES

MiningTextNetworks[DavidBanks]

Variableselectionformodel-basedclusteringofcategoricaldata[BrendanMurphy]

Eigenvaluesinmixturemodeling:geometric,robustnessandcomputationalissues[SalvatoreIngrassia]

SPECIALIZEDSESSION•RobustmethodsfortheanalysisofEconomic(Big)data[OrganizerandChair:SilviaSalini]

FastandRobustSeeminglyUnrelatedRegression[MiaHubert,TimVerdonck,OzlemYorulmaz]

ApplicationtotheDetectionofCustomsFraudoftheGoodness-of-fitTestingfortheNewcomb-BenfordLaw[LucioBarabesi,AndreaCerasa,AndreaCerioli,DomenicoPerrotta]

MonitoringtheRobustAnalysisofaSingleMultivariateSample[MarcoRiani,AnthonyC.Atkinson,AndreaCerioli]

Page 7: CLADAG is a member of the International Federation of ... · CLADAG is a member of the International Federation of Classification Societies ... , Manuel Roveri, Letizia Tanca] ...

•Bayesiannonparametricclustering[Organizer:FabrizioRuggeriChair:RenataRotondi]

ABaysiannonparametricApproachtoModelAssociationbetweenClustersofSNPsandDiseaseResponses[RaffaeleArgiento,AlessandraGuglielmi,ChuhsingKateHsiao,FabrizioRuggeri,CharlotteWang]

ABayesiannonparametricModelforClusteringandBorrowingInformation[AntonioLijoi,BernardoNipoti,IgorPrünster]

SequentialClusteringbasedonDirichletProcessPriors[RobertoCasarin,AndreaPastore,StefanoF.Tonellato]

•CausalInferencewithComplexDataStructures[OrganizerandChair:AlessandraMattei]

ShorttermimpactofPM10exposureonmortality:Apropensityscoreapproach[MichelaBaccini,AlessandraMattei,FabriziaMealli]

IdentificationandEstimationofCausalMechanismsinClusteredEncouragementDesigns:DisentanglingBedNetsusingBayesianPrincipalStratification[LauraForastiere,FabriziaMealli,TylervanderWeele]

Theeffectsofadropoutpreventionprogramonsecondarystudents’outcomes[EnricoConti,SilviaDuranti,AlessandraMattei,FabriziaMealli,NicolaSciclone]

•ClusteringinTimeSeries[OrganizerandChair:MicheleLa

Page 8: CLADAG is a member of the International Federation of ... · CLADAG is a member of the International Federation of Classification Societies ... , Manuel Roveri, Letizia Tanca] ...

Rocca]

ProbabilisticBoosted-OrientedClusteringofTimeSeries[AntonioD'Ambrosio,GianlucaFrasso,CarmelaIorio,RobertaSiciliano]

Copula-basedfuzzyclusteringoftimeseries[PierpaoloD'Urso,MartaDisegna,FabrizioDurante]

Comparingmulti-stepaheadforecastingfunctionsfortimeseriesclustering[MarcellaCorduas,GiancarloRagozini]

•MultiwayAnalysis[OrganizerandChair:GiuseppeBove]

(Interactive)visualisationofthreewaydata[CasperJ.Albers,JohnC.Gower]

Robustfuzzyclusteringofmultivariatetimetrajectories[PierpaoloD'Urso,RiccardoMassari]

EstimationproceduresforavoidingdegeneratesolutionsinCandecomp/Parafac[PaoloGiordani]

•BigDataAnalysis[OrganizerandChair:DonatoMalerba]

Towardsastatisticalframeworkforattributecomparisoninverylargerelationaldatabases[CesareAlippi,ElisaQuintarelli,ManuelRoveri,LetiziaTanca]

MiningBigDatawithhighperformancecomputingsolutions[FabrizioAngiulli,StefanoBasta,StefanoLodi,GianlucaMoro,ClaudioSartori]

Page 9: CLADAG is a member of the International Federation of ... · CLADAG is a member of the International Federation of Classification Societies ... , Manuel Roveri, Letizia Tanca] ...

EnhancingBigDataExplorationwithFacetedBrowsing[SoniaBergamaschi,GiovanniSimoniniandSongZhu]

•NewMethodologiesforCompositeIndicators[OrganizerandChair:AgostinoDiCiaccio]

AdvancesinComposite-basedPathModelingforSyntheticIndicators[VincenzoEspositoVinzi,LauraTrinchera,GiorgioRussolillo]

CompositeIndicatorsModeling[MaurizioVichi]

Measuringtheimportanceofvariablesincompositeindicators[WilliamBecker,MichaelaSaisana,PaoloParuolo,AndreaSaltelli]

•Clusteranalysissoftwareandvalidation[OrganizerandChair:ChristianHennig]

AdaptiveChoiceOfInputParametersInRobustClustering[LuisA.Garcìa-Escudero,AugustinMayo-Iscar]

RobustModel-basedClusteringwithCovarianceMatrixConstraints[PietroCoretto,ChristianHennig]

FlexibleImplementationofResamplingSchemesforClusterValidation[FriedrichLeisch]

•Selectingamixturemodelwithaclusteringfocus[OrganizerandChair:GillesCeleux]

ClusteringinfinitemixturesusinganIntegratedCompleted

Page 10: CLADAG is a member of the International Federation of ... · CLADAG is a member of the International Federation of Classification Societies ... , Manuel Roveri, Letizia Tanca] ...

Likelihoodcriterion[MarcoBertoletti,NialFrielandRiccardoRastelli]

EstimationandModelSelectionforModel-BasedClusteringwiththeConditionalClassificationLikelihood[Jean-PatrickBaudry]

OnthedifferentwaystocomputetheIntegratedCompletedLikelihoodcriterion[GillesCeleux]

•Exploringrelationshipsbetweenblocksofvariables[OrganizerandChair:GiorgioRussolillo]

WeightedMultiblockClustering[NdéyeNiang,MoryOuattara]

ThematicModelExplorationthroughMultipleCo-Structuremaximisation:MethodandSoftware[XavierBry,ThomasVerron]

ANewComponent-basedApproachofRegularisationforMultivariateGeneralisedLinearRegression[CatherineTrottier,XavierBry,FredericMortier,GuillaumeCornu]

SOLICITEDSESSION•AdvancesinDensity-basedclustering[OrganizerandChair:FrancescaGreselin]

ANonparametricClusteringmethodforImageSegmentation[GiovannaMenardi]

RobustClusteringforHeterogenousSkewData[LuisA.

Page 11: CLADAG is a member of the International Federation of ... · CLADAG is a member of the International Federation of Classification Societies ... , Manuel Roveri, Letizia Tanca] ...

Garcìa-Escudero,FrancescaGreselin,AgustinMayo-Iscar]

RegularizingfinitemixturesofGaussianDistributions[BettinaGrün,GertraudMalsiner-Walli]

•LatentvariablemodelsforlongitudinaldataPartI[OrganizerandChair:SilviaBacci]

AJointModelForLongitudinalandSurvivalDataBasedonanAR(1)LatentProcess[SilviaBacci,FrancescoBartolucci,SilviaPandolfi]

FiniteMixtureModelsforMixedData:EMAlgorithmsandParafacRepresentations[MarcoAlfò,PaoloGiordani]

OntheuseofthecontaminatedGaussiandistributioninHiddenMarkovmodelsforlongitudinaldata[AntonioPunzo,AntonelloMaruotti]

•LatentvariablemodelsforlongitudinaldataPartII[OrganizerandChair:FrancescoBartolucci]

AhiddenMarkovapproachtotheanalysisofincompletemultivariatelongitudinaldata[FrancescoLagona]

LatentMarkovandgrowthmixturemodels:acomparison[FulviaPennoni,IsabellaRomeo]

Latentworthsandlongitudinalpairedcomparison.AMarkovmodelofdependence[BrianFrancis,AlexandraGrand,ReginaDittrich]

Page 12: CLADAG is a member of the International Federation of ... · CLADAG is a member of the International Federation of Classification Societies ... , Manuel Roveri, Letizia Tanca] ...

•Multivariatedataanalysisinenvironmentalsciences[Organizer:FabrizioRuggeri;Chair:RaffaeleArgiento]

Multivariatedownscalingfornon-Gaussiandata[DanielaCocchi,LuciaPaci,CarloTrivisano]

PreliminaryresultsontaperingmultivariatespatiotemporalmodelsforexposuretoairbornemultipollutantsinEurope[AlessandroFassò,FrancescoFinazziandFerdinandNdongo]

Clusteringmacroseismicfieldsbystatisticaldatadepthfunctions[ClaudioAgostinelli,RenataRotondiandElisaVarini]

•Advancedmodelsfortourismanalysis[OrganizerandChair:StefaniaMignani]

Analysingterritorialheterogenetyintourist’satisfactiontowardsItaliandestinations[CristinaBernini,AugustoCerquaandGuidoPellegrini]

Micro-economicdeterminantsoftouristexpenditure:Aquantileregressionapproach[EmanuelaMarrocu,RaffaelePaciandAndreaZara]

Inequalitiesandtourismconsumptionbehaviour:amixturemodelanalysis[CristinaBernini,MariaFrancescaCracolici,CinziaViroli]

•BayesianNetworksandGraphicalModelsinSocio-EconomicSciences[OrganizerandChair:PaolaVicard]

Page 13: CLADAG is a member of the International Federation of ... · CLADAG is a member of the International Federation of Classification Societies ... , Manuel Roveri, Letizia Tanca] ...

BayesianNetworksforFirmPerformanceEvaluation[MariaE.DeGiuli,PietroGottardo,AnnaM.MoiselloandClaudiaTarantola]

Graphicalmodelusingcopulasformeasurementerrormodeling[DanielaMarella,PaolaVicard]

•TimeSeriesinClustering[OrganizerandChair:MicheleLaRocca]

ParsimoniousClusteringofTimeSeries[CarmelaIorio,AntonioD’Ambrosio,GianlucaFrasso,RobertaSiciliano]

DynamicTimeWarping-basedfuzzyclusteringforspatialtimeseries[PierpaoloD'Urso,MartaDisegna,RiccardoMassari]

PeriodicalFeatureBasedTimeSeriesClustering[FrancescoGiordano,MicheleLaRoccaandMariaLuciaParrella]

•BigDataAnalysis[OrganizerandChair:DonatoMalerba]

InteractiveMachineLearningwithR[GiorgioMariaDiNunzio]

Workloadestimationforacallcenter[PierluigiRivaandRuggieroScommegna]

PredictioninOliveOilTradeusingRegressionModelsonTemporalDataNetwork[CorradoLoglisci,UmbertoMedicamento,ArturoCasieri]

Page 14: CLADAG is a member of the International Federation of ... · CLADAG is a member of the International Federation of Classification Societies ... , Manuel Roveri, Letizia Tanca] ...

•AdvancesinOrdinalandPreferenceData[OrganizerandChair:AntonioD’Ambrosio]

Measuringconsensusinthesettingofnon-uniformqualitativescales[JoséL.Garcìa-Lapresta,DavidPérez-Romàn]

AccurateAlgorithmsforConsensusRankingDetection[GiulioMazzeo,AntonioD’Ambrosio,RobertaSiciliano]

LogisticRegressionTreesforOrdinalandPreferenceData[ThomasRusch,AchimZeileis,KurtHornik]

•CasestudiesindatasciencefromLiguriancompanies[OrganizerandChair:DelioPanaro]

Statisticalmethodsfortheanalysisof«Ostreopsisovata»bloomeventsfrommeteo-marinedata[EnnioOttaviani,ValentinaAsnaghi,MariachiaraChiantore,AndreaPedroncini,RosellaBertolotto]

Dataminingforoptimalgambling[GabrieleTorre,FabrizioMalfanti]

AFraudDetectionAlgorithmforOnlineBanking[DelioPanaro,EvaRiccomagno,FabrizioMalfanti]

Doesdirectors’backgroundmatter?Firmperformance,boardfeaturesandfinancialreportingreliability[DelioPanaro,SilviaFerramoscaandSaraTrucco]

•Modelingordinaldata[OrganizerandChair:MaurizioCarpita]

Page 15: CLADAG is a member of the International Federation of ... · CLADAG is a member of the International Federation of Classification Societies ... , Manuel Roveri, Letizia Tanca] ...

PosteriorpredictivemodelchecksforassessingthegoodnessoffitofBayesianmultidimensionalIRTmodels[MariagiuliaMatteucci,StefaniaMignani]

InternationaltourisminItaly:aBayesianNetworkapproach[FedericaCugnata,GiovanniPerucca]

Clusteringupperlevelunitsinmultilevelmodelsforordinaldata[LeonardoGrilli,AgnesePanzera,CarlaRampichini]

•Functionaldataanalysisforenvironmentaldata[OrganizerandChair:TonioDiBattista]

ClusteringSpatiallydependentFunctionalData:amethodbasedontheconceptofspatialdispersionfunctionofacurve[ElviraRomano,AntonioBalzanella,RosannaVerde]

Twocasestudiesonobjectorientedspatialstatistics[PiercesareSecchi,SimoneVantini,ValeriaVitelli]

Inferenceonfunctionalbiodiversitytools[TonioDiBattista,FrancescaFortuna,FabrizioMaturo]

•Advancesinquantileregression[OrganizerandChair:CristinaDavino]

M-quantileregression:diagnosticsandparametricrepresentationofthemodel[AnnamariaBianchi,EnricoFabrizi,NicolaSalvati,NikosTzavidis]

Page 16: CLADAG is a member of the International Federation of ... · CLADAG is a member of the International Federation of Classification Societies ... , Manuel Roveri, Letizia Tanca] ...

QuantileRegression:aBayesianRobustApproach[MarcoBottone,MauroBernardi,LeaPetrella]

Acomparisonamongestimatorsforlinearregressionmethods[MarilenaFurno,DomenicoVistocco]

HandlingheterogeneityamongunitsinQuantileRegression[CristinaDavino,DomenicoVistocco]

•DirectionalData[OrganizerandChair:GiovanniC.Porzio]

Smallbiasedcirculardensityestimation[MarcoDiMarzio,StefaniaFensore,AgnesePanzera,CharlesC.Taylor]

Adepth-basedclassifierforcirculardata[GiuseppePandolfo]

Nonparametricestimatesofthemodefordirectionaldata[ThomasKirschstein,SteffenLiebscher,GiovanniC.Porzio,GiancarloRagozini]

•Recentdevelopmentsinstatisticalanalysisofnetworkdata[OrganizerandChair:DomenicoDeStefano]

GameTheoryandNetworkModelsfortheReconstructionofArchaeologicalNetworks[VivianaAmati,UlrikBrandes]

AmodelforclusteringaspatialnetworkwithapplicationtoLocalLabourSystemidentification[FrancescoPauli,NicolaTorelli,SusannaZaccarin]

Page 17: CLADAG is a member of the International Federation of ... · CLADAG is a member of the International Federation of Classification Societies ... , Manuel Roveri, Letizia Tanca] ...

OnthesamplingdistributionsoftheMLestimatorsinNetworkEffectModels[MicheleLaRocca,GiovanniC.Porzio,MariaProsperinaVitale,PatrickDoreian]

CorrespondenceAnalysiswithDoublingforTwo-ModeValuedNetworks[GiancarloRagozini,DomenicoDeStefanoDanielaD’Ambrosio]

•Currentchallengesinclusteringandclassificationofbiomedicaldata[OrganizerandChair:AdalbertF.X.Wilhelm]

Semanticmulticlassifiersystemsforthedetectionofagingrelatedprocesses[HansA.Kestler,LudwigLausser,Lyn-RouvenSchirra,FlorianSchmid]

Emotionrecognitioninhumancomputerinteractionusingmultipleclassifiersystems[FriedhelmSchwenker]

Ensembleofselectedclassifiers[BertholdLausen,AsmaGul,ZardadKhanandOsamaMahmoud]

CONTRIBUTEDPAPERS

Ageneralizeddistanceforinferenceonfunctionaldata[AndreaGhiglietti,AnnaM.Paganoni]

Longgapsinmultivariatespatio-temporaldata:anapproachbasedonFunctionalDataAnalysis[MariantoniettaRuggieri,AntonellaPlaiaandFrancescaDiSalvo]

Effectsoncurveclusteringofdifferenttransformationsof

Page 18: CLADAG is a member of the International Federation of ... · CLADAG is a member of the International Federation of Classification Societies ... , Manuel Roveri, Letizia Tanca] ...

chronologicaltextualdata[MatildeTrevisaniandArjunaTuzzi]

Anoteonthereliabilityofaclassifier[LucaFrigau]

Robustifiedclassificationofmultivariatefunctionaldata[FrancescaIeva,AnnaM.Paganoni]

SizeControlofRobustRegressionEstimators[SilviaSalini,AndreaCerioli,FabrizioLaurini,MarcoRiani]

TheMovementsofEmotions:anExploratoryClassificationonAffectiveMovementData[PasqualeDente,ArvidKappas,AdalbertF.X.Wilhelm]

ElectreTri-MachineLearningApproachtotheRecordLinkageProblem[ValentinaMinnetti,RenatoDeLeone]

QualityofClassificationapproachesforthequantitativeanalysisofinternationalconflict[AdalbertF.X.Wilhelm]

ThertclustProcedureforRobustClustering[FrancescoDotto,AlessioFarcomeni,LuisAngelGarcìa-Escudero,AgustinMayo-Iscar]

Whatarethetrueclusters?[ChristianHennig]

Anovelmodel-basedclusteringapproachformassivedatasetsofspatiallyregisteredtimeseries.Withapplicationtoseasurfacetemperatureremotesensinfdata[FrancescoFinazzi,MarianScott]

Page 19: CLADAG is a member of the International Federation of ... · CLADAG is a member of the International Federation of Classification Societies ... , Manuel Roveri, Letizia Tanca] ...

BigDataClassification:Simulationsinthemanyfeaturescase[ClausWeihs]

FromBigDatatoinformation:statisticalissuesthroughexamples[SilviaBiffignandi,SerenaSignorelli]

Bigdatameetpharmaceuticalindustry:anapplicationonsocialmediadata[CaterinaLiberati,PaoloMariani]

Definingthesubjectsdistanceinhierarchicalclusteranalysisbycopulaapproach[AndreaBonanomi,MartaNaiRuscone,SilviaAngelaOsmetti]

Supervisedclassificationofdefectivecrankshaftsbyimageanalysis[BeatrizRemeseiro,JavierTarrìo-Saavedra,MarioFrancisco-Fernàndez,ManuelG.Penedo,SalvadorNaya,RicardoCao]

ArchetypalAnalysisforData-DrivenPrototypeIdentification[GiancarloRagozini,FrancescoPalumbo,MariaR.D’Esposito]

PrincipalComponentAnalysisofComplexDataandApplicationtoClimatology[SergioCamizandSilviaCreta]

SparseexploratorymultidimensionalIRTmodels[LaraFontanella,SaraFontanella,PasqualeValentini,NickolayTrendafilov]

IterativeFactorClusteringforCategoricaldataReconsidered[AlfonsoIodiceD’Enza,AngelosMarkos,FrancescoPalumbo]

Page 20: CLADAG is a member of the International Federation of ... · CLADAG is a member of the International Federation of Classification Societies ... , Manuel Roveri, Letizia Tanca] ...

TestingAntipodalSymmetryofCircularData[GiovanniCasale,GiuseppePandolfo,GiovanniC.Porzio]

Howtodefinedevianceresidualsinmultinomialregression[GiovanniRomeo,MariangelaSciandra,MarcelloChiodi]

DiagnostictoolsforGAMLSSfittedobjects[AndreaMarletta,MariangelaSciandra]

BayesianRegressionAnalysiswithLinkedandDuplicatedData[AndreaTancredi,RebeccaSteorts,BruneroLiseo]

Asemi-parametricFayHerriot-typemodelwithunknownsamplingvariances[SilviaPolettini]

PosteriorDistributionsfromOptimallyB-RobustEstimatingfunctionsandApproximateBayesianComputation[IvanLucianoDanesi,FabioPiacenza,ErlisRuli,LauraVentura]

MCABasedCommunityDetection[CarloDrago]

Classifyingsocialrolesbynetworkstructures[SimonaGozzo,VeneraTomaselli]

AmultilevelHeckmanmodeltoinvestigatefinancialassetsamongoldpeopleinEurope[OmarPaccagnella,ChiaraDalBianco]

OptimalPricingUsingBayesianSemiparametricPriceResponseModels[WinfriedJ.Steiner,AnettWeber,StefanLangandPeterWechselberger]

Page 21: CLADAG is a member of the International Federation of ... · CLADAG is a member of the International Federation of Classification Societies ... , Manuel Roveri, Letizia Tanca] ...

Monetarytransmissionmodelsforbankinginterestrates[LauraParisi,PaoloGiudici,IgorGianfrancescoandCamilloGiliberto]

Estimatingtheeffectofprenatalcareonbirthoutcomes[EmilianoSironi,MassimoCannas]

Recursivepartitioning:anapproachbasedontheweightedKemenydistance[MariangelaSciandra,AntonellaPlaia,VeronicaPicone]

Whytostudyabroad?Anexampleofclustering[ValeriaCaviezel,AnnaM.Falzoni]

Agraphicalcopula-basedtoolfordetectingtaildependence[RobertaPappadà,FabrizioDurante,NicolaTorelli]

Classificationmodelsastoolofbankruptcyprediction–Polishexperience[JózefPociecha,BarbaraPawełek,MateuszBaryła,SabinaAugustyn]

Therelationshipbetweenindividualpriceresponseofbeerconsumersandtheirdemographic/psychographiccharacteristics[FriederikePaetz]

TheEnsembleConceptualClusteringofSymbolicDataforCustomerLoyaltyAnalysis[MarcinPełka]

Consumers’perceptionsofCorporateSocialResponsibilitiesandwillingnesstopay:APartialLeastSquares[KarstenLübke,ChristianHose,ThomasObermeier]

Page 22: CLADAG is a member of the International Federation of ... · CLADAG is a member of the International Federation of Classification Societies ... , Manuel Roveri, Letizia Tanca] ...

InspectingthequalityofItalianwinethroughcausalreasoning[EugenioBrentari,MaurizioCarpita,SilviaGolia]

Exploringsocio-economicfactorsassociatedwithadherencetotheMediterraneandiet:amultilevelapproach[TizianaLaureti,LucaSecondi]

Bigdataand‘social’reputation:afinancialexample[PaolaCerchiello]

BayesianNetworksforStockPicking[AlessandroGreppi,MariaElenaDeGiuli,ClaudiaTarantola]

PortfolioselectionwithLassoalgorithm[RiccardoBramante,SilviaFacchinetti,DiegoZappa]

SunspotinEconomicModelswithExternalities[BeatriceVenturiandAlessandroPirisinu]

Page 23: CLADAG is a member of the International Federation of ... · CLADAG is a member of the International Federation of Classification Societies ... , Manuel Roveri, Letizia Tanca] ...

SequentialClusteringbasedonDirichletProcessPriors

RobertoCasarin,AndreaPastoreandStefanoF.Tonellato1

1 Department of Economics, Ca’ Foscari University of Venice, (e-mail:[email protected])

Abstract: This paper proposes a new sequential clusteringmethod based on the sequential estimation of the randompartitioninducedbytheDirichletprocess.OurapproachreliesonSequentialImportanceResampling(SIR)andontheestimationofthe posterior probabilities that each pair of individuals aregeneratedbythesamemixturecomponent.Suchestimatesdonotrequire the identification ofmixture components, and thereforeare not affected by label switching. Then, a dissimilaritymatrixcan be easily built, allowing for the implementation ofagglomerativeclusteringmethods.

Keywords:Dirichlet process, sampling importance resampling,agglomerativeclustering.

1DirichletprocessmixtureA very important class ofmodels inBayesiannonparametrics isbasedontheDirichletprocessandisknownasDirichletprocessmixture (Antoniak, 1974). In thismodel, theobservable randomvariables, Xi, i = 1,…, n, are assumed to be exchangeable andgeneratedbythefollowinghierarchicalmodel:

Page 24: CLADAG is a member of the International Federation of ... · CLADAG is a member of the International Federation of Classification Societies ... , Manuel Roveri, Letizia Tanca] ...

where DP(α, G0) denotes a Dirichlet process (DP) with basemeasure G0 and precision parameter α > 0. Since the DPgenerates almost surely discrete random measures on theparameterspaceΘ,tiesamongtheparametervalueshavepositiveprobability,leadingtoabatchofclustersoftheparametervectorθ=[θ1,…,θn]T.ExploitingthePolyaurnrepresentationoftheDP,themodelcanberewrittenas

where{k}={1,…,k},s<i={sj,j∈{i−1}}(intherestofthepaper,thesubscript<iwillrefertothosequantitiesthat involvealltheobservationsXi'suchthati'<i),sj∈{k}forj∈{k−1},andnjisthenumber of θ1’s equal to θ*j. In this model representation, theparameterθcanbeexpressedas(s,θ*),withs={si:si∈{k}, i∈

{n}},θ*=[θ*1,…,θ*k]Twithθ*j~iidG0andθi=θ*si.Consequently,

themarginal distribution ofXi is amixturewithk components,wherekisanunknownrandominteger.

In a parametric non Bayesian approach, it would be quitestraightforwardtoclusterthedatabymaximisingtheprobabilityof the allocation of each datum to one of the k clusters (with kfixed and known), conditionally on the observed sample

Page 25: CLADAG is a member of the International Federation of ... · CLADAG is a member of the International Federation of Classification Societies ... , Manuel Roveri, Letizia Tanca] ...

(McLachlan&Peel,2000).Unfortunately,undertheassumptionswemade, such computations are not feasible even numerically,due to the well known label switching problem (Frühwirth-Schnatter, 2006). Nevertheless, equations (1)-(4) will be veryhelpfulinbuildingahierarchicalclusteringalgorithmbasedonaBayesiannonparametricmodelspecification.

2SamplingimportanceresamplingUnder the assumptions we introduced above, following thearguments of MacEachern et al., 1999, we can write theconditionalposteriordistributionofsigivenx1,…,xi,as

Wecanmarginalisetheconditionalposteriorofsiwithrespecttoθ*,obtaining

NoticethatwhenG0 isaconjugatepriorfor(1),thecomputationof(5)and(6)isoftenstraightforward.

Page 26: CLADAG is a member of the International Federation of ... · CLADAG is a member of the International Federation of Classification Societies ... , Manuel Roveri, Letizia Tanca] ...

MacEachernetal.,1999,introducedthefollowingimportancesampler.

SISalgorithm.Fori=1,…,n,repeatsteps(A)and(B)(A)Compute

withnk+1=α.(B)Generatesifromthemultinomialdistributionwith

TakingR independentreplicasofthisalgorithmweobtainsi(r), i

=1,…,n,r=1,…,R,andθ*j~p(θ|x(j)),withx(j)={xi:i∈{n},si=j},andcomputetheimportanceweights

such that ∑Rr=1wr = 1. Should the variance of the importance

weights be too small, the efficiency of the sampler could beimprovedbyresamplingasfollows(Cappéetal.,2005).ComputeNeff = (∑R

r=1w2r)(−1) = 1. IfNeff<R/2, drawR particles from the

current particle set with probabilities equal to their weights,replace the old particle with the new ones and assign themconstantweightswr=1/R.

3PairwisedissimilaritiesandhierarchicalclusteringIntuitively,wecanstatethattwoindividuals,iandj,aresimilarifxiandxjaregeneratedbythesamemixturecomponent,i.e.ifsi=

Page 27: CLADAG is a member of the International Federation of ... · CLADAG is a member of the International Federation of Classification Societies ... , Manuel Roveri, Letizia Tanca] ...

sj. Label switching prevents us from identifying mixturecomponents, but not from assessing similarities amongindividuals. In fact, the algorithm introduced in the previoussection may help us in estimating dissimilarities betweenindividuals.Theposteriorprobabilitythatxiandxjaregeneratedbythesamecomponent,i.e.theposteriorprobabilityoftheevent{si=sj},canbeestimatedas

whereI(x,y)=1 ifx=yandI(x,y)=0otherwise.Wecanthendefine a dissimilaritymatrixD with i j-th elementdij = 1 −pˆij,allowingustousestandardagglomerativehierarchicalclusteringmethodsbasedonposteriorevidence.

4DiscussionThe flexibility of Bayesian nonparametric models improvesrobustnessofclassificationwithrespecttofinitemixturemodels.Sampling importance resampling algorithms allow for efficientcomputations,particularlywhenthebasemeasureisconjugatetomodel likelihood. No restrictions on the parameters or postprocessingoftheposteriorsimulationsarerequired.

ReferencesAntoniak, C.E. 1974. Mixtures of Dirichlet processes with applications to

Bayesiannonparametricproblems.AnnalsofStatistics.,2,1152-1174.Cappé, O., Moulines, E., & T., Rydén. 2005. Inference in Hidden Markov

Models.NewYork:Springer.Frühwirth-Schnatter,S.2006.FiniteMixtureandMarkovSwitchingModels.

Berlin:Springer.MacEachern, S.N., Clyde, M., & Liu, J.S. 1999. Sequential importance

Page 28: CLADAG is a member of the International Federation of ... · CLADAG is a member of the International Federation of Classification Societies ... , Manuel Roveri, Letizia Tanca] ...

sampling for nonparametric Bayes models: The next generation. TheCanadianJournalofStatistics,27,251-267.

McLachlan,G.,&Peel,D.2000.FiniteMixtureModels.NewYork:Wiley.