CLADAG is a member of the International Federation of ... · CLADAG is a member of the...

Post on 17-Sep-2018

217 views 0 download

Transcript of CLADAG is a member of the International Federation of ... · CLADAG is a member of the...

CLADAG is a member of the International Federation of ClassificationSocieties(IFCS).Amongitsactivities,CLADAGorganizesabiennialscientificmeeting, schools related to classification and data analysis, publishes anewsletter, and cooperates with othermember societies of the IFCS to theorganization of their conferences. The scientific program comprises threeKeynote Lectures, an Invited Session, 10 Specialized Sessions, 15 SolicitedSessions and 15 Contributed Sessions. All the Specialized and SolicitedSessions have been promoted by the members of the Scientific ProgramCommittee. The organizers wish to thank them for their cooperation incontributingtothesuccessofCLADAG2015.TheBookofAbstractscontainsshortpapersofallthepresentationsscheduledintheconferenceprogram.Itis organized according to type of session/lecture: Keynote Lectures,SpecializedSessions,SolicitedSessionsandContributedSessions.

CLADAG2015

10thScientificMeetingoftheClassificationandDataAnalysisGroup

oftheItalianStatisticalSociety

FlamingoResort,SantaMargheritadiPula,October8-10,2015

BOOKOFABSTRACTS

Editors:

FrancescoMola,ClaudioConversano

CUECEditricebySardegnaNovamediaSoc.Coop.ViaBasilicatan.57/5909127Cagliari,ItalyTel.&Fax(+39)070271573www.cuec.euinfo@cuec.eu

FirstElectronicEditionCUEC©2016ISBN:978-88-8467-949-9Euro6,90

eBookDesign:GiovanniCaprioliwww.servizi-per-editoria.it•info@servizi-per-editoria.it

ParticipatingOrganizations

TableofContents

PrefaceConferenceThemesCommitteesKEYNOTELECTURES

MiningTextNetworks[DavidBanks]

Variableselectionformodel-basedclusteringofcategoricaldata[BrendanMurphy]

Eigenvaluesinmixturemodeling:geometric,robustnessandcomputationalissues[SalvatoreIngrassia]

SPECIALIZEDSESSION•RobustmethodsfortheanalysisofEconomic(Big)data[OrganizerandChair:SilviaSalini]

FastandRobustSeeminglyUnrelatedRegression[MiaHubert,TimVerdonck,OzlemYorulmaz]

ApplicationtotheDetectionofCustomsFraudoftheGoodness-of-fitTestingfortheNewcomb-BenfordLaw[LucioBarabesi,AndreaCerasa,AndreaCerioli,DomenicoPerrotta]

MonitoringtheRobustAnalysisofaSingleMultivariateSample[MarcoRiani,AnthonyC.Atkinson,AndreaCerioli]

•Bayesiannonparametricclustering[Organizer:FabrizioRuggeriChair:RenataRotondi]

ABaysiannonparametricApproachtoModelAssociationbetweenClustersofSNPsandDiseaseResponses[RaffaeleArgiento,AlessandraGuglielmi,ChuhsingKateHsiao,FabrizioRuggeri,CharlotteWang]

ABayesiannonparametricModelforClusteringandBorrowingInformation[AntonioLijoi,BernardoNipoti,IgorPrünster]

SequentialClusteringbasedonDirichletProcessPriors[RobertoCasarin,AndreaPastore,StefanoF.Tonellato]

•CausalInferencewithComplexDataStructures[OrganizerandChair:AlessandraMattei]

ShorttermimpactofPM10exposureonmortality:Apropensityscoreapproach[MichelaBaccini,AlessandraMattei,FabriziaMealli]

IdentificationandEstimationofCausalMechanismsinClusteredEncouragementDesigns:DisentanglingBedNetsusingBayesianPrincipalStratification[LauraForastiere,FabriziaMealli,TylervanderWeele]

Theeffectsofadropoutpreventionprogramonsecondarystudents’outcomes[EnricoConti,SilviaDuranti,AlessandraMattei,FabriziaMealli,NicolaSciclone]

•ClusteringinTimeSeries[OrganizerandChair:MicheleLa

Rocca]

ProbabilisticBoosted-OrientedClusteringofTimeSeries[AntonioD'Ambrosio,GianlucaFrasso,CarmelaIorio,RobertaSiciliano]

Copula-basedfuzzyclusteringoftimeseries[PierpaoloD'Urso,MartaDisegna,FabrizioDurante]

Comparingmulti-stepaheadforecastingfunctionsfortimeseriesclustering[MarcellaCorduas,GiancarloRagozini]

•MultiwayAnalysis[OrganizerandChair:GiuseppeBove]

(Interactive)visualisationofthreewaydata[CasperJ.Albers,JohnC.Gower]

Robustfuzzyclusteringofmultivariatetimetrajectories[PierpaoloD'Urso,RiccardoMassari]

EstimationproceduresforavoidingdegeneratesolutionsinCandecomp/Parafac[PaoloGiordani]

•BigDataAnalysis[OrganizerandChair:DonatoMalerba]

Towardsastatisticalframeworkforattributecomparisoninverylargerelationaldatabases[CesareAlippi,ElisaQuintarelli,ManuelRoveri,LetiziaTanca]

MiningBigDatawithhighperformancecomputingsolutions[FabrizioAngiulli,StefanoBasta,StefanoLodi,GianlucaMoro,ClaudioSartori]

EnhancingBigDataExplorationwithFacetedBrowsing[SoniaBergamaschi,GiovanniSimoniniandSongZhu]

•NewMethodologiesforCompositeIndicators[OrganizerandChair:AgostinoDiCiaccio]

AdvancesinComposite-basedPathModelingforSyntheticIndicators[VincenzoEspositoVinzi,LauraTrinchera,GiorgioRussolillo]

CompositeIndicatorsModeling[MaurizioVichi]

Measuringtheimportanceofvariablesincompositeindicators[WilliamBecker,MichaelaSaisana,PaoloParuolo,AndreaSaltelli]

•Clusteranalysissoftwareandvalidation[OrganizerandChair:ChristianHennig]

AdaptiveChoiceOfInputParametersInRobustClustering[LuisA.Garcìa-Escudero,AugustinMayo-Iscar]

RobustModel-basedClusteringwithCovarianceMatrixConstraints[PietroCoretto,ChristianHennig]

FlexibleImplementationofResamplingSchemesforClusterValidation[FriedrichLeisch]

•Selectingamixturemodelwithaclusteringfocus[OrganizerandChair:GillesCeleux]

ClusteringinfinitemixturesusinganIntegratedCompleted

Likelihoodcriterion[MarcoBertoletti,NialFrielandRiccardoRastelli]

EstimationandModelSelectionforModel-BasedClusteringwiththeConditionalClassificationLikelihood[Jean-PatrickBaudry]

OnthedifferentwaystocomputetheIntegratedCompletedLikelihoodcriterion[GillesCeleux]

•Exploringrelationshipsbetweenblocksofvariables[OrganizerandChair:GiorgioRussolillo]

WeightedMultiblockClustering[NdéyeNiang,MoryOuattara]

ThematicModelExplorationthroughMultipleCo-Structuremaximisation:MethodandSoftware[XavierBry,ThomasVerron]

ANewComponent-basedApproachofRegularisationforMultivariateGeneralisedLinearRegression[CatherineTrottier,XavierBry,FredericMortier,GuillaumeCornu]

SOLICITEDSESSION•AdvancesinDensity-basedclustering[OrganizerandChair:FrancescaGreselin]

ANonparametricClusteringmethodforImageSegmentation[GiovannaMenardi]

RobustClusteringforHeterogenousSkewData[LuisA.

Garcìa-Escudero,FrancescaGreselin,AgustinMayo-Iscar]

RegularizingfinitemixturesofGaussianDistributions[BettinaGrün,GertraudMalsiner-Walli]

•LatentvariablemodelsforlongitudinaldataPartI[OrganizerandChair:SilviaBacci]

AJointModelForLongitudinalandSurvivalDataBasedonanAR(1)LatentProcess[SilviaBacci,FrancescoBartolucci,SilviaPandolfi]

FiniteMixtureModelsforMixedData:EMAlgorithmsandParafacRepresentations[MarcoAlfò,PaoloGiordani]

OntheuseofthecontaminatedGaussiandistributioninHiddenMarkovmodelsforlongitudinaldata[AntonioPunzo,AntonelloMaruotti]

•LatentvariablemodelsforlongitudinaldataPartII[OrganizerandChair:FrancescoBartolucci]

AhiddenMarkovapproachtotheanalysisofincompletemultivariatelongitudinaldata[FrancescoLagona]

LatentMarkovandgrowthmixturemodels:acomparison[FulviaPennoni,IsabellaRomeo]

Latentworthsandlongitudinalpairedcomparison.AMarkovmodelofdependence[BrianFrancis,AlexandraGrand,ReginaDittrich]

•Multivariatedataanalysisinenvironmentalsciences[Organizer:FabrizioRuggeri;Chair:RaffaeleArgiento]

Multivariatedownscalingfornon-Gaussiandata[DanielaCocchi,LuciaPaci,CarloTrivisano]

PreliminaryresultsontaperingmultivariatespatiotemporalmodelsforexposuretoairbornemultipollutantsinEurope[AlessandroFassò,FrancescoFinazziandFerdinandNdongo]

Clusteringmacroseismicfieldsbystatisticaldatadepthfunctions[ClaudioAgostinelli,RenataRotondiandElisaVarini]

•Advancedmodelsfortourismanalysis[OrganizerandChair:StefaniaMignani]

Analysingterritorialheterogenetyintourist’satisfactiontowardsItaliandestinations[CristinaBernini,AugustoCerquaandGuidoPellegrini]

Micro-economicdeterminantsoftouristexpenditure:Aquantileregressionapproach[EmanuelaMarrocu,RaffaelePaciandAndreaZara]

Inequalitiesandtourismconsumptionbehaviour:amixturemodelanalysis[CristinaBernini,MariaFrancescaCracolici,CinziaViroli]

•BayesianNetworksandGraphicalModelsinSocio-EconomicSciences[OrganizerandChair:PaolaVicard]

BayesianNetworksforFirmPerformanceEvaluation[MariaE.DeGiuli,PietroGottardo,AnnaM.MoiselloandClaudiaTarantola]

Graphicalmodelusingcopulasformeasurementerrormodeling[DanielaMarella,PaolaVicard]

•TimeSeriesinClustering[OrganizerandChair:MicheleLaRocca]

ParsimoniousClusteringofTimeSeries[CarmelaIorio,AntonioD’Ambrosio,GianlucaFrasso,RobertaSiciliano]

DynamicTimeWarping-basedfuzzyclusteringforspatialtimeseries[PierpaoloD'Urso,MartaDisegna,RiccardoMassari]

PeriodicalFeatureBasedTimeSeriesClustering[FrancescoGiordano,MicheleLaRoccaandMariaLuciaParrella]

•BigDataAnalysis[OrganizerandChair:DonatoMalerba]

InteractiveMachineLearningwithR[GiorgioMariaDiNunzio]

Workloadestimationforacallcenter[PierluigiRivaandRuggieroScommegna]

PredictioninOliveOilTradeusingRegressionModelsonTemporalDataNetwork[CorradoLoglisci,UmbertoMedicamento,ArturoCasieri]

•AdvancesinOrdinalandPreferenceData[OrganizerandChair:AntonioD’Ambrosio]

Measuringconsensusinthesettingofnon-uniformqualitativescales[JoséL.Garcìa-Lapresta,DavidPérez-Romàn]

AccurateAlgorithmsforConsensusRankingDetection[GiulioMazzeo,AntonioD’Ambrosio,RobertaSiciliano]

LogisticRegressionTreesforOrdinalandPreferenceData[ThomasRusch,AchimZeileis,KurtHornik]

•CasestudiesindatasciencefromLiguriancompanies[OrganizerandChair:DelioPanaro]

Statisticalmethodsfortheanalysisof«Ostreopsisovata»bloomeventsfrommeteo-marinedata[EnnioOttaviani,ValentinaAsnaghi,MariachiaraChiantore,AndreaPedroncini,RosellaBertolotto]

Dataminingforoptimalgambling[GabrieleTorre,FabrizioMalfanti]

AFraudDetectionAlgorithmforOnlineBanking[DelioPanaro,EvaRiccomagno,FabrizioMalfanti]

Doesdirectors’backgroundmatter?Firmperformance,boardfeaturesandfinancialreportingreliability[DelioPanaro,SilviaFerramoscaandSaraTrucco]

•Modelingordinaldata[OrganizerandChair:MaurizioCarpita]

PosteriorpredictivemodelchecksforassessingthegoodnessoffitofBayesianmultidimensionalIRTmodels[MariagiuliaMatteucci,StefaniaMignani]

InternationaltourisminItaly:aBayesianNetworkapproach[FedericaCugnata,GiovanniPerucca]

Clusteringupperlevelunitsinmultilevelmodelsforordinaldata[LeonardoGrilli,AgnesePanzera,CarlaRampichini]

•Functionaldataanalysisforenvironmentaldata[OrganizerandChair:TonioDiBattista]

ClusteringSpatiallydependentFunctionalData:amethodbasedontheconceptofspatialdispersionfunctionofacurve[ElviraRomano,AntonioBalzanella,RosannaVerde]

Twocasestudiesonobjectorientedspatialstatistics[PiercesareSecchi,SimoneVantini,ValeriaVitelli]

Inferenceonfunctionalbiodiversitytools[TonioDiBattista,FrancescaFortuna,FabrizioMaturo]

•Advancesinquantileregression[OrganizerandChair:CristinaDavino]

M-quantileregression:diagnosticsandparametricrepresentationofthemodel[AnnamariaBianchi,EnricoFabrizi,NicolaSalvati,NikosTzavidis]

QuantileRegression:aBayesianRobustApproach[MarcoBottone,MauroBernardi,LeaPetrella]

Acomparisonamongestimatorsforlinearregressionmethods[MarilenaFurno,DomenicoVistocco]

HandlingheterogeneityamongunitsinQuantileRegression[CristinaDavino,DomenicoVistocco]

•DirectionalData[OrganizerandChair:GiovanniC.Porzio]

Smallbiasedcirculardensityestimation[MarcoDiMarzio,StefaniaFensore,AgnesePanzera,CharlesC.Taylor]

Adepth-basedclassifierforcirculardata[GiuseppePandolfo]

Nonparametricestimatesofthemodefordirectionaldata[ThomasKirschstein,SteffenLiebscher,GiovanniC.Porzio,GiancarloRagozini]

•Recentdevelopmentsinstatisticalanalysisofnetworkdata[OrganizerandChair:DomenicoDeStefano]

GameTheoryandNetworkModelsfortheReconstructionofArchaeologicalNetworks[VivianaAmati,UlrikBrandes]

AmodelforclusteringaspatialnetworkwithapplicationtoLocalLabourSystemidentification[FrancescoPauli,NicolaTorelli,SusannaZaccarin]

OnthesamplingdistributionsoftheMLestimatorsinNetworkEffectModels[MicheleLaRocca,GiovanniC.Porzio,MariaProsperinaVitale,PatrickDoreian]

CorrespondenceAnalysiswithDoublingforTwo-ModeValuedNetworks[GiancarloRagozini,DomenicoDeStefanoDanielaD’Ambrosio]

•Currentchallengesinclusteringandclassificationofbiomedicaldata[OrganizerandChair:AdalbertF.X.Wilhelm]

Semanticmulticlassifiersystemsforthedetectionofagingrelatedprocesses[HansA.Kestler,LudwigLausser,Lyn-RouvenSchirra,FlorianSchmid]

Emotionrecognitioninhumancomputerinteractionusingmultipleclassifiersystems[FriedhelmSchwenker]

Ensembleofselectedclassifiers[BertholdLausen,AsmaGul,ZardadKhanandOsamaMahmoud]

CONTRIBUTEDPAPERS

Ageneralizeddistanceforinferenceonfunctionaldata[AndreaGhiglietti,AnnaM.Paganoni]

Longgapsinmultivariatespatio-temporaldata:anapproachbasedonFunctionalDataAnalysis[MariantoniettaRuggieri,AntonellaPlaiaandFrancescaDiSalvo]

Effectsoncurveclusteringofdifferenttransformationsof

chronologicaltextualdata[MatildeTrevisaniandArjunaTuzzi]

Anoteonthereliabilityofaclassifier[LucaFrigau]

Robustifiedclassificationofmultivariatefunctionaldata[FrancescaIeva,AnnaM.Paganoni]

SizeControlofRobustRegressionEstimators[SilviaSalini,AndreaCerioli,FabrizioLaurini,MarcoRiani]

TheMovementsofEmotions:anExploratoryClassificationonAffectiveMovementData[PasqualeDente,ArvidKappas,AdalbertF.X.Wilhelm]

ElectreTri-MachineLearningApproachtotheRecordLinkageProblem[ValentinaMinnetti,RenatoDeLeone]

QualityofClassificationapproachesforthequantitativeanalysisofinternationalconflict[AdalbertF.X.Wilhelm]

ThertclustProcedureforRobustClustering[FrancescoDotto,AlessioFarcomeni,LuisAngelGarcìa-Escudero,AgustinMayo-Iscar]

Whatarethetrueclusters?[ChristianHennig]

Anovelmodel-basedclusteringapproachformassivedatasetsofspatiallyregisteredtimeseries.Withapplicationtoseasurfacetemperatureremotesensinfdata[FrancescoFinazzi,MarianScott]

BigDataClassification:Simulationsinthemanyfeaturescase[ClausWeihs]

FromBigDatatoinformation:statisticalissuesthroughexamples[SilviaBiffignandi,SerenaSignorelli]

Bigdatameetpharmaceuticalindustry:anapplicationonsocialmediadata[CaterinaLiberati,PaoloMariani]

Definingthesubjectsdistanceinhierarchicalclusteranalysisbycopulaapproach[AndreaBonanomi,MartaNaiRuscone,SilviaAngelaOsmetti]

Supervisedclassificationofdefectivecrankshaftsbyimageanalysis[BeatrizRemeseiro,JavierTarrìo-Saavedra,MarioFrancisco-Fernàndez,ManuelG.Penedo,SalvadorNaya,RicardoCao]

ArchetypalAnalysisforData-DrivenPrototypeIdentification[GiancarloRagozini,FrancescoPalumbo,MariaR.D’Esposito]

PrincipalComponentAnalysisofComplexDataandApplicationtoClimatology[SergioCamizandSilviaCreta]

SparseexploratorymultidimensionalIRTmodels[LaraFontanella,SaraFontanella,PasqualeValentini,NickolayTrendafilov]

IterativeFactorClusteringforCategoricaldataReconsidered[AlfonsoIodiceD’Enza,AngelosMarkos,FrancescoPalumbo]

TestingAntipodalSymmetryofCircularData[GiovanniCasale,GiuseppePandolfo,GiovanniC.Porzio]

Howtodefinedevianceresidualsinmultinomialregression[GiovanniRomeo,MariangelaSciandra,MarcelloChiodi]

DiagnostictoolsforGAMLSSfittedobjects[AndreaMarletta,MariangelaSciandra]

BayesianRegressionAnalysiswithLinkedandDuplicatedData[AndreaTancredi,RebeccaSteorts,BruneroLiseo]

Asemi-parametricFayHerriot-typemodelwithunknownsamplingvariances[SilviaPolettini]

PosteriorDistributionsfromOptimallyB-RobustEstimatingfunctionsandApproximateBayesianComputation[IvanLucianoDanesi,FabioPiacenza,ErlisRuli,LauraVentura]

MCABasedCommunityDetection[CarloDrago]

Classifyingsocialrolesbynetworkstructures[SimonaGozzo,VeneraTomaselli]

AmultilevelHeckmanmodeltoinvestigatefinancialassetsamongoldpeopleinEurope[OmarPaccagnella,ChiaraDalBianco]

OptimalPricingUsingBayesianSemiparametricPriceResponseModels[WinfriedJ.Steiner,AnettWeber,StefanLangandPeterWechselberger]

Monetarytransmissionmodelsforbankinginterestrates[LauraParisi,PaoloGiudici,IgorGianfrancescoandCamilloGiliberto]

Estimatingtheeffectofprenatalcareonbirthoutcomes[EmilianoSironi,MassimoCannas]

Recursivepartitioning:anapproachbasedontheweightedKemenydistance[MariangelaSciandra,AntonellaPlaia,VeronicaPicone]

Whytostudyabroad?Anexampleofclustering[ValeriaCaviezel,AnnaM.Falzoni]

Agraphicalcopula-basedtoolfordetectingtaildependence[RobertaPappadà,FabrizioDurante,NicolaTorelli]

Classificationmodelsastoolofbankruptcyprediction–Polishexperience[JózefPociecha,BarbaraPawełek,MateuszBaryła,SabinaAugustyn]

Therelationshipbetweenindividualpriceresponseofbeerconsumersandtheirdemographic/psychographiccharacteristics[FriederikePaetz]

TheEnsembleConceptualClusteringofSymbolicDataforCustomerLoyaltyAnalysis[MarcinPełka]

Consumers’perceptionsofCorporateSocialResponsibilitiesandwillingnesstopay:APartialLeastSquares[KarstenLübke,ChristianHose,ThomasObermeier]

InspectingthequalityofItalianwinethroughcausalreasoning[EugenioBrentari,MaurizioCarpita,SilviaGolia]

Exploringsocio-economicfactorsassociatedwithadherencetotheMediterraneandiet:amultilevelapproach[TizianaLaureti,LucaSecondi]

Bigdataand‘social’reputation:afinancialexample[PaolaCerchiello]

BayesianNetworksforStockPicking[AlessandroGreppi,MariaElenaDeGiuli,ClaudiaTarantola]

PortfolioselectionwithLassoalgorithm[RiccardoBramante,SilviaFacchinetti,DiegoZappa]

SunspotinEconomicModelswithExternalities[BeatriceVenturiandAlessandroPirisinu]

SequentialClusteringbasedonDirichletProcessPriors

RobertoCasarin,AndreaPastoreandStefanoF.Tonellato1

1 Department of Economics, Ca’ Foscari University of Venice, (e-mail:stone@unive.it)

Abstract: This paper proposes a new sequential clusteringmethod based on the sequential estimation of the randompartitioninducedbytheDirichletprocess.OurapproachreliesonSequentialImportanceResampling(SIR)andontheestimationofthe posterior probabilities that each pair of individuals aregeneratedbythesamemixturecomponent.Suchestimatesdonotrequire the identification ofmixture components, and thereforeare not affected by label switching. Then, a dissimilaritymatrixcan be easily built, allowing for the implementation ofagglomerativeclusteringmethods.

Keywords:Dirichlet process, sampling importance resampling,agglomerativeclustering.

1DirichletprocessmixtureA very important class ofmodels inBayesiannonparametrics isbasedontheDirichletprocessandisknownasDirichletprocessmixture (Antoniak, 1974). In thismodel, theobservable randomvariables, Xi, i = 1,…, n, are assumed to be exchangeable andgeneratedbythefollowinghierarchicalmodel:

where DP(α, G0) denotes a Dirichlet process (DP) with basemeasure G0 and precision parameter α > 0. Since the DPgenerates almost surely discrete random measures on theparameterspaceΘ,tiesamongtheparametervalueshavepositiveprobability,leadingtoabatchofclustersoftheparametervectorθ=[θ1,…,θn]T.ExploitingthePolyaurnrepresentationoftheDP,themodelcanberewrittenas

where{k}={1,…,k},s<i={sj,j∈{i−1}}(intherestofthepaper,thesubscript<iwillrefertothosequantitiesthat involvealltheobservationsXi'suchthati'<i),sj∈{k}forj∈{k−1},andnjisthenumber of θ1’s equal to θ*j. In this model representation, theparameterθcanbeexpressedas(s,θ*),withs={si:si∈{k}, i∈

{n}},θ*=[θ*1,…,θ*k]Twithθ*j~iidG0andθi=θ*si.Consequently,

themarginal distribution ofXi is amixturewithk components,wherekisanunknownrandominteger.

In a parametric non Bayesian approach, it would be quitestraightforwardtoclusterthedatabymaximisingtheprobabilityof the allocation of each datum to one of the k clusters (with kfixed and known), conditionally on the observed sample

(McLachlan&Peel,2000).Unfortunately,undertheassumptionswemade, such computations are not feasible even numerically,due to the well known label switching problem (Frühwirth-Schnatter, 2006). Nevertheless, equations (1)-(4) will be veryhelpfulinbuildingahierarchicalclusteringalgorithmbasedonaBayesiannonparametricmodelspecification.

2SamplingimportanceresamplingUnder the assumptions we introduced above, following thearguments of MacEachern et al., 1999, we can write theconditionalposteriordistributionofsigivenx1,…,xi,as

Wecanmarginalisetheconditionalposteriorofsiwithrespecttoθ*,obtaining

NoticethatwhenG0 isaconjugatepriorfor(1),thecomputationof(5)and(6)isoftenstraightforward.

MacEachernetal.,1999,introducedthefollowingimportancesampler.

SISalgorithm.Fori=1,…,n,repeatsteps(A)and(B)(A)Compute

withnk+1=α.(B)Generatesifromthemultinomialdistributionwith

TakingR independentreplicasofthisalgorithmweobtainsi(r), i

=1,…,n,r=1,…,R,andθ*j~p(θ|x(j)),withx(j)={xi:i∈{n},si=j},andcomputetheimportanceweights

such that ∑Rr=1wr = 1. Should the variance of the importance

weights be too small, the efficiency of the sampler could beimprovedbyresamplingasfollows(Cappéetal.,2005).ComputeNeff = (∑R

r=1w2r)(−1) = 1. IfNeff<R/2, drawR particles from the

current particle set with probabilities equal to their weights,replace the old particle with the new ones and assign themconstantweightswr=1/R.

3PairwisedissimilaritiesandhierarchicalclusteringIntuitively,wecanstatethattwoindividuals,iandj,aresimilarifxiandxjaregeneratedbythesamemixturecomponent,i.e.ifsi=

sj. Label switching prevents us from identifying mixturecomponents, but not from assessing similarities amongindividuals. In fact, the algorithm introduced in the previoussection may help us in estimating dissimilarities betweenindividuals.Theposteriorprobabilitythatxiandxjaregeneratedbythesamecomponent,i.e.theposteriorprobabilityoftheevent{si=sj},canbeestimatedas

whereI(x,y)=1 ifx=yandI(x,y)=0otherwise.Wecanthendefine a dissimilaritymatrixD with i j-th elementdij = 1 −pˆij,allowingustousestandardagglomerativehierarchicalclusteringmethodsbasedonposteriorevidence.

4DiscussionThe flexibility of Bayesian nonparametric models improvesrobustnessofclassificationwithrespecttofinitemixturemodels.Sampling importance resampling algorithms allow for efficientcomputations,particularlywhenthebasemeasureisconjugatetomodel likelihood. No restrictions on the parameters or postprocessingoftheposteriorsimulationsarerequired.

ReferencesAntoniak, C.E. 1974. Mixtures of Dirichlet processes with applications to

Bayesiannonparametricproblems.AnnalsofStatistics.,2,1152-1174.Cappé, O., Moulines, E., & T., Rydén. 2005. Inference in Hidden Markov

Models.NewYork:Springer.Frühwirth-Schnatter,S.2006.FiniteMixtureandMarkovSwitchingModels.

Berlin:Springer.MacEachern, S.N., Clyde, M., & Liu, J.S. 1999. Sequential importance

sampling for nonparametric Bayes models: The next generation. TheCanadianJournalofStatistics,27,251-267.

McLachlan,G.,&Peel,D.2000.FiniteMixtureModels.NewYork:Wiley.