Biometrics By The Border...11. An Approach To Poisson Mixed Models For -Omics Expression Data 12....

BiometricsByTheBorder

InternationalBiometricsSocietyAustralasianRegionConferenceKingscliff,NSW

26-30thNovember2017

BiometricsByTheBorder1. Welcome2. ProgrammeAtAGlance

1. Monday2. Tuesday3. Wednesday4. Thursday

3. ProgrammeAndAbstractsForMonday27thOfNovember1. AgriculturalAndAgri-EnvironmentalStatisticsWithSupportOf

GeospatialInformation,MethodologicalIssues2. DevelopingARegulatoryDefinitionForTheAuthenticationOf

ManukaHoney3. OnTestingRandomEffectsInLinearMixedModels4. AnalysingDigestionData5. APermutationTestForComparingPredictiveValuesInClinical

Trials6. ChallengesAndOpportunitiesWorkingAsAConsultingStatistician

WithAFoodScienceResearchGroup7. RobustSemiparametricInferenceInRandomEffectsModels8. RobustPenalizedLogisticRegressionThroughMaximumTrimmed

LikelihoodEstimator9. AMulti-StepClassifierAddressingCohortHeterogeneityImproves

PerformanceOfPrognosticBiomarkersInComplexDisease10. HowToAnalyseFiveDataPointsWithFun11. AnApproachToPoissonMixedModelsFor-OmicsExpressionData12. BayesianSpatialEstimationWhenAreasAreFew13. Knowledge-GuidedGeneralizedBiclusteringAnalysisForIntegrative

–OmicsAnalysis14. UnderstandingTheVariationInHarvesterYieldMapDataFor

EstimatingCropTraits15. CitizenScienceToSurveillance:EstimatingReportingProbabilities

OfExoticInsectPests16. IntroductionTo“Deltagen”-AComprehensiveDecisionSupportTool

ForPlantBreedersUsingRAndShiny17. EstimatingNitrousOxideEmissionFactors

4. ProgrammeAndAbstractsForTuesday28thOfNovember1. ClusterCapture-Recapture:ANewFrameworkForEstimating

PopulationSize2. PropensityScoreApproachesInThePresenceOfMissingData:

ComparisonOfBalanceAndTreatmentEffectEstimates3. VisualisingModelSelectionStabilityInHigh-DimensionalRegression

Models4. TheMissingLink:AnEquivalenceResultForLikelihoodBased

MethodsInMissingDataProblems5. DimensionalityReductionOfLIBSDataForBayesianAnalysis6. AnalysisOfMelanomaDataWithAMixtureOfSurvivalModels

UtilisingMulti-ClassDLDAToInformMixtureClass7. ForecastingHotspotsOfPotentiallyPreventableHospitalisationsWith

SpatiallyAggregatedLongitudinalHealthData:AllSubsetModelSelectionWithANovelImplementationOfRepeatedK-FoldCross-Validation

8. IdentifyingClustersOfPatientsWithDiabetesUsingAMarkovBirth-DeathProcess

9. ChallengesAnalysingCombinedAgriculturalFieldTrialsWithPartiallyOverlappingTreatments

10. AHiddenMarkovModelForSleepStageDetectionUsingRawTri-AxialWristActigraphy

11. SparsePhenotypingDesignsForEarlyStageSelectionExperimentsInPlantBreedingPrograms

12. ComparisonsOfTwoLargeLong-TermStudiesInAlzheimer'sDisease

13. AOne-StageMixedModelAnalysisOfCanolaChemistryTrials14. ASemi-ParametricLinearMixedModelsForLongitudinally

MeasuredFastingBloodSugarLevelOfAdultDiabeticPatients15. IndividualAndJointAnalysesOfSugarcaneExperimentsToSelect

TestLines5. ProgrammeAndAbstractsForWednesday29thOfNovember

1. StatisticsOnStreetCorners2. EstimatingOverdispersionInSparseMultinomialData3. AssessingMudCrabMeatFullnessUsingNon-InvasiveTechnologies4. AnalysisOfMultivariateBinaryLongitudinalData:Metabolic

SyndromeDuringMenopausalTransition5. StatisticalAnalysisOfCoastalAndOceanographicInfluencesOnThe

QueenslandScallopFishery

6. SavedByTheExperimentalDesign:TestingBycatchReductionAndTurtleExclusionDevicesInThePngPrawnTrawlFishery

7. RethinkingBiosecurityInspections:ACaseStudyOfTheAsianGypsyMoth(AGM)InAustralia

8. SubtractiveStabilityMeasuresForImprovedVariableSelection9. AComparisonOfMultipleImputationMethodsForMissingDataIn

LongitudinalStudies10. SpeciesDistributionModellingForCombinedDataSources11. AFactorAnalyticMixedModelApproachForTheAnalysisOf

GenotypeByTreatmentByEnvironmentData12. TheImpactOfCohortSubstanceUseUponLikelihoodOf

TransitioningThroughStagesOfAlcoholAndCannabisUseAndUseDisorder:FindingsFromTheAustralianNationalSurveyOnMentalHealthAndWell-Being

13. TheLASSOOnLatentIndicesForOrdinalPredictorsInRegression14. Whole-GenomeQTLAnalysisForNestedAssociationMapping

Populations15. AnAsymmetricMeasureOfPopulationDifferentiationBasedOnThe

SaddlepointApproximationMethod16. FastAndApproximateExhaustiveVariableSelectionForGLMsWith

APES17. OrderSelectionOfFactorAnalyticModelsForGenotypeX

EnvironmentInteraction18. MultipleSampleHypothesisTestingOfTheHumanMicrobiome

ThroughEvolutionaryTrees19. StatisticalStrategiesForTheAnalysisOfLargeAndComplexData20. OptimalExperimentalDesignForFunctionalResponseExperiments21. ToPCAOrNotToPCA22. AnEvaluationOfErrorVarianceBiasInSpatialDesigns23. NewModel-BasedOrdinationDataExplorationToolsFor

MicrobiomeStudies24. AlwaysRandomize?25. ComparingClassicalCriteriaForSelectingIntra-ClassCorrelated

FeaturesForThree-ModeThree-WayData26. ExploringTheSocialRelationshipsOfDairyGoats27. BayesianSemi-ParametricSpectralDensityEstimationWith

ApplicationsToTheSouthernOscillationIndex28. EfficientMultivariateSensitivityAnalysisOfAgriculturalSimulators29. BayesianHypothesisTestsWithDiffusePriors:CanWeHaveOur

CakeAndEatItToo?6. ProgrammeAndAbstractsForThursday30thOfNovember

1. AGeneralFrameworkForFunctionalRegressionModelling2. HockeySticksAndBrokenSticks–ADesignForASingle-Arm,

Placebo-Controlled,Double-Blind,RandomizedClinicalTrialSuitableForChronicDiseases

3. BoundingIVEstimatesUsingMediationAnalysisThinking4. CorrelatedBivariateNormalCompetingRisks---Structuring

EstimationInAnIll-PosedProblem5. BayesianRegressionWithFunctionalInequalityConstraints6. GeneticAnalysisOfRenalFunctionInAnIsolatedAustralian

IndigenousCommunity7. ThePerformanceOfModelAveragedTailAreaConfidenceIntervals8. BiasCorrectionInEstimatingProportionsByPooledTesting9. TheParametricCureFractionModelOfOvarianCancer10. TheSkillings-MackStatisticForRanksDataInBlocks

7. PosterAbstracts1. Multi-EnvironmentTrialAnalysisOfAgronomyTrialsUsing

EstablishedPlantPopulationDensity2. AdventuresInDigitalAgricultureInNewZealand3. ModellingCanopyGreennessOverTimeUsingSplinesAndNon-

LinearRegression4. OnTestingMarginalHomogeneityForSquareContingencyTables

WithOrdinalCategories5. AFactorAnalyticApproachToModellingDiseaseProgression

AcrossLeafLayersAndTime6. AssociatingStrawStrengthWithLikelihoodOfHeadLossInBarley

ForWesternAustralia7. ADeepLearningNeuralNetworkModelToIdentifyTheImportant

GenesInMetastaticBreastCancerFromCensoredMicroarrayData8. UsingRAndShinyToDevelopWebBasedSamplingApplications

ForTheAgriculturalAndEducationSectors9. GraphicalNetworkAnalysesInformsBiomarkerDiscoveryVia

QuantificationOfKeyDiseaseRelatedConnections10. Spatio-TemporalCorticalBrainAtrophyPatternsOfAlzheimer's

Disease11. EmpiricalModellingOfFruitFirmnessChangeDuringColour

Conditioning

WelcomeWelcometothe2017BiometricsbytheBorderconferenceinKingscliff,NewSouthWales.AswemeethereonthelandoftheBundjalungpeople,weacknowledgethetraditionalcustodiansofthelandandpayourrespecttotheElderspastandpresent.

IbidaspecialwelcometoallwhotravelledfarfromcountriessuchasBelgium,Brazil,China,Denmark,Germany,Italy,Japan,Nigeria,SouthAfrica,USAandVietnamtocomplementthemajorityfromAustraliaandNewZealand.

TheconferencewillbeopenedbythecurrentPresidentofIBS,ElizabethThompsonandclosedbytheincomingPresident,LouiseRyan.In-between,wehavesixoutstandingkeynotespeakerswhosettheframeworkforanexcitinganddiversescientificprogram.Iinviteyoualltoattendpresentationsontopicsfamiliartoyourresearchandtoexplorewhatisdoneinotherareas.

Wehaveaveryhealthynumberofstudentsandearlycareerresearchersamongthemorethan120delegatesattending,indicatinghowpopular,importantandattractivedatarichresearchcontinuestobeintheyearstocome.

Thisconferencewillbeasuccessbecauseofexcellentworkbymany,inparticularbyRossDarnell,MelissaDobbie,ClaytonForknall,AlisonKellyandBethanyMacdonald,whoformthelocalorganizingcommittee;byJamesCurran(ChairoftheScientificProgram),GarthTarrandEstherMeenkenwhoassistedJames;byWarrenMuller(Treasurer),DavidBaird(Secretary)andbyHansHockeywhoisourofficialphotographer,soalwayssmilewhenheisaround.

LastbutnotleastIthankourPlatinumsponsorsACEMS,GRDCandtheUniversityofSydneyfortheirgenerousfinancialsupport,manyaspectsoftheconferenceareonlypossiblebecauseoftheirhelp.

SamuelMueller,PresidentIBS-AR

ProgrammeAtAGlance

Monday

Time Monday27/11850 Housekeeping900 Welcome

SamuelMueller,PresidentTraditonalwelcometoAustraliabyarepresentativeofthe

BundjalungElizabethThompson,PresidentoftheIBS

940 AgriculturalAndAgri-EnvironmentalStatisticsWithSupportOfGeospatialInformation,MethodologicalIssues

ElisabettaCarfagnaUniversityofBologna,Italy

1030 MorningTea(30minutes)

Modelling Sampling,ComplexSurveys,AndMixedModels

1100 DevelopingARegulatoryDefinitionForTheAuthenticationOfManukaHoneyClaireMcDonald,MinistryforPrimaryIndustries,NZ

OnTestingRandomEffectsInLinearMixedModelsAlanWelsh,ANU

1120 AnalysingDigestionDataMaryannStaincliffe,AgResearchLtd.

APermutationTestForComparingPredictiveValuesInClinicalTrialsKoujiYamamoto,OsakaCityUniversity

1140 ChallengesAndOpportunitiesWorkingAsAConsultingStatisticianWithAFoodScienceResearchGroupM.GabrielaBorgognone,QueenslandDepartmentof

RobustSemiparametricInferenceInRandomEffectsModelsMichaelStewart,UniversityofSydney

AgricultureandFisheries1200 RobustPenalizedLogistic

RegressionThroughMaximumTrimmedLikelihoodEstimatorTongWang,ShanxiMedicalUniversty,China

1220 Lunch(1hour10minutes)1330 AMulti-StepClassifierAddressingCohortHeterogeneity

ImprovesPerformanceOfPrognosticBiomarkersInComplexDisease

JeanYangUniversityofSydney

General/Education Genetics1420 HowToAnalyseFiveDataPoints

WithFunPaulineO'Shaughnessy,UniversityofWollongong

AnApproachToPoissonMixedModelsFor-omicsExpressionDataIreneZeng,UniversityofAuckland

1440 BayesianSpatialEstimationWhenAreasAreFewAswiAswi,QueenslandUniversityofTechnology

Knowledge-guidedGeneralizedBiclusteringAnalysisforintegrative–omicsAnalysisQiLong,UniversityofPennsylvania

1500 UnderstandingTheVariationInHarvesterYieldMapDataForEstimatingCropTraitsDeanDiepeveen,CurtinUniversity

CitizenScienceToSurveillance:EstimatingReportingProbabilitiesOfExoticInsectPestsPeterCaley,CSIRO

1520 Afternoontea(30minutes)Computing/BigData EnvironmentalStatistics

1550 IntroductionTo“Deltagen”-AComprehensiveDecisionSupportToolForPlantBreedersUsingRAndshinyDongwenLuo,AgResearchLtd.

EstimatingNitrousOxideEmissionFactorsAlasdairNoble,AgResearchLtd.

1630 LightningTalksfollowedbyPosterSessionKindlySponsoredbytheGrainsResearchandDevelopment

Corporation

Tuesday

Time Tuesday28/11850 Housekeeping900 ClusterCapture-Recapture:ANewFrameworkForEstimating

PopulationSizeRachelFewster

UniversityofAuckland950 MorningTea(30minutes)

MissingData HighDimensionalData1030 PropensityScore

ApproachesInThePresenceOfMissingData:ComparisonOfBalanceAndTreatmentEffectEstimatesJannahBaker,TheGeorgeInsituteforGlobalHealth

VisualisingModelSelectionStabilityinHigh-DimensionalRegressionModelsGarthTarr,UniversityofSydney

1050 TheMissingLink:AnEquivalenceResultForLikelihoodBasedMethodsInMissingDataProblemsFirouzehNoghrehchi,UniversityofNewSouthWales

DimensionalityReductionofLIBSDataforBayesianAnalysisAnjaliGupta,UniversityofAuckland

Medical Modelling1110 AnalysisofMelanoma

DatawithaMixtureofSurvivalModelsUtilisingMulti-class

ForecastingHotspotsOfPotentiallyPreventableHospitalisationsWithSpatiallyAggregatedLongitudinalHealthData:AllSubsetModelSelectionWithANovel

DLDAtoInformMixtureClassSarahRomanes,UniversityofSydney

ImplementationOfRepeatedk-FoldCross-ValidationMatthewTuson,UniversityofWesternAustralia

1130 IdentifyingClustersOfPatientsWithDiabetesUsingaMarkovBirth-DeathProcessMugdhaManda,UniversityofAuckland

ChallengesAnalysingCombinedAgriculturalFieldTrialsWithPartiallyOverlappingTreatmentsKerryBell,QueenslandDepartmentofAgricultureandFisheries

1150 AHiddenMarkovModelForSleepStageDetectionUsingRawTri-AxialWristActigraphyMichelleTrevenen,UniversityofWesternAustralia

SparsePhenotypingDesignsForEarlyStageSelectionExperimentsInPlantBreedingProgramsNicoleCocks,UniversityofWollongong

1210 ComparisonsOfTwoLargeLong-TermStudiesInAlzheimer'sDiseaseCharleyBudgeon,UniversityofWesternAustralia

AOne-StageMixedModelAnalysisOfCanolaChemistryTrialsDanielTolhurst,UniversityofWollongong

1230 ASemi-ParametricLinearMixedModelsForLongitudinallyMeasuredFastingBloodSugarLevelOfAdultDiabeticPatientsLegesseKassaDebusho,UniversityofSouthAfrica

IndividualAndJointAnalysesOfSugarcaneExperimentsToSelectTestLinesRenataAlcardeSermarini,UniversityofSaoPaulo,Brazil

1250 Lunch(1hour10minutes)

Wednesday

Time Wednesday29/11850 Housekeeping900 StatisticsOnStreetCorners

DianneCookMonashUniversity

950 MorningTea(30minutes)Modelling Fisheries BiometricsI

1030 EstimatingOverdispersioninSparseMultinomialDataFarzanaAfroz,UniverityofOtago

AssessingMudCrabMeatFullnessUsingNon-InvasiveTechnologiesCaroleWright,QueenslandDepartmentofAgricultureandFisheries

AnalysisOfMultivariateBinaryLongitudinalData:MetabolicSyndromeDuringMenopausalTransitionGeoffJones,MasseyUniversity

1050 StatisticalAnalysisOfCoastalAndOceanographicInfluencesOnTheQueenslandScallopFisheryWen-HsiYang,UniversityofQueensland

SavedByTheExperimentalDesign:TestingBycatchReductionAndTurtleExclusionDevicesInThePngPrawnTrawlFisheryEmmaLawrence,CSIRO

RethinkingBiosecurityInspections:ACaseStudyOfTheAsianGypsyMoth(AGM)InAustraliaPetraKuhnert,CSIRO

ModelSelection Agriculture/Horticulture BiometricsII1110 SubtractiveStability

MeasuresForImprovedVariableSelectionConnorSmith,UniversityofSydney

GxEforGenomicPredictionModelsColleenHunt,QueenslandDepartmentofFisheriesandAgriculture

AComparisonOfMultipleImputationMethodsForMissingDataInLongitudinalStudiesMdHamidul

Huque,MurdochChildrensResearchInstitute

1130 SpeciesDistributionModellingForCombinedDataSourcesIanRenner,UniversityofNewcastle

AFactorAnalyticMixedModelApproachForTheAnalysisOfGenotypeByTreatmentByEnvironmentDataLaurenBorg,UniversityofWollongong

TheImpactOfCohortSubstanceUseUponLikelihoodOfTransitioningThroughStagesOfAlcoholAndCannabisUseAndUseDisorder:FindingsFromTheAustralianNationalSurveyOnMentalHealthAndWell-BeingChriannaBharat,NationalDrugandAlcoholResearchCentre,Australia

1150 TheLASSOOnLatentIndicesForOrdinalPredictorsinRegressionFrancisHui,ANU

Whole-GenomeQTLAnalysisForNestedAssociationMappingPopulationsMariaValeriaPaccapelo,QueenslandDepartmentofAgricultureandFisheries

AnAsymmetricMeasureofPopulationDifferentiationBasedontheSaddlepointApproximationMethodLouise

McMillan,UniversityofAuckland

1210 FastAndApproximateExhaustiveVariableSelectionForGLMsWithAPESKevinWang,UniversityofSydney

OrderSelectionOfFactorAnalyticModelsForGenotypeXEnvironmentInteractionEmiTanaka,UniversityofSydney

MultipleSampleHypothesisTestingOfTheHumanMicrobiomeThroughEvolutionaryTreesMartinaMincheva,TempleUniversity,USA


AGM1340 StatisticalStrategiesfortheAnalysisofLargeAndComplexData

LouiseRyanUniversityofTechnologySydneyand

HarvardT.H.ChanSchoolofPublicHealthDesign Multivariate

1430 OptimalExperimentalDesignforFunctionalResponseExperimentsChristopherDrovandi,QueenslandUniversityofTechnology

ToPCAOrNotToPCACatherineMcKenzie,AgResearchLtd.

1450 AnEvaluationofErrorVarianceBiasinSpatialDesignsEmlynWilliams,ANU

NewModel-BasedOrdinationDataExplorationToolsForMicrobiomeStudiesOlivierThas,GhentUniversity,Belgium

1510 AlwaysRandomize?ChrisBrien,UniversityofSouthAustraliaand

ComparingClassicalCriteriaForSelectingIntra-ClassCorrelated

UniversityofAdelaide FeaturesForThree-ModeThree-WayDataLynetteHunt,UniversityofWaikato

1530 Afternoontea(20minutes)Agriculture/Horticulture BayesianStatistics

1550 ExploringthesocialrelationshipsofdairygoatsVanessaCave,AgResearchLtd.

BayesianSemi-ParametricSpectralDensityEstimationWithApplicationsToTheSouthernOscillationIndexClaudiaKirch,UniversityofMagdeburg,Germany

1610 EfficientMultivariateSensitivityAnalysisOfAgriculturalSimulatorsDanielGladish,CSIRO

BayesianHypothesisTestsWithDiffusePriors:CanWeHaveOurCakeAndEatItToo?JohnOrmerod,UniversityofSydney

Thursday

Time Thursday30/11850 Housekeeping900 AGeneralFrameworkForFunctionalRegressionModelling

SonjaGrevenLMUniversity,Munich

950 MorningTea(30minutes)

Biostatistics TheoryAndMethods

1030 HockeySticksAndBrokenSticks–ADesignForASingle-Arm,Placebo-Controlled,Double-Blind,RandomizedClinicalTrialSuitableForChronicDiseases

BoundingIVEstimatesUsingMediationAnalysisThinking

HansHockey,BiometricsMatters TheisLange,UniversityofCopenhagen

1050 CorrelatedBivariateNormalCompetingRisks---StructuringEstimationInAnIll-PosedProblemMalcolmHudson,MacquarieUniversity

BayesianRegressionwithFunctionalInequalityConstraintsJoshuaBon,UniversityofWesternAustralia

Genetics/Medical TheoryAndMethods

1110 GeneticAnalysisOfRenalFunctionInAnIsolatedAustralianIndigenousCommunityRussellThomson,WesternSydneyUniversity

ThePerformanceOfModelAveragedTailAreaConfidenceIntervalsPaulKabaila,LaTrobeUniversity

1130 DeconstructingTheInnateImmuneComponentOfAMolecularNetworkOfTheAgingFrontalCortexEllisPatrick,UniversityofSydney

BiasCorrectionInEstimatingProportionsbyPooledTestingGrahamHepworth,UniversityofMelbourne

1150 TheParametricCureFractionModelofOvarianCancerSerifatFolorunso,UniversityofIbadan,Nigeria

TheSkillings-MackStatisticForRanksDataInBlocksJohnBest,UniversityofNewcastle

1210 ConferenceCloseLouiseRyan


ProgrammeAndAbstractsForMonday27thOfNovember

Keynote:Monday27th9:40Mantra

AgriculturalAndAgri-EnvironmentalStatisticsWithSupportOfGeospatialInformation,MethodologicalIssues

ElisabettaCarfagnaUniversityofBologna

Agri-environmentaltrade-offsareissuescriticalforpolicymakerschargedwithmanagingbothfoodsupplyandthesustainableuseoftheland.Reliabledataarecrucialfordevelopingeffectivepoliciesandforevaluatingtheirimpact.However,oftenthereliabilityofagriculturalandagro-environmentalstatisticsislow.

Duetothetechnologicaldevelopment,inthelastdecades,differentkindsofgeospatialdatahavebecomeeasilyaccessibleatdecreasingpricesandhavestartedtobeanimportantsupporttostatisticsproductionprocess.

Inthispaper,wefocusonmethodologicalissuesrelatedtotheuseofgeospatialinformationforsamplingframeconstruction,sampledesign,stratification,grounddatacollectionandestimationofagriculturalandagri-environmentalparameters.Particularattentionisdevotedtotheimpactofspatialresolutionofdata,changeofsupportaggregationanddisaggregationofspatialdata,whenremotesensingdata,GlobalPositioningSystemsandGeographicInformationSystems(GIS)areusedforproducingagriculturalandagro-environmentalstatistics.

Monday27th11:00Narrabeen

DevelopingARegulatoryDefinitionForTheAuthenticationOfManukaHoney

ClaireMcDonald1,SuzanneKeeling1,MarkBrewer2,andSteveHathaway11MinistryforPrimaryIndustries

2BioSS

ManukahoneyisapremiumexportproductfromNewZealandthathasbeenunderscrutinyduetoclaimsoffraud,adulterationandmislabelling.Althoughthereareseveralindustryapproachesfordefiningmanukahoney,thereiscurrentlynoscientificallyrobustdefinitionsuitableforuseinaregulatorysetting.Assuch,ensuringtheauthenticityofmanukahoneyischallenging.

HerewepresenttheresultsofathreeyearscienceprogrammewhichdevelopedscientificallyrobustdefinitionsformonofloralandmultifloralmanukahoneyproducedinNewZealand.Theprogrammeinvolved:selectingappropriatemarkerstoidentifyhoneysourcedfromLeptospermumscoparium(manuka),establishingplantandhoneyreferencecollections,developingtestmethodstodeterminethelevelsofthemarkersandanalysingthedatageneratedtodevelopthedefinitions.

Thesuitabilityof16markers(chemicalandDNA-based)wereevaluatedforuseinaregulatorydefinitionformanukahoney.Plantsampleswerecollectedfromtwofloweringseasonsrepresentingbothmanukaandnon-manukaspeciesfrombothNewZealandandAustralia.Honeysamples,alsorepresentingmanukaandnon-manukafloraltypes,weresourcedfromsevenNewZealandproductionseasons.Additionally,honeysamplesweresourcedfromanother12countriestoenablecomparison.Allsamplesweretestedforthemarkersbeingevaluatedusingthedevelopedtestmethods.

ThemethodofCART(ClassificationandRegressionTrees)wasusedtodevelopthemonofloralandmultifloralmanukahoneydefinitions.TheCARToutputswerefurtherprocessedusingasimulationapproachtodeterminethesensitivityandtherobustnessofthedefinitions.Thedefinitionsuseacombinationof5markers(4chemicaland1DNA)atsetthresholdstoclassifyasampleasmanukahoneyorotherwise.Wediscussthepracticalitiesofusingthescience-baseddefinitionswithinaregulatorycontext.

Monday27th11:00Gunnamatta

OnTestingRandomEffectsInLinearMixedModels

AlanWelsh1,FrancisHui1,andSamuelMueller21ANU

2UniversityofSydney

Wecanapproachproblemsinvolvingrandomeffectsinlinearmixedmodelsdirectlythroughtherandomeffectsorthroughparameterssuchasthatthevariancecomponentsthatdescribethedistributionsoftherandomeffects.Bothapproachesareusefulbutleadtodifferentissues.Forexample,workingwiththerandomeffectsraisesquestionsofhowtoestimatethemandcanmeandealingwithalargenumberofrandomeffects,whileworkingwithvariancecomponentsleadstotestinghypothesesontheboundaryoftheparameterspace.Inbothcases,findinggoodapproximationstothenulldistributionoftheteststatisticcanbechallengingsomodernapproachesoftenrelyonsimulation.Inthistalk,were-examinetheF-testbasedonlinearcombinationsoftheresponses,fortestingrandomeffectsinlinearmixedmodels.Wepresentageneralderivationofthetest,highlightitscomputationspeed,itsgenerality,anditsexactnessasatest,andreportempiricalstudiesintothefinitesampleperformanceofthetest.Weconcludethepresentationbyreportingourlatestresultsfromongoingresearchthatinvestigatesconnectionsbetweentestingandmodelselectionbydiscussingsometestsofsignificanceofrandomeffectsandexploringtheirrelationshiptomodelselectionprocedures.


AnalysingDigestionData

MaryannStaincliffe,DebbieFrost,MustafaFarouk,andGuojieWuAgResearch

Foodtechnologistareinterestedinunderstandingifaddinggrainand/orvegetablestobeefincreasesthedigestibilityovermeatalone.Digestibilityof

meatwasmeasuredatupto4hoursusingapepsinandpancreatininvitromodel.Thismethodinvolvesusinggelsinlanes,wherethedensityofthecolourofthegelisanindicatorofthepresenceofaproteinorpeptideatthatlevelofkilodalton(kDa).TypicallykDaabove12kDaareconsideredtobetheproteinsandlowerthanthatarethepeptides.Inthepastwehaveanalysedthesedatausingtwoapproaches.Thefirstapproachistoselect4or5bandsthatexpectedtobeimportantandthenuseamixedeffectsmodeltocomparethemeanTraceQuantityofeachofthebands,whereMeattype,Additivetype(vegetableand/orgrain)andTimepointsarefixedfactorsandthegelisarandomeffect.ThesecondapproachistofitacurvetothechangeintheproportionofTraceQuantityabove12kDafromTimezero.Theproblemwiththesetwoapproachesisthatwestruggletoprovideaconsistentinterpretationoftheresults.Therefore,wewillexplorealternativemethodsforanalysingthistypeofdata.


APermutationTestForComparingPredictiveValuesInClinicalTrials

KoujiYamamotoandKanaeTakahashiOsakaCityUniversity

Screeningtestsordiagnostictestsareimportantforearlydetectionandtreatmentofdisease.Therearefourwell-knownmeasurements,sensitivity(SE),specificity(SP),positivepredictivevalue(PPV)andnegativepredictivevalue(NPV)indiagnosticstudies.ForcomparingSEs/SPs,McNemartestiswidelyused,butthereareonlyfewmethodsforthecomparisonofPPVs/NPVs.Moreover,allofthesemethodsarebasedonlarge-sampletheory.

So,inthistalk,firstly,weinvestigatetheperformanceofthosemethodswhenthesamplesizeissmall.Inaddition,weproposeapermutationtestforcomparingtwoPPVs/NPVswecanapplyevenifthesamplesizeissmall.Finally,weshowtheperformanceoftheproposedmethodwithsomeexistingmethodsviasimulationstudies.


ChallengesAndOpportunitiesWorkingAsAConsultingStatisticianWithAFoodScienceResearchGroup

M.GabrielaBorgognoneQueenslandDepartmentofAgricultureandFisheries

Whenanestablishedresearchgrouphasbeenfunctioningformanyyearswithoutastatisticianasanintegralpartoftheteam,welcomingoneintothegroupcanpresentchallengesaswellasopportunitiesforallinvolved.

Challengesfortheresearchgroupinclude,forexample,involvingthestatisticianatthebeginningofthestudyinsteadofoncetheexperimentshavebeencompletedandthedatacollected;acquiringorincreasingknowledgeofexperimentaldesignprinciples;understandingthelimitationsofsomestatisticalanalyses,expandingtherangeofmethodstheyfeelfamiliarwith,andlearningwhen/howtoapplyeachone;andimprovingthepresentationofresultsinthiserawherepoorpresentationisperpetuatedbythegenerallackofsoundstatisticalmethodsintheliteratureoftheresearcharea.Challengesforthestatisticianinclude,forexample,overcominghis/herlackofgeneralknowledgeoftheunderlyingscientificareaanditsspecificvocabulary;determiningwhatexperimentaldesignswouldworkfromapracticalpointofview;developingunderstandingoftheirscientificquestions,datamanagementpractices,andtypesofdatacollected;navigatingthevarioussoftwaretheyuseandcheckingtheiradequaciesandlimitations;and,aboveall,communicatingwithpatienceandperseverance.

Correspondingly,allchallengespresentopportunitiesforimprovementandcollaborationbetweenscientistsandstatisticians.Workingasateamsupportsadecisionmakingprocessthatisrelevanttoindustryandthatisbasedongoodstatisticalpractices.Additionally,ithelpsscientistsbecomemorestatisticallyawareandempowered.AbitmorethanayearagoIstartedworkingasaconsultingstatisticianwithafoodscienceresearchgroup.InthispresentationIwillsharesomeofthechallengesandtheopportunitiestoincorporategoodstatisticalpracticeIhaveidentified,aswellassomeoftheimprovementswehavemadesofarworkingtogetherinthispartnership.


RobustSemiparametricInferenceInRandomEffectsModels

MichaelStewart1andAlanWelsh21UniversityofSydney

2ANU

Wereportonrecentworkusingsemiparametrictheorytoderiveprocedureswithdesirablerobustnessandefficiencypropertiesinthecontextofinferenceconcerningscaleparametersforrandomeffectmodels.


RobustPenalizedLogisticRegressionThroughMaximumTrimmedLikelihoodEstimator

HongweiSun,YuehuaCui,andTongWangShanxiMedicalUniversity

Penalizedlogisticregressionisusedtoidentifygeneticmarkersformanyhigh-dimensionaldatasetssuchasingeneexpression,GWAS,DNAmethylationstudiesandsoon.Butoutlierssometimesoccurduetomisseddiagnosisormisdiagnosisofsubjects,heterogeneityofsamples,technicalproblemsinexperimentsorotherproblems.Theycangreatlyinfluencetheestimationofpenalizedlogisticregression.Fewstudiesfocusontherobustnessofpenalizedmethodswhentheresponsevariableiscategorical,whichisstandardinmedicalresearch.ThisstudyproposedarobustLASSO-typepenalizedlogisticregressionbasedonmaximumtrimmedlikelihood(MTL-LASSO).Thedefinitionofbreakdownpoint(BDP)forpenalizedlogisticregressionwasgivenanditspropertyfortheproposedmethodwasproved.AmodificationofFAST-LTSalgorithmswasusedtoimplementtheestimation.Thereweightedstepwas

addedtoimproveperformancewhileguaranteeingrobustness.Thesimulationstudyshowstheproposedmethodcanresistagainstoutliers.Arealdatasetaboutgeneexpressionprofilesofmultiplesclerosispatientsandhealthycontrolswasanalysed.OutliersinthecontrolgroupidentifiedbyreweightedMTL-LASSObehavedifferentlyfromothers.Itunveilstheremaybeheterogeneityproblemincontrolgroup.Amuchbetterfitisobtainedafterremovingoutliers.

Keynote:Monday27th13:30Mantra

AMulti-StepClassifierAddressingCohortHeterogeneityImprovesPerformanceOfPrognosticBiomarkersInComplexDisease

JeanYangUniversityofSydney

Recentstudiesincancerandothercomplexdiseasescontinuetohighlighttheextensivegeneticdiversitybetweenandwithincohorts.Thisintrinsicheterogeneityposesoneofthecentralchallengestopredictingpatientclinicaloutcomeandthepersonalizationoftreatments.Here,wewilldiscusstheconceptofclassifiabilityobservedinmulti-omicsstudieswhereindividualpatients’samplesmaybeconsideredaseitherhardoreasytoclassifybydifferentplatforms,reflectedinmoderateerrorrateswithlargeranges.Wedemonstrateinacohortof45AJCCstageIIImelanomapatientsthatclinico-pathologicbiomarkerscanidentifythosepatientsthataremostlikelytobemisclassifiedbyamolecularbiomarker.Theprocessofmodellingtheclassifiabilityofpatientswasthenreplicatedinindependentdatafromotherdiseases.

Amulti-stepprocedureincorporatingthisinformationnotonlyimprovedclassificationaccuracyoverallbutalsoindicatedthespecificclinicalattributesthathadmadeclassificationproblematicineachcohort.Instatisticalterms,ourstrategymodelscohortheterogeneityviatheidentificationofinteractioneffectsinahighdimensionalsetting.Atthetranslationallevel,thesefindingsshowthatevenwhencohortsareofmoderatesize,includingfeaturesthatexplainthepatient-specificperformanceofaprognosticbiomarkerinaclassificationframeworkcansignificantlyimprovethemodellingandestimationofsurvival,

aswellasincreaseunderstanding.


HowToAnalyseFiveDataPointsWithFun

PaulineO'Shaughnessy1,StephenRobson2,andLouiseRawlings21UniversityofWollongong

2ANU

While“bigdata”isoneofthebiggestbuzzwords,weoccasionallycomecrossdatawithveryfewdatapoints.ThisdatasetisfromastudyoftheincidenceratesofcommonlyperformedmedicalproceduresInAustralia,whichisavailableonlyatthestatelevel.WehavefivestatesinAustraliathusfivedatapoints.Sowhatcanwedowhenweonlyhavefivedatapoints?Oneofthenoveltyapproachesistofitthedatawitharegressionmodel.Howevergiventhechallengingnatureofthedatawithsmallsize,noguaranteecanbeplacedonthesatisfactoryofthelinearityandhomoscedasticityassumptionofthelinearregression,inturns,theinferencefromthestandardlinearmodeltheoryisnolongervalid.Doublebootstrapisusedtoprovidesolutiontothevalidstatisticalinferenceforthebestlinearapproximationfortherelationshipbetweenvariablesinanassumption-leanregressionsetting.


AnApproachToPoissonMixedModelsFor-OmicsExpressionData

IreneSuilanZengandThomasLumleyUniversityofAuckland

Weareinterestedinregressionmodelsformultivariatedatafromhigh-throughputbiologicalassays('omic'data).Thesedatahavecorrelationsbetweenvariables,andmayalsocomefromstructuredexperiments,soageneralisedlinearmixedmodelisappropriatetofittheexperimentalvariablesanddifferenttypesofomicsdata.However,thenumberofvariablesisoftenlargerthanthe

numberofobservations:astructuredcovariancemodelisnecessaryandsparsityinductionisbiologicallyappropriate.InthispresentationwedescribeanapproachtoPoissonmixedmodels,suitableforRNAseqgeneexpressiondata,basedontranscript-specificrandomeffectswithasparseprecisionmatrix.Weshowbysimulationsthattheoptimalsparsenesspenaltyforregressionmodellingisnotthesameasintheusualgraphestimationproblemandcomparesomeestimationstrategiesinsimulations.


BayesianSpatialEstimationWhenAreasAreFew

AswiAswi,SusannaCramb,EarlDuncan,andKerrieMengersenQueenslandUniversityofTechnlogy

Spatialmodellingwhentherearefew(<20)smallareascanbechallenging.Bayesianmethodscanbebeneficialinthissituationduetotheeaseofspecifyingstructureandadditionalinformationthroughpriors.However,careisneededasthereareoftenfewerneighboursandmoreedges,whichmayinfluenceresults.HereweinvestigateBayesianspatialmodelspecificationwhentherearefewareas,firstthroughasimulationstudy(numberofareasrangingfrom4to2500)andthenapplytoacasestudyondenguefeverin2015inMakassar,Indonesia(14areas).FourdifferentBayesianspatialmodelsnamely,anindependentmodeland3modelsbasedonaCAR(ConditionalAutoregressive)prior:theBesag,York&Mollié,Leroux,andalocalisedmodel(augmentstheCARpriorwithaclustermodelusingpiecewiseconstantintercepts)wereapplied.Dataweregeneratedforthesimulationstudyconsideringlowandhighspatialautocorrelationandlowandhighdiseaseincidence.ModelgoodnessoffitwascomparedusingDevianceInformationCriteria.AnalysisofvarianceandBonferroni'smethodwerealsousedtodeterminewhichmodelsweresignificantlydifferent.Thesimulationstudyshowedmodelsdifferedintheirperformancemainlyintwosituations:1.Whentherewereatleast25areasandboththediseaserateandspatialautocorrelationwaslow,and2.Forallareasizeswhentherewaslowspatialautocorrelationbutahighoveralldiseaserate.Likewise,resultsfromthecasestudyshowedthatallfourmodelsperformedsimilarly.Thisisprobablyduetothelownumberofareasandalowdisease

incidence.


Knowledge-GuidedGeneralizedBiclusteringAnalysisForIntegrative–OmicsAnalysis

ChanggeeChang1,YizeZhao2,MingyaoLi1,andQiLong11UniversityofPennsylvania

2CornellUniversity

Advancesintechnologyhaveenabledgenerationofmultipletypesof-omicsdatainmanybiomedicalandclinicalstudies,anditisdesirabletopoolsuchdatainordertoimprovethepowerofidentifyingimportantmolecularsignaturesandpatterns.However,suchintegrativeanalysespresentnewanalyticalandcomputationalchallenges.Toaddresssomeofthesechallenges,weproposeaBayesiansparsegeneralizedbiclusteringanalysis(GBC)whichenablesintegratingmultipleomicsmodalitieswithincorporationofbiologicalknowledgethroughtheuseofadaptivestructuredshrinkagepriors.Theproposedmethodscanaccommodatebothcontinuousanddiscretedata.MCMCandEMalgorithmsaredevelopedforestimation.Numericalstudiesareconductedtodemonstratethatourmethodsachieveimprovedfeatureselectionandpredictioninidentifyingdiseasesubtypesandlatentdrivers,comparedtoexistingmethods.


UnderstandingTheVariationInHarvesterYieldMapDataForEstimatingCropTraits

DeanDiepeveen1,KarynReeves1,AdrianBaddeley1,andFionaEvans21CurtinUniversity

2MurdochUniversity

Ourrecentexploratoryresearchinvolvesextractingtangiblecroptraitsfromimagessuchasyield-maps.ResearchbyDiepeveenetal(2012)demonstrated

thatgeneticinformationcanbeextractedfromnear-infrared(NIR)imagesusingboththeimplicitknowledgeofdata,environmentaldataandusingamultivariateapproach.Yieldmapdataisgeo-referenceddataofgrain-yieldthatisgeneratedbyaharvestercuttingthecrop,threshingthestrawandextractingthegrain.Ourresultsshowthattheyield-mapdatahassignificantissuesassociatedwithit.Oneissueisthedelayfromtimeofcuttingthecroptoenteringintothestorage-binformeasurement.Thisisdependentonspeedoftheharvesterandiscompoundedwithmaintainingacriticalvolumegoingthroughtheharvestertooperateefficiency.Therearealsoissueswithvariationfromthedensityandplantsizeofthecropwithinthepaddockbeingharvested.Ourpreliminaryresultsjusthighlightthesignificantchallengesinextractingprecisecroptraitsfromyieldmapdata.


CitizenScienceToSurveillance:EstimatingReportingProbabilitiesOfExoticInsectPests

PeterCaley1,MarijkeWelvaert2,andSimonBarry11CSIRO

2UniversityofCanberra

Upuntilmid-2016,citizenscienceuploadstotheAtlasofLivingAustraliaincludedc.400bugspecies,andc.1,000beetlespecies.Giventheshorttimeperiod(c.3years)overwhichmostoftheserecordshaveaccumulated,thisrepresentsaconsiderablereportingeffort.Thekeyappliedquestionfromabiosecuritycontextishowthislevelofreportingtranslatestothedetectionandreportingofexoticinsectpestsintheeventofanincursion.

Weuseacase-controldesigntomodeltheprobabilityofexistinginsectspeciesbeingreportedviacitizensciencechannelsfeedingintotheAtlasofLivingAustralia.Theeffectofinsectfeatures(size,colour,pattern,morphology)andgeographicdistributiononreportingratesareexploredasexplanatoryvariables.Wethenapplythemodeltoexotichighprioritypestspeciestopredicttheirreportingratesintheeventoftheirintroduction.


IntroductionTo“Deltagen”-AComprehensiveDecisionSupportToolForPlantBreedersUsingRAndShiny

DongwenLuoandZulfiJahuferAgResearch

Theobjectiveofthispresentationistointroduceauniquenewplantbreedingdecisionsupportsoftwaretool"DeltaGen",implementedinRanditspackageShiny.DeltaGenprovidesplantbreederswithasingleintegratedsolutionforexperimentaldesigngeneration,dataqualitycontrol,statisticalandquantitativegeneticanalyses,breedingstrategyevaluation/simulationandcostanalysis,patternanalysis,indexselectionandunderlyingbasictheoryonquantitativegenetics.Thissoftwaretoolcouldalsobeusedasateachingresourceinplantbreedingcourses.DeltaGenisavailableasFreewareonthelink:http://agrubuntu.cloudapp.net/shiny-apps/PlantBreedingTool/


EstimatingNitrousOxideEmissionFactors

AlasdairNobleandTonyVanDerWeerdenAgResearch

NitrousOxide(N2O)isanimportantgreenhousegaswithaglobalwarmingpotentialnearly300timesthatofcarbondioxide.UndertheKyotoProtocolNewZealandisrequiredtoreportagreenhousegasinventoryannuallywhichincludesN2O.InNewZealand,95%ofN2Oemissionsarederivedfromnitrogen(N)inputstoagriculturalsoils(e.g.animalexcretaandfertiliser).FieldexperimentsareconductedtoestimatetheseN2Oemissions,wheredataiscollatedandanalysedfollowingastandardmethodologytodetermineemissionfactors,whichestimatetheamountofN2OlostperunitofNappliedtosoil.Howeverforindividualdatasetstherearesomeaspectsofthedataareincompatiblewiththeproposedmodelsosomeadhocadjustmentsaremade.AmorerigorousBayesianapproachisproposedandsomeresultswillbediscussed.

http://agrubuntu.cloudapp.net/shiny-apps/PlantBreedingTool/

ProgrammeAndAbstractsForTuesday28thOfNovember

Keynote:Tuesday28th9:00Mantra

ClusterCapture-Recapture:ANewFrameworkForEstimatingPopulationSize

RachelFewsterUniversityofAuckland

Askanywildlifemanager:theirfirstburningquestionis"Howmanyarethere?",andtheirsecondis"Aretheytrendingupwardsordownwards?"Capture-recaptureisoneofthemostpopularmethodsforestimatingpopulationsizeandtrends.Asthenamesuggests,itreliesonbeingabletoidentifythesameanimaluponmultiplecaptureoccasions.Thepatternofcapturesandrecapturesamongidentifiedanimalsisusedtoestimatethenumberofanimalsnevercaptured.

Physicallycapturingandtagginganimalscanbeadangerousandstressfulexperienceforboththeanimalsandtheirhumaninvestigators-orifittranspiresthattheanimalsactuallyenjoyit,biasedinferencemayresult.Consequently,researchersincreasinglyfavournon-invasivesamplingusingnaturaltagsthatallowanimalstobeidentifiedbyfeaturessuchascoatmarkings,droppedDNAsamples,acousticprofiles,orspatiallocations.Theseinnovationsgreatlybroadenthescopeofcapture-recaptureestimationandthenumberofcapturesamplesachievable.However,theyareimperfectmeasuresofidentity,effectivelysacrificingsamplequalityforquantityandaccessibility.Asaresult,capture-recapturesamplesnolongergeneratecapturehistoriesinwhichthematchingofrepeatedsamplestoasingleidentityiscertain.Instead,theygeneratedatathatareinformative---butnotdefinitive---aboutanimalidentity.

Iwilldescribeanewframeworkfordrawinginferencefromcapture-recapturestudieswhenthereisuncertaintyinanimalidentity.Intheclustercapture-

recaptureframework,weassumethatrepeatedsamplesfromthesameanimalwillbesimilar,butnotnecessarilyidentical,toeachother.Overlapisalsopossiblebetweenclustersofsamplesgeneratedbydifferentanimals.Wetreatthesampledataasaclusteredpointprocess,andderivethenecessaryprobabilisticpropertiesoftheprocesstoestimateabundanceandotherparametersusingaPalmlikelihoodapproach.

Becauseitavoidsanyattemptsatexplicitsample-matching,theclustercapture-recapturemethodcanbeveryfast,takingmuchthesametimetoanalysemillionsofsample-comparisonsasitdoestoanalysehundreds.Iwilldescribeapreliminaryframeworkforabundanceestimationfromacousticmonitoring.Clustercapture-recapturecanalsobeusedforbehaviouralstudies,andIwillshowanexampleusingcamera-trapdatafromapartially-markedpopulationofforestshiprats.

Tuesday28th10:30Narrabeen

PropensityScoreApproachesInThePresenceOfMissingData:ComparisonOfBalanceAndTreatmentEffectEstimates

JannahBaker1,TimWatkins1,2,andLaurentBillot1,31TheGeorgeInsituteforGlobalHealth

2UniversityofSydney3UniversityofNewSouthWales

Theuseofpropensityscoremethodscanpotentiallyimprovebalancebetweengroupsinobservationalstudydata,thusminimisingconfounding.However,afrequentproblemwithsuchstudiesisthepresenceofmissingdata.Wecomparethreeapproachestogeneratingthepropensityscoreaccountingformissingdatawithinthecontextofaclinicalcasestudyexaminingtheeffectoftelemonitoringonhypertension.Overall,4,642patientsdiagnosedwithhypertensionreceivingonlinehealthsupportinMyHealthGuardian–atelephonechronicdiseasesupportprogramofferedbyprivatehealthinsurerHCF–wereofferedatelemonitoringintervention.Ofthese,2,729acceptedandstartedtreatmentbetweenJuly2014toApril2015(designated“cases”),and1,913declined

(designated“controls”).Datawereavailablefromcasesandcontrolsonseveralbaselinevariablesincludingdemographic,lifestyleandclinicalcharacteristics.Outcomeswerethenumberofhospitalisations,totallengthofstayandtotalcostofhospitalisationbetween1Januaryto31December2016.Propensityscoremethodswereusedtobalancebaselinevariablesbetweengroups.Threeapproacheswereusedtogeneratepropensityscoresfromalogisticregressionmodelaccountingformissingdata:1)categorisationofallvariableswith“Missing”asacategory,2)multipleimputationoftreatmenteffect(MIte)wheretreatmenteffectestimatesarecombinedover20imputeddatasets,and3)multipleimputationofpropensityscore(MIps)wherepropensityscoresarefirstaveragedoverimputeddatasetspriortoestimationoftreatmenteffects.Thepropensityscorefromeachapproachwasthenusedintwoways:a)matchingcasestocontrols,andb)inverseprobabilityoftreatmentweighting(IPTW).Thebalanceachievedbyeachapproachwascomparedusingstandardiseddifferencesofmeansandproportionsinbaselinecharacteristicsbetweengroups.Thetreatmenteffectestimatesfromeachapproachwerealsocompared.Thediscussionwillcanvasourfindingsandrecommendationsforhandlingmissingdatawhenusingpropensityscoreapproaches.

Tuesday28th10:30Gunnamatta

VisualisingModelSelectionStabilityInHigh-DimensionalRegressionModels

GarthTarrUniversityofSydney

ThemplotRpackageprovidesaprovidesanimplementationofmodelstabilityandvariableinclusionplotsforresearcherstousetobetterinformthevariableselectionprocess.Theinitialfocuswasonexhaustivesearchesthroughthemodelspace,however,thisquicklybecomesinfeasibleforhighdimensionalmodels.Analternativeapproachforhighdimensionalmodelsistocombinebootstrapmodelselectionwithregularisationprocedures.Thereexistanumberoffastandefficientmethodregularisationmethodsforvariableselectioninhighdimensionalregressionsettings.Wehaveimplementedvariableinclusionplotsandmodelstabilityplotsusingtheglmnetpackage.Wedemonstratetheutilityofthemplotpackageinidentifyingstableregularisedmodelselectionchoices

withrespecttotwomainsourcesofuncertainty.Firstly,byresamplingthedataweareabletodeterminehowoftenvariousmodelsarechosenwhenthedatachanges.Secondly,weareabletoevaluatehowoftencompetingmodelsarechosenacrossarangeofvaluesforthetuningparameter.Exploringthesetwosourcesofuncertaintyinmodelselectiongeneratesalargeamountofrawdatathatneedstobeprocessed.Themplotpackageprovidesavarietyofmethodstovisualisethisrawdatatohelpinformaresearcher'smodelselectionchoice.


TheMissingLink:AnEquivalenceResultForLikelihoodBasedMethodsInMissingDataProblemsFirouzehNoghrehchi,JakubStoklosa,SpiridonPenev,andDavidI.Warton

UniversityofNewSouthWales

Multipleimputationandmaximumlikelihoodestimation(viatheexpectation-maximizationalgorithm)aretwowell-knownmethodsreadilyusedforanalysingdatawithmissingvalues.Thesetwomethodsareoftenconsideredasbeingdistinctfromoneanother,duetotheirconstructionforestimationandtheirtheoreticalproperties.Weshowthatthereisacloserelationshipbetweenthetwomethods.Specifically,weshowthatatypeofmultipleimputationcanbeunderstoodasastochasticexpectation-maximisationapproximationtomaximumlikelihood.Asaresult,wecanexploretheapplicationofarangeoflikelihood-basedtoolsinthemultipleimputationcontextinordertoimproveitsperformance.Inparticular,wedevelopinformationcriteriaforselectinganimputationmodelgivenasetofcompetingmodels,andaflexiblelikelihoodratiotestwhenmodelsarefittedbymultipleimputation.Wedemonstrateourmethodsonrealandsimulateddatasets.


DimensionalityReductionOfLIBSDataForBayesianAnalysis

AnjaliGupta1,JamesCurran1,SallyCoulson2,andChristopherTriggs11UniversityofAuckland

2ESR

In2004,AitkenandLucypublishedanarticledetailingatwo-levellikelihoodratioformultivariatetraceevidence.Thismodelhasbeenadoptedinanumberofforensicdisciplinessuchastheinterpretationofglass,drugs(MDMA),andink.Moderninstrumentationiscapableofmeasuringmanyelementsinverylowquantitiesand,notsurprisingly,forensicscientistswishtoexploitthepotentialofthisextrainformationtoincreasetheweightofthisevidence.Theissue,fromastatisticalpointofview,isthattheincreaseinthenumberofvariables(dimension)intheproblemleadstoincreaseddatademandtounderstandboththevariabilitywithinasource,andinbetweensources.Suchinformationwillcomeintime,butusuallywedon’thaveenough.Onesolutiontothisproblemistoattempttoreducethedimensionalitythroughmethodssuchasprincipalcomponentanalysis.Thispracticeisquitecommoninhighdimensionalmachinelearningproblems.Inthistalk,IwilldescribeastudywhereweattempttoquantifytheeffectsofthisthisapproachontheresultinglikelihoodratiosusingdataobtainedfromaLaserInducedBreakdownSpectroscopy(LIBS)instrument.


AnalysisOfMelanomaDataWithAMixtureOfSurvivalModelsUtilisingMulti-ClassDLDAToInformMixtureClass

SarahRomanes,JohnOrmerod,andJeanYangUniversityofSydney

MelanomaisaprevalentskincancerinAustralia,withcloseto14000newcasesestimatedtobediagnosedin2017.Survivaltimesaremarkedlydifferentfromoneindividualtothenext.Inparticular,thereappearstobethreeclassesofsurvivaloutcome.Thistalkconsidersintegratingsurvivaltimedatawithmicroarraygeneexpressiondata.Weconstructahybridmodelthatseamlesslyintegratesathree-classlineardiscriminantanalysismodel,mixtureofparametric

survivalmodels,andmodelselectioncomponents.Wefitthismodelusingavariationalexpectationmaximization(VEM)approach.Ourmodelselectioncomponentnaturallysimplifiesasafunctionoflikelihoodratiostatisticsallowingnaturalcomparisonswithtraditionalhypothesistestingmethods.Wecompareourmethodwithseveralnaïveapproacheswhichonlyaddressestheclassificationaspectorsurvivalmodelaspectinisolation.


ForecastingHotspotsOfPotentiallyPreventableHospitalisationsWithSpatiallyAggregatedLongitudinalHealthData:AllSubsetModelSelectionWithANovelImplementationOfRepeatedK-FoldCross-Validation

MatthewTuson,BerwinTurlach,KevinMurray,MeiRuuKok,AlistairVickery,andDavidWhyatt

UniversityofWesternAustralia

Itissometimesdifficulttotargetindividualsforhealthinterventionduetolimitedinformationontheirbehaviourandriskfactors.Insuchcasesplace-basedinterventionstargetinggeographical‘hotspots’withhigherthanaverageratesofhealthserviceutilisationmaybeeffective.Manystudiesexistexaminingpredictorsofhotspots,butoftendonotconsiderthatplace-basedinterventionsaretypicallycostlyandtaketimetodevelopandimplement,andhotspotsoftenregresstothemeanintheshort-term.Long-termgeographicalforecastingofhotspotsusingvalidatedstatisticalmodelsisessentialineffectivelyprioritisingplace-basedhealthinterventions.

Existingmethodsforecastinghotspotstendtoprioritisepositivepredictedvalue(i.e.correctpredictions)attheexpenseofsensitivity.Thisworkintroducesmethodstodevelopmodelsoptimisingbothpositivepredictedvalueandsensitivityconcurrently.Thesemethodsutilisespatiallyaggregated

administrativehealthdata,WAcensuspopulationdata,andABSgeographicboundaries,combiningallsubsetmodelselectionwithanovelimplementationofrepeatedcross-validationforlongitudinaldata.Resultsfrommodelsforecasting3-yearhotspotsforfourpotentiallypreventablehospitalisationsarepresented,namely:typeIIdiabetesmellitus,heartfailure,highriskfoot,andchronicobstructivepulmonarydisease(COPD).


IdentifyingClustersOfPatientsWithDiabetesUsingAMarkovBirth-DeathProcess

MugdhaManda,ThomasLumley,andSusanWellsUniversityofAuckland

Estimatingdiseasetrajectorieshasincreasinglybecomemoreessentialtoclinicalpractitionerstoadministereffectivetreatmenttotheirpatients.Apartofdescribingdiseasetrajectoriesinvolvestakingpatients'medicalhistoriesandsociodemographicfactorsintoaccountandgroupingthemintosimilargroups,orclusters.Advancesincomputerisedpatientdatabaseshavepavedawayforidentifyingsuchtrajectoriesinpatientsbyrecordingapatient'smedicalhistoryoveralongperiodoftime(longitudinaldata):westudieddatafromthePREDICT-CVDdataset,anationalprimary-carecohortfromwhichpeoplewithdiabetesfrom2002-2015wereidentifiedthroughroutineclinicalpractice.WefittedaBayesianhierarchicallinearmodelwithlatentclusterstotherepeatedmeasurementsofHbA1candeGFR,usingtheMarkovbirth-deathprocessproposedbyStephens(2000)tohandlethechangesindimensionalityasclusterswereaddedorremoved.


ChallengesAnalysingCombinedAgriculturalFieldTrialsWithPartiallyOverlapping

Treatments

KerryBellandMichaelMumfordQueenslandDepartmentofAgricultureandFisheries

Tomakerecommendationsonwhichmanagementpracticeshavethepotentialtoincreasecropyieldthereneedstobeaconsistentpatterndemonstratedacrosstrialsfrommanyenvironments.Thispresentationconsidersacasestudylookingat31mungbeantrialsinnorthernAustralianfrom2014to2016.Thetrialsdidnotalwayshaveconsistentfactors(e.g.variety,rowspacingortargetplantdensity)orevenconsistentfactorlevels.Toovercometheissueofinconsistentfactors,environmentsweredefinedasthecombinationofsite,yearandanymanagementfactorsnotcommonacrosstrials(e.g.timeofsowing,irrigation,fertiliser).

Therewerenumerousfullfactorialcombinationswithinsubsetsofthedatathatcouldbeconsideredforinvestigationsothefirstchallengewastodeterminewhichfactorialcombinationstofocusontobestaddresstheresearchquestionsandreportingrequirements.Oncethiswasdetermined,allthedatafromthetrialsthatcontributedtothefactorialwereincludedinacombinedanalysisusinglinearmixedmodels.Inthismodel,thefactorialofinterestwaspartitionedinthetestoffixedeffectswhileeachtrials’designparametersandresidualvarianceswereestimatedusingallthedatafromeachtrial.Anexampleoftheabovementionedfactorialcombinationsisenvironmentbyrowspacingforoneparticularvariety.Thenextchallengewasthatwithsomanyenvironmentstherewasusuallyanenvironmentbyrowspacinginteractionwhichwasnotusefulformakingrecommendationsaboutrowspacing.

Clusteringofenvironmentsallowedforminggroupsthatdidnothaveasignificantinteractionbetweenrowspacingandenvironment.Thesegroupswerethengeneralisedtotypesofenvironmentswithcertainresponsestorowspacing.


AHiddenMarkovModelForSleepStageDetectionUsingRawTri-AxialWrist

Actigraphy

MichelleTrevenen1,KevinMurray1,BerwinTurlach1,LeonStraker2,andPeterEastwood1

1UniversityofWesternAustralia2CurtinUniversity

Sleepisacomplexyetorganisedprocessconsistingofregularcyclesofsleepstages.Thesestagesarerapid-eyemovementandnon-rapideyemovement(lightsleepandslow-wavesleep).Sleepstagingisofgreatimportanceinthephysiologicalworldassleepdisordersoccurinaround20%ofthepopulationandareassociatedwithamultitudeofserioushealthimplicationsandconsiderableeconomicburden.HiddenMarkovmodelshavebeensuccessfullyusedintheclassificationofindividualsleepstagesmeasuredbypolysomnography,whichisconsideredthe'goldstandard'inassessingsleep,however,itisintrusiveandcostly.

Actigraphyisincreasinglybeingconsideredasanon-intrusiveandcosteffectivealternativemethodtoobjectivelymeasuresleeppatterns.However,thereislimitedresearchontheabilityofactigraphytodetectindividualsleepstages,furthermore,thisresearchindicatesthatthesecurrentmethodologiesdonothavetheabilitytodoso.Currentactigraphicapproachestosleepdetectionusefiltereduni-axialdatameasuredwithalowsamplingrate,whereas,rawtri-axialdatameasuredathighsamplingratesisfrequentlyusedintheassessmentofday-timeactivities.

Usingsimultaneouslymeasuredactigraphyandpolysomnographydatafrom100healthyyoungadultsintheWesternAustralianPregnancyCohortStudywecreatedandvalidatedanalgorithmtodeterminesleepstagesutilisingraw,tri-axialaccelerationdatafromwristactigraphy.Tenfeaturevariableswerecreatedfromeach30-secondblockofdataand50subjectswereusedtotrainthehiddenMarkovmodelwiththefeaturevariablesusedasinputparameters.Theremaining50subjectswereusedtovalidatethetrainedhiddenMarkovmodelagainstpolysomnography.

Validationsuggestedthatourmodelisabletoclassifysleepstagesusingrawtri-axialactigraphydata.Theseresultsdemonstratethatactigraphy-basedhiddenMarkovmodelscanfeasiblybeusedforautomaticsleepstaging.


SparsePhenotypingDesignsForEarlyStageSelectionExperimentsInPlantBreedingPrograms

NicoleCocks,AlisonSmith,DavidButler,andBrianCullisUniversityofWollongong

Theearlystagesofcerealandpulsebreedingprogramstypicallyinvolveinexcessof500testlines.Thetestlinesarepromotedthroughaseriesoftrialsbasedontheirperformance(yield)andotherdesirabletraitssuchasheat/droughttolerance,diseaseresistance,etc.Itisthereforeimportanttoensurethedesign(andanalysis)ofthesetrialsareefficientinordertoappropriatelyandaccuratelyguidethebreedersthroughtheirselectiondecisions,untilonlyasmallnumberofelitelinesremain.

ThedesignofearlystagevarietytrialsinAustraliaprovidedthemotivationfordevelopinganewdesignstrategy.Thepreliminarystagesoftheseprogramshavelimitedseedsupply,whichlimitsthenumberoftrialsandreplicatesoftestlinesthatcanbesown.Traditionally,completelybalancedblockdesignsorgridplotdesignsweresownatasmallnumberofenvironmentsinordertoselectthehighestperforminglinesforpromotiontothelaterstagesoftheprogram.Givenourunderstandingofvariety(i.e.line)byenvironmentinteraction,thisapproachisnotasensibleoroptimaluseofthelimitedresourcesavailable.

Anewmethodtoallowforalargernumberofenvironmentstobesampledforsituationswhereseedsupplyislimitedandnumberoftestlinesislargewillbediscussed.Thisstrategywillbereferredtoassparsephenotyping,whichisdevelopedwithinthelinearmixedmodelframeworkasamodel-baseddesignapproachtogeneratingoptimaltrialdesignsforearlystageselectionexperiments.


ComparisonsOfTwoLargeLong-Term

StudiesInAlzheimer'sDisease

CharleyBudgeonUniversityofWesternAustralia

TheincidenceofAlzheimer’sdisease(AD),theleadingcauseofdementia,ispredictedtoincreaseatleastthreefoldby2050.Curingthisdiseaseisaglobalpriority.Currently,twomajorstudiesareattemptingtogainfurtherunderstandingofthisdisease;theAlzheimer'sDiseaseNeuroimagingInitiative(ADNI)andtheAustralianImaging,BiomarkerandLifestyleStudy(AIBL).Wedescribethesetwocohortstoassesstheimpactofcombiningthemtoprovidealargercohortforanalyses.

Aninitialcomparisonoftheprotocolswascarriedoutandrecruitmentstrategieswereshowntobemarginallydifferentbetweenthestudies.Inclusioncriteriaspecifiedagesbetween55and90yearsinADNIand>65yearsinAIBL.Marginallydifferentspecificationsfordiseasestageclassificationsofhealthycontrols(HC),mildcognitivelyimpaired(MCI)andADindividualswereobserved,forexample,differentMini-MentalStateExam(MMSE)cut-offs.However,bothstudieshadADdiagnosissupportedbytheNINDS/ARDAcriteria.BaselinecharacteristicswerecomparedbetweenADNIandAIBLcohorts.Overall,AIBLhadmoreHCscomparedtoADNI(69%vs30%),butfewerMCIindividuals(12%vs50%).TheADNIcohorthadahigherlevelofeducationandgenerally,withinadiseaseclassification,therewereminimaldifferencesinbaselineage,sex,MMSE,andPreclinicalAlzheimerCognitiveComposite(PACC)scores.

LongitudinalanalysescomparedthechangeovertimeforthetwocohortsanddiseaseclassificationsforPACCandMMSE.TherewerenosignificantdifferencesincohortswithintheHCandMCIgroups,butwithintheADgroup,subjectsintheADNIcohorthadgenerallyhigherpredictedPACCandMMSEscoresovertimethanthoseinAIBL.

OurresultssuggestthereisthepotentialtocombinetheADNIandAIBLcohortsforanalysispurposestoprovideonemorepowerfuldataset;however,considerationshouldbetakenforsomemeasures.


AOne-StageMixedModelAnalysisOfCanolaChemistryTrials

DanielTolhurst,KyMathews,AlisonSmith,andBrianCullisUniversityofWollongong

TheNationalVarietyTrials(NVT)programisusedbyplantbreedingcompaniestoevaluatetheyieldpotentialofnewcropvarietiesindependentlyacrossalargerangeofAustraliangrowingconditions.BycomparisonwiththeremainingNVTcrops,growerdecisionsarefurthercomplicatedincanolabecauseofitsvulnerabilitytotheinfestationofweeds.Ameasurehistoricallyusedbyfarmersforthemanagementofweedsistheapplicationofaherbicide(chemistry)treatment.Thechoiceofchemistryisimportantasitrestrictsvarietyselectiontothosebredwiththespecifictolerance.ThesetofvarietiescurrentlyevaluatedinNVTaretoleranttooneofthreechemistries,namelyimidazolinone(I;butmarketedasClearfield),glyphosate(RoundupReady;R)ortriazine(T),orhavenospecifictolerance(i.e.conventionalcanola;C).Consequently,canolahasamorecomplextestingregimethantheremainingNVTcropsaseachtrialhasanestedtreatmentstructureinvolvingbothchemistriesandvarieties.

CanolatrialsareconductedinlocationsacrosstheAustraliangrainbeltandreflectbestfarmerpracticeforeachdistrict.Everysiteispartitionedintoseveralfieldblocksandplotsareallocatedtothetreatmentsaccordingtoorthogonalblockdesigns.Asprayboomisusedtoadministereachchemistrybutispragmaticinthesensethatlargeareasaretreatedsimultaneously.Thisprecludestheapplicationofdifferentspraystoplotsinthesameblock.Randomisationisthereforerestrictedsovarietiesinasingleblockaretoleranttothesamechemistry.However,asthenumberofchemistriesandblocksareexactlyequal,thereisnoinformationtoestimatetheexperimentalerrorvariationandbotharestatisticallyconfounded.Consequently,growersarelimitedtoevaluatingvarietieswiththesametoleranceascomparisonsacrosschemistriesareinvalid.Thisalsohasimportantimplicationsonthestatisticalanalysis,whichisdiscussedinthistalk.


ASemi-ParametricLinearMixedModelsForLongitudinallyMeasuredFastingBloodSugarLevelOfAdultDiabeticPatients

TafereTilahun1,BelayBirlie1,andLegesseKassaDebusho21JimmaUniversity

2UniversityofSouthAfrica

Thispaperfocusedonlongitudinaldataanalysisoffastingbloodsugar(FBS)levelofadultdiabeticpatientsatJimmaUniversitySpecializedHospitaldiabeticclinicusinganapplicationofsemi-parametricmixedmodel.ThestudyrevealedthattherateofchangeinFBSlevelindiabeticpatients,duetotheclinicinterventions,doesnotcontinueasasteadypacebutchangeswithtimeandweightofpatients.Furthermore,itclarifiedassociationsbetweenFBSlevelandsomecharacteristicsofadultdiabeticpatientsthatweightofadiabetespatienthasasignificantnegativeeffectwhereaspatientgender,age,typeofdiabetesandfamilyhistoryofdiabetesdidnothaveasignificanteffectonthechangeofFBSlevel.Undervariousvariancestructuresofsubject-specificrandomeffects,thesemi-parametricmixedmodelshadbetterfitthanlinearmixedmodel.Thiswaslikelyduetothelocalizedsplines,whichcapturedmorevariabilityinFBSlevelthanthelinearmixedmodel.


IndividualAndJointAnalysesOfSugarcaneExperimentsToSelectTestLines

AlessandraDosSantos1,ChrisBrien2,4,ClariceG.B.Demétrio1,RenataAlcardeSermarini1,5,GuilhermeA.P.Silva3,andSandroR.Fuzatto3

1UniversityofSãoPaulo2UniversityofSouthAustralia

3CTC-Piracicaba4UniverstiyofAdelaide5UniversityofAdelaide

Intheearlystagesofabreedingprogrammanyfieldtrialsareconducted,consideringseveralsoilandweatherconditions.Inthecaseofsugarcane,theseexperimentsareinstalledusingalargenumberoftestlines,butlimitationsinthefieldandoftheamountofgeneticmaterialdonotallowthereplicationofmany.Thisworkevaluated21trialsfromdifferentregions,fromaBraziliansugarcanebreedingprogram.Eachoftheseexperimentsoccupiedarectangulararrayofaround20rowsby25columnsinmostinstance,theplotswere12mlonger,double-furrowswith0.9mbetweenfurrowswithintheplotand1.5mspacingbetweendifferentplotsand1mbetweencolumns.Allthetrialshadatleast79%oftheareaplantedwithunreplicatedtestlines,theatmost21%oftheplotswereoccupiedbyfourcommercialvarieties,check.Aspecialcheck,interspersedalongdiagonals,wasplantedsystematicallyonadiagonalgrid,theotherthreewereequallyreplicatedandeachreplicatewasspreadoutinthreeneighbouringrowplots.Sevenofthe21experimentshadnosignificantdirectgeneticeffects,11presentedsignificantcompetitionattheresiduallevelandonlyonehadsignificantcompetitionatthegeneticlevel.Thecorrelationbetweentheselectedtestlinesforthedifferentexperimentsinasameregionwaslessthan0.54.However,thegeneticcorrelationwassignificantinthejointanalysesandstrongerthanthatfromtheindividualanalyses.Twosimulationstudieswereperformed:thefirstinvestigatedtheanalysisforasingleexperimentandtheresultsshowthatitisdifficulttofitamodelwhenthereisgeneticcompetitionwithorwithoutresidualcompetition.Thesamedifficultywasobservedinthesecondstudy,whichcomparedtheresultsfromindividualandjointanalyses.Itshowedthat,evenforthejointanalyses,onlyaround45to55%ofthetruebesttestlineswereselected.

ProgrammeAndAbstractsForWednesday29thOfNovember

Keynote:Wednesday29th9:00Mantra

StatisticsOnStreetCorners

DianneCookMonashUniversity

Perceptualresearchisoftenconductedonthestreet,withconveniencesamplingofpedestrianswhohappentobepassingby.Itisthroughexperimentsconductedusingpasser-bysthatwehavelearnedabouttheeffectofchange-blindness(https://www.youtube.com/watch?v=FWSxSQsspiQ)isinplayoutsidethelaboratory.

Indatascience,plotsofdatabecomeimportanttoolsforobservingpatterns,makingdecisions,andcommunicatingfindings.Butplotsofdatacanbevieweddifferentlybydifferentobservers,andoftenprovokeskepticismaboutwhetherwhatyousee"isreallythere?"Withtheavailabilityoftechnologythatharnessesstatisticalrandomisationtechniquesandinputfromcrowdswecanprovideobjectiveevaluationofstructurereadfromplotsofdata.

Thistalkdescribesaninferentialframeworkfordatavisualisation,andtheprotocolsthatcanbeusedtoprovideestimatesofp-values,andpower.Iwilldiscusstheexperimentsthatwehaveconductedthat(1)showthatthecrowd-sourcingdoesprovideresultssimilartostatisticalhypothesistesting,(2)howthiscanbeusedtoimproveplotdesign,(3)p-valuesinsituationswherenoclassicaltestsexist.Examplesfromecologyandagriculturewillbeshown.

JointworkwithHeikeHofmann,AndreasBuja,DeborahSwayne,HadleyWickham,Eun-kyungLee,MahbubulMajumder,NiladriRoyChowdhury,LendieFollett,SusanVanderplas,AdamLoy,YifanZhao,NathanielTomasetti

https://www.youtube.com/watch?v=FWSxSQsspiQ

Wednesday29th10:30Narrabeen

EstimatingOverdispersionInSparseMultinomialData

FarzanaAfrozUniversityofOtago

Thephenomenonofoverdispersionariseswhenthedataaremorevariablethanweexpectfromthefittedmodel.ThisissueoftenariseswhenfittingaPoissonorabinomialmodel.Whenoverdispersionispresent,ignoringitmayleadtomisleadingconclusions,withstandarderrorsbeingunderestimatedandoverly-complexmodelsbeingselected.Inourresearchweconsideredoverdispersedmultinomialdata,whichcanariseinmanyresearchareas.Twoapproachescanbeusedtoanalyseoverdispersedmultinomialdata;theuseofthequasilikelihoodmethodorexplicitmodellingoftheoverdispersionusing,forexample,aDirichlet-multinomialorfinite-mixturedistribution.Useofquasilikelihoodhastheadvantageofonlyrequiringspecificationofthefirsttwomomentsoftheresponsevariable.Forsparsedata,suchasinacontingencytablewithmanylowexpectedcounts,useofquasilikelihoodtoestimatetheamountofoverdispersionwillbeparticularlyuseful,asitmaybedifficulttoobtainreliableestimatesoftheparametersinaDirichlet-multinomialorfinite-mixturemodel.Iconsiderfourestimatorsoftheamountofoverdispersioninsparsemultinomialdata,discusstheirtheoreticalpropertiesandprovidesimulationresultsshowingtheirperformanceintermsofbias,varianceandmeansquarederror.

Wednesday29th10:30Gunnamatta

AssessingMudCrabMeatFullnessUsingNon-InvasiveTechnologies

CaroleWright,SteveGrauf,BrettWedding,PaulExley,JohnMayze,andSuePoole

QueenslandDepartmentofAgricultureandFisheries

Thedecisionofwhetheramudcrabshouldberetainedatharvesthastraditionallybeenbasedonshellhardness.Thisismostcommonlyassessedbyusingthumbpressureappliedtothecarapaceofthemudcrab.Thecarapaceofarecentlymoultedmudcrabwillflexconsiderablyandisthereforereturnedtothewater.Thisassessmenthasalsobeenusedtodividemudcrabsintothreemeatfullnessgrades(A,BandC).ThehighermeatfullnessgradeAmudcrabsfetchagreaterpriceatmarketcomparedtothelowerBandCgrades.Thesubjectivenatureofthisassessmentwillalwaysresultindisputesattheboundariesofthegrades.Bydevelopingamoreobjectivescience-basedmethoddowngradesatthemarketwillbereducedwhileconsumersatisfactionandtheoverallindustryprofitabilitywillincrease.

Ascopingstudywasconductedthatevaluatedinnovativenon-invasivetechnologiestoassessmudcrabmeatfullnessbasedonpercentageyieldrecoveryofcookedmeatfromthedominantindividualmudcrabclaws.Thenon-invasivetechnologiesassessedincludednearinfraredspectroscopy(NIRS),candlingusingvisiblelight,andacousticvelocity.NIRSshowedthemostpotentialandwasreassessedinasecondstudywithslightimprovementstospectracapturemethodsandNIRlightsources.

94livemudcrabsfromtheMoretonBayareawereusedinthesecondstudy.Partialleastsquaresregression(PLS-R)wasperformedtobuildacalibrationmodeltopredictthepercentageyieldrecoveryofcookedmeatbasedonthespectraldata.ThePLS-Rhad and .

Aprincipalcomponentslineardiscriminantanalysis(PC-LDA)wasalsoconductedtodiscriminatebetweenthestandardthreegradesofmudcrabmeatfullness.Thiswascomparedtotheindustrystandardshellhardnessmethod.TheNIRSPC-LDAachievedaminimumof76%correctclassificationforeachofthethreegrades,comparedto24%fortheshellhardnessmethod.

Thenon-invasivetechnologiestrialledalongwiththeresultswillbediscussedinthistalk.

Wednesday29th10:30Bundeena

AnalysisOfMultivariateBinaryLongitudinalData:MetabolicSyndromeDuring

MenopausalTransition

GeoffJonesMasseyUniversity

Metabolicsyndrome(MetS)isamajormultifactorialconditionthatpredisposesadultstotype2diabetesandCVD.Itisdefinedashavingatleastthreeoffivecardiometabolicriskcomponents:1)highfastingtriglyceridelevel,2)lowhigh-densitylipoproteincholesterol,3)elevatedfastingplasmaglucose,4)largewaistcircumference(abdominalobesity),and5)hypertension.IntheUSStudyofWomen’sHealthAcrosstheNation(SWAN),a15-yearmulti-centreprospectivecohortstudyofwomenfromfiveracial/ethnicgroups,theincidenceofMetSincreasedasmidlifewomenunderwentthemenopausaltransition(MT).AmodelissoughttoexaminetheinterdependentprogressionofthefiveMetScomponentsandtheinfluenceofdemographiccovariates.


StatisticalAnalysisOfCoastalAndOceanographicInfluencesOnTheQueenslandScallopFishery

Wen-HsiYang1,AnthonyJ.Courtney2,MichaelF.O’neill2,MatthewJ.Campbell2,GeorgeM.Leigh2,andJerzyA.Filar1

1UniverstiyofQueensland2QueenslandDepartmentofAgricultureandFisheries

Thesaucerscallop(Ylistrumballoti)otter-trawlfisheryusedtobethemostvaluablecommercially-fishedspeciesinQueenslandoceanwaters.Overthelastfewyears,therehasbeengrowingconcernamongfishers,fisherymanagersandscientistsoverthedeclineincatchratesandannualharvest.Aquantitativeassessmentconductedin2016showedthatscallopabundancewasatanhistoriclowlevel.Theassessmentuseddatasourcedfromthefisheryandindependentsurveys.Furtherinformationoncoastalandoceanographicinfluencesareavailableandmayrevealnewfactorsthatinfluencepopulationabundanceof

scallopsandimprovemanagementofthefishery.Inthisstudy,scallopcatchrateabundancedataandcoastalandphysicaloceanographicvariables(e.g.seasurfacetemperatureanomalies,coastalfreshwaterflowandChlorophyll-a)weremodelledtoidentifyspatialandtemporalenvironmentalprocessesimportantforconsiderationinfisherymanagementprocedures.


SavedByTheExperimentalDesign:TestingBycatchReductionAndTurtleExclusionDevicesInThePngPrawnTrawlFishery

EmmaLawrenceandBillVenablesCSIRO

Intrawlingforprawns,theprawncatchisoftenonlyasmallpartoftheresultsofanyonetrawl,withtheremaindercalled"bycatch".Reducingthebycatchcomponent,whilemaintainingtheprawncatch,isanimportantindustrygoal,primarilyforenvironmentalaccreditationpurposes,butalsoforeconomicreasons.

Wedesignedanat-seatrialfortheGulfofPapuaPrawnFishery,involvingfourvesselseachtowing"quadgear"(thatis,4separate,butlinkedtrawlnets)ineachtrawlshot,over18days.Theexperimentwasdesignedtoassesstheeffectivenessof27combinationsofTurtleExcluderDevices(TEDs)andBycatchReductionDevices,(BRDs),withacontrolnet,withoutanyattacheddeviceasoneofthenetsineachquad.AtBiometrics2015wediscussedhowweusedsimulatedannealingtogenerateahighlyefficientdesign,inseveralstages,tomeetthelargenumberofhighlyspecificlogisticalconstraints.

Thefocusofthistalkwillbetheanalysis,whichalsoprovedsomewhatchallenging.Wewillpresenttheresultsofouranalysisanddemonstratewhyputtingthetimeintothinkingaboutandgeneratinganon-standardexperimentaldesignallowedustoaccommodatethevariousglitchesandmisfortunesthatalwaysseemtohappenatsea.


RethinkingBiosecurityInspections:ACaseStudyOfTheAsianGypsyMoth(AGM)InAustralia

PetraKuhnert1,DeanPaini1,PaulMwebaze1,andJohnNielsen21CSIRO

2DepartmentofAgricultureandWaterResources

TheAsiangypsymoth(AGM)(Lymantriadisparasiatica)isaseriousbiosecurityrisktoAustralia’sforestryandhorticulturalindustries.WhilesimilarinappearancetotheEuropeangypsymoth(Lymantriadispardispar),theAsiangypsymothiscapableofflyingupto40kilometresandthereforehasthepotentialtoestablishandspreadinotherareaslikeAustralia.Inaddition,femalesareattractedtolightandwilloviposit(layeggs)indiscriminately.Asaresult,femalesareattractedtoshippingportsatnightandwillovipositonships.Theseshipsthereforehavethepotentialtospreadthismotharoundtheworld.

Thelife-cycleofthemothhasbeenwelldocumentedandisheavilydependentontemperature,witheggsundergoingthreephasesofdiapausebeforehatching.CurrentinspectionsofvesselsarrivingintoAustralianportsfromwhatisdeemedan“atrisk”portisalengthyandcostlyprocess.

ToassisttheDepartmentofAgriculturewiththeirprioritisationofships,wedevelopedanAGMToolintheformofanRShinyAppthat(1)showstheshortestmaritimepathfromanatriskporttoanAustralianportforavesselofinterestand(2)predictstheprobabilityofapotentialhatchandit’sreliabilityusingaclassificationtreemodelthatwasdevelopedtoemulatethelifecycleofthemothfromsimulateddata.Inthistalkwewilldiscussthemethodologythat(1)simulatestheAGMbiologyandpotentialhatchesofeggs,alongwithhowweextractedrelevanttemperaturedatathatwastheprimarydriverofthelifecycleforAGM,and(2)emulatesthissimulateddatausingastatisticalmodel,namelyaclassificationtreetopredicttheprobabilityofapotentialhatch.Wewillalsodiscussabootstrapapproachtoexplorethereliabilityofthepotentialhatchpredicted.


SubtractiveStabilityMeasuresForImprovedVariableSelection

ConnorSmith1,SamuelMüller1,andBorisGuennewig1,21UniversityofSydney

2UniversityofNewSouthWales

ThistalkbuildsupontheInvisibleFence(Jiangetal.,2011)apromisingmodelselectionmethod.Utilizingacombinationofcoeffcient,scaleanddevianceestimatesweareabletoimprovethisresamplingbasedmodelselectionmethodforregressionmodels,bothlinearandgenerallinearmodels.Theintroductionofavariableinclusionplotallowsforavisualrepresentationforthestabilityofthemodelselectionmethodaswellasthevariablesbootstrappedrank.Thesuggestedmethodswillbeappliedtobothsimulatedandrealexampleswithcomparisonsaboutbothcomputationaltimeandeffectivenessmadetoselectionsthroughalternativeselectionprocedures.Wewillreportonourlatestresultsfromongoingworkinscalingupsubtractivestabilitymeasureswhenthenumbersoffeaturesislarge.

References:Jiang,J.,Nguyen,T.,&Rao,J.S.(2011).Invisiblefencemethodsandtheidentificationofdifferentiallyexpressedgenesets.StatisticsandItsInterface,4(3),403-415.


AComparisonOfMultipleImputationMethodsForMissingDataInLongitudinalStudies

MdHamidulHuque1,KatherineLee1,JulieSimpson2,andJohnCarlin11MurdochChildrensResearchInstitute

2UniversityofMelbourne

Multipleimputation(MI)forimputingmissingdataareincreasinglyusedin

longitudinalstudieswheredataaremissingduetonon-responseandlosttofollow-up.Standardmultivariatenormalimputation(MVNI)andfullyconditionalspecifications(FCS)aretheprincipleimputationframeworkavailableforimputingcross-sectionalmissingdata.Anumberofmethodshasbeensuggestedintheliteraturetoimputelongitudinaldataincluding(i)useofstandardFCSandMVNIwithrepeatedmeasurementsasseparatedistinctvariables(ii)useofimputationmethodsbasedongeneralizedlinearmixedmodels.NoclearevaluationoftherelativeperformanceofavailableMImethodsinthecontextoflongitudinaldata.Wepresentacomprehensivecomparisonofthealltheavailablemethodsforimputationlongitudinaldatainthecontextofestimatingcoefficientforbothlinearregressionmodelandlinearmixedeffectmodel.Wealsocomparedtheperformanceofthemethodstoimputebothbinaryandcontinuousdata.Atotalof10differentmethods(MVNI,JM-pan,JM-jomo,standardFCS,FCS-twofold,FCS-MTW,FCS-2lnorm,FCS-2lglm,FCS-2ljomoandFCS-Blimp)arecomparedintermsofbias,standarderrorandcoverageprobabilityoftheestimatedregressioncoefficients.Thesemethodsarecomparedusingasimulationstudybasedonapreviouslyconductedanalysisexploringtheassociationbetweentheburdenofoverweightandqualityoflife(QoL)usingdatafromtheLongitudinalStudyofAustralianChildren(LSAC).WefoundthatbothstandardFCSandMVNIprovidereliableestimatesandcoverageoftheregressionparameters.Amongothermethodslinearmixedmodelsbasedmethods,JM-jomoandFCS-Blimpapproachesholdgreatpromise.


SpeciesDistributionModellingForCombinedDataSources

IanRenner1andOlivierGimenez21UniversityofNewcastle

2Centred’EcologieFonctionnelleetEvolutive

Increasingly,multiplesourcesofspeciesoccurrencedataareavailableforaparticularspecies,collectedthroughdifferentprotocols.Forsingle-sourcemodels,avarietyofmethodshavebeendeveloped:pointprocessmodelsforpresence-onlydata,logisticregressionforpresence-absencedataobtainedthroughsingle-visitsystematicsurveys,andoccupancymodellingfor

detection/non-detectiondataobtainedthroughrepeat-visitsurveys.Insituationsforwhichmultiplesourcesofdataareavailabletomodelaspecies,thesesourcesmaybecombinedviaajointlikelihoodexpression.Nonetheless,therearequestionsabouthowtointerprettheoutputfromsuchacombinedmodelandhowtodiagnosepotentialviolationsofmodelassumptionssuchastheassumptionofspatialindependenceamongpoints.

Inthispresentation,Iwillexplorequestionsofinterpretationoftheoutputfromthesecombinedapproaches,aswellasproposeextensionstocurrentpracticethroughtheintroductionofaLASSOpenalty,sourceweightstoaccountfordifferingqualityofdata,andmodelswhichaccountforspatialdependenceamongpoints.ThisapproachwillbedemonstratedbymodellingthedistributionoftheEurasianlynxineasternFrance.


AFactorAnalyticMixedModelApproachForTheAnalysisOfGenotypeByTreatmentByEnvironmentData

LaurenBorg,BrianCullis,andAlisonSmithUniversityofWollongong

Theaccurateevaluationofgenotypeperformanceforarangeoftraits,includingdiseaseresistance,isofgreatimportancetotheproductivityandsustainabilityofmajorAustraliancommercialcrops.Typically,thedatageneratedfromcropevaluationprogrammesarisefromaseriesoffieldtrialsknownasmulti-environmenttrials(METs),whichinvestigategenotypeperformanceoverarangeofenvironments.

Inevaluationtrialsfordiseaseresistance,itisnotuncommonforsomegenotypestobechemicallytreatedagainsttheafflictingdisease.AnimportantexampleinAustraliaistheassessmentofgenotypesforresistancetoblacklegdiseaseincanolacropswhereitiscommonpracticetotreatcanolaseedswithafungicide.Genotypesareeithergrownintrialsastreated,untreatedorasboth.

ThereareanumberofmethodsfortheanalysisofMETdata.Thesemethods,

however,donotspecificallyaddresstheanalysisofdatawithanunderlyingthree-waystructureofgenotypebytreatmentbyenvironment(GxTxE).Here,weproposeanextensionofthefactoranalyticmixedmodelapproachforMETdata,usingthecanolablacklegdataasthemotivatingexample.

Historicallyintheanalysisofblacklegdata,thefactorialgenotypebytreatmentstructureofthedatawasnotaccountedfor.Entries,whicharethecombinationsofgenotypesandfungicidetreatmentspresentintrials,wereregardedas`genotypes’andatwo-wayanalysisof‘genotypes’byenvironmentswasconducted.

Theanalysisofourexampleshowedthattheaccuracyofgenotypepredictions,andthenceinformationforgrowers,wassubstantiallyimprovedwiththeuseofthethree-wayGxTxEapproachcomparedwiththehistoricalapproach.


TheImpactOfCohortSubstanceUseUponLikelihoodOfTransitioningThroughStagesOfAlcoholAndCannabisUseAndUseDisorder:FindingsFromTheAustralianNationalSurveyOnMentalHealthAndWell-Being

LouisaDegenhardt1,MeyerGlantz2,ChriannaBharat1,AmyPeacock1,LuiseLago1,NancySampson3,andRonaldKessler31NationalDrugandAlcoholResearchCentre

2NationalInstituteonDrugAbuse3HarvardUniversity

Theaimsofthepresentstudyweretousepopulation-levelAustraliandatatoestimateprevalenceandspeedoftransitionsacrossstagesofalcoholandcannabisuse,abuseanddependence,andremissionfromdisorder,andconsiderthepotentialimpactsthatanindividual’sageandsexcohort’slevelofsubstance

usepredictedtransitionsintoandoutofsubstanceuse.Dataonlifetimehistoryofuse,DSM-IVusedisorders,andremissionfromthesedisorderswerecollectedfromparticipants(n=8,463)inthe2007AustralianNationalSurveyofMentalHealthandWellbeingusingtheCompositeInternationalDiagnosticInterview.

Lifetimeprevalenceofalcoholuse,regularuse,abuse,dependence,andremissionfromabuseanddependencewere94.1%,64.5%,22.1%,4.0%,16.1%and2.1%,respectively.Unconditionallifetimeprevalenceofcannabisuse,abuse,dependence,andremissionfromabuseanddependencewere19.8%,6.1%,1.9%,4.0%and1.5%.Increasesintheestimatedproportionofpeopleintherespondent’ssexandagecohortwhousedalcohol/cannabisasofagivenageweresignificantlyassociatedwithmosttransitionsfromusethroughtoremissionbeginningatthesameage.Clearassociationsweredocumentedbetweencohort-levelprevalenceofsubstanceuseandpersonalriskofsubsequenttransitionsofindividualsinthecohortfromusetogreatersubstanceinvolvement.Thisrelationshipremainedsignificantoverandaboveassociationsinvolvingtheindividual’sageofinitiation.Thesefindingshaveimportantimplicationsforourunderstandingofthecausalpathwaysintoandoutofproblematicsubstanceuse.


TheLASSOOnLatentIndicesForOrdinalPredictorsInRegression

FrancisHui1,SamuelMueller2,andAlanWelsh11ANU

2UniversityofSydney

Manyapplicationsofregressionmodelsinvolveordinalcategoricalpredictors.AmotivatingexampleweconsiderisordinalratingsfromindividualsrespondingtoquestionnairesregardingtheirworkplaceintheHouseholdIncomeandLabourDynamicsinAustralia(HILDA)survey,withtheaimbeingtostudyhowworkplaceconditions(mainandpossibleinteractioneffects)affecttheiroverallmentalwellbeing.Acommonapproachtohandlingordinalpredictorsistotreateachpredictorasafactorvariable.Thiscanleadtoaveryhigh-dimensionalproblem,andhasspurredmuchresearchintopenalizedlikelihoodmethodsfor

handlingcategoricalpredictorswhilerespectingthemarginalityprinciple.Ontheotherhand,giventheordinalratingsareoftenregardedasmanifestationsofsomelatentindicesconcerningdifferentaspectsofjobquality,thenamoresensibleapproachwouldbetofirstperformsomesortofdimensionreductionbeforeenteringthepredictedindicesintoaregressionmodel.Inappliedresearchthisisoftenperformedasatwo-stageprocedure,andindoingsofailstoutilizetheresponseinordertobetterpredictthelatentindicesthemselves.

Inthistalk,weproposetheLASSOonLatentIndices(LoLI)forhandlingordinalcategoricalpredictorsinregression.TheLoLImodelsimultaneouslyconstructsacontinuouslatentindexforeachorgroupsofordinalpredictorsandmodelstheresponseasafunctionofthese(andotherpredictorsifappropriate)includingpotentialinteractions,withacompositeLASSOtypepenaltyaddedtoperformselectiononmainandinteractioneffectsbetweenthelatentindices.Asasingle-stageapproach,theLoLImodelisabletoborrowstrengthfromtheresponsetoimproveconstructionofthecontinuouslatentindices,whichinturnproducesbetterestimationofthecorrespondingregressioncoefficients.Furthermore,becauseoftheconstructionoflatentindices,thedimensionalityoftheproblemissubstantiallyreducedbeforeanyvariableselectionisperformed.Forestimation,weproposefirstestimatingthecutoffsrelatingtheobservedordinalpredictorstothelatentindices.Thenconditionalonthesecutoffs,weapplyapenalizedExpectationMaximizationalgorithmviaimportancesamplingtoestimatetheregressioncoefficients.AsimulationstudydemonstratestheimprovedpoweroftheLoLImodelatdetectingtrulyimportantordinalpredictorscomparedtobothtwo-stageapproachesandusingfactorvariables,andbetterpredictiveandestimationperformancecomparedtothecommonlyusedtwo-stageapproach.


Whole-GenomeQTLAnalysisForNestedAssociationMappingPopulations

MariaValeriaPaccapelo1,AlisonKelly1,JackChristopher2,andArunasVerbyla3,4

1QueenslandDepartmentofAgricultureandFisheries2QueenslandAllianceforAgricultureandFood

3Data614CSIRO

Geneticdissectionofquantitativetraitsinplantshasbecomeanimportanttoolinbreedingofimprovedvarieties.ThemostcommonlyusedmethodstomapQTLarelinkageanalysisinbi-parentalpopulationsandassociationmappingindiversitypanels.However,bi-parentalpopulationsarerestrictedintermsofallelicdiversityandrecombinationevents.Despitethefactthatassociationmappingovercomestheselimitations,ithaslowpowertodetectrareallelesassociatedwithatraitofinterest.Multi-parentpopulationssuchasmulti-parentadvancedgenerationinter-cross(MAGIC)andnestedassociationmapping(NAM)populationshavebeendevelopedtocombinestrengthsofbothmappingapproaches,capturingmorerecombinationeventsandallelicdiversitythanbi-parentalpopulationsandinagreaterfrequencythanadiversitypanel.NestedassociationmappingusesmultipleRILfamiliesconnectedbyasinglecommonparent.Suchapopulationstructurepresentssomeadditionalchallengescomparedtotraditionalmapping,inparticularthepopulationdesignandthelargenumberofmolecularmarkersthatneedtobeintegratedsimultaneouslyintotheanalysis.WepresentamethodforQTLmappingforNAMpopulationsadaptedfrommulti-parentwholegenomeaverageintervalmapping(MPWGAIM)wheretheNAMdesignisincorporatedthroughtheprobabilityofinheritingfounderallelesforeverymarkeracrossthegenome.Thismethodisbasedonamixedlinearmodelinaone-stageanalysisofrawphenotypestogetherwithmarkers.Itsimultaneouslyscansthewhole-genomethroughaniterativeprocessleadingtoamulti-locusmodel.TheapproachwasappliedtoawheatNAMpopulationinordertoperformQTLmappingforplantheight.ThemethodwasdevelopedinR,withmaindependenciesbeingtheRpackagesMPWGAIMandasreml.ThisapproachestablishesthebasisforfurtherstudiesandextensionssuchasthecombinationofmultipleNAMpopulations.


AnAsymmetricMeasureOfPopulationDifferentiationBasedOnTheSaddlepointApproximationMethod

LouiseMcMillanandRachelFewsterUniversityofAuckland

Inthefieldofpopulationgeneticstherearemanymeasuresofgeneticdiversityandpopulationdifferentiation.ThebestknownisWright'sFst,laterexpandedbyCockerhamandWeir,whichisverywidelyusedasameasureofseparationbetweenpopulations.Morerecentlyamultitudeofothermeasureshavebeendeveloped,fromGsttoD,allwithdifferentfeaturesanddisadvantages.Onethingthesemeasuresallhaveincommonisthattheyaresymmetric,whichistosaythattheFstbetweenpopulationAandpopulationBisthesameasthatbetweenpopulationBandpopulationA.FollowingmyworkonGenePlot,avisualizationtoolforgeneticassignment,Iamnowworkingonthedevelopmentofanasymmetricmeasure,wherethefitofAintoBmaynotbethesameasthefitofBintoA.Thismeasurewillenablethedetectionofscenariossuchas"subsetting",therelationshipbetweenalarge,diversepopulationAandasmallerpopulationBthathasexperiencedgeneticdriftsincebeingseparatedfromA.Themeasurehasseveralfeaturesthatdistinguishitfromexistingmeasures,andisconstructedusingthesamesaddlepointapproximationmethodunderlyingGenePlot,andwhichisusedtoapproximatethemulti-locusgeneticdistributionsofpopulations.


FastAndApproximateExhaustiveVariableSelectionForGLMsWithAPES

KevinWang,SamuelMueller,GarthTarr,andJeanYangUniversityofSydney

Obtainingmaximumlikelihoodestimatesforgeneralisedlinearmodels(GLMs)iscomputationallyintensiveandremainsasthemajorobstacleforperformingallsubsetsvariableselection.Exhaustiveexplorationofthemodelspace,evenforamoderatelylargenumberofcovariates,remainsaformidablechallengeformoderncomputingcapabilities.Ontheotherhand,efficientalgorithmsforexhaustivesearchesdoexistforlinearmodels,mostnotablytheleapsandboundalgorithmand,morerecently,themixedintegeroptimisationalgorithm.Inthistalk,wepresentAPES(APproximatedExhaustiveSearch)anewmethodthat

approximatesallsubsetselectionforagivenGLMbyreformulatingtheproblemasalinearmodel.Themethodworksbylearningfromobservationalweightsinacorrect/saturatedgeneralisedlinearregressionmodel.APEScanbeusedinpartnershipwithanyotherstate-of-the-artlinearmodelselectionalgorithm,thusenabling(approximate)exhaustivemodelexplorationindimensionsmuchhigherthanpreviouslyfeasible.WewilldemonstratethatAPESmodelselectioniscompetitiveagainstgenuineexhaustivesearchviasimulationstudiesandapplicationstohealthdata.Extensionstoarobustsettingisalsopossible.


OrderSelectionOfFactorAnalyticModelsForGenotypeXEnvironmentInteraction

EmiTanaka1,FrancisHui2,andDavidWarton31UniversityofSydney

2ANU3UniversityofNewSouthWales

Factoranalytic(FA)modelsarewidelyusedacrossarangeofdisciplinesowingtocomputationaladvantagesfromdimensionreductionandpossibleabilitytointerpretthefactors.Inplantbreeding,FAmodelprovidesanaturalframeworktomodelthegenotypexenvironmentinteraction.AnFAmodelisdictatedbythenumberoffactors(orderofthemodel).Ahigherorderlendstomoreparametersinthemodelandthisnecessitatestheorderselectiontoachieveparsimony.Weintroduceanorderselectionmethodviatheorderedfactorlasso(OFAL).Weillustrateitsperformancebasedonasimulationonarealwheatyieldmulti-environmentaltrial.


MultipleSampleHypothesisTestingOfTheHumanMicrobiomeThroughEvolutionaryTrees

MartinaMincheva1,HongzheLi2,andJunChen31TempleUniversity

2UniversityofPennsylvania3MayoClinic

Nextgenerationsequencingtechnologiesmakeitpossibletosurveymicrobialcommunitiesbysequencingnucleicacidmaterialextractedfrommultiplesamples.Themetagenomicreadcountsaresummarizedasempiricaldistributionsonareferencephylogenetictree.ThedistancebetweenthemisevaluatedbytheKantorovich-Rubinstein(kr)metric,equivalenttothecommonlyusedweightedUniFracdistanceonatree.Thispaperproposesamethodtotestthehypothesisthattwosetsofsampleshavethesamemicrobialcomposition.TheasymptoticdistributionsofthekrdistancebetweenthetwoFrechetmeansandtheFrechetvariancesarederivedandareshowntobeindependent.TheteststatisticisdefinedastheratioofthosedistancesanditisshowntofollowanasymptoticF-distribution.Itsgeneralitystemsfromthefactthatthetestisnonparametricandrequiresnoassumptionsontheprobabilitydistributionsofthecountdata.Itisalsoapplicableforvaryingsetsizesandsamplesizes.ItisanextensionofEvansandMatsen(2012)whosuggestatesttoonlycomparetwosinglesamples.Thecomputationalefficiencyoftheproposedtestcomesfromtheexactasymptoticdistributionoftheproposedteststatistic.Extensivedataanalysisshowsthatthetestissignificantlyfasterthanthepermutation-basedmultivariateanalysisofvarianceusingdistancematrices(permanova)(McArdleandAnderson,2001).Atthesametime,ithascorrecttype1errorsandcomparablepower,whichmakesitpreferableintheanalysisoflargescalemicrobiomedata.

Keynote:Wednesday29th13:40Mantra

StatisticalStrategiesForTheAnalysisOfLargeAndComplexData

LouiseRyan1,2,StephenWright1,3,andHonHwang11UniversityofTechnologySydney

2HarvardT.H.ChanSchoolofPublicHealth3AustralianRedCross

Thistalkwillfocusonchallengesthatarisewhenfacedwiththeanalysisofdatasetsthataretoolargeforstandardstatisticalmethodstoworkproperly.Whileonecanalwaysgofortheexpensivesolutionofgettingaccesstoamorepowerfulcomputerorcluster,itturnsoutthattherearesomesimplestatisticalstrategiesthatcanbeused.Inparticular,we'lldiscusstheuseofsocalled"DivideandRecombine"strategiesthatrelegatesomeoftheworktobedoneinadistributedfashion,forexampleviaHadoop.Combiningthesestrategieswithcleversubsamplinganddatacoarseningideascanresultindatasetsthataresmallenoughtomanageonastandarddesktopmachine,withonlyminimalefficiencyloss.TheideasareillustratedwithdatafromtheAustralianRedCross.


OptimalExperimentalDesignForFunctionalResponseExperiments

JeffZhangandChristopherDrovandiQueenslandUniversityofTechnology

Functionalresponsemodelsareimportantinunderstandingpredator-preyinteractions.Thedevelopmentoffunctionalresponsemethodologyhasprogressedfrommechanisticmodelstomorestatisticallymotivatedmodelsthatcanaccountforvarianceandtheover-dispersioncommonlyseeninthedatasetscollectedfromfunctionalresponseexperiments.However,littleinformationseemstobeavailabletothosewishingtoprepareoptimalparameterestimationdesignsforfunctionalresponseexperiments.Wedevelopaso-calledexchangedesignoptimisationalgorithmsuitableforinteger-valueddesignspaces,whichforthemotivatingfunctionalresponseexperimentinvolvesselectingthenumberofpreyusedforeachobservation.Further,wedevelopandcomparenewutilityfunctionsforperformingrobustoptimaldesigninthepresenceofparameteruncertainty,whicharegenerallyapplicable.Themethodsareillustratedusingapublishedbeta-binomialfunctionalresponsemodelforanexperimentinvolvingthefreshwaterpredatorNotonectaglauca(anaquaticinsect)preyingonAsellusaquaticus(asmallcrustacean)asacasestudy.


ToPCAOrNotToPCA

CatherineM.McKenzie1,WeiZhang1,StuartD.Card1,CoryMatthew2,WadeJ.Mace1,andSivaGanesh1

1AgResearch2MasseyUniversity

Whentherearegroupingsofobservationspresentinthedata,manyresearchersresorttoutilisingaPrincipalComponentsAnalysis(PCA)aprioriforidentifyingpatternsinthedata,andthenlooktomapthepatternsobtainedfromPCAtodifferencesamongthegroupings,orattributebiologicalsignaltothem.Isthisappropriate,giventhatPCA’sarenotdesignedtodiscriminatebetweenthegroupings?Isagroup-orientedmultivariatemethodologysuchasmultivariateanalysesofvariance(MANOVA)orCanonicalDiscriminantAnalysis(CDA)preferable?Whichmethodhasmorerelevancewheninvestigatingfactoreffectsofbiochemicalpathways?Weexplorethisquestionviaabiologicalexample.

Twobiologicalmaterials(B1&B2)wereanalysedforthesame19primarymetabolites,withthreefactorsofMethods(M1&M2),Treatment(T1,T2,T3andT4),andAge(A1&A2),withthreereplicatevaluesgivingatotalof48observationsforeachbiologicalmaterial.Univariateandmultivariateanalysesofvariance(ANOVAandMANOVA,respectively)werecarriedout,forwhichthereweremanystatisticallysignificantinteractioneffects.Inaddition,othermultivariatetechniquessuchasPCAandCDAwereusedtoexplorerelationshipsbetweenthevariables.ThequestionremainsastotheappropriatenessofcarryingoutPCAtoexplorebiochemicalpathways,thecomparisonbeingbetweentailoringthepatternextractionaprioritomatchtheknowngroupingswithinthedataversusstartingwithanunrestrainedpatternanalysisandseekingtoexplainthepatternsdetectedpost-analysis?


AnEvaluationOfErrorVarianceBiasInSpatialDesigns

EmlynWilliams1andHans-PeterPiepho21ANU

2UniversityofHohenheim

Spatialdesignandanalysisarewidelyused,particularlyinfieldexperimentation.However,itisoftenthecasethatspatialanalysisdoesnotenhancemoretraditionalapproachessuchasrow-columnanalysis.Itisthenofinteresttogaugethedegreeoferrorvariancebiasthataccrueswhenaspatially-designedexperimentisanalysedasarow-columndesign.ThistalkbuildsontheworkofTedin(1931)who,withR.A.Fisherasadvisor,studiederrorvariancebiasinknight'smoveLatinsquares.


NewModel-BasedOrdinationDataExplorationToolsForMicrobiomeStudies

OlivierThas1,StijnHawinkel1,andLucBijnens21GhentUniversity

2JanssenPharmaceutics

High-throughputsequencingtechnologiesalloweasycharacterizationofthehumanmicrobiome,butthestatisticalmethodsforanalysingmicrobiomedataarestillintheirinfancy.DataexplorationoftenreliesonclassicaldimensionreductionmethodssuchasPrincipalCoordinateAnalysis(PCoA),whichisbasicallyaMultidimensionalScaling(MDS)methodstartingfromecologicallyrelevantdistancemeasuresbetweenthevectorsofrelativeabundancesofthemicroorganisms(e.gBray-Curtisdistance).

Wewilldemonstratethattheseclassicalvisualisationmethodsfailtodealwithmicrobiome-specificissuessuchasvariabilityduetolibrary-sizedifferencesandoverdispersion.Nextweproposeanewtechniquethatisbasedonanegativebinomialregressionmodelwithlog-link,andwhichreliesontheconnectionbetweencorrespondenceanalysisandthelog-linearRC(M)modelsofGoodman(AnnalsofStatistics,vol.13,1985);seealsoZhuetal.(EcologicalModelling,vol.187,2005).InsteadofassumingaPoissondistributionforthecounts,anegativebinomialdistributionisassumed.Tobetteraccountforlibrarysize

effects,weadoptadifferentweightingscheme,whichnaturallyarisesfromtheparameterisationofthemodel.AniterativeparameterestimationmethodisproposedandimplementedintoR.Thenewmethodisillustratedonseveralexampledatasets,anditisempiricallyevaluatedinasimulationstudy.Itisconcludedthatourmethodsucceedsbetterindiscoveringstructureinmicrobiomedatasetsthanwithotherconventionalmethods.

Inthesecondpartofthepresentationweextentthemodel-basedmethodtoaconstrainedordinationmethodbyusingsample-specificcovariatedata.Themethodlooksforatwo-dimensionalvisualisationthatoptimallydiscriminatesbetweenspecieswithrespecttotheirsensitivitytoenvironmentalconditions.AgainwebuilduponresultsofZhuetal.(2005)andZhangandThas(StatisticalModelling,vol.12,2012).Themethodisillustratedonrealdata.

AllmethodsareavailableasanRpackage.


AlwaysRandomize?

ChrisBrien1,21UniversityofSouthAustralia

2UniverstiyofAdelaide

Fishergaveusthreefundamentalprinciplesfordesignedexperiments:replication,randomizationandlocalcontrol.Consonantwiththis,Brienetal.(2011)[Brien,C.J.,Harch,B.D.,Correll,R.L.,&Bailey,R.A.(2011)Multiphaseexperimentswithatleastonelaterlaboratoryphase.I.Orthogonaldesigns.JournalofAgricultural,Biological,andEnvironmentalStatistics,16,422-450.]exhorttheuseofrandomizationinmultiphaseexperimentsviatheirPrinciple7(Allocateandrandomizeinthelaboratory).Thisprincipleisqualifiedwith‘whereverpossible’,whichleadstothequestion‘whenisrandomizationnotpossible?’.

Situationswhererandomizationisnotapplicablewillbedescribedforbothsingle-phaseandmultiphaseexperiments.Thereasonsfornotrandomizingincludepracticallimitationsand,formultiphaseexperiments,difficultyinestimatingvarianceparameterswhenrandomizationisemployed.Forthelatter

case,simulationstudiescanvassinganumberofpotentialdifficultieswillbedescribed.ANonrandomizationPrinciple,andanaccompanyinganalysisstrategy,formultiphaseexperimentswillbeproposed.


ComparingClassicalCriteriaForSelectingIntra-ClassCorrelatedFeaturesForThree-ModeThree-WayData

LynetteHunt1andKayeBasford21UniversityofWaikato

2UniversityofQueensland

Manyunsupervisedlearningtasksinvolvedatasetswithbothcontinuousandcategoricalattributes.Onepossibleapproachtoclusteringsuchdataistoassumethatthedatatobeclusteredcomefromafinitemixtureofpopulations.Therehasbeenextensiveuseofmixtureswherethecomponentdistributionsaremultivariatenormalandwherethedatawouldbedescribedastwomodetwowaydata.Thefinitemixturemodelcanalsobeusedtoclusterthreewaydata.Themixturemodelapproachrequiresthespecificationofthenumberofcomponentstobefittedtothemodelandtheformofthedensityfunctionsoftheunderlyingcomponents.

Thistalkillustratestheperformanceofseveralcommonlyusedmodelselectioncriteriainselectingboththenumberofcomponentsandtheformofthecorrelationstructureamongsttheattributeswhenfittingamixturemodeltothefinitemixturemodeltoclusterthreewaydatacontainingmixedcategoricalandcontinuousattributes


ExploringTheSocialRelationshipsOfDairyGoats

VanessaCave1,BenjaminFernoit2,JimWebster1,andGosiaZobel11AgResearch

2AgrosupDijon

Goatsaresentientbeingscapableofanemotionalresponsetotheirlives.Yetdespitethis,thesocialrelationshipsbetweenanimalsarelargelyoverlookedincommercialsystems.Goatshavebeenshowntorecognise,andmakedecisionsbasedon,thepresenceofotherspecificgoats.Thedegreetowhichtheychoosetoassociatewithindividualshasnotbeenestablished,butsocialbondsinotheranimalshavebeenshowntobufferagainststressfulsituations,andconverselycanbeasignificantsourceofstresswhensuchbondsaredisrupted.

Toinvestigatewhethersocialrelationshipsexistamongdairygoats,4non-consecutivedaysofvideofocusingonagroupof12goatswasanalysed.Atoneminutescanintervals,theproximityofeverygoatrelativetoeveryothergoatwasrecordedasanordinalvariablewithfourlevels(incontact,withinaheadlength,withinabodylength,oralone).Eachgoat’slocationinthepenwasalsonoted(feeding,bedding,climbingplatform).

Avarietyofstatisticaltechniques,includingheatmapsandnetworkanalyses,wereusedtostudythesocialrelationshipsamonggoatsbasedonproximity.Socialrelationshipswerecharacterisedbyspecificpairsofgoatsreliablyspendingalotoftimeincloseproximity.

Resultsindicatethatwhilstsomegoatswere“sociable”(e.g.,spendingmorethan60%oftheirtimewithothergoats),otherstendedtobe“loners”(e.g.,spendingmorethan60%oftheirtimealone).Interestingly,therewasevidenceofbothpreferredandavoidedcompanionships.

Thissmall-scalestudyprovidesthefirstevidencetosuggestthatcommonmanagementpracticesresultingintheregroupingofdairygoatscouldhaveanimpactontheirwelfare.


BayesianSemi-ParametricSpectralDensityEstimationWithApplicationsToThe

SouthernOscillationIndex

ClaudiaKirch1,MattEdwards2,AlexanderMeier1,andRenateMeyer21UniversityofMagdeburg2UniversityofAuckland

StandardtimeseriesmodellingisdominatedbyparametricmodelslikeARMAandGARCHmodels.EventhoughnonparametricBayesianinferencehasbeenarapidlygrowingareaoverthelastdecade,onlyveryfewnonparametricBayesianapproachestotimeseriesanalysishavebeendeveloped.Mostnotably,CarterandKohn(1997),Gangopadhyay(1998),Choudhurietal.(2004),andRosenetal(2012)usedWhittle'slikelihoodforBayesianmodelingofthespectraldensityasthemainnonparametriccharacteristicofstationarytimeseries.Ontheotherhand,frequentisttimeseriesanalysesareoftenbasedonnonparametrictechniquesencompassingamultitudeofbootstrapmethods(KreissandLahiri,2011,KirchandPolitis,2011).

AsshowninContreras-Cristanetal.(2006),thelossofefficiencyofthenonparametricapproachusingWhittle'slikelihoodapproximationcanbesubstantial.Ontheotherhand,parametricmethodsaremorepowerfulthannonparametricmethodsiftheobservedtimeseriesisclosetotheconsideredmodelclassbutfailifthemodelismisspecified.Therefore,wesuggestanonparametriccorrectionofaparametriclikelihoodthattakesadvantageoftheefficiencyofparametricmodelswhilemitigatingsensitivitiesthroughanonparametricamendment.WeuseanonparametricBernsteinpolynomialprioronthespectraldensitywithweightsinducedbyaDirichletprocess.ContiguityandposteriorconsistencyforGaussianstationarytimeserieshavebeenshowninapreprintbyKirchetal(2017).BayesianposteriorcomputationsareimplementedviaaMH-within-Gibbssamplerandtheperformanceofthenonparametricallycorrectedlikelihoodisillustratedinasimulation.WeusethisapproachtoanalysethemonthlytimeseriesoftheSouthernOscillationIndex,oneofthekeyatmosphericindicesforgaugingthestrengthofElNinoeventsandtheirpotentialimpactsontheAustralianregion.


EfficientMultivariateSensitivityAnalysisOf

AgriculturalSimulators

DanielGladishCSIRO

Complexmechanisticcomputermodelsoftenproducemultivariateoutput.Sensitivityanalysiscanbeusedtohelpunderstandsourcesofuncertaintyinthesystem.Muchoftheliteraturearoundsensitivityanalysishasfocusedonunivariateoutput,withsomerecentadvancesusingmultivariatecorrelatedoutput.Onepromisingmethodformultivariatesensitivityanalysisinvolvesdecompositionthroughbasisfunctionexpansion.However,thesemethodsoftenrequireseveralmodelrunsandmaystillbecomputationallyintensiveforpracticalpurposes.Emulatorshavebeenaprovenmethodforreducingcomputationaltimeforunivariatesensitivityanalysis,withsomerecentdevelopmentformultivariatecomputermodels.Weproposetheuseofgeneralizedadditivemodelsandrandomforestscombinedwithaprincipalcomponentanalysisforemulationforamultivariatesensitivityanalysis.Wedemonstrateourmethodusingacomplexagriculturalsimulator.


BayesianHypothesisTestsWithDiffusePriors:CanWeHaveOurCakeAndEatItToo?

JohnOrmerod,MichaelStewart,WeichangYu,andSarahRomanesUniversityofSydney

WeintroduceanewclassofpriorsforBayesianhypothesistesting,whichwename"cakepriors".ThesepriorscircumventBartlett'sparadox(alsocalledtheJeffreys-Lindleyparadox);theproblemassociatedwiththeuseofdiffusepriorsleadingtononsensicalstatisticalinferences.Cakepriorsallowtheuseofdiffusepriors(havingonescake)whileachievingtheoreticallyjustifiedinferences(eatingittoo).WedemonstratethismethodologyforBayesianhypothesestestsforscenariosunderwhichtheoneandtwosample -tests,andlinearmodelsaretypicallyderived.Theresultingteststatisticstaketheformofapenalized

likelihoodratioteststatistic.ByconsideringthesamplingdistributionunderthenullandalternativehypothesesweshowforindependentidenticallydistributedregularparametricmodelsthatBayesianhypothesistestsusingcakepriorsarestronglyChernoff-consistent,i.e.,achievezerotypeIandIIerrorsasymptotically.Lindley'sparadoxisalsodiscussed.

ProgrammeAndAbstractsForThursday30thOfNovember

Keynote:Thursday30th9:00Mantra

AGeneralFrameworkForFunctionalRegressionModelling

SonjaGrevenandFabianScheiplLMUMunich

Recenttechnologicaladvancesgenerateanincreasingamountoffunctionaldata,datawhereeachobservationrepresentsacurveoranimage(RamsayandSilverman,2005).Examplesoftechnologiesthatgeneratefunctionaldataincludeimagingtechniques,accelerometers,spectroscopyandspectrometry.Anykindofmeasurementcollectedovertime-datausuallyreferredtoaslongitudinal-canalsobeviewedaspotentiallysparselyobservedfunctionaldata.

Researchersareincreasinglyinterestedinregressionmodelsforfunctionaldatatorelatefunctionalobservationstoothervariablesofinterest.Wewilldiscussacomprehensiveframeworkforadditive(mixed)modelsforfunctionalresponsesand/orfunctionalcovariates.Theguidingprincipleistoreframefunctionalregressionintermsofcorrespondingmodelsforscalardata,allowingtheadaptationofalargebodyofexistingmethodsforthesenoveltasks.Theframeworkencompassesmanyexistingaswellasnewmodels.Itincludesregressionfor'generalized'functionaldata,meanregression,quantileregressionaswellasgeneralizedadditivemodelsforlocation,shapeandscale(GAMLSS)forfunctionaldata.Itadmitsmanyflexiblelinear,smoothorinteractiontermsofscalarandfunctionalcovariatesaswellas(functional)randomeffectsandallowsflexiblechoicesofbases-inparticularsplinesandfunctionalprincipalcomponents-andcorrespondingpenaltiesforeachterm.Itcoversfunctionaldataobservedoncommon(dense)orcurve-specific(sparse)grids.Penalized

likelihoodbasedandgradient-boostingbasedinferenceforthesemodelsareimplementedinRpackagesrefundandFDboost,respectively.Wealsodiscussidentifiabilityandcomputationalcomplexityforthefunctionalregressionmodelscovered.Arunningexampleonalongitudinalmultiplesclerosisimagingstudyservestoillustratetheflexibilityandutilityoftheproposedmodelclass.ReproduciblecodeforthiscasestudyisalsoavailableonlinewiththerecentdiscussionpaperGrevenandScheipl(2017)thistalkisbasedon.

Thursday30th10:30Narrabeen

HockeySticksAndBrokenSticks–ADesignForASingle-Arm,Placebo-Controlled,Double-Blind,RandomizedClinicalTrialSuitableForChronicDiseases

HansHockey1andKristianBrock21BiometricMattersLtd.

2CancerResearchUKClinicalTrialsUnit

Thisworkismotivatedandexemplifiedbyageneticdisordercausingearlyonsetdiabetes,blindnessanddeafness,whichisextremelyrare,inevitablyfatalandhasnocurrentdirecttreatment.Whilethestandardplacebo-controlledRCTisthegoldstandardrequiredbytheregulatoryagencyforanewproposeddrugstudy,itisconjecturedthatpotentialstudyparticipantswillpreferadesignwhichguaranteesthattheyarealwaysassignedtothedrugunderstudy.Asingle-armdesignisproposedwhichmeetsthispatientneedandhenceprobablyincreasesrecruitmentandcompliance.Atthesametime,itmeetstherequirementforfullrandomization.Analyseswhichfollownaturallyfromthisdesignarealsodescribedandwereusedintrialsimulationsforsamplesizingandforexaminationoftheeffectofunderlyingassumptions.

Thursday30th10:30Gunnamatta

BoundingIVEstimatesUsingMediation

AnalysisThinking

TheisLange1,21UniversityofCopenhagen

2PekingUniversity

Inthispaperweinitiallydemonstratethatthewell-knownassumptionsforconductingIV-analysescanequivalentlybeexpressedusingnaturaleffectsfrommediationanalysis.Viewingtheassumptions,anindeedthewholeIVanalysis,fromamediationanalysisperspectiveopensupanovelpossibilityforboundingthetruecausaleffectoftheexposurewhenthedistributionalassumptionsoftheIVanalysisfail;e.g.ifthereisaninteractionsbetweentheunmeasuredconfoundersandexposure.Theprocedureworksacrossalleffectscales(riskdifference,hazardratioetc.)andtypesofviolationsofthedistributionalassumptions.TheproposedmethodcanalsobeviewedasasensitivityanalysisfortheIV-analysiswherethereisonlyasingletuningparameter,whichcanbestraightforwardlyinterpreted.ForthepurelybinarycasetheproposedboundsconvergestotheBalkeandPearlboundswhenthetuningparametertendstoinfinity(i.e.wheneventhemostextrememisspecificationisconsidered).Astheproposedmethodiscomputationallydemandingsometimeisspentonimplementationconsiderations.


CorrelatedBivariateNormalCompetingRisks---StructuringEstimationInAnIll-PosedProblem

MalcolmHudsonMacquarieUniversity

Estimationofcorrelationbetweencompetingriskshaslongbeenknowntobeanill-posedproblem,duetolackofidentifiablity,termedtheidentifiabilitycrisis[Crowder,ScandJStat1991].Recently,manysemi-parametricmodelsandafullyparametricmodelhavebeenusedtopermitestimationandassess

sensitivityofresultstodegreeofcorrelation[JeongandFineBiostatistics2007;seealsoTaietal,SIM2008].Parametricmodelsarenon-standardandcalculationscomplex;JeongandFine'sparametricmodelemploysaGompertzdistribution.Forlog-Normaldata,complexityofcalculationsinabivariateNormal(BVN)modelisforbiddinganddirectoptimizationunstable.

WedescribeaparametricsolutionavailableintheBVNcase.OurapproachusesanEMalgorithmforcompetingrisks(generalisingtheAitkin'sEMforunivariatesurvival).ComplexcalculationsareevaluatedinclosedformusingalemmaofStein[Liu,StatistProbLetters1994].Thisprobabilisticandstatisticalplatformforexploringtheill-posednessisimplementedwithinthebncR-package(inpreparation).

Weintroducethesevariouscomponentsinthetalk.


BayesianRegressionWithFunctionalInequalityConstraints

JoshuaBon1,BerwinTurlach1,KevinMurray1,andChristopherDrovandi21UniversityofWesternAustralia

2QueenslandUniversityofTechnology

WeinvestigatehowtoconductBayesianinferencewithfunctionalinequalityconstraintsovertheparameterspace.Thisproblemisanalogoustosemi-infiniteprogramminginoptimisation.AnovelmethodusingSequentialMonteCarlo(SMC)isgiven.TheSMCalgorithmapproximatesthedistributionofminimainordertosuccessivelyiteratetowardstheconstrainedparameterspace.Wedemonstrateonforensicmorphometricdataofhumanskullswheremonotonicityisrequiredinmorethanonedimension.Methodsarecomparedtothosecurrentlyavailablewithonedimensionalconstraintsandtheresultsdiscussed.


GeneticAnalysisOfRenalFunctionInAn

IsolatedAustralianIndigenousCommunity

RussellThomson1,BrendanMcMorran2,WendyHoy3,MatthewJose4,TimThornton5,GaétanBurgio2,andSimonFoote2

1WesternSydneyUniversity2ANU

3UniversityofQueensland4UniversityofTasmania

5UniversityofWashington

Incloseconsultationwiththelocallandcouncil,andwithethicalapprovalfrommanyethicscommittees,wehaveperformedagenome-wideassociationstudy(GWAS)onasamplecohortfromthe1990s.Wealsohaveafollowupdatasetof120studyparticipantsfrom2014,withDNAsequencedata.

Iwilldiscussthestatisticalanalysesofthesedatasets,withrespecttoissuesofdegradedDNA,higherrorratesandcorrelatedsamples.


ThePerformanceOfModelAveragedTailAreaConfidenceIntervals

PaulKabailaLaTrobeUniversity

Commonlyinappliedstatistics,thereissomeuncertaintyastowhichexplanatoryvariablesshouldbeincludedinthemodel.Frequentistmodelaveraginghasbeenproposedasamethodforproperlyincorporatingthis"modeluncertainty"intoconfidenceintervalconstruction.Suchproposalshavebeenofparticularinterestinenvironmentalandecologicalstatistics.

Theearliestapproachtotheconstructionoffrequentistmodelaveragedconfidenceintervalswastofirstconstructamodelaveragedestimatoroftheparameterofinterestconsistingofadata-basedweightedaverageoftheestimatorsofthisparameterunderthevariousmodelsconsidered.Themodel

averagedconfidenceintervaliscenteredonthisestimatorandhaswidthproportionaltoanestimateofthestandarddeviationofthisestimator.However,thedistributionalassumptiononwhichthisconfidenceintervalisbasedhasbeenshowntobecompletelyincorrectinlargesamples.

AnimportantconceptualadvancewasmadebyFletcher&Turek(2011)andTurek&Fletcher(2012)whoputforwardtheideaofusingdata-basedweightedaveragesacrossthemodelsconsideredofproceduresforconstructingconfidenceintervals.Inthiswaythemodelaveragedconfidenceintervalisconstructedinasinglestep,ratherthanfirstconstructingamodelaveragedestimator.

WereviewtheworkofKabailaetal(2016,2017)whichevaluatestheperformanceofthemodelaveragedtailareaconfidenceintervalofTurek&Fletcher(2012)inthe“testscenario”oftwonestednormallinearregressionmodels.Ourassessmentofthisconfidenceintervalisthatitperformsquitewellinthisscenario,providedthatthedata-basedweightfunctioniscarefullychosen.

References:

1. Kabaila,P.,Welsh,A.H.,&Abeysekera,W.(2016)Model-averagedconfidenceintervals.ScandinavianJournalofStatistics.

2. Kabaila,P.,Welsh,A.H.andMainzer,R.(2017)Theperformanceofmodelaveragedtailareaconfidenceintervals.CommunicationsinStatistics-TheoryandMethods.


DeconstructingTheInnateImmuneComponentOfAMolecularNetworkOfTheAgingFrontalCortex

EllisPatrick1,MarikoTaga2,MartaOlah2,Hans-UlrichKlein2,CharlesWhite3,JulieSchneider4,LoriChibnik5,DavidBennett4,SaraMostafavi6,

ElizabethBradshaw2,andPhilipDeJager21UniversityofSydney2ColumbiaUniversity

3TheBroadInstitute4RushUniversity

5HarvardUniversity6UniverstiyofBritishColumbia

Alzheimer'sdiseaseispathologicallycharacterizedbytheaccumulationofneuritic-amyloidplaquesandneurofibrillarytanglesinthebrainandclinicallyassociatedwithalossofcognitivefunction.ThedysfunctionofmicrogliacellshasbeenproposedasoneofthemanycellularmechanismsthatcanleadtoanincreaseinAlzheimer'sdiseasepathology.InvestigatingthemolecularunderpinningsofmicrogliafunctioncouldhelpisolatethecausesofdysfunctionwhilealsoprovidingcontextforbroadergeneexpressionchangesalreadyobservedinmRNAprofilesofthehumancortex.

WehaveusedmRNAsequencingtoconstructgeneexpressionprofilesofmicrogliapurifiedfromthecortexof11subjectsfromalongitudinalcohortofaging,RushMemoryAgingProject(MAP).Bystudyingthesemicrogliageneexpressionprofilesinthecontextoftissue-levelprofilesofthecortexof542subjectsfromtheMAPandReligiousOrdersStudy(ROS)weaddressdualproblems.ByusinginformationfromthelargeROSMAPcohort,weareabletoisolatethegeneswhicharestronglyassociatedwithimmuneresponse.Conversely,weillustratethatthemicrogliasignaturecanbeusedtohighlightpredefinedsetsofcoexpressedgenesinROSMAPthatarehighlyenrichedformicrogliagenes.AddressingthesetwoquestionsallowsustoidentifysetsofmicrogliaspecificgeneswhichareassociatedwithvariousAlzheimer'sdiseasetraitsfurtheremphasizingthemolecularconsequencesofmicrogliadysfunctioninthisdisease.Specifically,weareabletoidentifyasetofmicrogliarelatedgenesassociatedwithtauandamyloidpathologyaswellasanactivatedmicroglialmorphology.


BiasCorrectionInEstimatingProportionsByPooledTesting

GrahamHepworth1andBradBiggerstaff2

1UniversityofMelbourne2CentersforDiseaseControlandPrevention

Pooledtesting(orgrouptesting)ariseswhenunitsarepooledtogetherandtestedasagroupforthepresenceofanattribute,suchasadisease.Wehaveencounteredpooledtestingproblemsinplantdiseaseassessmentandprevalenceestimationofmosquito-borneviruses.

Intheestimationofproportionsbypooledtesting,theMLEisbiased,andseveralmethodsofcorrectingthebiashavebeenpresentedinpreviousstudies.WeproposeanewestimatorbasedonthebiascorrectionmethodintroducedbyFirth(1993),whichusesamodificationofthescorefunction.Ourproposedestimatorisalmostunbiasedacrossarangeofproblems,andsuperiortoexistingmethods.WeshowthatforequalpoolsizesthenewestimatorisequivalenttotheestimatorproposedbyBurrows(1987),whichhasbeenusedbymanypractitioners.


TheParametricCureFractionModelOfOvarianCancer

SerifatFolorunso,AngelaChukwu,andAkintundeOdukogbeUniversityofIbadan

WeproposeincorporatingtheGammalinkfunctiontothegeneralizedGammausingamixturecurefractionmodel.Themathematicalpropertiesoftheproposedmodelwereexploredandtheinferencesforthemodelswereobtained.TheproposedmodelcalledGamma-generalizedgammamixturecuremodel(GGGMCM)willbevalidatedusingareallifedatasetonovariancancer.


TheSkillings-MackStatisticForRanksDataInBlocks

JohnBestUniversityofNewcastle

SkillingsandMackgaveastatisticfortestingtreatmentdifferencesforranksdatainblocks-bothcompleteandincompleteblocksorblockswithmissingvalues.Thisgeneralstatisticisthusquiteuseful.ThereisanRpackageforitscalculation.Howeverweillustrateaproblemwithtiedranks.Sensoryevaluationdataisused.

PosterAbstracts

Multi-EnvironmentTrialAnalysisOfAgronomyTrialsUsingEstablishedPlantPopulationDensity

MichaelMumfordandKerryBellQueenslandDepartmentofAgricultureandFisheries

Thereisagrowinginterestwithinthegrainsindustrytodeterminetheyieldresponsetoplantpopulationdensity.Thiscanvarydependingonthetriallocation,thehybridthatisplantedandthemanagementpracticesthatareemployed.

Inthepast,theanalysisofplantpopulationdensityhasbeenperformedusingthetargetedplantpopulationdensityasdeterminedbythenumberofseedsinitiallyplanted.However,ithasbeenfoundthatthetargetedplantpopulationdensitymaydiffersignificantlyfromtheestablishedplantpopulationdensity,i.e.theplantpopulationdensityobservedinthefield.Insuchcircumstances,itbecomesimportanttousetheestablishedratherthantargetedplantpopulationdensitiesinordertoprovidethemostinformativemeasureofyieldresponsetoplantdensity.Thisisespeciallytruewhenperformingamulti-environmenttrial(MET)analysisduetovariationbetweenthetargetedandestablishedplantpopulationdensityateachtrial.

WeproposeaprocedureforperformingaMETanalysisusingestablishedplantpopulationdensityasanexplanatoryvariable.Usingalinearmixedmodelframework,separateregressionlines/curvesarefittoeachcombinationoftrialandmanagementpractice,whichwedefineas‘environment’.Environmentsarethengroupedaccordingtotheresultsofaclusteranalysisperformedusingtheestimatesoftheregressioncoefficients,ensuringthatenvironmentswithinaclustershareasimilarresponsetoplantdensity.Thelines/curvesfittedtoenvironmentswithinclustersarethenparallel,allowingustothenperformFisher’sleastsignificantdifferencetestingforyielddifferencesbetween

environmentswithineachcluster.

ThispresentationwillfocusontheapplicationoftheproposedmethodtoalargesetofmaizeandsorghumagronomytrialsconductedacrossNewSouthWalesandQueenslandoverthelastthreeyears.Forsimplicity,aseparateanalysiswasperformedforeachhybrid.

AdventuresInDigitalAgricultureInNewZealand

EstherMeenkenandVanessaCaveAgResearch

NewZealandisanationforwhichprimaryindustriesaretheeconomicbackbone.NewZealand'shighlyvariablelandscapemeansthereareawiderangeinclimaticandsoilconditions,withmultiplelandusesexistinginrelativelysmallspatialareas.ThismeanstherearemanypossibilitiestobeexploredinDigitalAgricultureasadvancesinsensorytechnologiesaremade.Thetechnicalchallengesknowntoberelatedtotheuseofsensorsincludeissuesaroundbigdata,missingdata,biasedandvariablesensors,andvarioustypesofstructuredandnon-normaldata.WewilldescribeasuiteofstudiesthathavebeenrecentlyundertakenthatcanbelooselygroupedundertheDigitalAgricultureumbrella.Casestudiesexplorearangeofscientificandstatisticalchallengesintheprimaryindustriesarena,andincludetheuseofclassicalandBayesianPrincipledExperimentalDesign,HierarchicalBayesianModels,neuralnetclassifiers,visualisation,animaltrackingviaGPS,anddatamanagement.

ModellingCanopyGreennessOverTimeUsingSplinesAndNon-LinearRegression

BethanyMacdonald1,JackChristopher2,andAlisonKelly11QueenslandDepartmentofAgricultureandFisheries

2QueenslandAllianceforAgricultureandFood

Theabilityofaplanttoretainleafgreennessforanextendedtimeafteranthesis,

knownasstay-green,hasbeenlinkedtogreateryieldinwheatandcanvaryamongstgenotypes.Manyaspectsofaplant'ssenescencecontributetostay-green;therefore,accuratelymodellingsenescenceandunderstandingthegeneticvariationinsenescencedynamicsisintegraltoselectingforthestay-greenphenotype.Normaliseddifferencevegetativeindex(NDVI)providesameasureofcanopygreennessandwhencollectedovertime,cangiveanindicationofwhensenescencebeginsandtherateatwhichitcontinues.Christopheretal.(2014)capturedgeneticvariationinsenescencedynamicsusingmultiplestay-greentraitsderivedfromalogisticregressionoflongitudinalNDVImeasurements.Thesetraitswereestimatedindependentlyforeachplotwithintheexperiment,ignoringanysourcesofcovariance,andthensubsequentlyanalysedusingalinearmixedmodel.However,noestimationerrorswerecarriedfromthefirsttothesecondstageoftheanalysis.

Wediscusstwoalternativeone-stagemethodsformodellingsenescencedynamicsbasedonlongitudinalNDVImeasurements.Bothofthesemethodsenablethevariabilityinsenescencepatternstobepartitionedintogeneticandnon-geneticcomponents,whilealsoincorporatingtheexperimentalstructure.Thefirstmethodinvolvestheuseofsplinesinalinearmixedmodelframework.Thismethodallowsanappropriatecovariancestructureforrepeatedmeasurestobeutilisedand,unlikeotherapproaches,enablesflexibilityinthesenescencepatterns.Thesecondmethodinvolvesfittinglogisticequationsinanon-linearmixedmodelframework.Theestimatedparametershavebiologicalinterpretations,aidinginthegeneticcomparisonofsenescencedynamics.Thesetwomethodsofferamorestatisticallyrobustapproachtomodellingcropsenescenceandtheunderlyinggeneticvariability.

OnTestingMarginalHomogeneityForSquareContingencyTablesWithOrdinalCategories

KoujiTahataTokyoUniversityofScience

Fortheanalysisofsquarecontingencytableswithorderedcategories,themarginalhomogeneitymodel,whichindicatestherowmarginaldistributionis

equaltothecolumnmarginaldistribution,wereconsidered.Stuart(1955)proposedtheteststatistics(denotedbyQ)forthemarginalhomogeneitymodel.Howeveritdoesnotusetheinformationofthecategoryordering.Agresti(1983)comparedbetweenQandtheMann-Whitneytestaboutthepowerbysimulations.Inthispaper,weconsidersomemeasurestorepresentthedegreeofdeparturefromthemarginalhomogeneity.Forexample,Tomizawa,MiyamotoandAshihara(2003)andTahata,IwashitaandTomizawa(2008).InthesituationgivenbyAgresti(1983),wecomparebetweenQandtestbasedonthesemeasuresaboutthepower.

AFactorAnalyticApproachToModellingDiseaseProgressionAcrossLeafLayersAndTime

ClaytonForknall,GregPlatz,LisleSnyman,andAlisonKellyQueenslandDepartmentofAgricultureandFisheries

ControllingandlimitingtheimpactoffoliardiseasesisachallengefacedbytheAustraliangrainsindustry.Foliardiseasescompromiseanddestroyphotosyntheticarea,thuslimitingplantresourcesandadverselyaffectingcropproductivity.Suchdiseasesofteninfectthelowerleaflayersofplantsearlyintheseasonand,dependentontheircompatibilitywiththehostandenvironment,progresstowardsthetopmostleaflayersovertime.

Synthesizingthecomplicateddynamicsofdiseaseprogression,overdifferentleaflayersandacrossagrowingseason,intoameasuretoestimatetheimpactoflossofleafareaonproductivityrequirestheassessmentoftheproportionofleafareacompromisedbydisease(LAD)atmultipletimesthroughouttheseason.AsimplemeasurethatcapturesbothLADanddiseasedurationonagivenleaflayeristheAreaUndertheDiseaseProgressCurve(AUDPC),formedbyapplyingthetrapezoidruletoLADassessmentsovertime.TheAUDPCisthenoftencorrelatedtoameasureofproductivityforestimatingtheimpactofdiseaseateitheravarietyorexperimentalunitlevel.

WeproposeanalternativeapproachtotheAUDPCtomodeldiseaseprogressionovertimeandleaflayersmoreefficiently.Usingafactoranalyticapproach,thecovariancebetweenleaflayers,bothwithinandacrossassessmenttimes,is

capturedonavarietybasisinalinearmixedmodelframework.Thisapproachwillnotonlyprovidemorereliableestimatesofthediseasepressureexertedonvarieties,butalsoinformofrelationshipsbetweenleaflayersandhowconsistenttheserelationshipsareacrosstime.Betterqualityinformationregardingtherelationshipbetweenlossofleafareaandproductivitycanbemadeavailabletoindustrythroughtheimprovedestimatesandunderstandingofdiseaseprogressionobtainedfromthismodellingapproach.

AssociatingStrawStrengthWithLikelihoodOfHeadLossInBarleyForWesternAustralia

DeanDiepeveen1,KefeiChen1,ChengdaoLi2,andDavidFarleigh31CurtinUniversity

2MurdochUniversity3DepartmentofPrimaryIndustriesandDevelopment

Barleyheadlossischaracterisedbystrawbreakagejustbelowtheheadatornearplantmaturity.Ourrecentresearchintobarleyheadlosshasenabledustoevaluatethevariouscomponentsofplantmaturityandhavefoundacleargeneticcomponenttothisindustryissue.Themorechallengingcomponentsofthisresearchistounderstandwhyourmoretolerantcultivarsalsoshowsubstantialheadlossincertainunseasonalconditionsduringharvest.Ourpreliminaryanalysesusesenvironmentaldatawithplantmeasurementdataandlooksattheincidencesofextremeweathereventsonbarleyheadloss.Whatconfoundstheanalysisisthenutrient/growthconditionsoftheplantandthepredispositionofthesebarleycropstoheadlossduringtheseweatherevents.Thispaperwillprovidesomelessonslearntandfuturedirections.

ADeepLearningNeuralNetworkModelToIdentifyTheImportantGenesInMetastaticBreastCancerFromCensoredMicroarrayData

Quoc-AnhTrinhVietnamNationalUniversity

Microarraydatahavebeenshowntocorrelatedwithsurvivalinbreastcancer.Themostdifficultinanalysisofthisdataconsistinasimultaneousmeasureofhugegeneexpressionlevelsoverafewpatientsavailable.TheconventionalsurvivalmodelssuchastheproportionalhazardmodelofCoxarenomoreappropriatedincludingthehazardproportionalhypothesisandthelinearcovariateseffecthypothesis.Prognosticstudiesinvolvetheidentificationofgenesthatcorrelatewithsurvivalinordertoprovidenewinformationonpathogenesis.Thisresultmayaidintheresearchofdrugdesignfornewtargets.Weintroduceinthispaperadeeplearningrecurrentneuralnetworkapproachtopredictsurvivaltimesfortheindividualpatientbasedonmicroarraymeasurement.Thisneuralnetworkdefinesthehazardfunctionofsurvivalnottomentiontheproportionalityandlinearityhypothesis.Wepresentadeeplearningapproachtotheidentificationofgenescriticalforthemetastasesofbreastcancer.Weidentifiedasetofhighlyinteractivegenesbyanalysingtheconnectivitymatrices.

UsingRAndShinyToDevelopWebBasedSamplingApplicationsForTheAgriculturalAndEducationSectors

PeterKasprzak,OlenaKravchuk,andAndyTimminsUniveristyofAdelaide

Samplingisanimportantaspectofagriculturalstatisticsandanimportantfeatureofindustrialreality.Enablinganefficientdialoguewithresearchersandagronomistsaboutsoundmethodsforfielddatacollectionwillincreasetheefficiencyandreliabilityofagriculturalprojects.TheShinypackageinRallowsinteractivewebapplicationstobecreatedbaseduponexistingRpackages.ThispresentationwilloutlinethestepsinvolvedincreatinganappthatimplementssamplingtechniquessuchasSRS,StratifiedSampling,andRatioSampling,withinavisuallyappealinggraphicalenvironmentforuseintheeducationandagriculturalsectors.

GraphicalNetworkAnalysesInformsBiomarkerDiscoveryViaQuantificationOfKeyDiseaseRelatedConnections

JamesDoeckeCSIRO

Determinationoftheoptimalsetofbiomarkersoftenreliesonlargescalefeatureselection,thatselectsaminimalsetofindependentbiomarkers,withoutconsiderationofanyinformationregardingthebiologicalnetwork.Thisresearchisaimedatdefiningbiologicalnetworkstructuresrelatedtodiseasephenotype,throughathree-levelstatisticalprocess.Briefly,wefirstimplementanunsupervisedcorrelationstructureacrossthesetofbiomarkers,usedtodefineclustersofbiomarkersviastrictthresholding.Next,usingthegraphicalnetworkanalysismethodDifferentialNetworkAnalysisinGenomics(DINGO),wedefinethedifferentialnetworkstructureforhealthyanddiseasestates.Lastlyfromthederiveddifferentialconnectivityscoreweclustertheindividualgroupsofbiomarkers.Fromthisprocess,weareabletoproduceanetworkofsimilarlygroupedsetsofbiomarkersthatcanseparatehealthyfromdiseasesubjects.Lastlyweperformarandomeffectsmodelforeachclustertoestimatetheassociationwithdiseasestatus.Usingthisdesign,wehavebeenabletoincorporateinformationfrommultiplecorrelatedbiomarkerstoincreasetheproportionofvarianceexplainedindiseasestatus.Wetestthisdesignontwodatasets,1)asetof~2000peptidebiomarkers,withtheexpressaimofdefiningsetsofbiomarkersassociatedwithAlzheimer’sDiseasepathology,and2)asetof~2000mRNAprobesetscombinedwithasetof~2000DNAmethylationprobeswiththeexpressaimofdefiningsetsofbiomarkersassociatedwithbreastcancer.Wepresentnetworkplotsandconnectivitystructuresforeachdataset,andhighlighttheincreasedperformanceofclusteredbiomarkersascomparedtoindividualbiomarkerstoseparatediseasestatus.Lastly,weshowthatbyutilizinganetworkapproach,wecanincorporatemorebiologicalinformationfromthedatatoexplainoutcome.

Spatio-TemporalCorticalBrainAtrophyPatternsOfAlzheimer'sDisease

MarcelaCespedes1,JamesMcGree1,ChristopherDrovandi1,KerrieMengersen2,JamesDoecke3,andJurgenFripp3

1QueenslandUniversityofTechnology2QueenslandUniversityofTechnlogy

3CSIRO

Thedegenerationofthecerebralcortexisacomplexprocesswhichoftenspansdecades.Thisdegenerationcanbeevaluatedonregionsofinterest(ROI)inthebrainthroughprobabilisticnetworkanalysis.However,currentapproachesforfindingsuchnetworkshavethefollowinglimitations:1)analysisatdiscreteagegroupscannotappropriatelyaccountforconnectivitydynamicsovertime;and2)morphologicaltissuechangesareseldomunifiedwithnetworks,despiteknowndependencies.Toovercometheselimitations,aprobabilisticdynamicwombledmodelisproposedtosimultaneouslyestimateROIcorticalthicknessandnetworkcontinuouslyoverage,andwascomparedtoanageaggregatedmodel.Theinclusionofageinthenetworkmodelwasmotivatedbytheinterestininvestigatingthepointintimewhenconnectionsalteraswellasthelengthoftimerequiredforchangestooccur.Ourmethodwasvalidatedviaasimulationstudy,andappliedtohealthycontrols(HC)andclinicallydiagnosedAlzheimer'sdisease(AD)groups.Theprobabilityofalinkbetweenthemiddletemporal(akeyADregion)andtheposteriorcingulategyrusdecreasedfromage55(posteriorprobability>0.9),andwasabsentbyage70intheADnetwork(posteriorprobability<0.12).ThesameconnectionintheHCnetworkremainedpresentthroughoutages55to95(posteriorprobability≥0.75).Theanalysespresentedinthisworkwillhelppractitionerschoosesuitablestatisticalmethodstoidentifykeypointsintimewhenbraincovarianceconnectionschange,inadditiontomorphologicaltissueestimates,whichcouldpotentiallyallowformoretargetedtherapeuticinterventions.

EmpiricalModellingOfFruitFirmnessChangeDuringColourConditioning

LindyGuoandRingoFengPlantandFood

Gold3(Actinidia.chinensisZesy002,marketedasZespri®SunGoldKiwifruit)isanewyellow-fleshedkiwifruitcultivarbeingusedtoreplaceHort16Athathas

beenalmostalldestroyedbytheoutbreakofPseudomonassyringaepv.actinidiae(Psa)in2010.TheSmartMonitoringprogrammestartedin2011,recordsfruitdevelopment,maturation,harvestingandstorageperformanceofGold3kiwifruitfrom12orchardsinafairlyconsistentmanneryearafteryear.

Theaimofthisprojectistosummarizefruitdevelopmentandstoragepotentialsoverthe5years,andhenceprovideabetterunderstandingoftherolesofseasonalweatherconditions,orchardmanagement,andfruitattributesatharvestaffectingstoragepotential.

Forthisproject,storagelifewasdefinedasstoragetimeforabatchoffruit(fruitharvestedfromanorchardonaparticulardate)tosoftento1kfg.Severalwaystocalculatedstoragelifebasedonfirmnessmonitoringdataarecomparedandthestorageliveswerecorrelatedwithweatherconditions,orchardmanagement,andfruitattributesatharvestusingdifferentalgorithms.Theresultswillbepresentedtohighlightexperienceslearntinthisdataanalysis.Recommendationswillbemadeondealingwithmultiyeardatawithcollinearvariables.

Biometrics By The Border...11. An Approach To Poisson Mixed Models For -Omics Expression Data 12....

Documents

Transcript of Biometrics By The Border...11. An Approach To Poisson Mixed Models For -Omics Expression Data 12....