Analysing foodborne outbreaks in the USA Renske · Analysing foodborne outbreaks in the USA Project...
Transcript of Analysing foodborne outbreaks in the USA Renske · Analysing foodborne outbreaks in the USA Project...
1
AnalysingfoodborneoutbreaksintheUSAProjectforDesignofExperiments–byRenskeBouma
IntroductionFoodisvitaltolife,butcanalsocauseillnessorevendeath.Foodcannamelybeacarrierofdangerousmicro-organisms,whichthenwillresultinfoodbornedisease.AccordingtotheWorldHealthOrganisation(WHO)afoodbornediseaseisdefinedas:‘Anydiseaseofaninfectiousortoxicnaturecausedby,orthoughttobecausedby,theconsumptionoffoodorwater’.Thisdefinitionalsoincludesdiseasescausedbynon-microbialsubstances,likeharmfulpesticidesorprocessingchemicals.Mostcommonarehowevertheillnessescausedbymicro-organismsandtheirtoxins(Adams&Moss2008)andthesewillbethefocusofthisreport.Afoodborneoutbreakisdefinedas:‘Anincidentinwhichtwoormorepersonsexperienceasimilarillnessresultingfromtheingestionofacommonfood’(CDC2000).Oftenenough,theoutbreaksarelargerthantwoillnesses.InthisreportfoodborneoutbreaksintheUSAareinvestigatedforthedependenceofaverageoutbreaksizeonlocationofpreparationofthefoodvehicle,themicro-organismthatcausedthediseaseandthestateitoccurredin.
ThedatabaseThedatabaseIuseisputtogetherbytheCentersforDiseaseControlandPrevention(CDC).ThisistheorganisationintheUSAthatisworkingtowardsabetterpublichealth.Toknowhowtodothis,theCDCneedstoknowwheretheproblemslayandthereforeitmonitorstheprevalenceofdiseases,likefoodbornediseases.ItcreatedtheFOODtool,theFoodborneOutbreakOnlineDatabase(CDC2015),inwhichallreportedcasesoffoodborneoutbreaksthatwerereportedtotheCDCsince1998areincluded.Idownloadedanextensiveexcelfilefromtheirwebsitetouseforthestatisticalanalysis.TheCDCwarnsthatthedatabaseisnotfinal,reportscanstillbechangedwhennewinformationisgathered.ThedatabaseIusedwaslastlyupdatedon16October2015.Thenewerreportscouldthereforebereflectingthetrueoutbreaklessthantheolderreports,whichcouldleadtosystematicerrors.However,thedatabasedoesnotcontainreportsnewerthan2014,soalsothenewestoutbreakshadalmostayeartobefullyreported.Thedatabaseincludesthefollowinginformationfromeveryoutbreak(seeFigure1):theyearandmonthitoccurred,thestate,thespecie/speciesthat(probably)causedthedisease,theserotype(ifknown)ofthemicro-organism,theetiologystatus(confirmedoronlysuspectedorigin),thelocation(s)ofpreparationoftheinfectedfood,theresultingillnesses,hospitalizationsanddeaths,thefoodvehicleandthecontaminatedingredientinthisfooditem.TosimplifyIonlyuseyear,state,genus,locationofpreparationandresultingillnessesinmyanalysis.Thesearethefactorsthatcanbegroupedmosteasilyingroupsthatarestillbigenoughforanalysisandthatseemthemostinterestingtome.
Figure1theoriginaldatabase
2
HypothesisIwouldliketoknowwhereamistakecausesthemostillnesses.Doesamistakebyacaterercausemoreillnessesonaveragethanamistakeatabanquet?Doesanoutbreakatarestaurantcausemoreillnessesthanoneathealthcare?Nexttothat,Iamcuriousiftheotherfactors,genusandstate,playaroleaswell.Whichgenuscausesthemostillnessesperoutbreak?Aretheredifferencesbetweenstatesinhowbigtheoutbreaksare?
ExperimentaldesignIwanttoknowwhetherornotthedifferencesinamountofillnessperoutbreakbetweendifferentlocationsofpreparations,generaandstatesaresignificantornot.Isitsimplybecauseofchancethattheylookdifferentorisitlikelythatthereisarealdifference?ToknowthisIwillanalysethevarianceofthedatawithanone-waybalancedANOVAinR.BeforeIcandothis,Ihavetostructurethedata.
StructuringofthedataTogetaclearresultIremovedallthedata-pointswithmultiplepossiblespecies,multiplelocationofpreparationsandthemultistateoutbreaks.IgroupedthedifferentspeciesofthemostcommongeneraasdisplayedinTable1andomittedalltheotherdata-pointsfromlesscommongenera.IalsogroupeddifferentlocationofpreparationsasdisplayedinTable2andleftoutallotherdata-pointsfromlesscommonlocations(likecampsandfestivals).Noneofthegroupshaslessthan100data-points,whichIbelievegivesagoodreliability.
Table1groupingofdifferentspeciesintheirrespectivegenus
group contains data-points Bacillus (B) B. cereus, B. other, B. unknown 246 Campylobacter (Ca) C. jejuni, C. coli, C. fetus, C. other, C unknown 185 Clostridium (Cl) C. perfringens, C. botulinum 547 Escherichia (Es) E. coli, enteroaggregative, E. coli enteropathogenic, E.
coli other, E. coli shiga toxin-producing 222
Norovirus (N) Norovirus, Norovirus Genogroup 1, Norovirus Genogroup 2, Norovirus unknown
3729
Salmonella (Sa) Salmonella, S. enterica, S. other, S. unknown, 1335 Shigella (Sh) Shigella, S. boydii, S. dysenteriae, S flexneri, S. sonnei,
S. unknown 112
Staphylococcus (St) S. aureus, S. other, S. unknown 415
Table2groupingofdifferentlocationsofpreparation
group contains data-points Banquet Banquet facility (food prepared and served on-site) 210
Caterer Caterer (food prepared off-site from where served), Caterer;
unknown, Caterer; other 687
Health care Hospital, Long-term care/nursing home/assisted living facility, long-term care..; Hospital, long-term care..; Other
162
Private home Private home/residence 861 Restaurant Restaurant- “Fast food” (drive up service or pay at counter),
Restaurant- other or unknown type, Restaurant – other or unknown type; Other, Restaurant – Sit-down dining, Restaurant – sit-down dining; Other
4871
3
Afterthisstructuringand‘cleaning’,thedata-setlookedlikeshowninFigure2.Thedatasetstillcontainedmorethan6500data-points.
FirstlookatthedataBeforestartingtogointothestatistics,Ihavealookattherawdata.Themeanofallillnessesfromalloutbreaksis20.33.Plottingthedatagivesanideaabouttherangetheamountofillnessesperoutbreakcanbein.InFigure3theoutbreaksize(theamountofillnessesperoutbreak)forthedifferentlocationsisgiven.Itcanbeseenthatmostoftheoutbreaksresultinlessthan100illnesses,buttherearemanyexceptions.Fiveoutbreakswererelativelyextremeandresultedinmorethan600reportedillpeople.Mostoutliersareseenattherestaurant,butnoconclusionscanbedrawnfromthis,astheamountofdata-pointsarenotequalforalllocations.Thedata-setcontainsbyfarthemostpointsforrestaurants,whichcanexplainthebiggerrangeofpointswithinthisgroup.Whenonlylookingattheboxplots,thereseemstobeadifferencebetweenthefirstthreelocationsandthelasttwo.
Figure3boxplotofoutbreaksizeversuslocationofpreparation
Figure2thestructureddata-setinExcel
4
TheboxplotspergenusisgiveninFigure4.Ofcoursemostoftheoutbreaksareagainbelow100illnesses,asthesamedataisplottedasbefore,justinadifferentgrouping.Interestingtoseeiswheretheoutliersare.Escherichiaseemsatfirstsighttobequitedangerous,becauseofthehighoutliers,buttheboxplotitselfisquitesmall.Norovirushasanexceptionalamountofoutliers,butjustasfortherestaurants,noconclusioncanbedrawnfromthis.TheNorovirusgroupcontainsthemostdata-pointsfromallgroupings,sothiscouldexplainthehigheramountofoutliers.Whenjustlookingattheboxplot,itseemsthatClostridium,NorovirusandShigellaareresultinginthemoreillnessesthantheothergenera.WewillseeifthisdifferenceisseenaswellfromtheANOVA.
Figure4boxplotofoutbreaksizeversusgenus
TheboxplotperstateisgiveninFigure5.Itisclearthattherearedifferencesbetweenthestates.Wewillfindoutlaterwhetherornotthesedifferencesaresignificantornot.
Figure5boxplotofoutbreaksizeversusstate
5
HypothesistestingMymaininterestiswhetherornotthelocationofpreparationofthefoodthatcausestheoutbreakhasasignificantinfluenceontheoutbreaksize.SoIperformanANOVAinRonthelocation.Thisistheoutput:
Df Sum Sq Mean Sq F value Pr(>F) Location 4 333252 83313 54.51 <2e-16 *** Residuals 6786 10371990 1528 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Theresultshowsthatatleastbetweentwolocationsthereisasignificantdifferenceinaverageoutbreaksize.Thisdifferenceissoextremethattheprobabilitythatthiswouldoccurunderthenullhypothesis(thereisnodifference)islessthan0.1%.
ValidationofassumptionsTheinterpretationoftheANOVAisonlyvalidwhentheassumptionsareacceptable.TheresultofANOVAismeaningfulwhenitcanbeassumedthattheresiduesarenormally,independentlyandidenticallydistributed(NIID).NormalitycanbecheckedbymakingaQ-Qplot.Whentheresiduesaremoreorlessnormallydistributedastraightlineisobserved.InFigure6itcanbeseenthatthisisnotthecase.Anon-parametictestshouldbeusedorthedatashouldbetransformedsothatthenormalityassumptionbecomesvalid.
Figure6Q-QplotofresiduesfromANOVAoflocation
6
HypothesistestingInthebook(Boxetal.2005)itwassuggestedthattakingthelogarithmoftheoutput(hereoutbreaksize)canstabilizethevariance.IperformtheANOVAwiththelog-transformeddata.ThisistheoutputfromR:
Df Sum Sq Mean Sq F value Pr(>F) Location 4 846 211.52 209.2 <2e-16 *** Residuals 6786 6861 1.01 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Theresultisthesameasbefore:thereisaverysignificancedifference.Letusseeifthistimewecantrusttheresult.
ValidationofassumptionsAgain,ImakeaQ-Qplottocheckiftheresiduesarenormallydistributed.AscanbeseeninFigure7thedotsaremoreorlessononeline.Normalitycanthereforebeassumed.
Figure7Q-QplotofresiduesoftheANOVAoflocationaftertransformation
7
Theresiduesshouldalsobeapproximatelyindependentlyandidenticallydistributed.InFigure8itcanbeseenthattheresiduesarespreadmoreorlessthesameovertheyears.Thereisnotrend,soprobablythereisnoautocorrelationbetweenthedata-points.Thismeanstheassumptionofindependencecanbemade.
Figure8plotofresiduesintimeorder
InFigure9theresiduesperexpectedvalueareplotted.Curiousenough,onlyfourlinesappear,whileIamresearchingfivelocations.Apparentlytwolocationshaveanexpectedvalueveryclosetoeachother.Thespreadofresiduesshouldbeapproximatelythesameperexpectedvaluetobeabletoassumeidenticaldistribution.Thespreadisnotidenticalhere,butitiscloseenough.
Figure9plotofresiduesversusexpectedvalue
8
HypothesistestingNowIknowIfoundanusefultransformationofmydataIdosomemorehypothesistestingwiththetransformeddata.Ialreadysawasignificantdifferencebetweendifferentlocationofpreparations.NowIamalsocuriousiftherearedifferencesbetweengenera.IntheANOVAtablebelowitcanbeseenthatthedifferentgeneradonotresultinthesameaverageoutbreaksize.
Df Sum Sq Mean Sq F value Pr(>F) Genus 7 648 92.50 88.88 <2e-16 *** Residuals 6783 7060 1.04 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Ialsowouldliketoknowiftherearesignificantdifferencesbetweenstates.FromtheANOVAtablebelowitisclearthatthereareverysignificantdifferencesbetweenstates.
Df Sum Sq Mean Sq F value Pr(>F) State 53 828 15.620 15.3 <2e-16 *** Residuals 6737 6879 1.021 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Theplotstochecktheassumptionsfortheseresidualsareleftoutforspacereasons,butcanbefoundintheappendix.Theygivenoreasontodoubtanyoftheassumptions.
Blocking?Wehaveseenthattherearedifferencesinaverageoutbreaksizebetweenoutbreaksindifferentstates,fromdifferentlocationsofpreparationandbydifferentgenera.CombiningtheeffectsinoneANOVAsothattheresiduesareexplainedbyallfactorswouldreducetheunexplainedremainingresidues.ThiswouldincreasethesensitivityoftheANOVAtestandwiththatthesignificanceoftheresults.However,theresultsarealreadyassignificantasitgets,soblockingisnotnecessary.
GraphicalANOVAsToseewherethedifferencesexactlylieIperformagraphicalANOVAforthelocations,generaandstates.ToknowwhichpointinthegraphcorrespondentstowhatIalsoshowtheaveragesinatable.Nexttothis,bylookingattherealaverages(andnotthetransformedones)inthesetableswecanseewhetherornotasignificantdifferenceisalsoaninterestingone.Withsuchalargedata-setadifferenceisalreadyquitefastasignificantone,butifthedifferenceissmall,thismightnotbeveryinteresting.
9
LocationsThegraphicalANOVAforthedifferentlocationsofpreparationcanbeseeninFigure10andtheaveragesaregiveninTable3.
Figure10GraphicalANOVAforlocations
Table3averageoutbreaksizeperlocationofpreparation
Locationofpreparation AverageoutbreaksizeBanquet 36.30Caterer 37.37Health care 31.68Private home 14.31Restaurant 17.93
Theresidualsarespreadfrom-3to5whilethedifferencebetweenlocationsisasbigas40.ItcanthereforebeseenclearlyfromthisgraphicalANOVAthatthereisasignificantdifferencebetweensomelocations.Thelowpointinthegraphisactuallyboththepointforprivatehomeandtherestaurant.Theyaresoclosetoeachotherthattheycannotbeseenasindividualpoints.Thustheyarevirtuallythesame.Theyarehoweververydifferentfromtheotherlocations:banquet,catererandhealthcare.Thismakessenseasatabanquetandacatererfoodisprovidedtobiggroupsofpeople.Athealthcaretheamountofpeopleeatingthefoodisnotasbigasforcatererorbanquet,butasthepeopleeatingthefoodarefragile,theyaremorepronetogetsickfromafoodcontamination.
10
GeneraThegraphicalANOVAforthedifferentgeneracausingtheillnessescanbeseeninFigure11Figure10andtheaveragesaregiveninTable4.Theorderofaverageoutbreaksizefromsmallesttolargestisgiventomakeiteasiertoseewhatthedifferentgenerahaveaseffectontheaverageoutbreaksize.
Figure11graphicalANOVAforgenera
Table4averageoutbreaksizepergenuscausingit
Genus Averageoutbreaksize OrderBacillus (B) 7.34 1Campylobacter (Ca) 10.79 2Clostridium (Cl) 24.54 6Escherichia (Es) 21.44 5Norovirus (N) 22.34 7Salmonella (Sa) 18.23 4Shigella (Sh) 33.39 8Staphylococcus (St) 11.38 3
BacillusoutbreaksresultonaverageintheleastillnesseswhileShigellaoutbreakscausethemost.ThebigaverageoutbreaksizebyShigellacouldbeexplainedbythefactthattheinfectiousdoseislow.Inotherwords,afewcellsarealreadyenoughtocauseadisease.Commonly,Shigellaisspreadperson-to-person,butwhenfoodispreparedbypersonnelthatcariesthebacterium,thefoodcangetinfected(Adams&Moss2008).Insuchsituationsoftenmanypeoplegetillasonesickpersoninthepersonnelofforexampleacateringcompanyhandlethefoodformany.Also,peoplethatgetillfromShigellaarelikelytoreportthemselves,asthesymptomsofshigellosisareoftenquiteextremeandneedmedicalattention.
11
Bacillusisaverycommonfoodpathogen.Itformssporesbywhichitcansurviveharshconditions.DifferentspeciesofBacillusformdifferententerotoxinswhichcanresultintwodifferentillnesses:diarrhoealandemeticsyndrome.Bothillnessesareoftenoverinlessthan24hoursandthesymptomsareinmostcasesquitenormal,likevomitinganddiarrhoea(Adams&Moss2008).Asmostpeopledonotreportthesekindsofsymptomsiftheyareoverinaday,itisverylikelythattheamountofillnessesofanoutbreakofabacillusspecieareunderreported.Itishoweveralsopossiblethatoutbreaksarereallysmallerasindividualproductscanbethesourceofanoutbreak(insteadofpersonnelhandlingfoodformany).Forexample,B.cereuscansurviveinpasteurizedmilk,butwillonlygrowandproducetoxinswhenstoredattoohightemperature(Adams&Moss2008).Inthiswayitispossiblethatonlyonepackagebecomesunsafe.Solelythepeopleeatingfromthatonepackagethengetsick(likeafamily).Thisresultsinsmalleroutbreaks.
StatesThegraphicalANOVAforthedifferentstateswheretheoutbreaksoccurredcanbeseeninFigure12Figure10andtheaveragesaregiveninTable5onthenextpage.Astherearemanystates,themostinterestingonesarehighlighted.
Figure12graphicalANOVAforstates
12
Table5Averageoutbreaksizeperstateitoccurredin
State Averageoutbreaksize State AverageoutbreaksizeAlabama 18.25 Nebraska 32.93Alaska 12.78 Nevada 51.45Arizona 24.33 NewHampshire 21.74Arkansas 28.27 NewJersey 21.02California 17.58 NewMexico 24.89Colorado 18.46 NewYork 25.37Connecticut 15.89 NorthCarolina 31.21Delaware 17.25 NorthDakota 32.06Florida 10.87 Ohio 18.71Georgia 24.60 Oklahoma 23.62Guam 3.50 Oregon 16.63Hawaii 22.69 Pennsylvania 20.22Idaho 19.71 PuertoRico 22.18Illinois 24.98 RepublicofPalau 6.00Indiana 21.63 RhodeIsland 17.53Iowa 31.23 SouthCarolina 29.57Kansas 20.51 SouthDakota 36.83Kentucky 30.45 Tennessee 29.62Louisiana 31.25 Texas 47.88Maine 10.19 Utah 33.97Maryland 18.49 Vermont 7.40Massachusetts 27.59 Virginia 28.89Michigan 35.71 Washington 13.68Minnesota 15.90 WashingtonDC 48.86Mississippi 50.62 WestVirginia 15.63Missouri 28.08 Wisconsin 24.08Montana 57.40 Wyoming 44.09TwostrikingaveragesarethoseforGuamandtheRepublicofPalau,withonly3.5and6illnessesrespectivelyonaverageperoutbreak.Aquickgoogleexplainswhy.GuamisalittleislandfareastofthePhilippinesandisinhabitedbylessthan200.000people(Wikipedia2016a).PalauisanotherislandrelativelyclosetoGuamandhasevenlesspeople:about25.000(Wikipedia2016c).Thesmallamountofpeopleandthedistancebetweenthemainlandandtheislandsprobablyexplainthesmalloutbreaks.AsitisacolonyoftheUSAwithquiteadifferentculture,theeagernesstoreportislikelytobesmaller.Alsotherearesimplylesspeopletogetill.Themainlandstate,Vermont,becomestheninterestingwithanaverageoutbreaksizeofonly7.4.Vermonthasagoodreputationwhenitcomestopublichealth.VermontgotfirstrankforhealthoutcomesintheUSAin2010.From2000to2008Vermontwasrankedasthehealthiestplacetolivesevenoutofeighttimes(Wikipedia2016d).Thelowaverageoutbreaksizefitsinthispicture.NevadaandMississippiareonthecompleteothersideoftherange,with51.5and50.6illnessesrespectivelyonaverageperoutbreak.Mississippiisinfamousforitshealthcare.ItwasgiventhelowestrankforhealthcareamongalltheAmericanstatesbytheCommonwealthFund(Wikipedia2016b).ThelargeaverageoutbreaksizeinNevadamightbecausedbythepopularityofLasVegas.MassivescaledbuffetsareverycommoninLasVegas,whichmeansthatifthereisanoutbreak,many(tourists)willgetsickatonce.
13
ConclusionanddiscussionItcanbeconcludedthatfoodborneoutbreaksdifferinsizedependingonlocationofpreparationofthefood,micro-organismcausingtheillnessandthestateitoccursin.Sizeinthiscasereferstothereportedamountofillnesses.Thequestionishoweverhowrealisticthereportsreflecttherealoutbreaksizes.Probably,alloutbreaksareunderreported,butsomemightbemoreunderreportedthanothers,whichisproblematicasthismightcreatesignificantdifferenceswhereactuallytherearenone.Nexttothis,Iwanttomentionthatabiggeroutbreaksizedoesnotdirectlysaysomethingabouttheseriousnessoftheoutbreak.Abigoutbreakcouldmeanthat50peoplehadtovomitones,andasmallonecouldmeanthat10peopledied.Thisreportissimplyandonlyabouttheamountofpeopleaffectedperoutbreak.AnotherremarkIwouldliketomakeisthefactthatIomitteddatabecauseofambiguity.Itcouldbethataspecificgenusishardtodistinguish,butactuallycreatesbigoutbreaks.Thiswouldnotbeseeninthisanalysisbecausealldata-pointswithmultiplepossiblemicro-organismscausingit,werenottakenintoaccount.Theomittingofdatadoeshaveanadvantageconsideringtheconclusionsaboutlocationofpreparation.Thefactthatunsafefoodwaspreparedsomewheredoesnotsaythatsomethingwentwronginthatparticularlocation.Itmightbethatatthefactory,oratthefarm,orduringtransportsomethingwentwrongbywhichunsafefoodwascreated.However,iffoodbecomesunsafeinanearlystepintheproduction,itislikelytoendupindifferentplaces.Byomittingthedata-pointswithmultiplelocations,itismoreprobablethatitwasactuallyamistakeatthatlocationcausingthediseaseintheremainingdata-points.Withthisitispossibletomakemorereliableconclusionsaboutwhereamistakeresultsinthebiggestfoodborneoutbreak.
ReferencesAdams, M.R. & Moss, M.O., 2008. Food Microbiology 3th ed., The Royal Society of Chemistry.
Box, G.E.P., Hunter, J.S. & Hunter, W.G., 2005. Statistics for Experimenters 2nd ed., Wiley-Interscience.
CDC, 2000. Appendix B Guidelines for Confirmation of Foodborne-Disease Outbreaks. Available at: http://www.cdc.gov/mmwr/preview/mmwrhtml/ss4901a3.htm [Accessed May 23, 2016].
CDC, 2015. Foodborne Outbreak Online Database (FOOD Tool). Available at: http://wwwn.cdc.gov/foodborneoutbreaks/ [Accessed May 20, 2016].
Wikipedia, 2016a. Guam. Available at: https://en.wikipedia.org/wiki/Guam [Accessed May 20, 2016].
Wikipedia, 2016b. Mississippi - health. Available at: https://en.wikipedia.org/wiki/Mississippi#Health [Accessed May 20, 2016].
Wikipedia, 2016c. Palau. Available at: https://en.wikipedia.org/wiki/Palau [Accessed May 20, 2016].
Wikipedia, 2016d. Vermont - Public health. Available at: https://en.wikipedia.org/wiki/Vermont#Public_health [Accessed May 20, 2016].
14
Appendix
Thecodesetwd("~/R/workingdirectory")Data=read.table("projectdata.txt",header=T,"\t")attach(Data)mean(Illnesses)plot(Location,Illnesses)plot(Genus,Illnesses)plot(State,Illnesses)##ANOVA###influencelocationonamountofIllnessesr.l=aov(Illnesses~Location)summary(r.l)#locationmatters,noisedidnotpreventresult,blockingnecessary?#assumptions#res.l=resid(r.l)qqnorm(res.l)#normalitydata-->notnormal!qqline(res.l)plot(x=Year,y=res.l)#independence-->notrendplot(fitted(r.l),res.l)#equalvariance-->notreally!#interpretationANOVAisquestionable##!needofdatatransformationorothertest!##datatransformation:logr.tl=aov(log(Illnesses)~Location)summary(r.tl)res.tl=resid(r.tl)qqnorm(res.tl)qqline(res.tl)plot(Year,res.tl)plot(fitted(r.tl),res.tl)#Better!Stillverysignificantinfluenceoflocationofpreparation#influencegenusonamountofIllnessesr.g=aov(log(Illnesses)~Genus)summary(r.g)res.g=resid(r.g)qqnorm(res.g)qqline(res.g)plot(Year,res.g)plot(fitted(r.g),res.g)#Assumptionsarealright,genusmattersverysignificantly!#influencestateonamountofIllnessesr.s=aov(log(Illnesses)~State)summary(r.s)
15
res.s=resid(r.s)qqnorm(res.s)qqline(res.s)plot(Year,res.s)plot(fitted(r.s),res.s)#Assumptionsarealright,Statemattersalsoverysignificantly!#blockingispossible,butnecessary?Testisalreadyverysign.#nowIwanttoknowwherethedifferencesare!##GraphicalANOVA##raw.total=c(Data$Illnesses)total=log(raw.total)ga=mean(total)#grandaveragepar(mfrow=2:1)#LocationsmeanL=aggregate(x=log(Illnesses),by=list(Location=Location),mean)devL=meanL$x-gastripchart(sqrt(6786/4)*devL,main="Locations")stripchart(res.tl,main="Residuals",method="stack",offset=0.005)#GenusmeanG=aggregate(x=log(Illnesses),by=list(Genus=Genus),mean)devG=meanG$x-gastripchart(sqrt(6783/7)*devG,main="Genera")stripchart(res.g,main="Residuals",method="stack",offset=0.005)#StatemeanS=aggregate(x=log(Illnesses),by=list(State=State),mean)devS=meanS$x-gastripchart(sqrt(6737/53)*devS,main="States")stripchart(res.s,main="Residuals",method="stack",offset=0.005)