Measuring Youth Program Quality: A Guide to Assessment...

92
Nicole Yohalem and Alicia Wilson-Ahlstrom, The Forum for Youth Investment with Sean Fischer, New York University and Marybeth Shinn, Vanderbilt University Published by The Forum for Youth Investment January 2009

Transcript of Measuring Youth Program Quality: A Guide to Assessment...

Page 1: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

NicoleYohalemandAliciaWilson-Ahlstrom,TheForumforYouthInvestmentwithSeanFischer,NewYorkUniversityandMarybethShinn,VanderbiltUniversity

PublishedbyTheForumforYouthInvestmentJanuary2009

Page 2: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

©January2009TheForumforYouthInvestment2

AbouttheForumforYouthInvestmentTheForumforYouthInvestmentisanonprofit,nonpartisan“actiontank”dedicatedtohelpingcommunitiesandthenationmakesureallyoungpeopleareReadyby21®–readyforcollege,workandlife.Informedbyrigorousresearchandpracticalexperience,theForumforgesinnovativeideas,strategiesandpartnershipstostrengthensolutionsforyoungpeopleandthosewhocareaboutthem.Atrustedresourceforpolicymakers,advocates,researchersandpractitioners,theForumprovidesyouthandadultleaderswiththeinformation,connectionsandtoolstheyneedtocreategreateropportunitiesandoutcomesforyoungpeople.

TheForumwasfoundedin1998byKarenPittmanandMeritaIrby,twoofthecountry’stopleadersonyouthissuesandyouthpolicy.TheForum’s25-personstaffisheadquarteredinWashingtonD.C.inthehistoricCady-LeeHousewithasatelliteofficeinMichiganandstaffinMissouri,NewMexico,VirginiaandWashington.

Page 3: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

SuggestedCitation:Yohalem,N.andWilson-Ahlstrom,A.withFischer,S.andShinn,M.(2009,January).MeasuringYouthProgramQuality:AGuide

toAssessmentTools,SecondEdition.

Washington,D.C.:TheForumforYouthInvestment.

©2009TheForumforYouthInvestment.Allrightsreserved.PartsofthisreportmaybequotedorusedaslongastheauthorsandtheForumforYouthInvestmentarerecognized.NopartofthispublicationmaybereproducedortransmittedforcommercialpurposeswithoutpriorpermissionfromtheForumforYouthInvestment.

PleasecontacttheForumforYouthInvestmentatTheCady-LeeHouse,7064EasternAve,NW,Washington,D.C.20012-2031,Phone:202.207.3333,Fax:202.207.3329,Web:www.forumfyi.org,Email:youth@forumfyi.orgforinformationaboutreprintingthispublicationandinformationaboutotherpublications.

NicoleYohalemandAliciaWilson-Ahlstrom,TheForumforYouthInvestmentwithSeanFischer,NewYorkUniversityandMarybethShinn,VanderbiltUniversity

Page 4: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

4

Theauthorswouldliketothankthefollowingprojectadvisorswhohelpeddeveloptheoriginalscopeofwork,providedinputintotheinterviewprotocolandreportoutlineandreviewedtheoriginalreportindraftform:

CarolBehrer,IowaCollaborationforYouth•Development

PriscillaLittle,HarvardFamilyResearchProject•

ElaineJohnson,NationalTrainingInstitutefor•CommunityYouthWork

JeffreyBuehler,MissouriAfterschoolState•Network

BobPianta,UniversityofVirginia•

MarybethShinn,VanderbiltUniversity•

WeareespeciallygratefulforthecontributionsofSeanFischerandMarybethShinn,whotooktheleadonreviewingthetechnicalpropertiesofeachinstrumentanddraftedthetechnicalsectionsofboththefirst(2007)andsecondeditionsofthisreport.

Thedevelopersofeachofthetoolsdescribedinthisreportalsodeservethanks,fortheirwillingnesstosharetheirmaterials,talkwithusandreviewdrafts.ThankstoJulieGoldsmith,AmyArbreton,BethMiller,WendySurr,EllenGannett,JudyNee,PeterHowe,SuzanneGoldstein,AjayKhashu,LizReisner,EllenPechman,ChristinaRussell,RheMcLaughlin,SaraMello,ThelmaHarms,CharlesSmith,DeborahVandellandKimPierce.

ThankstoKarenPittmanforherguidanceandsuggestionsthroughouttheprojectandtoseveralForumstaffmembers,includingNaliniRavindranathandLauraMattisfortheirassistanceinthelayout,designandeditingprocess.

Finally,thankstotheWilliamT.GrantFoundationforsupportingthisworkandinparticulartoBobGranger,VivianTsengandEdSeidman,whoseideas,suggestionsandencouragementwerecriticalintransformingthisfromanideatoafinalproduct.

Page 5: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

5

Introduction................................................................................... 6

UpdatedContent........................................................................... 7

Cross-CuttingComparisons......................................................... 10

At-a-GlanceSummaries................................................................ 19AssessingAfterschoolProgramPracticesTool.......................................................20

CommunitiesOrganizingResourcestoAdvanceLearningObservationTool.........21

Out-of-SchoolTimeObservationInstrument...........................................................22

ProgramObservationTool........................................................................................23

ProgramQualityObservationTool............................................................................24

ProgramQualitySelf-AssessmentTool....................................................................25

PromisingPracticesRatingScale.............................................................................26

QualityAssuranceSystem®......................................................................................27

School-AgeCareEnvironmentRatingScale............................................................28

YouthProgramQualityAssessment.........................................................................29

IndividualToolDescriptions......................................................... 30AssessingAfterschoolProgramPracticesTool.......................................................31

CommunitiesOrganizingResourcestoAdvanceLearningObservationTool.........38

Out-of-SchoolTimeObservationInstrument...........................................................43

ProgramObservationTool........................................................................................48

ProgramQualityObservationTool............................................................................53

ProgramQualitySelf-AssessmentTool....................................................................59

PromisingPracticesRatingScale.............................................................................63

QualityAssuranceSystem®......................................................................................68

School-AgeCareEnvironmentRatingScale............................................................72

YouthProgramQualityAssessment.........................................................................77

References..................................................................................... 85

Appendix........................................................................................ 87

Page 6: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

6

Thefollowingtoolsareincludedintheguideatthistime:

AssessingAfterschoolProgramPracticesTool(APT)NationalInstituteonOut-of-SchoolTimeandMassachusettsDepartmentofElementary&SecondaryEducation

CommunitiesOrganizingResourcestoAdvanceLearningObservationTool(CORAL)Public/PrivateVentures

Out-of-SchoolTimeObservationTool(OST)PolicyStudiesAssociates,Inc.

ProgramObservationTool(POT)NationalAfterSchoolAssociation

ProgramQualityObservationScale(PQO)DeborahLoweVandellandKimPierce

ProgramQualitySelf-AssessmentTool(QSA)NewYorkStateAfterschoolNetwork

PromisingPracticesRatingScale(PPRS)WisconsinCenterforEducationResearchandPolicyStudiesAssociates,Inc.

QualityAssuranceSystem®(QAS)Foundations,Inc.

School-AgeCareEnvironmentRatingScale(SACERS)FrankPorterGrahamChildDevelopmentInstituteandConcordiaUniversity,Montreal

YouthProgramQualityAssessment(YPQA)DavidP.WeikartCenterforYouthProgramQuality

Withtheafter-schoolandyouthdevelopmentfieldsexpandingandmaturingoverthepastseveralyears,programqualityassessmenthasemergedasacentraltheme.Thisinterestinprogramqualityissharedbypractitioners,policymakersandresearchersintheyouth-servingsector.

Fromaresearchperspective,moreevaluationsareincludinganassessmentofprogramqualityandmanyhaveincorporatedsetting-levelmeasures(wheretheobjectofmeasurementistheprogram,nottheparticipants)intheirdesigns.Atthepolicylevel,decision-makersarelookingforwaystoensurethatresourcesareallocatedtoprogramslikelytohaveanimpactandareincreasinglybuildingqualityassessmentandimprovementexpectationsintorequestsforproposalsandprogramregulations.Atthepracticelevel,programs,organizationsandsystemsarelookingfortoolsthathelpconcretizewhateffectivepracticelookslikeandallowpractitionerstoassess,reflectonandimprovetheirprograms.

Withthisgrowinginterestinprogramqualityhascomeanincreaseinthenumberoftoolsavailabletohelpprogramsandsystemsassessandimprovequality.Giventhesizeanddiversityoftheyouth-servingsector,itisunrealistictoexpectthatanyonequalityassessmenttoolwillfitallprogramsorcircumstances.Whilediversityinavailableresourcesispositiveandreflectstheevolutionofthefield,italsomakesitimportantthatpotentialusershaveaccesstogoodinformationtohelpguidetheirdecision-making.

Overthelastseveralyears,weattheForumhavefoundourselvesregularlyfieldingquestionsrelatedtoprogramqualityassessmentincludingwhattoolsexist,whatittakestousethemandwhatmightworkbestunderwhatconditions.Theneedtoofferguidancetothefieldintermsofavailableresourceshasbecomeincreasinglyclear.

Thisguidewasdesignedtocomparethepurpose,structure,contentandtechnicalpropertiesofseveralyouthprogramqualityassessmenttools.Itbuildsonworkwebeganinthisareafiveyearsago,aswellasrecentworkconductedbytheHarvardFamilyResearchProject

Page 7: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

7

todocumentandcompilequalitystandardsformiddleschoolprograms(Westmoreland,H.&Little,P.,2006).

CriteriaforInclusionWithanycompendiumcomesthechallengeofdeterminingwhattoinclude.Ourfirstcaveatisthatweplantocontinuerevisingthisguideovertime,inpartbecauseinitscurrentformitisnotinclusiveoftheuniverseofrelevanttoolsandinpartbecauseagreatdealofinnovationiscurrentlyunderway.Manyofthetoolsincludedinthereviewwillberevisedorwillundergofurtherfieldtestinginthenext1-2years.

Ourcriteriaforinclusionintheguidewereasfollows:

Toolsthatareorthatincludesetting-level•observationalmeasuresofquality.Weareparticularlyinterestedindirectprogramobservationasameansforgatheringspecificdataaboutprogramqualityandinparticular,staffpractice.Thereforethisreviewdoesnotfeatureothermethodologicalapproachestomeasuringquality(e.g.,surveyingparticipants,stafforparentsabouttheprogram).

Toolswhichareapplicableinarangeofschool•andcommunity-basedprogramsettings.Wedidnotincludetoolsthataredesignedtomeasurehowwellaspecificmodelisbeingimplemented(sometimesreferredtoasfidelity)orhavelimitedapplicabilitybeyondspecificorganizationsorapproaches.

Toolsthatincludeafocusonsocialprocesses•withinprograms.Manyofthetoolsinthisguideaddresssomestaticregulatoryorlicensingissues(e.g.,policiesrelatedtostaffing,healthandsafety).However,weareparticularlyinterestedintoolsthataddresssocialprocessesortheinteractionsbetweenandamongpeopleintheprogram.

Toolswhichareresearch-based.• Allofthetoolsincludedare“research-based”inthesensethattheirdevelopmentwasinformedbyrelevantchild/youthdevelopmentliterature.Althoughweareparticularlyinterestedininstrumentswithestablishedtechnicalproperties(e.g.,reliability,

validity),notallofthoseincludedfitthismorerigorousdefinitionof“research-based.”

PurposeandContentsoftheGuideWehopethiscompendiumwillprovideusefulguidancetopractitioners,policymakers,researchersandevaluatorsinthefieldastowhatoptionsareavailableandwhatissuestoconsiderwhenselectingandusingaqualityassessmenttool.Itfocusesonthepurposeandhistory,content,structureandmethodology,technicalpropertiesanduserconsiderationsforeachoftheinstrumentsincluded,aswellasabriefdescriptionofhowtheyarebeingusedinthefield.Foreachtool,weaimtoaddressthefollowingkeyquestions:

PurposeandHistory.Whywastheinstrumentdeveloped–forwhomandinwhatcontext?Isitsprimarypurposeprogramimprovement?Accreditation?Evaluation?Forwhatkindsofprograms,servingwhatagegroups,isitappropriatefor?

Content.Whatkindsofthingsaremeasuredbythetool?Istheprimaryfocusontheactivity,programororganizationlevel?Whatcomponentsofthesettingsareemphasized–socialprocesses,programresources,orthearrangementofthoseresources(Seidman,Tseng&Weisner,2006)?HowdoesitalignwiththeNationalResearchCouncil’spositivedevelopmentalsettingsframework1(2002)?

StructureandMethodology.Howisthetoolorganizedandhowdoyouuseit?Howaredatacollectedandbywhom?Howdotheratingscalesworkandhowareratingsdetermined?Canthetoolbeusedtogenerateanoverallprogramqualityscore?

TechnicalProperties.Isthereanyevidencethatdifferentobserversinterpretquestionsinsimilarways(reliability)?Isthereanyevidencethatthetoolmeasureswhatitissupposedtomeasure(validity)?SeetheAppendixfora“psychometricsdictionary”thatdefinesrelevantterminologyandexplainswhytechnicalpropertiesareanimportantconsideration.

1Thisreportincludedalistof“featuresofpositivedevelopmentalsettings”culledfromfrequentlycitedliterature.Ithascontributedtotheemergingconsensusaboutthecomponentsofprogramquality.

Page 8: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

8

UserConsiderations.Howeasyisthetooltoaccessanduse?Doesitcomewithinstructionsthatareunderstandableforpractitionersaswellasresearchers?Istrainingavailableontheinstrumentitselforonthecontentcoveredbyit?Aredatacollection,managementandreportingservicesavailable?Whatcostsareassociatedwithusingthetool?

IntheField.Howisthetoolbeingappliedinspecificprogramsorsystems?

Toensurethattheguideisusefultoarangeofaudienceswithdifferentpurposesandpriorities,wehaveprovidedbothin-depthandsummarylevelinformationinavarietyofformats.

Foreachtool,weprovidebothaonepage“at-a-glance”summaryaswellasalongerdescription.Theat-a-glancesummariesorlongertooldescriptionscanstandaloneasindividualresources.Shouldyoudecidetouseoneoftheseinstrumentsorwanttotakeacloserlookattwoorthree,youcouldpullthesesectionsoutandsharewithkeystakeholders.

Wealsoprovidecross-instrumentcomparisonchartsandtablesforthosewhowanttogetasenseofwhatthelandscapeofprogramqualityassessmenttoolslookslike.TheCross-CuttingObservationssectionthatfollowscomparestheinstrumentsacrossmostofthecategorieslistedabove(purpose,content,structure,technicalproperties,userconsiderations).Whiledefinitionsofqualitydonotdifferdramaticallyacrosstheinstruments,therearenotabledifferencesinsomeoftheseotherareaswhichwetrytocapture.

Page 9: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

9

Inthiseditionoftheguide,weupdatethesummariesofnineassessmenttoolsfeaturedintheoriginalMarch2007edition,andaddanadditionaltool–theCommunitiesOrganizingResourcestoAdvanceLearning(CORAL)ObservationTool)–developedbyPublic/PrivateVentures.Thiseditionalsoincludesrefineddefinitionsofvalidityandadiscussionregardingsomeofthelimitationsoftraditionalmethodsofestablishingreliability.

Sinceouroriginalpublication,therehasbeenaflurryofactivityrelatedtothedevelopmentanduseofthevarioustools.Almostallofthetooldevelopershavecontinuedtoworkoneithertechnicalorpracticalaspectsoftheirassessmenttools,aswellasonrelatedresourcestosupportpractitioneruseofthesetools.

Thesechangesdemonstratecontinuedinvestmentonthepartofdevelopersinmakingtoolsmoreaccessibleanduser-friendlytoprogramsandsystemstryingtoimplementqualityassessmentandimprovement.Changesthathavebeenmadeorareindevelopmentsince2007include:

Furtherpsychometrictestingofthereliabilityand•validityofmeasures(OST;YPQA)

Developmentand/orexpansionofresourcesto•supporttheuseofvarioustools(APT;POT;QSA;QAS)

Developmentand/orexpansionoftheavailabilityof•web-basedtoolsandresources(QAS;QSA;YPQA)

Aligningqualityassessmenttoolswithother•measurestocreateapackageofcompatibletools(APT)

Restructuringoftheframeworkand/orscales•(APT;OST)

Expandingaccessbytranslatingatoolintodifferent•languages(SACERS)

Developmentofbrother/sistertoolstargeting•differentagegroups(YPQA;SACERS)

Wehopethiscompendiumwillprovideusefulguidancetopractitioners,policymakers,researchersandevaluatorsinthefieldastowhatoptionsareavailableandwhatissuestoconsiderwhenselectingandusingaqualityassessmenttool.Welookforwardtoupdatingthecompendiumagainasthisworkadvances.

Page 10: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

10

TOOLDEVELOPERSKEYAPT:AssessingAfterschoolProgramPracticesToolNationalInstituteonOut-of-SchoolTimeandMassachusettsDepartmentofElementary&SecondaryEducation

CORAL:CommunitiesOrganizingResourcestoAdvanceLearningObservationToolPublic/PrivateVentures

OST:Out-of-SchoolTimeObservationToolPolicyStudiesAssociates,Inc.

POT:ProgramObservationToolNationalAfterSchoolAssociation

PQO:ProgramQualityObservationScaleDeborahLoweVandellandKimPierce

QSA:ProgramQualitySelf-AssessmentToolNewYorkStateAfterschoolNetwork

PPRS:PromisingPracticesRatingScaleWisconsinCenterforEducationResearchandPolicyStudiesAssociates,Inc.

QAS:QualityAssuranceSystem®Foundations,Inc.

SACERS:School-AgeCareEnvironmentRatingScaleFrankPorterGrahamChildDevelopmentInstituteandConcordiaUniversity,Montreal

YPQA:YouthProgramQualityAssessmentDavidP.WeikartCenterforYouthProgramQuality

Althoughtheindividualtooldescriptionsincludewhatwehopeisusefulinformationaboutseveraldifferentprogramqualityassessmentinstruments,theirlevelofdetailmaybedaunting,particularlywithoutasenseofthebroaderlandscapeofresources.Someoftheindividualizedinformationabouteachtoolcanbefurtherdistilledinwaysthatmayhelpreadersunderstandboththebroadercontextofprogramqualityassessmentandwhereindividualtoolsfallwithinthatcontext.Wewerenotabletocollectcompletelycomparableinformationaboutallinstrumentsineverytopicarea,butinthosecaseswherewewere,wehavesummarizedandcomparedthatinformationinnarrativeandcharts.

Figure1:TargetAgeandPurpose

Figure2:CommonandUniqueContent

Figure3:Methodology

Figure4:StrengthofTechnicalProperties

AdditionalTechnicalConsiderations

Figure5:TechnicalGlossary

Figure6:TrainingandSupportforUsers

Page 11: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

©January2009TheForumforYouthInvestment 11

Mostofthetoolsincludedinthisreviewweredevelopedprimarilyforself-assessmentandprogramimprovementpurposes.Some,however,weredevelopedwithprogrammonitoringoraccreditationasakeygoalandseveralweredevelopedexclusivelyforuseinresearch.Manyhavetheirrootsinearlychildhoodassessment(SACERS,

POT,PQO)whileothersdrawmoreheavilyonyouthdevelopmentand/oreducationliterature(APT,CORAL,OST,PPRS,QAS,QSA,YPQA).Whilethemajorityoftoolsweredesignedtoassessprogramsservingabroadrangeofchildren(oftenK–12orK–8),somearetailoredformorespecificageranges.

ProgramTargetAge

PrimaryPurpose

GradesServed Improvement Monitoring/Accreditation

Research/Evaluation

AssessingAfterschoolProgramPracticesTool(APT) GradesK–8 ¸ ¸

CommunitiesOrganizingResourcestoAdvanceLearningObservationTool(CORAL) GradesK–5 ¸ ¸

Out-of-SchoolTimeObservationTool(OST) GradesK–12 ¸

ProgramObservationTool(POT) GradesK–8 ¸ ¸

ProgramQualityObservationScale(PQO) Grades1–5 ¸

ProgramQualitySelf-AssessmentTool(QSA) GradesK–12 ¸

PromisingPracticesRatingScale(PPRS) GradesK–8 ¸

QualityAssuranceSystem(QAS) GradesK–12 ¸

School-AgeCareEnvironmentRatingScale(SACERS) GradesK–6 ¸ ¸ ¸

YouthProgramQualityAssessment(YPQA) Grades4–12 ¸ ¸ ¸

Page 12: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

©January2009TheForumforYouthInvestment12

Thereisreasonableconsensusacrossinstrumentsaboutthecorefeaturesofsettingsthatmatterfordevelopment.Allofthetoolsincludedinthisreviewmeasuresixcoreconstructs(atvaryinglevelsofdepth):relationships,environment,engagement,socialnorms,skillbuildingopportunitiesandroutine/structure.ThecontentofmostoftheinstrumentsalignswellwiththeNationalResearchCouncil’sfeaturesofpositive

developmentsettingsframework(2002),whichhashelpedcontributetothegrowingconsensusaroundelementsofqualitythathasemergedsincethen.Intermsofwhatcomponentsofsettingsthetoolsemphasize(Seidmanetal,2006),allincludeafocusonsocialprocesses.Althoughonlyasubsetemphasizeprogramresources,severalincludeitemsrelatedtothearrangementofresourceswithinthesetting.

Management(CORAL,POT,QAS,QSA)

Staffing(APT,YPQA,QSA,

SACERS,POT)

YouthLeadership/Participation(APT,YPQA,OST,

QSA,PPRS)

ALLTOOLSMEASURE:RelationshipsEnvironmentEngagementSocialNorms

Skill-BuildingOpportunitiesRoutine/Structure

LinkagestoCommunity

(APT,YPQA,SACERS,QSA,QAS,POT)

Page 13: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

©January2009TheForumforYouthInvestment 13

2Thetimesamplingmethodhasobserversgothroughacycleofselectingindividualparticipants(ideallyatrandom)toobserveforbriefperiodsoftimeanddocumenttheirexperiences.

Manyofthetoolsincludedinthisreviewfollowasimilarstructure.Theytendtobeorganizedaroundacoresetoftopicsorconstructs,eachofwhichisdividedintoseveralitems,whicharethendescribedbyahandfulofmoredetailedindicators.Somevariationdoesexist,however.Forexample,thePQOincludesauniquetimesamplingcomponent.2Whilemosttoolsareorganizedaroundfeaturesofquality,somearenot.

Forexample,whiletheAPTaddressesacoresetofqualityfeatures,thetoolitselfisorganizedaroundtheprogram’sdailyroutine(e.g.,arrival,transitions,pick-up).Observationistheprimarydatacollectionmethodforeachoftheinstrumentsinthisreview,althoughseveralrelyuponinterview,questionnaireordocumentreviewasadditionaldatasources.

TargetUsers DataCollectionMethods

Prog

ram

St

aff

Exte

rnal

Ob

serv

ers

Obse

rvat

ion

Inte

rvie

w

Ques

tionn

aire

Docu

men

tRe

view

AssessingAfterschoolProgramPracticesTool(APT) ¸ ¸ ¸ ¸CommunitiesOrganizingResourcestoAdvanceLearningObservationTool(CORAL) ¸ ¸Out-of-SchoolTimeObservationTool(OST) ¸ ¸ProgramObservationTool(POT) ¸ ¸ ¸ ¸ ¸ProgramQualityObservationScale(PQO) ¸ ¸ProgramQualitySelf-AssessmentTool(QSA) ¸ ¸ ¸PromisingPracticesRatingScale(PPRS) ¸ ¸QualityAssuranceSystem(QAS) ¸ ¸ ¸ ¸ ¸School-AgeCareEnvironmentRatingScale(SACERS) ¸ ¸ ¸ ¸YouthProgramQualityAssessment(YPQA) ¸ ¸ ¸ ¸

Page 14: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

©January2009TheForumforYouthInvestment14

Mostoftheinstrumentshavesomeinformationshowingthatifdifferentobserverswatchthesameprogrampractices,theywillscoretheinstrumentsimilarly(internalconsistencyandinterraterreliability).Few,however,havelookedatotheraspectsofreliabilitythatareofinterestwhenassessingthestrengthofaprogramquality

measure.Severaloftheinstrumentshavepromisingfindingstoconsiderintermsofvalidity–meaningtheyhavemadesomeefforttodemonstratethattheinstrumentaccuratelymeasureswhatitissupposedtomeasure.Seetheaccompanyingglossaryonpage15andtheAppendixformoredetaileddefinitionsofpsychometricterms.

Scor

eDi

strib

utio

ns

Inte

rrat

er

Relia

bilit

y

Test

-Ret

est

Relia

bilit

y

Inte

rnal

Co

nsiste

ncy*

Conv

erge

nt

Valid

ity

Conc

urre

nt/

Pred

ictiv

eVa

lidity

Valid

ity

ofS

cale

St

ruct

ure*

AssessingAfterschoolProgramPracticesTool(APT) ¸̧ † ¸̧ †

CommunitiesOrganizingResourcestoAdvanceLearningObservationTool(CORAL) ¸̧ ¸ ¸̧ ¸̧ ¸ ¸̧

Out-of-SchoolTimeObservationTool(OST) ¸̧ ¸ ¸̧ ¸ ¸̧ ¸ ¸̧ ¸̧

ProgramObservationTool(POT) ¸̧ †̧ ¸̧ †̧ ¸̧ †̧ ¸̧ †

ProgramQualityObservationScale(PQO) ¸̧ ¸ ¸̧ ¸ ¸̧ ¸̧ ¸ ¸̧ ¸ ¸̧ N/A

ProgramQualitySelf-AssessmentTool(QSA)

PromisingPracticesRatingScale(PPRS) ¸̧ ¸ ¸̧ ¸̧ ¸ ¸̧ N/A

QualityAssuranceSystem(QAS)

School-AgeCareEnvironmentRatingScale(SACERS) ¸̧ ¸ ¸̧ ¸ ¸̧ ¸̧

YouthProgramQualityAssessment(YPQA) ¸̧ ¸ ¸̧ ¸̧ ¸ ¸̧ ¸̧ ¸ ¸̧ ¸̧ ¸

Key=NoEvidence

¸̧ ¸ =Evidenceofthispropertyisstrongbygeneralstandards

¸̧ =Evidenceofthispropertyismoderatebygeneralstandards,promisingbutlimitedormixed(strongonsomeitemsorscale,weakeronothers)

¸ =Evidenceofthispropertyisweakerthandesired

*Thistypeofevidenceisonlyrelevantforinstrumentswithalotofitemsthatwouldbeusefuliforganizedintoscales.†Psychometricinformationisnotbasedontheinstrumentinitscurrentform,soitsgeneralizabilitymaybelimited.

Page 15: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

©January2009TheForumforYouthInvestment 15

Whatisit? WhyisitUseful?

ScoreDistributions

Thedispersionorspreadofscoresfrommultipleassessmentsforaspecificitemorscale.

Inorderforitemsandscales(setsofitems)tobeuseful,theyshouldbeabletodistinguishdifferencebetweenprograms.Ifalmosteveryprogramscoreslowonaparticularscale,itmaybethattheitemsmakeit“toodifficult”toobtainahighscoreand,asaresult,don’tdistinguishbetweenprogramsonthisdimensionverywell.

InterraterReliability

Howmuchassessmentsbydifferenttrainedratersagreewhenobservingthesameprogramatthesametime.

ItisimportanttouseinstrumentsthatyieldreliableinformationregardlessofthewhimsorpersonalitiesofindividualobserversIffindingsdependlargelyonwhoisratingtheprogram(raterAismorelikelytogivefavorablescoresthanraterB),itishardtogetasenseoftheprogram’sactualstrengthsandweaknesses.

Test-RetestReliability

Thestabilityofaninstrument’sassessmentsofthesameprogramovertime.

Ifaninstrumenthasstrongtest-retestreliabilitythanthescoreitgeneratesshouldbestableovertime.Thisisimportantbecausewewantchangesinscorestoreflectrealchangesinprogramquality.Thegoalistoavoidsituationswhereaninstrumentiseithertoosensitivetosubtlechangesthatmayholdlittlesignificance,orinsensitivetoimportantlong-termchanges.

InternalConsistency Thecohesivenessofitemsforminganinstrument’sscales.

Scalesaresetsofitemswithinaninstrumentthatjointlymeasureaparticularconcept.If,however,theitemswithinagivenscaleareactuallyconceptuallyunrelatedtoeachother,thentheoverallscoreforthatscalemaynotbemeaningful.

ConvergentValidity

Theextenttowhichaninstrumentcomparesfavorablywithanotherinstrument(preferablyonewithdemonstratedvaliditystrengths)measuringidenticalorhighlysimilarconcepts.

Itisimportanttouseaninstrumentthatgeneratesaccurateinformationaboutwhatyouaretryingtomeasure.Iftwoinstrumentsarepresumedtomeasureidenticalorhighlysimilarconcepts,wewouldexpectprogramsthatreceivehighscoresononemeasuretoalsoreceivehighscoresontheother.

Concurrent/PredictiveValidity

Theextenttowhichaninstrumentisrelatedtodistincttheoreticallyimportantconceptsandoutcomesinexpectedways.

Ifaninstrumentaccuratelymeasureshighprogramqualitythenonecanexpectittopredictbetteroutcomesfortheyouthparticipatingintheprogram.Theinstrumentsfindingsshouldalsoberelatedtodistinct,theoreticallyimportantvariablesandconceptsinexpectedways.

ValidityofScaleStructure

Theextenttowhichitemsstatisticallygrouptogetherinexpectedwaystoformscales.

Itishelpfultoknowexactlywhichconceptsaninstrumentismeasuring.Factoranalysiscanhelpdetermineifonescaleactuallyincorporatesmorethanonerelatedconceptorifdifferentitemscanbecombinedbecausetheyareessentiallymeasuringthesamething.

Page 16: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

16

Manyinstrumentsinthisreporthavestrongreliabilityandvalidityevidenceusingtraditionallyacceptedtechniques.However,aswithanyfield,newmethodsareoftenintroducedtoadvanceourunderstandingofreliabilityandvalidity.Inthissection,wediscusssomeofthelimitationsoftraditionalmethodsthathavebeenhighlightedbyresearchersattheDavidP.WeikartCenterforYouthProgramQuality3(YPQAdeveloper),whatthesemethodscanandcannottellusaboutaninstrument’sreliabilityandvalidity,andhownewmethodsareaddressingtheseissues.

VariationsinQualityAcrossDifferentContextsInanidealworld,scoresweobtainfromprogramqualityinstrumentswouldalwaysbeperfectlyaccurate.Unfortunately,realitytendstobemessierbecausemanyfactorsinfluenceassessments.Forexample,differentratersmaynotperceiveobservationsinthesamewayandthusgivedifferentscorestothesamequestions.Differentstaffordifferentactivitiesmightgetdifferentscores,ifwecouldobservethemall,buttypicallywecannotdoso.Anotherpossibleissuecouldbethatprogramstaffinteractwithchildrendifferentlyatthebeginningoftheyear(whentheydonotknowthechildrenwellyet)versustheendoftheyear.Whentheseinfluencesareunaccountedforwhenusinganinstrument,theyarecollectivelyknownas“errorvariance”.Whenaninstrumentisreliable,itsscoresarenotinfluencedbymucherror.

Unfortunately,traditionalreliabilitymethods,includingtheonesusedbyinstrumentsinthisreport,donotaccountforallpossiblesourcesofvariationinscores(therebyincreasingtheinaccuracyoftheinstrument),asdiscussedbySteveRaudenbushandhiscolleagues(forareadabletreatmentofthistopic,seeMartinez&Raudenbush,2008).CharlesSmithattheWeikartCenterhasdonepreliminaryworkthatexaminessourcesofvariationfortheYPQA,includingwhetherratingsaredifferentduringearlierprogramsessionsversuslaterprogramsessions.Hehasalsoexamined

whetheritisenoughtoobserveonetypeofactivitywithinanagencyversusobservingabroadrangeofactivities(readerswhoareinterestedinspecificfindingsshouldrefertothetechnicalsummaryoftheYPQA,seepages78-82).

Whenweknowhowaninstrumentisinfluencedbyallthesefactors,wecantakestepstoreduceerror.Forexample,ifaninstrument’sscoresvarywidelydependingonwhichactivitiesweareobserving,thenweshouldobserveawiderangeofactivities.Ifscoresdependonthetimeofday,thenweshouldconductobservationsatmultipletimesthroughouttheday.Byaccountingfortheseadditionalinfluencesonprogramquality,wereduceerrorandobtainmoreaccuratescores.Atthispointintime,theYPQAistheonlyinstrumenttohavepreliminaryinformationonexternalsourcesofvariationbeyondinterraterreliability.

UnderstandingAssumptionsaboutInternalConsistencySeveralinstrumentsinthisreporthaveinternalconsistencyinformationontheirscales.Asexplainedinotherpartsofthisreport,scalesarecomposedofitemsmeanttomeasureaparticularconcept.Measuringinternalconsistencyisoftenthefirststepinevaluatingwhethertheitemsformameaningfuldomainbydeterminingwhethertheyarecohesive(readerswhowouldlikeamoreextensiveexplanationwithexamplesshouldrefertotheAppendix).

However,Smithpointsoutthatinternalconsistencyisonlyappropriatewhentheitemsarereflectionsofanon-tangibleconcept(calledreflectiveforshort).Asanexample,considertheconcept“SupportiveEnvironment.”Althoughthismightbeanimportantconcepttoassess,onecannotmeasureitthesamewayonewouldmeasuretemperatureorweight.Instead,researchersmustrelyonasetofquestionstoapproximatehowsupportivetheenvironmentisforchildren.Oneanalogyforareflectiveconceptcouldbeanartsculpture–totrulyappreciateit,onemustlookatthesculpturefrommultipleangles.

3TheWeikartCenterisajointventurebetweentheHigh/ScopeEducationalResearchFoundationandtheForumforYouthInvestment.

Page 17: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

17

Similarly,totrulymeasureareflectiveconcept,onemustexamineitusingagroupofsimilaritems(whichprovidethedifferentangles).Theoreticallyspeaking,researchersinterestedindevelopingascalecouldprobablygeneratehundredsofpossibleitemstomeasureaparticularconcept.Ofcourse,itisimpracticaltousethemall,soresearcherschooseamanageablesubset.

Incontrast,internalconsistencyscoresarenotappropriatewhenconceptsareformative,whichmeansthatasingleconceptisacompositeofmultiple,separatecomponents(MacKenzie,Podsakoff,&Jarvis,2005).Agoodanalogyforthistypeofconceptisapuzzle.Toassessaformativeconcept,youneedtogatherallofthepiecesandputthemtogether.Forexample,imaginethatwewishtomeasure“ProgramResources.”Unlikeareflectiveconcept,thistypeofconceptisacompositeofseveralimportantcomponents(thepuzzlepieces).

Ouritemswouldinquireaboutthingslikemoney,time,space,andnumberofstaffmembers.Eachoftheseresourcesmaybeanimportantcomponentoftheoverallconceptandareessentialtoincludeinthescaleifwearetoobtainaclearpictureandcompletethepuzzle.Unlikereflectiveconcepts,researcherscannotchooseasubsetofitemsfromalargelistofpossibilities.Rather,eachitemisanimportantcomponenttothewhole.Becausetheconceptisacompositeofseparateandpotentiallyunrelatedparts,thecohesivenessoftheitemsisnotimportant,andthereforeinternalconsistencyproceduresarenotappropriate(asstatedintheAppendix,internalconsistencymeasurestherelatednessofitems,whichassumesthattheitemsarereflective).

TheWeikartCenterhasbeenreexaminingtheYPQAscalestoassesswhethertheyarereflectiveversusformative.Althoughthisworkisstillinprogress,theresultswillhaveimportantimplicationsforhowwethinkaboutevaluatingbothreliabilityandvalidityforobservation-basedmetrics.

Page 18: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

©January2009TheForumforYouthInvestment18

*Afeestructuremaybedevelopedovertime,onceadditionalmaterialsarecompleted.†Traininganddataserviceshaveonlybeenmadeavailableinthecontextofspecificresearchprojects.††Theseareestimatesoftimenecessarytoprepareobservers;developersofthesetoolshavenottrained“toreliability.”

SixoftheteninstrumentsincludedinthisreviewarefreetousersandavailabletodownloadfromtheInternet;theotherfourhavevariouscostsassociatedwiththeiruse.Inmost,butnotallcases,trainingisavailable(atafee)forthoseinterestedinusingthetool.Manycomewithuser-friendlymanualsthatexplainhow

tousetheinstrument;insomecasesthesematerialsarestillunderdevelopment.Inseveralcases,thedevelopersofthetoolsalsoprovidedatacollection,managementandreportingservicesatadditionalcosttousers.Detailsaboutsuchconsiderationsareincludedintheindividualtooldescriptions.

Cost

Trai

ning

Ava

ilabl

e

Estim

ated

Tim

eNe

cess

ary

to

Trai

nOv

erse

rver

sto

Gen

erat

e

Relia

bleSc

ores

Estim

ated

M

inim

um

Obse

rvat

ion

Tim

eNe

eded

to

Gene

rate

Sou

nd

Data

Data

Col

lect

ion,

M

anag

emen

tan

dRe

portin

gAv

aila

ble

AssessingAfterschoolProgramPracticesTool(APT) Free* Yes

4hourtrainingplus2programobservations

1afternoon(2-3hours)

No

CommunitiesOrganizingResourcestoAdvanceLearningObservationTool(CORAL) Free No 2days 3-4hours No

Out-of-SchoolTimeObservationTool(OST) Free No†8-18hours,dependingonexperience

3hours No†

ProgramObservationTool(POT)

$300AdvancingQualityKit

Yes 2.5-3days3-5hours(forself-assessment)

No

ProgramQualityObservationScale(PQO) Free No†

2hoursplus2-4observations&2-4timesamples,dependingonexperience

1.5hoursobservation&.5hourstimesampling

No†

ProgramQualitySelf-AssessmentTool(QSA) Free Yes 2hours†† N/A No

PromisingPracticesRatingScale(PPRS) Free No†

2hoursplus2-4observations,dependingonexperience

2hours No†

QualityAssuranceSystem(QAS)

$75AnnualSiteLicense

Yes 2-3hours††1afternoon(2-3hours)

Yes

School-AgeCareEnvironmentRatingScale(SACERS)

$15.95SACERSBooklet

Yes 4-5days 3hours Yes

YouthProgramQualityAssessment(YPQA) $39.95YPQAStarterPack

Yes 2days 4hours Yes

Page 19: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

19

Detaileddescriptionsofthetenassessmenttoolsareprovidedinthenextsection.Hereweofferone-pagesummariestocopyandshare.Eachsummaryfollowsacommonformat.

AssessingAfterschoolProgramPracticesTool(APT)NationalInstituteonOut-of-SchoolTimeandMassachusettsDepartmentofElementary&SecondaryEducation

CommunitiesOrganizingResourcestoAdvanceLearningObservationTool(CORAL)Public/PrivateVentures

Out-of-SchoolTimeObservationTool(OST)PolicyStudiesAssociates,Inc.

ProgramObservationTool(POT)NationalAfterSchoolAssociation

ProgramQualityObservationScale(PQO)DeborahLoweVandellandKimPierce

ProgramQualitySelf-AssessmentTool(QSA)NewYorkStateAfterschoolNetwork

PromisingPracticesRatingScale(PPRS)WisconsinCenterforEducationResearchandPolicyStudiesAssociates,Inc.

QualityAssuranceSystem®(QAS)Foundations,Inc.

School-AgeCareEnvironmentRatingScale(SACERS)FrankPorterGrahamChildDevelopmentInstituteandConcordiaUniversity,Montreal

YouthProgramQualityAssessment(YPQA)DavidP.WeikartCenterforYouthProgramQuality

Page 20: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

20

DevelopedbyNIOSTandtheMassachusettsDepartmentofElementary&SecondaryEducation

Overview:TheAssessmentofAfterschoolProgramPracticesTool(APT)isdesignedtohelppractitionersexamineandimprovewhattheydointheirprogramtosupportyoungpeople’slearninganddevelopment.Itexaminesthoseprogrampracticesthatresearchsuggestsrelatetoyouthoutcomes(e.g.,behavior,initiative,socialrelationships).AresearchversionoftheAPT(theAPT-R)wasdevelopedin2003-2004.Thismoreuser-friendlyself-assessmentversionwasdevelopedin2005.

PrimaryPurpose(s):ProgramImprovement;Monitoring/Accreditation

ProgramTargetAge:GradesK–8

RelevantSettings:Bothstructuredandunstructuredprogramsthatserveelementaryandmiddleschoolstudentsduringthenon-schoolhours.

Content:TheAPTmeasuresasetof15program-levelfeaturesandpracticesthatcanbesummarizedintofivebroadcategories–programclimate,relationships,approachesandprogramming,partnershipsandyouthparticipation.

Structure:The15programfeaturesaddressedbytheAPTaremeasuredbytwotools–theobservationinstrument(APT-O)andquestionnaire(APT-Q).TheAPT-Oguidesobservationsoftheprograminaction,whiletheAPT-Qexaminesaspectsofqualitythatarenoteasilyobservedandguidesstaffreflectiononthoseaspectsofpracticeandorganizationalpolicy.

Methodology:Itemsthatareobservablewithinagivenprogramsession(typicallyonefullafternoon)areassessedintheAPT-O.TheAPT-Qisaquestionnairetogatherinformationaboutplanning,frequencyandregularityofprogramofferings

andopportunitiesandfrequencyofconnectionswithfamiliesandschool.BoththeAPT-OandAPT-Qhavefour-pointscales,thoughflexibilityisencouragedforuserswhofindthescalesnotusefulfortheirpurposes.Dependingonwhatpartofthetool(s)isbeingused,thescalesmeasurehowcharacteristicanitemisoftheprogram,theconsistencyofanitemorthefrequencyofanitem.Foreachitem,concretedescriptorsillustratewhatascoreof1,2,3or4lookslike.

TechnicalProperties:Whilenopsychometricinformationisavailableforthecurrentself-assessmentversionoftheAPT,someisavailableontheresearchversion(APT-R)onwhichitisbased.FortheAPT-R,interraterreliabilitywasmoderateandpreliminaryevidenceofconcurrentandpredictivevalidityisavailable.NIOSThasplansforfurthertestingoftheAPT.

UserConsiderations:EaseofUse

“Cheatsheets”demonstratelinkbetweenquality•andoutcomes.

Instrumentisextremelyflexibleinterms•ofadministration,useofscales,numberofobservations,etc.

Theinstrumentisdesignedforuserstomake•observationsinjustoneprogramsession.

Theinstrumentcanbeusedaspartofapackage•includinganoutcomestoolanddatatrackingsystem.

AvailableSupportsTrainingonboththeAPTitselfandtheyouth•developmentprinciplesembeddedintheinstrumentisavailablethroughNIOST.

Packagingandpricinginformationabouttraining•ontheinstrumentisavailablefromNOISTfororganizationsnotalreadyaffiliatedwiththeAPT.

ForMoreInformation:www.niost.org/content/view/1572/282/orwww.doe.mass.edu/21cclc/ta

Page 21: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

21

CommunitiesOrganizingResourcestoAdvanceLearningObservationTool

Overview:TheCORALobservationtoolwasdesignedbyPublic/PrivateVentures(P/PV)fortheCORALafter-schoolinitiativefundedbytheJamesIrvineFoundation.ThetoolwasdevelopedforresearchpurposesandwasprimarilyusedinaseriesofevaluationstudiesontheCORALafter-schoolinitiative.TheprimarypurposeoftheobservationswastomonitorfidelitytotheBalancedLiteracyModelandchangeinqualityandoutcomesovertime.Thetoolwasusedintwoways:1)observationofliteracyinstructionand2)observationofprogramminginsupportofliteracy.ThoughtheCORALobservationtoolwasdesignedtohelpobserversmeasuretheimpactofafter-schoolprogramsonacademicachievement,ithasapplicationsforobservingqualityinawidevarietyofsettings.

PrimaryPurpose:Research/Evaluation

ProgramTargetAge:GradesK–5

RelevantSettings:Structuredliteracy-basedprograms,bothschoolandcommunity-based.

Content:TheCORALobservationtooldocumentstheconnectionbetweenthequalityoftheprogram,fidelitytotheBalancedLiteracyModelandtheacademicoutcomesofparticipants.

Structure:TheCORALobservationtoolisstructuredaroundfivekeyconstructsofquality–adult-youthrelations,effectiveinstruction,peercooperation,behaviormanagementandliteracyinstruction.Thetoolisdividedintofiveparts.Thefirstthree–theactivitydescriptionform,characteristicsformandtheactivitycheckboxform–arefocusedondescribingtheactivityaswellasparticipantandstaffbehavior.Thesecondtwo

componentsincludeanactivityscaleandanoverallassessmentform,andarecompletedaftera90-minuteobservationperiod.

Methodology:Eachconstructisbasedonafive-pointratingscale.Theactivitydescriptionform,characteristicsformandactivitycheckboxformarefilledoutbeforeanactivityisobserved,andcontainthemostinformativeaspectsoftheactivity.Theactivityscaleandoverallassessmentformarecompletedaftera90-minuteobservationsession.

TechnicalProperties:Evidenceforscoredistributionsandpredictivevalidityisstrongbygeneralstandards,andevidenceforinternalconsistencyandthevalidityofscalestructureispromisingbutlimited.

UserConsiderations:EaseofUse

Containsdetailedinstructionsforconducting•observations.

Includesspaceforopen-endednarratives.•

Scoringtakes3-4hours,includingcompletingthe•ratingscales,relatednarrativesandtheoverallassessment.

AvailableSupportsCurrently,trainingislimitedtoindividualsinvolved•inspecificevaluationsthatemploytheinstrument.

Public/PrivateVenture’swebsitefeaturesafree•downloadofmaterialsintheirAfterschoolToolkit.

ForMoreInformation:www.ppv.org/ppv/initiative.asp?section _ id=0&initiative _ id=29

Page 22: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

22

Out-of-SchoolTimeProgramObservationToolDevelopedbyPolicyStudiesAssociates,Inc.

Overview:TheOut-of-SchoolTimeProgramObservationTool(OST)wasdevelopedinconjunctionwithseveralresearchprojectsrelatedtoout-of-schooltimeprogramming,withthegoalofcollectingconsistentandobjectivedataaboutthequalityofactivitiesthroughobservation.Itsdesignisbasedonseveralassumptionsabouthigh-qualityprograms–firstthatcertainstructuralandinstitutionalfeaturessupporttheimplementationofhigh-qualityprogramsandsecondthatinstructionalactivitieswithcertaincharacteristics–variedcontent,mastery-orientedinstructionandpositiverelationships–promotepositiveyouthoutcomes.

PrimaryPurpose:Research/Evaluation

ProgramTargetAge:GradesK–12

RelevantSettings:Variedschool-andcommunity-basedafter-schoolprograms.

Content:TheOSTdocumentsandratesthequalityofthefollowingmajorcomponentsofafter-schoolactivities:interactionsbetweenyouthandadultsandamongyouth,staffteachingprocessesandactivitycontentandstructures.

Structure:ThefirstsectionofOSTallowsfordetaileddocumentationofactivitytype,numberanddemographicsofparticipants,spaceused,learningskillstargeted,typeofstaffandtheenvironmentalcontext.Theremainderofthetoolassessesthequalityofactivitiesalongfivekeydomainsincludingrelationships,youthparticipation,staffskillbuildingandmasterystrategiesandactivitycontentandstructure.

Methodology:TheOSTobservationinstrumentusesaseven-pointscaletoassesstheextenttowhicheachindicatorisorisnotpresentduringanobservation.Qualitativedocumentation,recordedonsite,supplementstheratingscales.ActivityandqualityindicatordatafromtheOSTobservationinstrumentisusedinconjunctionwithrelatedsurveymeasures.

TechnicalProperties:Evidenceforinterraterreliabilityisstrongbygeneralstandards,asisevidenceforscoredistributionsandinternalconsistency.Evidenceforconcurrentvalidityandthevalidityofthescalestructureispromisingbutlimited.

UserConsiderations:EaseofUse

Freeandavailableonline.•

Toolincludesanintroductionandbasicprocedures•foruse.

Includessometechnicallanguagebuthasbeenused•bybothresearchersandpractitioners.

Ratersmustobserveapproximately3hoursof•programmingtogeneratesounddata.

Observerscanbetrainedtogeneratereliable•observationsthrough8-16hoursoftraining,dependingonlevelofexperience.

AvailableSupportsTrainingislimitedtoindividualsinvolvedinspecific•evaluationsthatemploytheinstrument.

Additionalnon-observationalmeasuresrelatedto•after-schoolprogrammingareavailablefromPSAthatcanbeusedinconjunctionwiththeOST.

ForMoreInformation:www.policystudies.com/studies/youth/OST%20Instrument.html

Page 23: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

23

DevelopedbytheNationalAfterSchoolAssociation

Overview:TheProgramObservationToolisthecenterpieceoftheNationalAfterSchoolAssociation’s(NAA)programimprovementandaccreditationprocessandisdesignedspecificallytohelpprogramsassessprogressagainsttheStandardsforQualitySchool-AgeCare.Developedin1991byNAAandtheNationalInstituteonOut-of-SchoolTime,thetoolwasrevisedandpilotedbeforetheaccreditationsystembeganin1998.

PrimaryPurpose(s):ProgramImprovement;Monitoring/Accreditation

ProgramTargetAge:GradesK–8

RelevantSettings:Schoolandcenter-basedafter-schoolprograms.

Content:TheProgramObservationToolmeasures36“keysofquality,”organizedintosixcategories.Fiveareassessedprimarilythroughobservation:humanrelationships;indoorenvironment;outdoorenvironment;activities;andsafety,healthandnutrition.Thesixth–administration–isassessedthroughquestionnaire/interview.ThetoolreflectsNAA’scommitmenttoholisticchilddevelopmentanditsaccreditationorientation.

Structure:Thefivequalitycategoriesthatarethefocusofthetoolaremeasuredusingoneinstrumentthatincludesthe20relevantkeysandatotalof80indicators(fourperkey).Ifaprogramisgoingthroughtheaccreditationprocess,theadministrationitemsareassessedseparately,throughquestionnaire/interview.

Methodology:Theratingscalecaptureswhethereachindicatoristrueallofthetime,mostofthetime,sometimesornotatall.Specificdescriptionsofwhata0,1,2or3lookslikearenotprovided,butdescriptivestatementshelpclarifythemeaningofeachindicator.Programsseekingaccreditation

mustassignanoverallprogramratingbasedonindividualscoresandguidelinesareprovidedforobserverstoreconcileandcombinescores.Foraccreditationpurposes,theprogram/activitiesandsafety/nutritioncategoriesare“weighted.”

TechnicalProperties:NopsychometricevidenceisavailableonthePOTitself,butthereisinformationabouttheASQ(AssessingSchool-AgeChildcareQuality),fromwhichthePOTwasderived.Overall,evidenceforinterraterandtest-retestreliabilityisstrongbygeneralstandards.Followingrevisionstothescales,evidenceforinternalconsistencywasalsostrong.PreliminaryevidenceofconcurrentvalidityisalsoavailablefortheASQ.

UserConsiderations:EaseofUse

Accessiblelanguageandformatdevelopedwithinput•frompractitioners.

Whenusedforself-assessment,observationand•scoringtakesroughly3-5hours.

Aself-studymanualprovidesdetailedguidanceon•instrumentadministration.

Thepackagecostsapproximately$300(additional•costsforfullaccreditation).

AvailableSupportsThePOTispartofanintegratedsetofresourcesfor•self-studyandaccreditation.

Thefullaccreditationpackageprovidesdetailed•guides,videosandothersupports.

BeginninginSeptember2008,accreditationis•offeredthroughtheCouncilonAccreditation.

NAAcurrentlyofferstrainingthatcoversthe•ProgramObservationToolthroughitsday-longEndorserTraining(NAArecommendstwoandahalfdaysoftraininginordertoensurereliability).

SomeNAAstateaffiliatesoffertrainingforprograms•interestedinself-assessmentandimprovement.

ForMoreInformation:http://naaweb.yourmembership.com/?page=NAAAccreditation

Page 24: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

24

DevelopedbyDeborahLoweVandell&KimPierce

Overview:TheProgramQualityObservationScale(PQO)wasdesignedtohelpobserverscharacterizetheoverallqualityofanafter-schoolprogramenvironmentandtodocumentindividualchildren’sexperienceswithinprograms.ThePQOhasbeenusedinaseriesofresearchstudiesandhasitsrootsinVandell’sobservationalworkinearlychildcaresettings.

PrimaryPurpose:Research/Evaluation

ProgramTargetAge:Grades1–5

RelevantSettings:Variedschool-andcommunity-basedafter-schoolprograms.

Content:ThePQOfocusesprimarilyonsocialprocessesandinparticular,threecomponentsofqualityofchildren’sexperiencesinsideprograms:relationshipswithstaff,relationshipswithpeersandopportunitiesforengagementinactivities.

Structure:Thetoolhastwocomponents–qualitativeratingsfocusedontheprogramenvironmentandstaffbehavior(referredtoas“caregiverstyle”)andtimesamplesofchildren’sactivitiesandinteractions.Whileprogramenvironmentratingsaremadeoftheprogramasawhole,caregiverstyleratingsaremadeseparatelyforeachstaffmemberobserved.

Methodology:Allitemsareallassessedthroughobservation(althoughthePQOhasalwaysbeenusedintandemwithothermeasuresthatrelyondifferentkindsofdata).Programenvironmentandcaregiverstyleratingsaremadeusingafour-pointscaleandusersaregivendescriptionsofwhatconstitutesa1,2,3or4forthreeaspectsof

environmentandfouraspectsofcaregiverstyle.Inthetimesampleofactivities,activitytypeisrecordedusing19differentcategoriesandinteractionsareassessedandcodedalongseveraldimensions.

TechnicalProperties:Evidenceforinterraterreliability,scoredistributions,internalconsistencyandconvergentvalidityisstrongbygeneralstandardsandevidencefortest-retestreliabilityandconcurrent/predictivevalidityispromisingbutmixed.

UserConsiderations:EaseofUse

Freeandavailableforuse.•

ThePQOwasdevelopedwitharesearchaudience•inmind.Manualincludesbasicinstructionsforconductingobservationsandcompletingformsbuthasnotbeentailoredforgeneralorpractitioneruseatthistime.

Qualitativeratingsofenvironmentandstaff•requireaminimumof90minutesobservationtime.Completingthetimesamplesasoutlinedtakesaminimumof30minutesforanexperiencedobserver.

AvailableSupportsTraininghasonlybeenmadeavailableinthecontext•ofaspecificresearchstudy.

Datacollection,managementorreportinghaveonly•beenavailableinthecontextofaspecificstudy.

Theauthorshavedevelopedarangeofrelated•measuresthatcanbeusedinconjunctionwiththePQO(e.g.,physicalenvironmentquestionnaire;staff,studentandparentsurveys).

ForMoreInformation:http://childcare.gse.uci.edu/des4.html

Page 25: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

25

DevelopedbytheNewYorkStateAfterschoolNetwork

Overview:TheProgramQualitySelf-AssessmentTool(QSA)wasdevelopedexclusivelyforself-assessmentpurposes(useforexternalassessmentandformalevaluationpurposesisdiscouraged).TheQSAisintendedtobeusedasthefocalpointofacollectiveself-assessmentprocessthatinvolvesallprogramstaff.Soonafteritwascreatedin2005,thestateofNewYorkbeganrequiringthatall21stCCLC-fundedprogramsuseittwiceayearforself-assessmentpurposes.

PrimaryPurpose:ProgramImprovement

ProgramTargetAge:GradesK–12

RelevantSettings:Thefullrangeofschoolandcommunity-basedafter-schoolprograms.TheQSAisparticularlyrelevantforprogramsthatintendtoprovideabroadrangeofservicesasopposedtothosewitheitheraverynarrowfocusornoparticularfocus(e.g.,drop-incenters).

Content:TheQSAisorganizedinto10essentialelementsofeffectiveafter-schoolprograms,includingenvironment/climate;administration/organization;programming/activities;andyouthparticipation/engagement,amongothers.Alistofstandardsdescribeseachelementingreaterdetail.Theelementsrepresentamixofactivity-level,program-levelandorganizational-levelconcerns.

Structure:EachoftheQSA’s10essentialelementsisfurtherdefinedbyasummarystatementwhichisthenfollowedbybetween7and18qualityindicators.Thefour-pointratingscaleusedintheQSAisdesignedtocaptureperformancelevelsforeachindicator.Indicatorsarealsoconsideredstandardsofpractice,sothegoalistodeterminewhethertheprogramdoesordoesnotmeeteachofthestandards.

Methodology:Whilemostessentialelementsareassessedthroughobservation,themoreorganizationallyfocusedelementssuchasadministration,measuringoutcomes/evaluationandprogramsustainability/growthareassessedprimarilythroughdocumentreview.Usersarenotencouragedtocombinescoresforeachelementtodetermineaglobalrating,becausethetoolisintendedforself-assessmentonly.

TechnicalProperties:Beyondestablishingfacevalidity,theinstrument’spsychometricpropertieshavenotbeenresearched.

UserConsiderations:EaseofUse

PractitionersledthedevelopmentoftheQSA;•languageandformatareclearanduser-friendly.

Thetoolisfreeanddownloadableandincludes•anoverviewandinstructions.

Thetoolisscheduledforarevisionwhichwill•targetlengthandguidanceondeterminingratings.

AdditionalSupportsTheNewYorkStateAfterschoolNetworkhas•developedauserguide,whichprovidesaself-guidedwalk-throughofthetool.

ProgramscancontacttheNewYorkState•AfterschoolNetworktoreceivereferralsfortechnicalassistanceinusingtheinstrument.

ProgramsareencouragedtousetheQSAinconcert•withotherformalorinformalevaluativeefforts.

NYSANtrainingsareorganizedaroundthe•10elementsfeaturedintheinstrument,sopractitionerscaneasilyfindprofessionaldevelopmentopportunitiesthatconnecttothefindingsintheirself-assessment.

ForMoreInformation:www.nysan.org

Page 26: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

26

DevelopedbytheWisconsinCenterforEducationResearch&PolicyStudiesAssociates,Inc.

Overview:ThePromisingPracticesRatingScale(PPRS)wasdevelopedinthecontextofastudyoftherelationshipbetweenparticipationinhighqualityafter-schoolprogramsandchildandyouthoutcomes.Thetoolwasdesignedtohelpresearchersdocumenttypeofactivity,extenttowhichpromisingpracticesareimplementedwithinactivitiesandoverallprogramquality.ThePPRSbuildsdirectlyonearlierworkbyDeborahLoweVandellanddrawsuponseveralotherobservationinstrumentsincludedinthisreport.

PrimaryPurpose:Research/Evaluation

ProgramTargetAge:GradesK–8

RelevantSettings:Variedschool-andcommunity-basedafter-schoolprograms.

Content:ThePPRSfocusesprimarilyonsocialprocessesoccurringattheprogramlevel(othertoolsinthePPassessmentsystemareavailabletocollectotherkindsofinformation).Thetooladdressesactivitytype,implementationofpromisingpracticesandoverallprogramquality.Thepracticesatthecoreoftheinstrumentincludesupportiverelationswithadults,supportiverelationswithpeers,levelofengagement,opportunitiesforcognitivegrowth,appropriatestructure,over-control,chaosandmasteryorientation.

Structure:Thefirstpartoftheinstrumentfocusesonactivitycontext.Observerscodethingslikeactivitytype,space,skillstargeted,numberofstaffandyouthinvolved.Observersthenaddabriefnarrativedescriptionoftheactivity.ThecoreofthePPRSiswhereobserversdocumenttowhatextentcertainexemplarsofpromisingpracticearepresentintheprogram.

Methodology:Allitemsinthescaleareaddressedthroughobservation,withanemphasisfirstonactivitiesandthenmorebroadlyontheimplementationofpromisingpracticesbystaffwithintheprogram.Eachareaofpracticeisdividedintospecificexemplars(positiveandnegative)withdetailedindicators.Ratingsareassignedattheoverallpracticelevelusingafour-pointscale.Observersthenreviewtheirratingsofpromisingpracticesacrossmultipleactivitiesandassignanoverallratingforeachpracticeareaandtheoverallprogram.

TechnicalProperties:Strongevidenceforscoredistributionandinternalconsistencyoftheaverageoverallscorehasbeenestablished.Promisingbutlimitedevidenceofmoderateinterrelaterreliabilityandpredictivevalidityhavealsobeenestablished.

UserConsiderations:EaseofUse

Freeandavailableforuse.•

ThePPRSwasdevelopedwitharesearchaudience•inmind.Manualincludesbasicinstructionsforconductingobservationsandcompletingformsbuthasnotbeentailoredforgeneralorpractitioneruseatthistime.

InthestudythePPRSwasdevelopedfor,formal•observationtimetotaledapproximatelytwohourspersite,withadditionalhoursspentreviewingnotesandassigningratings.

AvailableSupportsTraininghasonlybeenmadeavailableinthecontext•ofaspecificstudy.

Datacollection,managementorreportinghasonly•beenavailableinthecontextofaspecificstudy.

Theauthorshavedevelopedarangeofrelated•measuresthatcanbeusedinconjunctionwiththePPRS(e.g.,physicalenvironmentquestionnaire;staff,studentandparentsurveys).

ForMoreInformation:http://childcare.gse.uci.edu/des3.html

Page 27: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

27

QualityAssuranceSystem®DevelopedbyFoundations,Inc.

Overview:TheQualityAssuranceSystem®(QAS)wasdevelopedtohelpprogramsconductqualityassessmentandcontinuousimprovementplanning.Basedonseven“buildingblocks”thatareconsideredrelevantforanyafter-schoolprogram,thisWeb-basedtoolisexpandableandhasbeencustomizedforparticularorganizationsbasedontheirparticularfocus.TheQASfocusesonqualityatthe“site”levelandaddressesarangeofaspectsofqualityfrominteractionstoprogrampoliciesandleadership.

PrimaryPurpose:ProgramImprovement

ProgramTargetAge:GradesK–12

RelevantSettings:Arangeofschool-andcommunity-basedprograms.

Content:ThevariouscomponentsofqualitythattheQASfocusesonareconsidered“buildingblocks.”Thesevencorebuildingblocksinclude:programplanningandimprovement;leadership;facilityandprogramspace;healthandsafety;staffing;familyandcommunityconnections;andsocialclimate.Threeadditional“programfocusbuildingblocks”thatreflectparticularfociwithinprogramsarealsoavailable.

Structure:TheQASisdividedintotwoparts.Partone–programbasics–includesthesevencorebuildingblocks.Foreach,usersaregivenabriefdescriptionoftheimportanceofthataspectofqualityandthenthebuildingblockisfurthersubdividedintobetweenfiveandeightelements,eachofwhichgetsrated.Parttwoofthetool–programfocus–consistsofthethreeadditionalbuildingblocksanditsstructureparallelsthatofpartone.RatingsfortheQASaremadeusingafour-pointscalefromunsatisfactory(1)tooutstanding(4).

Methodology:FillingouttheQASrequiresacombinationofobservation,interviewanddocumentreview.Usersfollowafive-stepprocessforconductingasitevisitandcollectingdata,whichincludesobservationoftheprograminactionandareviewofrelevantdocuments.Onceratingsforeachelementareenteredintothecomputer,scoresaregeneratedforeachbuildingblock–ratherthanasinglescorefortheoverallprogram–reflectingthetool’semphasisonidentifyingspecificareasforimprovement.

TechnicalProperties:Beyondestablishingfacevalidity,researchabouttheinstrument’spsychometricpropertieshasnotbeenconducted.

UserConsiderations:EaseofUse

TheQASisflexibleandcustomizable,withbuilt-in•user-friendlyfeatures.

Theinstructionguidewalkstheuserthroughbasic•stepsforusingthesystem.

The$75annuallicensingfeecoverstwoassessments•andcumulativereports.

Multi-siteprogramscangeneratesitecomparisonreports.•

AvailableSupports

Foundations,Inc.offersonlinesessionsandin-person•trainings.

OnceaQASsitelicenseispurchased,programs•canreceivelightphonetechnicalassistancefreeofchargefromstaff.

Programsthatwishtohavetheirassessment•conductedbytrainedassessorscanpurchasethisserviceundercontractwithFoundations,Inc.

TheQASisavailableinaWeb-basedformatallowing•userstoenterdataandimmediatelygeneratebasicgraphsandanalyses.

ForMoreInformation:http://qas.foundationsinc.org/start.asp?st=1

Page 28: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

28

DevelopedbyFrankPorterGrahamChildDevelopmentInstitute&ConcordiaUniversity,Montreal

Overview:TheSchool-AgeCareEnvironmentRatingScale(SACERS),publishedin1996andupdatedperiodically,isoneofaseriesofqualityratingscalesdevelopedbyresearchersattheFrankPorterGrahamChildDevelopmentInstitute.SACERSfocuseson“processquality”orsocialinteractionswithinthesetting,aswellasfeaturesrelatedtospace,scheduleandmaterialsthatsupportthoseinteractions.TheSACERScanbeusedbyprogramstaffaswellastrainedexternalobserversorresearchers.

PrimaryPurpose(s):ProgramImprovement;Monitoring/Accreditation;Research/Evaluation

ProgramTargetAge:GradesK–8

RelevantSettings:Arangeofprogramenvironmentsincludingchildcarecenters,school-basedafter-schoolprogramsandcommunity-basedorganizations.

Content:SACERSisbasedonthenotionthatqualityprogramsaddressthree“basicneeds”:protectionofhealthandsafety,positiverelationshipsandopportunitiesforstimulationandlearning.Thesevensub-scalesoftheinstrumentincludespaceandfurnishings;healthandsafety;activities;interactions;programstructure;staffdevelopment;andaspecialneedssupplement.

Structure:TheSACERSscaleincludes49items,organizedintosevensubscales.All49itemsareratedonaseven-pointscale,from“inadequate”to“excellent.”Concretedescriptionsofwhateachitemlookslikeatdifferentlevelsareprovided.Allofthesub-scalesanditemsareorganizedintoonebookletthatincludesdirectionsforuseandscoringsheets.

Methodology:Whileobservationisthemainformofdatacollection,severalitemsarenotlikelytobeobservedduringprogramvisits.Ratersareencouragedtoaskquestionsofadirector

orstaffpersoninordertoratetheseandareprovidedwithsamplequestions.Formanyitems,clarifyingnoteshelptheuserunderstandwhattheyshouldbelookingfor.Observersenterscoresonasummaryscoresheet,whichencouragesuserstocompileratingsandcreateanoverallprogramqualityscore.

TechnicalProperties:Evidenceforinterraterreliabilityandinternalconsistencyisstrongbygeneralstandards.Convergentandconcurrentvalidityevidenceislimitedbutpromising.

UserConsiderations:EaseofUse

Accessibleformatandlanguage.•

Includesfullinstructionsforuse,clarifyingnotes•andatrainingguide.

ThecostofSACERSbookletis$15.95.•

Suggestedtimeneeded:threehourstoobserve•aprogramandcompleteform.

Guidanceisofferedonhowtosample,observeand•scoretoreflectmultipleactivitieswithinaprogram.

AvailableSupportsAdditionalscoresheetscanbepurchasedinpackages•of30.

Threeandfive-daytrainingsonSACERSstructure,•rationaleandscoring.

Guidanceonhowtoconductyourowntraining•isprovidedinbooklet.

Trainingtoreliabilitytakes4-5days,withreliability•checksthroughout.

AccesstoalistservthroughtheFranklinPorter•GrahamInstituteWebsite.

Largescaleuserscannowusecommercialsoftware•toenter/scoredata.

WithWeb-basedreportingsystem,individual•assessmentscanberoutedtoasupervisorforqualityassuranceandfeedbackandaggregateanalysesandreportingcanbeprovided.

ForMoreInformation:www.fpg.unc.edu/~ecers/

Page 29: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

29

DevelopedbytheDavidP.WeikartCenterforYouthProgramQuality4

Overview:TheYouthProgramQualityAssessment(YPQA)wasdevelopedbytheHigh/ScopeEducationalResearchFoundationandhasitsrootsinalonglineageofqualitymeasurementrubricsforpre-schoolelementaryandnowyouthprograms.TheoverallpurposeoftheYPQAistoencourageindividuals,programsandsystemstofocusonthequalityoftheexperiencesyoungpeoplehaveinprogramsandthecorrespondingtrainingneedsofstaff.Whilesomestructuralandorganizationalmanagementissuesareincludedintheinstrument,theYPQAisprimarilyfocusedonwhatthedevelopersrefertoasthe“pointofservice”–thedeliveryofkeydevelopmentalexperiencesandyoungpeople’saccesstothoseexperiences.

PrimaryPurpose(s):ProgramImprovement;Monitoring/Accreditation;Research/Evaluation

ProgramTargetAge:Grades4–12

RelevantSettings:Structuredprogramsinarangeofschool-andcommunity-basedsettings.

Content:Becauseofthefocusonthe“pointofservice,”theYPQAemphasizessocialprocesses–orinteractionsbetweenpeoplewithintheprogram.Themajorityofitemsareaimedathelpingusersobserveandassessinteractionsbetweenandamongyouthandadults,theextenttowhichyoungpeopleareengagedintheprogramandthenatureofthatengagement.HowevertheYPQAalsoaddressesprogramresources(human,material)andtheorganizationorarrangementofthoseresourceswithintheprogram.

Structure:TheYPQAassessessevendomainsusingtwooverallscales.Topicscoveredincludeengagement,interaction,supportiveenvironment,safeenvironment,highexpectations,youth-centeredpoliciesandpracticesandaccess.

Methodology:Itemsattheprogramofferinglevelareassessedthroughobservation.Organizationlevelitemsareassessedthroughacombinationofguidedinterviewandsurveymethods.

Thescaleusedthroughoutisintendedtocapturewhethernoneofsomething(1),someofsomething(3)orallofsomething(5)exists.Foreachindicator,concretedescriptorsillustratewhatascoreof1,3or5lookslike.

TechnicalProperties:Evidenceforscoredistributions,test-retestreliability,convergentvalidityandvalidityofscalestructureisstrong.Evidenceforinterraterreliabilityismixedandevidenceispromisingbutlimitedintermsofinternalconsistencyandconcurrentvalidity.

UserConsiderations:EaseofUse

Languageandformatofthetoolareaccessible.•

Administrationmanualwithdefinitionsofterms•andscoringguidelines.

Thetoolcanbeorderedonline.•

Ratersmustobserveforroughlyfourhoursto•generatesounddata.

Observerscanbetrainedtogeneratereliable•observationsintwodays.

AvailableSupports

One-daybasicandtwo-dayintermediateYPQA•trainingareavailable,withadditionaltechnicalassistanceavailableuponrequest.

Youthdevelopmenttrainingthatisalignedwith•toolcontentisavailable.

Online“scoresreporter”andaWeb-baseddata•managementsystemareavailable.

ForMoreInformation:www.highscope.org/content.asp?contentid=117

4TheWeikartCenterisajointventurebetweentheHigh/ScopeEducationalResearchFoundationandtheForumforYouthInvestment.

Page 30: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

30

At-a-glancedescriptionsofthetenassessmenttoolsareprovidedintheprevioussection.Hereweoffermoredetaileddescriptions.Eachwrite-upfollowsacommonformat.

AssessingAfterschoolProgramPracticesTool(APT)NationalInstituteonOut-of-SchoolTimeandMassachusettsDepartmentofElementary&SecondaryEducation

CommunitiesOrganizingResourcestoAdvanceLearningObservationTool(CORAL)Public/PrivateVentures

Out-of-SchoolTimeObservationTool(OST)PolicyStudiesAssociates,Inc.

ProgramObservationTool(POT)NationalAfterSchoolAssociation

ProgramQualityObservationScale(PQO)DeborahLoweVandellandKimPierce

ProgramQualitySelf-AssessmentTool(QSA)NewYorkStateAfterschoolNetwork

PromisingPracticesRatingScale(PPRS)WisconsinCenterforEducationResearchandPolicyStudiesAssociates,Inc.

QualityAssuranceSystem(QAS)Foundations,Inc.

School-AgeCareEnvironmentRatingScale(SACERS)FrankPorterGrahamChildDevelopmentInstituteandConcordiaUniversity,Montreal

YouthProgramQualityAssessment(YPQA)DavidP.WeikartCenterforYouthProgramQuality

Page 31: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

31

PurposeandHistoryTheAssessingAfterschoolProgramPracticesTool(APT)isasetofobservationandquestionnairetoolsdesignedtohelppractitionersexamineandimprovewhattheydointheirafter-schoolprogramtosupportyoungpeople’slearninganddevelopment.Itwasspecificallydesignedtoexaminethoseprogrampracticesthatresearchsuggestsmayberelatedtokeyyouthoutcomes(e.g.,behavior,initiative,socialrelationships)anditisacorecomponentoftheAfterschoolProgramAssessmentSystem(APAS).5

TheresearchversionoftheAPT(theAPT-R)wasdevelopedin2003-2004foruseintheMassachusettsAfterschoolResearchStudy(MARS).BasedonextensivefieldtestingbygranteesaswellassomeadditionaltestingofthescalesusingMARSdata,amoreuser-friendlyself-assessmentversionofthetoolwasdevelopedduring2005forusebytheMassachusettsDepartmentofElementaryandSecondaryEducation21stCenturyCommunityLearningCenters(21stCCLC)granteesandotherprogramsinterestedinqualityassessment.Theself-assessmentversionisthefocusofthisdescription.

Theinstrumentcanbeusedtomeasurequalityinawidevarietyofprogrammodelsthatserveelementaryandmiddleschoolstudentsduringthenon-schoolhours.Inadditiontoservingasaself-assessmenttool,theAPTdefinesdesirableprogrampracticesinconcretetermsthatcanbeusedtocommunicatewithstaffandothers,helpstimulatereflectionanddiscussionregardingprogramstrengthsandweaknesses,guidethecreationofprofessionaldevelopmentprioritiesandimprovementgoalsandhelpgaugeprogresstowardthosegoals.

TheAPTfocusesontheexperiencesofyouthinprogramsandisnotintendedtoevaluateindividualstaffperformanceorproducedefinitiveglobalqualityscores

forprograms.WhiletheAPTincludesafour-pointratingscale,assigningratingsisnotrequired;programsareencouragedtousethetoolinwaysthatwillyieldthemostusefulinformationtoguideprogramimprovement.

ContentTheAPTmeasuresasetof15program-levelfeaturesandpracticesthatcanbesummarizedintofivebroadcategories:

Programclimate•

Relationships•

Approachesandprogramming•

Partnerships•

Youthparticipation•

Whileitdoesaddresssomebroaderorganizationalpolicyissues(e.g.,connectionswithschools,staff-youthratios)itwasdesignedtofocusprimarilyonthingsthatprogramstaffhavecontroloverandthatarerelevantacrossarangeofdifferentorganizationalcontextsandfacilities(e.g.,schools,communitycenters).

TheAPTemphasizessomeaspectsofsettingsmorethanothersandinparticularplacesastrongemphasisonrelationships,asresearchhasshownthatrelationshipshavethegreatestimpactonyouthoutcomes.Theprimaryfocusisthereforeonsocialprocesses–orinteractionsbetweenpeoplewithintheprogram.Severalitemshelpusersobserveandassessyouth-adultrelationshipsandinteractions,aswellaspeerinteractionsandconnectionswithfamiliesandschoolpersonnel.Toalesserextentthansocialprocesses,theAPTalsoaddressesprograms’humanandmaterialresourcesandhowthoseresourcesareorganizedorarrangedwithinthesetting.

IndevelopingtheAPT,theauthorsreviewedrelevantliteraturetoidentifyprogramfeaturesthatrelatetooutcomesandalsolookedatexistingdefinitionsandmeasuresofprogramquality.OnesuchdefinitioncommonlyreferencedinthefieldistheNationalResearchCouncil’sfeaturesofpositivedevelopmental

DevelopedbyNIOSTandtheMassachusettsDepartmentofElementary&SecondaryEducation

5TheAPTwasdesignedtoaddressprogrampracticesthatresearchsuggestsleadtoyouthoutcomesmeasuredbytheSurveyofAfterschoolYouthOutcomes(SAYO)–anevaluationsystemdevelopedbyNIOSTundercontractwiththeMADepartmentofEducation.TheSAYOincludespre-andpost-participationsurveysforteachersandprogramstaffandmeasuresthingslikebehaviorintheclassroomandprogram,initiative,engagementinlearning,relationswithpeersandadults,homework,analysisandproblem-solvingandacademicperformance.Formoreinformation,seewww.niost.org/training/APASbrochureforweb.pdf

Page 32: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

32

settings,aframeworkwhichitselfisfocusedprimarilyonsocialprocesses.ItemsontheAPTaddresseachoftheeightfeaturesidentifiedbytheNationalResearchCouncil(2002).

StructureandMethodologyThe15programfeaturesaddressedbytheAPTaremeasuredbyoneorbothoftwotools,theobservationinstrument(APT-O)andquestionnaire(APT-Q).TheAPT-Oguidesobservationsoftheprograminaction,whiletheAPT-Qexaminesaspectsofqualitythatarenoteasilyobservedandguidesstaffreflectiononthoseaspectsofpracticeandorganizationalpolicy.

Althoughthe15programfeaturesdrivethecontentofthetool,theAPT-Oisorganizedbydailyroutine.Fivesectionsareintendedtofollowwhatthedevelopersconsideratypicalprogramday.Whilethesesectionsmostcloselyreflectthedailyroutineinelementaryandmiddleschool21stCCLCprograms,thetoolisdesignedtobeflexibleandusersareencouragedtousewhicheversectionsaremostrelevantinwhateverordermakessense.Thesefivesectionsincludebothinformalprogramtimes(arrival,transitions,pick-up)andformalprogramtimes(homework,activities).

Withineachsection,sub-sectionsfocusonparticularpracticesandbehaviorsduringthosetimeperiods(forexample,sub-sectionsunder“homework”includehomeworkorganization,youthparticipationinhomeworktime,staffeffectivelymanagehomeworktimeandstaffprovideindividualizedhomeworksupport).Finally,eachsub-sectionincludesbetweentwoandeightspecificitemsthatcanbeobservedandrated.

AnimportantstructuralaspectoftheAPTisitsexplicitconnectiontoaspecificyouthoutcomemeasurementtool–theSurveyofAfterschoolYouthOutcomes(SAYO).ProgramscanuseAPTfindingstolookathowtheymaybecontributingtospecificoutcomeareasincludedintheSAYOandusersareprovidedwithchartsconnectingparticularAPTitemstospecificoutcomeareas.Despitethislinkage,theAPTisalsousefulasastand-alonetool.

Recently,anewsetofresources,theAPT-SAYOLinks,weredevelopedasquickguidesforpractitionerstounderstandtheresearchbaseconnectingAPTprogrampracticesandspecificSAYOoutcomeareas.

ProgramFeature APT-O APT-Q

Welcoming&InclusiveEnvironment ¸ ¸

PositiveBehaviorGuidance ¸

HighProgram&ActivityOrganization ¸

SupportiveStaff-YouthRelationships ¸ ¸

PositivePeerRelations ¸

Staff/ProgramSupportsIndividualizedNeeds&Interests ¸ ¸

Staff/ProgrammingStimulatesEngagement&Thinking ¸ ¸

TargetedSAYOSkillBuilding/Activities ¸ ¸

YoutharePositivelyEngagedinProgram/SkillBuilding ¸

Varied/FlexibleApproachestoProgramming ¸ ¸

SpaceisConducivetoLearning ¸

ConnectionswithFamilies ¸ ¸

OpportunitiesforResponsibility,Autonomy&Leadership ¸

ConnectionswithSchools ¸

ProgramSupportsStaff ¸

Page 33: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

33

TheAPT-OTheAPT-Oratingscale,shouldusersdecidetoassignratingstotheirobservations,isafour-pointscaledesignedtoanswerthequestion,“HowtrueisitthatthisstatementdescribeswhatIobserved?”Definitionsofeachpointonthescaledifferslightlydependingonwhetheryouareobservingaprogramorstaffpracticevs.youthbehaviors.Adetaileddescriptionoftheratingscaleaswellasotherratingoptionsandconsiderationsareincludedintheinstructionmanual.Some“conditional”itemsareincluded,whichareonlytoberatedshouldtheyoccur(e.g.,whenyouthbehaviorisinappropriate,staffusesimplereminderstoredirectbehavior).

Ratersareaskedtoassigna1-4(orN/A)ratingtoeachoftheindividualitemswithineachsub-section.Formostitems,aspecificdescriptionofwhata“1”lookslikeisprovided.Thewordingoftheitemitselfconstitutesa“4”sincethequestiondrivingtheratingsis,“How

trueisthis?”Theinstructionmanualprovidesgeneralguidance(notitem-specific)forhowtothinkabout2s(e.g.,desiredpracticewasonlypartiallymet,someminorevidenceofnegativeexpressionsofthepractice,orthepracticeisobservedinfrequently)and3s(observedpracticemostlyreflecteddesiredpractice,orthedesiredpracticewasobservedbutperhapsnotatallexpectedtimes).Thisyear,NIOSTwillbegindevelopingmorespecificanchorsfor2sand3sontheAPTratingscaleforeachitem.Thesenewanchorswillbefield-tested,butnotpsychometricallytestedthisacademicyear.

TheAPT-QTheAPTProgramQuestionnaire(APT-Q)helpsprogramsreflectupontheaspectsofqualitythatarenotnecessarilyobserved,suchasprogramplanning,frequencyofofferings,andconnectionswithparentsandschools.AsisthecasewiththeAPT-O,flexibilityisbuiltintothequestionnairecomponentofthetool.Users

ArrivalTime Howtrue? Notes

1.Thereisanestablishedarrivalroutinethatseemsfamiliartostaffandyouth. 1234N/A

2.Activitiesareavailabletoyouthtobecomeengagedinastheyarrive(mayincludesnack).(e.g.,Widevarietyofactivitiesareavailabletoarrivingyouth.)

1=Therearenoactivitiesavailableforarrivingyouth.Youthhavenothingtodo(e.g.Standaroundwaitingforstafftobeginprogramming).

1234N/A

3.Staffacknowledgechildren/youthwhentheyarrive.(e.g.,Offeragreeting,slaphands,ask“How’sitgoing?”,exchangehellos,etc.)

1=Staffdonotacknowledgeanyarrivingyouth.

1234N/A

4.Staffengagein1:1conversationswithyouth.(e.g.,Talkaboutyouth’sday,askaboutsomethingtheybroughtormade).

1=Staffarenotseenconversingorinteractingwithindividualyouth.

1234N/A

Page 34: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

34

areencouragedtoassignratingsonlytotheextentitisusefultodosoandtoreviewthevarioussectionsofthequestionnairebeforeusetoselectthosethatbestmatchaprogram’spriorities.TheAPT-Q,whichisdividedintoeightsections(seebox),providesopportunitiestoratetheconsistencyand/orfrequencyofcertainpractices.Italsoprovideslistsofspecificprogrampracticesthatsupportvariousqualityfeatures(e.g.,waystocreateawelcomingandinclusiveenvironment),encouraginguserstocheckthosethatareinuseintheprogrambutatthesametimeofferingabroadrangeofconcrete,positivepracticesthatcanencourageprogramdevelopmentandinnovation.

TheAPT-Qincludesthreedifferentratingscales.Afour-point“howconsistently”scale(rarely/never;onceinawhile/sometimes;often/alotofthetime;almostalways/always)isusedwiththesectionfocusedonprogramplanningandtheuseofspecificplanningpractices.Asix-point“howfrequently”scale(rarely/never;afewtimesperyear;aboutoncepermonth;aboutonceper

week;morethanonceperweek;usuallyeveryday)isusedfortwosectionsthatlookatprogramofferingsandtowhatextenttheprogrampromotesresponsibility,autonomyandleadership.Asimplerfour-point“howfrequently”scale(aboutonceperyear;severaltimesperyear;aboutoncepermonth;weeklyormoreoften)isusedforthesectionsthataddresshowtheprogramconnectswithfamiliesandschools.

TechnicalPropertiesThepsychometricinformationthatisavailableontheAPTcomesfromtheversionusedintheMassachusettsAfterschoolResearchStudy(MARS),conductedbytheInterculturalCenterforResearchinEducationandtheNationalInstituteonOut-of-SchoolTime(2005).6Theextenttowhichtrainedratersagreewhenobservingthesameprogramatthesametime,orinterraterreliability,wasmoderateandpreliminaryevidenceforconcurrentandpredictivevaliditysuggesttheAPT-Ryieldsaccurateinformationabouttheconceptsitmeasures.Asmentionedintheprevioussection,theanchorsforthe2and3ratingswillbedevelopedwiththeintentofimprovinginterraterreliability.Iffundingforfurthertestingcomesthrough,NIOSTwillbere-testinginterraterreliability.

Whilethecurrentself-assessmentversionofthetoolhasnopsychometricdata,NIOSTiscurrentlyseekingfundingtoconductfurtherpsychometrictestingoftheAPTandSAYOtools,includingtheextenttowhichthetwotoolscanworktogetherasanintegratedmeasurementsystemandallowforpractitionerstotargetkeypracticesandtrackexpectedoutcomes.

InterraterReliabilityResearchersexaminedinterraterreliabilityfor78programsintheMARSstudyandfoundthatpairedratersagreedontheirratings(withinonescorepoint)85percentofthetime.Iftherangeofresponseoptionsfortheresearchandself-assessmentversionsissimilar,thenwecanexpect,simplybychance,agreementbetweenraterstobeatleast62.5percent,yieldingamaximumkappascoreof0.60(ahighkappaisgenerally

APTProgramQuestionnaireSections

1.Howyouplananddesignprogramofferings

2.Yourprogramofferings

3.Howyourprogrampromotesresponsibility,autonomyandleadership

4.Howyourprogramcreatesawelcomingandinclusiveenvironment

5.Howyourprogramsupportsyouthasindividuals

6.Howyourprogramconnectswithfamilies

7.Howyourprogrampartnerswithschoolstosupportyouth

8.Howyourprogramsupportsandutilizesstafftopromotequality

6Thedevelopershaveconductedadetailedcomparisonofthetwoversions.RoughlyhalfoftheAPT-Ritemsinthecurrentself-assessmentversionappearexactlyastheywerewordedintheresearchversion.Roughlyonequarteroftheoriginalitemsweretakenout,roughlyonequarterwererevisedslightly.

Page 35: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

35

considered0.70).Althoughinterraterreliabilityhasnotyetbeenestablishedfortheself-assessmentversionoftheAPT,existingdatasuggestthatagreementwasmoderatelybetterthanchance.

FaceValidityThedevelopersreceivedextensiveandsystematicfeedbackontheAPT,aboutusabilityaswellasperceivedvalidity.Twenty-sixgranteesrepresentingover100programsitesrespondedtoasetofquestionsaboutthetool,moststaffparticipatedinfocusgroupsandin-depthinterviewswereconductedwith12grantees.Positivefeedbackfromthisrangeofkeystakeholderssuggestedthattheitemsmakeintuitivesenseformeasuringprogramquality.However,thisistheweakestformofvalidityasitisnotbasedonempiricalevidence.

ConcurrentValidityMARSauthorscomparedfindingsfromtheAPT-Rtoobservationsofprogramcharacteristicsandfoundthatcertainitemswererelatedtoprogramandstaffcharacteristicsinexpectedways.Forexample,programsthatscoredhighonitemsrelatedtostaffengagementandengagingactivitiestendedtohavesmallergroupsizes,strongerconnectionswithschools,ahigherstaff-to-childratioandahigherpercentageofstaffwithcollegedegrees.Programsthatscoredhighonitemsrelatingtoyouthengagementtendedtobewell-paced,organizedwithclearroutines,hadahigherstaff-to-childratioandahigherpercentageofstaffwithcollegedegrees.Betterfamilyrelationswererelatedtostrongerconnectionsbetweenprogramsandparentsandthecommunity.Programsthatofferedhighqualityhomeworktimetendedtooffermoreproject-basedlearningactivities,weremoreorganizedwithclearroutines,hadahigherstaff-to-childratioandhadmorestaffthatwerecertifiedteachers.NIOST’sproposedstudyincludesconcurrentvaliditytestingoftheAPTusingtheYPQA.

Thisevidenceofconcurrentvalidityshouldberegardedaspreliminarybecausemanyitemswerenotrelatedtoprogramcharacteristics.Forexample,youthengagementinprogramswasunrelatedtosmallergroupsizesandengagingandchallengingactivitieswerenotrelated

toprogramsbeingwell-pacedandorganizedwithclearroutines.Becausetheauthorsdidnotexplicitlyidentifywhichrelationshipsweremostimportantandwhichfindingsrancontrarytotheirexpectations,wecannotbecertainthattheobservedfindingsindicateunequivocallystrongconcurrentvalidity.

MARSauthorsalsoexaminedtheassociationbetweenAPT-Rscalesandfivestudentcharacteristicsthatwouldbeexpectedfromtheoryandpriorresearch:Students’improvementintheir(1)behavior,(2)initiative,(3)homework,(4)relationswithpeers,and(5)relationswithadults.AuthorsfoundthattheAPT-RYouthEngagementscalewasrelatedtoallstudentcharacteristicsinexpectedwaysexceptRelationswithAdults.HigherscoresonAPT-RChallengingActivitieswasrelatedtolowerscoresonRelationswithPeers,andhigherscoresonAPT-RRelationswithFamilieswasmarginallyrelatedtohigherscoresonRelationswithAdults,butbothAPT-Rscaleswereunrelatedtootherstudentcharacteristics.TheAPT-RHomeworkTimescalewasunrelatedtoallstudentcharacteristics.MARSauthorspointoutthattheprogramsintheirsampledidnotscorehighonChallengingActivitiesorHomeworkTime,indicatingthattheresultscouldbedifferentinasamplewithmorediverseprogramsonthesedimensions.Inaddition,correlationalevidencedidnotshowanysignificantrelationshipsbetweenstudentcharacteristicsandAPT-RscalesconcerningAppropriateSpaceandStaffEngagement.Giventhesefindingsaswellaskeepinginmindthatsome,butnotall,itemsusedintheAPT-Rappearintheself-assessmentversionofthetool,thisevidenceshouldberegardedaspreliminary,withadditionalevidenceneededtofirmlyestablishtheconcurrentvalidityoftheAPT.

ValidityofScaleStructureTheauthorscreatedseveralscalesintheAPT-Rusingastatisticaltechniqueknownasfactoranalysis.Howevertheextenttowhichthesefindingscanbegeneralizedtotheself-assessmentversionisuncleargiventhedifferencesbetweenthetwoinstruments.TestingonthevalidityofthescalestructurehasbeenplannedinNIOST’supcomingstudy.

Page 36: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

36

UserConsiderationsEaseofUseTodate,theAPThasbeenusedby36schooldistrictsoffering21stCCLCprogramsinover150sites,andinafour-citypilot(Charlotte,Boston,MiddlesexCounty,NewJerseyandAtlanta).Inresponsetoitsbroaduse,NIOSThasdevelopedmanyproductsrelatedtotheAPTthatmakeitbothuser-friendlyandadaptabletoavarietyofsystemsandsettings.

Onesuchproductisapackageoftools,includingtheAPT,theSAYOandadatamanagementtrackingsystemwhichtracksindividualyouth’sprogramparticipation(anexampleofsuchasystemisKidtrax).Whencombined,thesetoolsarereferredtoasthe“AfterschoolProgramAssessmentSystem(APAS).”Thesystemcanbeusedtosupporton-linedatacollection,management,analysisandreporting,allowingprogramstolinkqualitydatafromtheAPTandyouthoutcomesdatafromtheSAYOwithinformationondailyattendanceandprogramparticipationinacomprehensive,flexibleandintegratedfashion.

NIOSThasdevelopedotherproductstosupporttheuseoftheAPTinconjunctionwithitsothertoolsaspartofafullassessmentsystem(althoughtheAPTmaybeusedasastand-alonetool).Forexample,inadditiontotheSAYO,NIOSThascreatedaseriesofSAYO-APTLinksor“cheatsheets”tosupportprograms’effortstolinkqualityandoutcomes.TheSAYO-APTLinksdescribeeachSAYOyouthoutcomeareaandcorresponditwiththerelatedprogrampracticefoundintheAPT.Anotherproductisadetailed,practitioner-friendlytrainingnotebookthatguidesprogramsintheuseoftheAPTandrelatedtools.InterestedprogramsshouldcontactNIOSTfortrainingoptionsandcosts;trainingsarecustomizedtomeettheneedsandinterestsoforganizations.

Inanotherdevelopment,NIOSTisfinalizinganonlineYouthSurvey(SAYO-Y)designedtocomplementtheAPTandSAYO.Thesurveymeasuresyouth’sprogramexperiencesinfiveareas(e.g.,inclusiveenvironment,choice&autonomy);youth’ssenseofcompetenceinsixareas(e.g.,math,reading);andyouth’sfutureplanningandexpectations.Thissurveyhasbeentestedwithover

6,000Massachusettsyouthparticipatingin21stCCLCprogramsandwillbefullypilotedagainthisyear.

Flexibilityisahall-markoftheAPT,soalthoughthedevelopersprovidesomeguidanceastowhentoconductobservations,forhowlong,etc.,theyemphasizethattheAPT-Ocanbeusedinmanydifferentwaysandthatdecisionsabouthowmanyobservers,howmanyobservationsandwhethertousenumericalratingsshouldbedrivenbywhatusersintendtodowiththedataintheend.Thegeneralintentionbehindthedesignisforanobservertoobserveonefullprogramsession(typicallyafullafternoon),takingnotesduringtheobservationandusingtimeimmediatelyafterwardstocompleteallrelevantsectionsincludinganopen-ended“impressions”sectionattheendofthetool.ManyprogramsusingtheAPTasaself-assessment,however,havepreferredtoobtainatleasttwodaysofobservationdata.Initstrainingsessions,NIOSTguidesparticipantsinunderstandinghowtheymaywanttousetools,and,therefore,howtheywilltailorthemtosupporttheirevaluationandassessmentobjectives.

AvailableSupportsTrainingonboththeAPTitselfandprocessesforguidingprogramimprovementisavailablethroughNIOST.Mostrecently,atwo-dayAPAStrainingmodulehasbeendevelopedtopreparesitedirectorsandotherstousetheSAYOandAPT.TrainingfortheAPT,alone,isonefullday.NIOSTalsoprovidesaQualityAdvisortrainingtohelpcoachesandothertechnicalspecialistsusetheAPTtoworkwithprograms.AnonlinetutorialsisalsoavailabletopreparesitestousetheSAYOoutcometool.

Inmid-2007,packagingandpricinginformationabouttrainingontheinstrumentbecameavailablefororganizationsthatareinterestedbutnotalreadyaffiliatedwiththeAPTthroughstatewideeffortsinMassachusetts.

InTheFieldTheCityofCambridge,MAAgendaforChildrenisacity-wideout-of-schooltimeinitiativebringingtogethercitydepartments,community-basedorganizations,

Page 37: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

37

businesses,fundersandresidentstopositivelyimpactthelivesofCambridgeyouth.Specifically,theinitiativeworkstoimproveaccesstoandthequalityofCambridgeout-of-schooltimeprograms.Inpursuitofthatgoal,CambridgeAgendaforChildrenhasbeenrefiningamulti-siteprogramimprovementprocessoverthreeyears,usingtheAPTprogramimprovementtoolinthateffort.TheirSelf-AssessmentSupportinitiativesupportsprogramstoengageinobservationandself-reflection,andtaketargetedactionbasedonwhattheysee.

TheAPTgivesprogramcoordinators,sitecoordinatorsandfront-linestaffacommonlanguagefortalkingabouttheirgoalsanddetaileddescriptionsofeffectivepractice.

ForMoreInformationInformationabouttheAPTisavailableonlineat:www.niost.org/content/view/1572/282/orwww.doe.mass.edu/21cclc/ta

ContactKathySchleyer,TrainingOperationsManagerNationalInstituteofOut-of-SchoolTimeWellesleyCentersforWomenWellesleyCollege106CentralStreetWellesley,[email protected]

Page 38: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

38

PurposeandHistoryTheCORALobservationtoolwasdesignedbyPublic/PrivateVentures(P/PV)fortheCORALafter-schoolinitiativefundedbytheJamesIrvineFoundation.ThetoolwasdevelopedforresearchpurposesandwasprimarilyusedinaseriesofevaluationstudiesontheCORALafter-schoolinitiative.TheprimarypurposeoftheobservationswastomonitorfidelitytotheBalancedLiteracyModelandchangeinqualityandoutcomesovertime.

UndertheCORALinitiative,after-schoolprogrammingwasprovidedtoelementaryschoolagedchildreninfivecitiesinCalifornia.Ineachofthesecities,programmingwasdifferentandconsistedofavarietyofactivitiesranginginfocusfromscience-basedprogramstoartandculturalenrichmentprogramming.AllCORALprogramsincludedthecommoncoreelementofBalancedLiteracyprogramming.

Theobservationtoolwasusedintwoways:first,toobserveBalancedLiteracyinstructioninCORALafter-schoolprograms,andsecond,toobservetheintegrationofliteracyprogramminginavarietyofotheractivitiesincludinghomeworkhelpandacademicenrichmentprogramsranginginfocusfromsciencetoartandculturalenrichment.ThoughtheCORALobservationtoolwasdesignedtohelpobserversmeasuretheimpactofafter-schoolprogramsonacademicachievement,ithasapplicationsforobservingqualityinawidevarietyofsettings.

TheCORALobservationtoolhasfivecomponents:

an• activitydescriptionform,usedtogatherdescriptiveinformationontheobservedactivitypriortotheobservation;

an• activitycharacteristicform,completedduringtheobservation,usedtocollectgeneralinformationabouttheactivityandtypeofinstruction(i.e.,lengthoftime,numberofparticipants,numberofstaff,teachingmethods,etc.);

an• activitycheckboxformthatisdividedintothefiveoverarchingcategoriesforobservation–this

istheprimarymethodforrecordinginformationduringtheactivity;

the• activityscalesform,completedaftertheobservation,thatisusedtorateeachoftheconstructsobservedintheactivity;

andtheoverallassessmentcomponent(completed•afterthreeobservationsforliteracyprogramsandaftertwoobservationsfornon-literacyprograms)whichmeasuresbothaspectsoftheactivityandparticipantimprovementofskillareas.

BecausetheCORALinitiativeemphasizedbestpracticesinyouthdevelopment(includingpositiveadult/youthrelationshipsandongoingyouthparticipation),P/PVdevelopedtheCORALtoolbasedonanobservationtoolusedforevaluationoftheSanFranciscoBeaconinitiative.P/PValsodesignedasimilarobservationaltoolcontainingtheupdatedacademiccomponentsfromtheCORALtoolforthePhiladelphiaBeaconinitiative.

ContentTheCORALobservationtoolwasdesignedtohelpresearcherscollectdatathroughanongoingprogramobservationprocesstomeasuretheconnectionbetweenthequalityoftheprogram,fidelitytotheBalancedLiteracyModel,andtheacademicoutcomesofparticipants.Asaresult,themainsectionsofthetoolfocusonfivecomponentsofquality:adult-youthrelations,effectiveinstruction,peercooperation,behaviormanagement,andliteracyinstruction.Eachaspectofqualityhasseveralelementsofqualitythatareratedandcapturedinsubcategories,whicharecalledconstructs.Anumberofthesecoreconstructs(suchasbehaviormanagement,youth/adultrelationships,peercooperation,etc.)arerelevantforbothformalprogramsettingsaswellasinformal,adultsupervisedsettings.Thecharacteristicsratedwithineachoftheconstructsarelistedonpage38.

BecausethefocusoftheCORALobservationtoolisonfidelitytotheBalancedLiteracyModel–astructuredliteracyapproachthatusesavarietyofmodalitiesaimedadevelopingcompetentreaders–ittendstofocusonsocialprocessesandskilldevelopmentinsupportofliteracygains,andlessonprogramresourcesortheorganization

CommunitiesOrganizingResourcestoAdvanceLearningObservationTool

Page 39: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

39

ofthoseresourceswithinaprogram.Literacyfocusedactivitiesareassessedfortheirfidelitytothemodelandtheirassociationwithchangeinparticipantinterestandmotivation.Non-literacyfocusedactivitiesareassessedfortheirintegrationofliteracyskillsintothecurriculum.

StructureandMethodologyThefirstthreecomponentsoftheCORALobservationtool–theactivitydescriptionform,characteristicsformandtheactivitycheckboxform–arefocusedondescribingtheactivityaswellasparticipantandstaffbehavior.Theactivitydescriptionformiscompletedbefore(orafter)theactivitybasedontheinformationgleanedfromtheinstructorina10-15minuteconversation.Theactivitycharacteristicsform,wheregeneralinformationisrecorded,iscompletedduringthefirsttenminutesoftheactivity.Theactivitycheckboxformisarunninglistoftheobservationsofbehaviorandactivitycharacteristics,andisfilledoutbytheobserverwhiletheactivityisongoing.Intheactivitycheckboxform,observerscanchoosebetweenseveralexamplesofpositiveandnegativebehaviorforeachoftheconstructsabove.Iftheyobserveabehaviornotcapturedintheexamples,itcanberecordedinthenotessectionand

consideredinthescoringaftertheobservationiscompleted.Observationsareconductedovera90-minuteperiod(inCORAL,theliteracyactivitytookplaceovera90-minuteperiod,sotheobservationtookplaceoveritsentiredurationinordertobeabletoassessfidelity).

Eachconstructisbasedonafive-pointratingscale,with1representingthelowestscore(definitelyneedsimprovement)and5thehighest(outstanding).TheCORALdeveloperssuggestthat5sbeawardedsparinglyastheintentistoindicatethatthereisnoroomforimprovement.Meanwhile,the1ratingcanbegivenforseveralreasons,includingobservednegativestaffbehaviorsorduetoanactivitynotfittingtheappropriateconstruct.Activityratingsareassignedaftereach90minuteobservationforcumulativeobservationsofthreeforaliteracyprogramandtwoforanon-literacyactivity.

Afterthe90minuteobservation,observerscompletetheactivityscalesformwithin24hoursofeachobservation,includingthedescriptivenarrative.Thecheckboxformdoesnottranslateintoaone-to-onescoreonthescalesform.Insteadobserversarerequiredtoconsidertheactivitiesrecordedonthecheckboxformalongwiththe

Adult-YouthRelationships BehaviorManagement Instruction Literacy PeerCooperation

Adultsupport•fortheactivity

Generaladult•responsiveness

Emotional•qualityoftherelationship

Appropriatenessof•behavioraldemands

Adultmanagement•

Staff’s•inclusivenessofyouth

Clarity•

Organization•

Motivation•

Challenge•

Connectionto•othermaterial

Connection•betweenyouth&material

Cultural•awareness

Responsivenessto•Englishlanguagelearners

Literacyrich•environment

Readaloud•

Booktalk/•discussion/shoutout

Writing•

Independentreading•

Skilldevelopment•activities/games

Buildvocabulary/•spelling

Connectionsbetween•youth&text

Cooperative•activity

Page 40: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

40

duration/frequencyoftheobservedbehavior,qualityofthebehaviorandimportanceofthebehaviortotheactivitywhenderivingascoreforeachconstruct.Additionally,observerscompletedescriptivenarrativesummaries(usingsamplenarrativesasaguide)whichcontainthemostinformativeaspectsoftheactivity.

Uponcompletionoftherequiredseriesofobservations,theoverallassessmentformisusedtoratetheoverallqualityoftheactivity.Theformcontains11narrativequestionsinwhichobserversdescribethestrengths,weaknesses,areasofimprovementfortheactivity,culturalawarenessoftheinstructor,modificationsforvaryinglinguisticneeds,andtheclassroomenvironment.Theobserversarealsoaskedtorecordtheimprovementstheyobserved.Thequestioncontainsapre-selectedlistofskillswhichrangefromacademictovisualartsandperformance.

Thedeveloperssuggestthreeorfourhoursforcompletingtheratingscales,relatednarrativesandtheoverallassessment.

Fortheirresearchpurposes,Public/PrivateVenturesadditionallyrequiredobserverstowriteanarrativedescriptionforeachcomponentandusedthedescriptionsaspartoftheirquantitativeanalysisandasamethodofcompiling“bestpractices”thatexemplifiedspecificconstructs.Inaddition,observerswereaskedtoidentifyinanarrativewhichaspectofeachoftheseskillsparticipantsshowedimprovement.

TechnicalPropertiesTechnicalevidencefortheCORALobservationtoolcomesfromitsuseduringtheevaluationoftheCORALInitiative,atwo-yearstudythatincluded56observationsin23after-schoolprogramsinthefirstyearand43observationin21after-schoolprogramsinthesecondyear.The90-minuteliteracyactivitieswereobservedtwotofourtimeseachduringthefirstyearoftheevaluation,andaminimumofthreetimeseachduringthesecond.Evidenceisdrawnfromstudy’sinitialreportafterthefirstyear(Arbreton,Goldsmith&Sheldon,2005)aswellasthefinalreport,whichdrawsonevidencefrombothyears(Arbretonetal.,2008).Datafromthenon-literacyactivityobservationswereusedtocreatequalitativeprofilesoftheactivities,andnostatisticalanalysiswasconducted.FortheliteracyactivitiesintheCORALevaluation,statisticalanalysiswasconductedtoidentifytherelationshipbetweenprogramqualityandparticipantacademicgains.ThetechnicalpropertiesdescribedbelowpertaintoP/PV’sanalysisofliteracyobservationdataonly.

SomeusersmaybeinterestedinsummarizingdatafromtheCORALintoscales.IntheiranalysisfromtheCORALinitiative(Arbreton,Goldsmith,&Sheldon,2005;Arbretonetal.,2008),thedeveloperscreatedfourscalesusingitemsfromtheinstruments’ActivityScalesForm:(1)Adult-YouthRelation(averageofitems1through3);(2)InstructionalQuality(averageofitems5through8);(3)GroupManagement(averageofitems16and17);

Q8.Challenge(++) Q8.Challenge(––)Thestaff:

encourageyouthtopushbeyondtheirpresentlevel•ofcompetency.

trytosustainthemotivationofyouthwhoare•discouragedorreluctanttotry.

continouslymovetothenextstepassoonas•youthprogress.

reinforceandencourageyouth’seffortsinorder•tomaintaintheirinvolvement.

Thestaff:

discourageyouthwhotriedtopushbeyondtheir•presentlevelofcompetency.

missopportunitiestosustainthemotivationof•youthwhoarediscouragedorreluctanttotry.

missopportunitiestomovetothenextsteepas•soonasyouthprogressed(e.g.,pacewastooslow)

donotreinforceorencourageyouth’seffortsin•ordertomaintaintheirinvolvement.

N/Abecause:

Page 41: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

41

(4)ConnectionBetweenYouthandActivities(averageofitems9and10).

DevelopersalsocreatedanOverallLessonRating,whichisanaverageofscoresfromthreescales(Adult-YouthRelation,InstructionalQuality,&GroupManagement),andseveralitems:ReadAloud(item19),BookTalk(item20),Writing(item21),IndependentReading(item22),&ConnectiontoYouth(item10).

ScoreDistributionsItemsandscalesshouldbeabletodetectmeaningfuldifferencesacrosssettings,andthereforeshouldexhibitarangeofscores.Scoredistributionswereexaminedforthefourdomain-specificscalesaswellassixindividualitemsrepresentingtheBalancedLiteracystrategiesmeasuredintheCORALtool.Mostitemsexhibitedgoodscoredistributions,althoughtheitem“SkillDevelopmentActivities”(item24)wasonthelowendofthescaleforbothyearsofmeasurement(averagescoreswere1.5and1.8,respectively,onascaleof1to5).

InternalConsistencyResponsestoitemscomprisingscalesshouldbehighlyrelated,suggestingthattheitemsformmeaningfuldomains.TheinternalconsistencyoftheOverallLessonRatingwasquitestrongwithanalphaof.94.OfthefourscalesmeasuringspecificdomainsfromtheCORALInitiative,three(AdultSupport,InstructionalQuality,andGroupManagement)exhibitedexcellentinternalconsistencywithCronbach’salpharangingfrom.84to.88(exceedingtherecommendedvalueof.70),suggestingthatthescalesarecohesiveandcomposedofrelateditems.However,thescaleConnectiontoYouthandActivitieswaslesscohesive(alpha=.54),suggestingthatthetwoitemscomposingthisscaleareonlymoderatelyrelatedandmoreitemsmaybeneededtofullycapturethisdomain.

ValidityofScaleStructureAnanalysisthatexaminesscalestructurevalidityforasinglescaletestswhetherthescaletrulymeasuresadistinctandcoherentdomain(ratherthanseveraldifferentconceptsordomains).DevelopersexaminedwhetherseveralitemsandscalesfromtheCORAL

toolcouldbecombinedtorepresentanOverallLessonRating.FindingsindicatedthattheOverallLessonRatingrepresentsacohesivesummaryofmultipledomainsfromtheCORALmeasure.Thedevelopersdidnotexaminescalestructurevalidityforotherscalesmeasuringspecificdomains(suchasInstructionalQuality).

PredictiveValidityIftheCORALtooltrulymeasureseffectiveliteracystrategies,wecanexpectthatscoresfromtheliteracystrategyitemswillberelatedtogainsinreadingandEnglishlanguageskills.Toexaminetheinstrument’spredictivevalidity,authorsexaminedtherelationshipbetweenscoresontheCORALtoolwithoutcomesof234-383childrenacrossthetwoyearsofthestudy.

AsdiscussedintheCORALinitiativereports(Arbreton,Goldsmith,&Sheldon,2005;Arbretonetal.,2008),authorsclassifiedprogramsintofiveliteracyprofilesbasedonhowtheprogramsscoredonthesixitemsmeasuringBalancedLiteracystrategies.ReaderswhoareinterestedinmoreinformationonhowtheliteracyprofileswerecreatedshouldconsulttheCORALinitiativereportscitedinthisdocument.Attheendofthefirstyear,childrenwhoattendedprogramswithbetterliteracyprofileshadgreaterreadingimprovementsontheInformalReadingInventory(IRI)butwerenotmorelikelytohavepositiveoutcomesontheCaliforniaStandardsTest-English-LanguageArts(CST-ELA).OnereasonwhyliteracyprofilesmayhavepredictedbetterscoresontheIRIbutnottheCST-ELAcouldbebecausetheIRIfocuseslargelyonreadingabilitiesandcomprehension,whereastheCST-ELAissomewhatbroaderandalsoincorporateswritingskillsandwordanalysis.Classroompractices(asmeasuredbythefourscales)didnotbythemselvespredictreadingimprovementsineithertheIRIortheCST-ELA,overthefirstyearoftheevaluation.

Attheendofthesecondyear,theauthorscreatedascalecalledtheOverallLessonRatingthatisacombinationofitemsmeasuringliteracystrategiesaswellasclassroompractices.TheOverallLessonRatingpredictedmorepositiveoutcomesontheCST-ELA(scoresontheIRIwerenotexaminedinthesecond

Page 42: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

42

year).ThefindingsfrombothyearssuggestthattheCORALtoolsuccessfullymeasuresqualityinprogramswithastrongliteracycomponent.

UserConsiderationsEaseofUseAlthoughtheCORALtoolcanbeusedbyanyone,itwasdesignedexclusivelyforresearchpurposesandhasnotbeenspecificallyadaptedforpractitioneruseatthistime.However,thetooldoescontaindetailedinstructionsforconductingobservationsandcompletingtheforms.

AvailableSupportsAtthistime,Public/PrivateVenturesisnotofferingtrainingonuseoftheCORALobservationtool,thoughatwo-daytrainingwasofferedtotheobserversparticipatingintheevaluationstudy.Thistrainingincludedareviewoftheobservationmaterials,mockobservationsandwrite-upsondayone,andafieldobservationinwhichanewobserverwaspairedwithanexperiencedguideondaytwo.Thetraineeandguidecomparednotestodevelopconsistency.Traineesreceivedongoingmonitoringandsupport.

IntheFieldInthe2004-2005schoolyear,theCORALobservationtoolwasusedaspartofanevaluationoftheprograms’implementationoftheBalancedLiteracyModel.TheevaluationwasconductedbyPublic/PrivateVentures.Liveobservationsoftheparticipants’experiences,theimpactofbalancedliteracyontheacademicgainsofparticipants,andfidelitytotheBalancedLiteracyModelwereatthecenteroftheresearch.Observationswereconductedduringtheprogramparticipants’third-gradeandfourth-gradeyears,andeachparticipantwasobservedthreetimesbyanobserverassignedfromapoolofobservers.

Inanalyzingthedata,researchersfoundthatalthoughthelinkbetweenqualityandacademicgainswasinconclusive,thefindingsindicatethattheparticipantswiththegreatestacademicgainswerethosethatparticipatedinhigherqualityprograms.Additionally,all

youth,includingthosereadingbelowgradelevel,hadgreatergainswheninthehigherqualityprogramsthandidtheircounterpartsinlowerqualityprograms.Inthesecondyearofobservations,theresearchersobservedthattherewasconsistencyintheimplementationoftheliteracymodelaswellashigherqualityimplementation.Inthesametimeperiod,researchersalsoobservedreadinggainsthatwere39percenthigherthanthegainsrecordedinthefirstyear.

ForMoreInformation:InformationabouttheCORALobservationtoolisavailableat:www.ppv.org/ppv/initiative.asp?section _ id=0&initiative _ id=29

Contact:AmyArbreton,SeniorResearchFellowPublic/PrivateVenturesLakeMerritPlaza1999HarrisonStreet,Suite1550Oakland,CA94612510.273.4600

Page 43: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

43

youth.Thefirstsectioncapturesarangeofin-depthinformationaboutthetypeofactivitybeingobservedandtheskillsemphasizedthroughthatactivity;theremainderfocusesonwhattheyouthdevelopmentliteraturepointstoascriticalcomponentsofprograms.

Becauseofitsdevelopmentalgroundinganditsfocusonwhatyoungpeopleexperienceinsideofprograms,theOSTObservationToolhasanactivityandprogram-levelfocusanddoesnotaddressorganizationalissuesrelatedtomanagement,leadershiporpolicy.Theprimaryfocusisonsocialprocesses–includingrelationalissuesandmanyitemsthatspeakspecificallytoinstructionandlearning.Beyondoneitemrelatedtomaterials,theinstrumentdoesnotfocusonprogramresourcesortheorganizationofthoseresourceswithinthesetting.

ThecontentoftheOSTObservationInstrumentalignsverycloselywiththeSAFEframework(DurlakandWeissberg,2007)whichoutlinesfeaturesofprogramsthatcontributetopositiveoutcomesforyouthinout-of-schooltimeprograms.SAFEreferstoout-of-schooltimeactivitiesthatare:

Sequenced:• Contentandinstructionaredesignedtoincreasinglyadvanceskillsandknowledgeandhelpyouthachievegoals;

Active:• Activitieslendthemselvestoactiveengagementinlearning;

PersonallyFocused:• Activitiesstrengthenrelationshipsamongyouthandbetweenstaffandyouth;

Explicit:• Activitiesexplicitlytargetspecificlearningordevelopmentalgoals.

The2008versionofthetoolwasupdatedtofullyalignwiththeSAFEframework.Thechangesaremostobviousinthereorganizationofthequalitativeportionoftheinstrumentinwhichobservationalnotesarerecordedandsynthesized.Thisframeworkwasalsousedintheevaluators’analysesoftheirYear2findingsfortheNewYorkandNewJerseystudieswhichdemonstratedthattheOSTindicatorsmapwelltotheSAFEframework.

PurposeandHistoryPolicyStudiesAssociates(PSA)developedtheOut-of-SchoolTimeProgramObservationTool(OSTObservationTool)overafive-yearperiod,inconjunctionwithseveralresearchprojectsrelatedtoafter-schoolprogramming,includingamajorstudyofpromisingpracticesinhigh-performingprograms(Birmingham,Pechman,Russell&Mielke,2005).Athirdeditionofthetool,revisedin2008,wasreviewedforthiscompendium.TheinstrumentwasrecentlyusedinstudiesoftheNewYorkCityDepartmentofYouthandCommunityDevelopment’sOut-of-SchoolTime(OST)ProgramsforYouthandoftheNewJerseyAfter3Initiative.

Thetoolwasdevelopedwithresearchgoalsinmind–inparticularthedesiretocollectconsistentandobjectivedataaboutthequalityofafter-schoolactivitiesthroughobservation.Itsdesignisbasedonacoupleofassumptionsabouthigh-qualityprograms–firstthatcertainstructuralandinstitutionalfeaturessupporttheimplementationofhigh-qualityprogramsandsecondthatinstructionalactivitieswithcertaincharacteristics–variedcontent,mastery-orientedinstructionandpositiverelationships–promotepositiveyouthdevelopmentoutcomes.

TheOSTObservationToolcanbeusedinvariedafter-schoolcontextsincludingschool-orcenter-basedprogramsandwithyouthparticipantsthatareinkindergartenthrough12thgrade.Whilethetoolcanprovideprogramstaffwithaframeworkforobservingandreflectingontheirpractice,itwasdevelopedfor,andthusfarhasprimarilybeenusedfor,researchpurposes.Initscurrentdesign,itisnotintendedtobeusedtoassignoverallqualityscoresforprogramsorstaff.

ContentTheOSTObservationToolwasdesignedtoprovideresearchersandotheruserswithaframeworkforobservingessentialindicatorsofpositiveyouthdevelopment.Itfocusesonthreemajorcomponentsofprograms:activitytype,activitystructuresandinteractionsbetweenyouthandadultsandamong

DevelopedbyPolicyStudiesAssociates,Inc.

Page 44: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

44

Additionalchangesbetweenthesecondandthirdeditionsinvolveinclusionoftheacademicandtechnologyfeaturesofprograms.Thisnewsectionfeaturesitemsrelatedtoliteracy,mathinstructionandtheuseoftechnology,guidinguserstonotethepresenceorabsenceofactivitiesthatmeetliteracy,mathortechnologygoals.

StructureandMethodologyThefirstpartoftheinstrument,whichfocusesonactivitytype,providesobserverswithdetaileddefinitionsfordocumenting:

Typeofactivity(e.g.,tutoring,visualarts,music,•sports,communityservice)

Typeofspace(e.g.,classroom,gym,library,•auditorium,hallway,playground)

Primaryskilltargeted(e.g.,artistic,physical,•literacy,numeracy,interpersonal)

Numberandeducationlevelofstaffinvolved•intheactivity

Environmentalcontext(e.g.,supervision,space,•materials)

Number,genderandgradelevelofparticipants•

Inaddition,theinstrumentprovidesdetaileddescriptionsofeachofthefourSAFEfeaturesforratingpurposes.Theaboveobservationsarerecordedonacoversheetthatalsoincludesotherbasicinformationabouttheobserver,program,date,time,etc.

Theremainderofthetooladdressesfivekeyyouthdevelopment“domains”includingrelationships(youth-andstaff-directedareconsideredseparately),youthparticipation,skillbuildingandmastery,andactivity

contentandstructure.Eachdomainissubdividedintofourtosevenspecificindicatorsorpractices.Foreachindicator,adetailed“exemplar”isofferedtoguideratings.Forexample:

Domain:• Relationship-building:Youth.

Indicator:• Allormostyoutharefriendlyandrelaxedwithoneanother.

Exemplar:• Youthsocializeinformally.Theyarerelaxedintheirinteractionswitheachother.Theyappeartoenjoyoneanother’scompany.

TheratingscaleintheOSTObservationInstrumentasksuserstoassestheextenttowhicheachindicatorisorisnotpresentduringanobservation.Whilethedevelopershaveexperimentedwithboththree-andfive-pointscalesinvariousstudies,thethirdeditionoftheinstrumentusesaseven-pointratingscalewhichgivesmoreroomforcapturingsubtleties,where1=notevidentand7=highlyevidentandconsistent(seebelow).A“5”ratingisconsideredbasicquality.Observersareinstructedtofirstselecttheoddnumberthatmostcloselyreflectsthelevelofevidenceobservedandthen,ifnecessary,tomoveupordowntotheadjacentevennumberifthatmoreaccuratelyreflectsthepresenceoftheindicatorwithintheactivity.

DevelopersoftheOSTObservationToolhavestructureditflexiblysothatuserscanarrangescalesdifferentlyfordifferentpurposes.Althoughdefinitiverulesforconstructingscalesdonotexist,intheirreportonthevalidationstudy(Pechmanet.al.,2008),theauthorspresentfourdifferentmethodsforcreatingscales.Thefoursetsofscaleswereusedindifferentstudiesacrossseveralyearsandwereeachguidedbyseparatetheoriesandevaluationquestions.Theauthorspresentthescalesetsasdifferentoptionsforuserstosummarizedata

1 2 3 4 5 6 7

Exemplarisnotevident

Exemplarisrarelyevident

Exemplarismoderatelyevidentorimplicit

Exemplarishighlyevidentandconsistent

Page 45: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

45

fromtheOST.Userswhoareinterestedinsummarizingdatashouldrefertothereportofthevalidationstudyforinformationonwhichitemscomposespecificscales.

Thefirstscalesetincludesfourscales:(1)Youthrelationship-buildingandparticipation,(2)Staff-youthrelationships,(3)Skillbuildingandmasteryand(4)Activitycontentandstructure.ThesescaleswereusedinastudyofSharedFeaturesofHigh-PerformingAfter-SchoolProgramsconductedonbehalfofTheAfter-SchoolCorporationandtheSouthwestEducationalDevelopmentLaboratory(Birminghametal,2005)

Thesecondscalesetissimilarandalsoincludesfourscales:(1)Youthrelationship-building,(2)Staff-relationshipbuilding,(3)Instructionalmethodsand(4)Activitycontentandstructure.ThesescaleswereusedinthefirstyearoftheevaluationoftheNewJerseyAfter3initiative(Kimetal.,2006).

Thethirdscalesetincludesthreescales:(1)Relationships,(2)Instructionalstrategiesand(3)Activitycontentandstructure.ThesescaleswereusedinthefirstyearoftheevaluationoftheNewYorkCityDepartmentofYouthandCommunityDevelopment’sOut-of-SchoolTimeProgramsforYouthinitiative(Russelletal.,2006).

ThefourthscalesetisbasedontheSAFEframeworkwhichemphasizesactivitiesthatareSequenced,Active,FocusedandExplicit,asdescribedbyDurlakandWeissberg(2007).ThefourscalesinthissetcorrespondtothesefourSAFEdomains.ThesescaleswereusedinthesecondyearevaluationsofboththeNewJerseyAfter3initiativeandtheNewYorkCityDepartmentofYouthandCommunityDevelopment’sOut-of-SchoolTimeProgramsforYouthinitiative(WalkingEagleetal.,2008;Russelletal.,2008).

TechnicalPropertiesPsychometricinformationpresentedherefortheOSTObservationInstrumentcomesfromthethreestudiespreviouslymentioned(SharedFeaturesofHigh-PerformingAfter-SchoolPrograms,NewJerseyAfter

3,andtheNewYorkCityDepartmentofYouthandCommunityDevelopment’sOut-of-SchoolTimeProgramsforYouthinitiative).

ScoreDistributionsPechmanandcolleagues(2008)examinedthescoredistributionsofallscalesineachofthefourscalesets.Theauthorsgenerallyfoundgoodvariabilityinthescoresacrossraters’observationsof159to238activitiesin10to15programsobservedacrossthethreestudies.OneexceptionwastheActivescalefromtheSAFEscaleset.Theaveragescoreforthisscalewassomewhatlow(1.9onascaleof1to7),makingitdifficulttodeterminewhetherthescalehasdifficultycapturingdifferencesacrossprogramsormostprogramssimplydonotkeepyouthveryactive.However,findingssuggestthatallotherscalescapturemeaningfuldifferencesacrossavarietyofactivitiesandprograms.

InterraterReliabilityObserversusingthisinstrumentreachedhighlevelsofagreement.Examiningdatafromfiveassessmentsacrossthreeseparatestudies,pairsofresearchersco-observedbetween19to40activitieswithin10to15programsateachassessment.UsingPearsonandintraclasscorrelations,researchersexaminedtheinterraterreliabilityfortheoverallscoreaswellastheaveragescorewithineachoftheinstrument’sfivedomains.Whenavailable,intraclasscorrelationsweregenerallyabovetherecommendedvalueof.50,indicatingstrongagreement.StrongagreementwasalsosupportedbythePearsoncorrelations,whichwereclosetoorabovetherecommendedvalueof.70.7Therefore,thesefindingssuggestthattrainedraterscanachievehighoverallagreementacrosseachofthefivedomains.8

InternalConsistencyInternalconsistencywasstrongforallscalesacrossthefoursetswithCronbach’salphasexceedingtherecommendedvalueof.70.Thesefindingssuggestthateachofthescales’itemsarehighlyrelatedandformmeaningfuldomains.Thefirstscalesetsummarized

7TheoneexceptionwasfortheActivityContentandStructuredomain,whichwaslowerforoneoutofthefiveassessments,butagreementontheotherfourassessmentswasstrong.8Theinterraterreliabilitiesforspecificitemsandscaleswerenotreportedandcouldbelower.

Page 46: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

46

TASCstudydatacollectedfrom173independentobservationsin10programsandhadalphalevelsrangingfrom.73to.88.ThesecondscalesetsummarizedNewJerseyAfter3datacollectedfrom179independentobservationsin10programsandhadalphalevelsrangingfrom.81to.83.ThethirdscalesetsummarizedNewYorkCityOSTstudydatacollectedfrom238independentobservationsfrom15programsandhadalphasrangingfrom.80to.87.ThefourthscalesetsummarizedbothNewYorkCityOSTstudydataandNewJerseyAfter3datacollectedfromacombinedtotalof358observationsandhadalphalevelsrangingfrom.84to.88.

ConcurrentValidityOSTdevelopersexaminedtheconcurrentvalidityofthethirdscalesetdrawingon1,444youthsurveysfromtheDYCDOSTinitiativeinNewYorkCityandtheNewJerseyAfter3initiative.Specifically,usingSpearman’sRhorankordercorrelationcoefficients,researchersexaminedtheassociationsbetweentheOSTRelationships,InstructionalStrategies,andActivityContentandStructurescaleswithresponsesfromaseparateyouthsurveyonInteractionswithStaff,InteractionswithPeers,SenseofBelonging,ExposuretoNewExperiences,andAcademicBenefits.

HigherscoresontheRelationshipsscalewererelatedtohigherscoresforExposuretoNewExperiences,InteractionswithPeers,andInteractionswithStaffforbothyearsofthestudy,aswellashigherscoresforSenseofBelonginginthefirstyearandAcademicBenefitsinthesecondyear.HigherscoresontheInstructionalStrategiesscalewererelatedtohigherscoresonExposuretoNewExperiences,SenseofBelonging,andInteractionswithStaffinthefirstyearofthestudybutnotthesecond.InstructionalStrategieswasnotrelatedtoInteractionswithPeersorAcademicBenefits,andtheActivityContentandStructureScalewasnotrelatedtoanyyouthexperiencesfromtheyouthsurveyineitheryear.

Basedonthesefindings,theavailableconcurrentvalidityevidenceismixed.Inaddition,concurrentvalidityevidencedoesnotcurrentlyexistfortheotherscalesets.

ValidityofScaleStructureAnanalysisexaminingscalestructurevaliditytestswhetheritemsformingmultiplescalestrulymeasuredistinctandcoherentdomainasexpected.Theinstrument’sdevelopersconductedafactoranalysistoexaminethestructuralvalidityofthefourthscaleset.FindingssuggestedthattheitemscouldbecategorizedintothefourSAFEdomains,althoughtheSequencedandExplicitdomainsweremoderatelyrelated,suggestingthattheyarenotentirelydistinctfromoneanother.Inaddition,theActivedomainappearedtobeacombinationoftwodistinctcategories,suggestingthatthisdomainisnotcompletelycohesive.

AlthoughtheinstrumenthassomeevidenceofscalestructurevalidityfortheSAFEdomains,evidencefortheotherthreescalesetsiscurrentlyunavailable.

UserConsiderationsEaseofUseWhiletheOSTObservationInstrumentisavailableonlineandisfreeforanyonetodownloadanduse,itisimportanttorecognizethatitwasdevelopedwithprimarilyaresearchaudienceinmind.Theintroductiontothetoolincludesanoverviewandreviewofbasicproceduresforconductingobservationsandcompletingtheform,butthematerialshavenotbeentailoredforpractitionersatthistimeanduselanguage(e.g.,sampling,reliability)thatmaynotbeaccessibletosomeaudiences.

ItsdevelopersconsidertheOSTObservationTooltobehighlyefficienttocompleteinthefield.Usersobserve15minutesofanactivityandscoreitimmediatelyinlessthanfiveminutes.Usersareadvisedtoobserveatotalof8-10activitiesoveratleasttwoafternoons(orapproximatelythreehoursofprogramobservation)toadequatelysampleprogramofferings.Additionalguidanceabouthowtoorganizeobservationsonsite,sampleactivitiesappropriatelyandmanagemultipleobserversisprovidedintheinstrument’sproceduressection.

AvailableSupportsAtthistime,trainingrelatedtotheOSTObservationInstrumentislimitedtoindividualsinvolvedina

Page 47: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

47

specificstudythatemploystheinstrument.Datacollectorsparticipateintrainingsthatprovideadetailedoverviewoftheinstrument,itsindicatorsandthetheoreticalframework.Followingareviewoftheoperationaldefinitionsforeachcategoryandgroupofindicators,researchersparticipateinpracticeratingsessionsusingvideo-tapedsamplesofafter-schoolactivitiestobuildinterraterreliabilitypriortofieldwork.Additionalreliabilitychecksareconductedinthefieldandinfollow-upmeetingstoensurecommoninterpretationoftermsanditems.

ResearcherstypicallyusetheobservationdatacollectedwiththeOSTinstrumentinconjunctionwithsupplementary(butnotformallylinked)measuressuchasinterviews,surveysandfocusgroups.Asresearchcontinues,validitydatawillbecomeavailableabouttherelationshipbetweenprogramqualityfeaturesandyouthoutcomesasmeasuredbysomeoftheseotherinstruments.

InTheFieldIn2005,theNewYorkCityDepartmentofYouthandCommunityDevelopment(DYCD)contractedwithPSAtoconductacomprehensiveevaluationofits536OSTprogramsserving69,000participantsunderitsOut-of-SchoolTimeProgramsforYouthinitiative.Participatingserviceproviders,servingallgradelevels,operatedunderoneofthreefundingmechanisms:a)OptionI,targetedtowardageneralpoolofserviceprovidersoperatingprogramsinneighborhoodsthroughoutNewYorkCity;b)OptionII,forprogramsusinga30%privatefundingmatch;andc)OptionIII,forprogramsoperatedincollaborationwiththeDepartmentofParksandRecreationandofferedatParkssites.

TheOSTObservationToolwasusedinasampleof15ofthe536OSTprogramsaspartoftheoverallevaluation.Theevaluationcombinedthissamplingdatawithotherdatasources,includingaparticipationdatabaseandprogramdirectorsurveystoroundoutthepicture.Thefirst-yearevaluationfindingsidentifiedavenuesforimprovingtheeffectivenessofOSTprogramminginseveralareas.Forexample,

althoughprogramssuccessfullyenrolledstudentsinthefirstyear,theystruggledtomaintainhighyouthparticipationrates,suggestinganeedtoestablishprogrampoliciesandactivityofferingsthatencouragedregularparticipation.Additionally,whileprogramsinthefirstyearconsistentlyprovidedsafeandstructuredenvironmentsforparticipants,,theyexperiencedchallengesindeliveringinnovative,content-basedlearningopportunitiesthatengagedyouth.

PSA’ssecondyearfindingscenteredonevidenceofprograms’effortstoimproveprogramqualityandscale.ThefindingssuggestthatOSTprogramsincreasedboththeirenrollmentandparticipationrates.Programsscaledupenrollmentfrom51,000youthinthepreviousyeartoservemorethan69,000youththroughoutNewYorkCity.RatesofindividualyouthparticipationalsoincreasedsubstantiallycomparedtoYear1,indicatingthatprogramsweresuccessfullyrecruitingandretainingparticipants.Inaddition,programsreportedthattheyimprovedthequalityandcapacityoftheirprogramstaffthroughimprovedhiringandprofessionaldevelopmentopportunities.

Inyearthree,theevaluationwillcontinuetocollectdatafromOSTprogramstoexploretheassociationsamongprogram-qualityfeatures,youthparticipationpatterns,andyouthoutcomes.

ForMoreInformationTheOSTObservationInstrumentisavailableonlineat:www.policystudies.com/studies/youth/OST%20Instrument.html

ContactChristinaRussellorEllenPechmanPolicyStudiesAssociates1718ConnecticutAvenue,NWWashington,[email protected]

Page 48: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

48

PurposeandHistoryTheProgramObservationToolisthecenterpieceoftheNationalAfterSchoolAssociation’sprogramimprovementandaccreditationprocessandisdesignedspecificallytohelpprogramsassessprogressagainsttheirStandardsforQualitySchool-AgeCare.TheinstrumentwasdevelopedbytheNationalAfterSchoolAssociation(NAA)andtheNationalInstituteonOut-of-SchoolTimein1991andwasbasedontheAssessingSchool-AgeChildCareQualityProgramObservationInstrumentdevelopedbySusanO’Connor,ThelmaHarms,DebbyCryerandKathrynWheeler.Theinstrumentwasrevisedin1995andpilotedbetween1995and1997.AdditionalrevisionswerethenmadebeforeNAA’saccreditationsystembecameactivein1998.

TheNAAStandards,whichtheProgramObservationToolisbasedon,aremeanttoprovideabaselineofqualityforafter-schoolprogramsservingchildrenandyouthbetweenages5and14.Theyareintendedforuseingroupsettings–primarilyschoolandcenter-based–wherechildrenparticipateregularlyandwherethegoalissupportingandenhancingoveralldevelopment.

Rootedintheframeworksoftheearlychildhoodandschool-agecarefieldsoftheearly1990’s,theinstrumentandtheNAAstandardsreflectmuchofthethinkingofthetime,particularlyintermsoflicensingandmonitoring,andhavebeenusedaspartofaseven-stepaccreditationprocessforthepastdecade.Pre-datingthecreationofthefederal21stCenturyCommunityLearningCentersprogram,theNAAstandardshaveplayedasignificantroleinthefieldandhavebeenadoptedbyarangeofprogramsandsystemsacrossthecountry.Therearenow20,000copiesofthestandardsbookinprintandover500programsacrossthecountryareinsomestageoftheaccreditationprocess.

In2008,NAAshiftedawayfromitsroleasanaccreditingbodyandisnowofferingaccreditationthroughtheCouncilonAccreditation.NAAwillcompletetheaccreditationprocessthroughtheendof2009forallagenciesthatappliedandwereatsomestageoftheprocessbeforeSeptember2008.Foragenciesapplying

afterSeptember2008,aprogramrelationshipmanagerfromtheCouncilwillassistthemthroughtheprocess.

TheProgramObservationToolwillstillbeavailableforagenciesinterestedinusingitforself-assessmentandimprovementpurposes(ashasalwaysbeenthecaseforagenciesnotseekingaccreditation).NAAwillretaintherightstothestandardsandmaterials,andcontinuetoprovidesupportsfortechnicalassistance.

ContentTheProgramObservationToolmeasures36“keysofquality”thatareorganizedintosixcategories.Fiveofthosecategoriesareconsideredobservableandareassessedprimarilythroughobservation:humanrelationships;indoorenvironment;outdoorenvironment;activities;andsafety,healthandnutrition.Thesixthcategory–administration–isassessedthroughquestionnaire.

BecauseofNAA’scommitmenttosupportingchilddevelopmentinaholisticway,theinstrumentmeasuresarangeofsocialprocesses–howchildrenandstaffwithinthesettinginteract.Becauseofthelinktoaccreditation,italsofocusesquiteabitonprogramresourcesandthearrangement(spatial,socialandtemporal)ofthoseresourceswithintheprogram.Unlikesomeoftheothertoolsinthiscompendium,theProgramObservationToolalsoaddressesprogrampoliciesandproceduresthatarebelievedtoinfluencequality.

TheProgramObservationToolpre-datestheNationalResearchCouncil’sfeaturesofpositivedevelopmentsettingsframework(2002)byoveradecadeanddrawsmoreheavilyontheearlychildhoodliteraturethantheyouthdevelopmentliterature.However,itdoesaddressmanyoftheNRCfeatures,placingtheleastemphasison“supportforefficacyandmattering”and“skillbuildingopportunities.”

StructureandMethodologyThefivequalitycategoriesthatarethefocusoftheProgramObservationToolaremeasuredusingoneoverallinstrumentthatincludesthe20relevantkeysandatotalof80indicators(fourperkey).Ifaprogramisgoingthroughthe

DevelopedbytheNationalAfterSchoolAssociation

Page 49: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

49

accreditationprocess,theadministrationitems(includedintheStandards,butnottheObservationTool)areassessedseparately,throughquestionnaire/interview.

TheratingscaleusedthroughouttheProgramObservationTool(seeexamplebelow)isintendedtocapturewhethereachindicatoristrueallofthetime(3),mostofthetime(2),sometimes(1)ornotatall(0).Althoughspecificdescriptionsofwhata0,1,2,or3lookslikeforeachindicatorarenotprovided,betweenoneandeightdescriptivebulletstatementsareincludedundereachindicatortoclarifymeaning.

Spaceisprovidedforobserverstotakenotesoneachindicator.Atthebottomofeachpage,observersareencouragedtototaltheirnumericalscoresforeachqualitykeytoachieveanoverallprogramrating.Tallysheetsandinstructionsareprovidedformultipleobserverstoreconcileandcombinetheirscores.Inordertoachieveaccreditation,therearetwo“weighted”categories–program/activitiesandsafety/nutritioninwhichprogramsmustmeetacertainthresholdinordertobeaccredited.

6.Childrenandyouthgenerallyinteractwithoneanotherinpositiveways.GuidingQuestions:Dochildrenseemtoenjoyspendingtimetogether?Dotheytalkaboutfriendsattheprogram?Dotheytendtoincludeothersfromdifferentbackgroundsorwithdifferentabilitiesintheirplay?

Comments Rating

a.Childrenappearrelaxedandinvolvedwitheachother.Groupsoundsarepleasantmostofthetime.•

0123

b.Childrenshowrespectforeachother.Teasing,belittlingorpickingonparticularchildrenisuncommon.•

Childrenshowsympathyforeachotherandhelpeachother.•0123

c.Childrenusuallycooperateandworkwelltogether.Childrenwillinglysharematerialsandspace.•

Theysuggestactivities,negotiaterolesandjointlyworkoutrules.•

Childrenincludeotherswithdevelopmental,physicalorlanguage•differenceintheirplay.

Childrenoftenhelpeachother.•

Thereisastrongsenseofcommunity.•

0123

d.Whenproblemsoccur,childrenoftentrytodiscusstheirdifferencesandworkoutasolution.

Childrenlistentoeachother’spointofviewandtrytocompromise•(e.g.iftwochildrenwanttousethesameequipment,theymaydecidetotaketurnsasasolution).

Childrenknowhowtosolveproblems.•

Theirsolutionsaregenerallyreasonableandfair.•

Theydonottrytosolvedisagreementsbybullyingoractingaggressively.•

0123

Page 50: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

50

TechnicalPropertiesAlthoughnopsychometricevidenceisavailableontheProgramObservationToolitself,thereisinformationavailableabouttheASQ(AssessingSchool-AgeChildcareQuality),fromwhichthePOTwasderived.UsersshouldnotethattheASQ’spsychometricpropertiesmaynotbecompletelyconsistentwiththoseofthePOT.9Overall,evidenceforinterraterandtest-retestreliabilityisstrongfortheASQ,meaningtheassessmentsofthesameprogrampracticesbydifferentobserversareconsistentandassessmentsarestableovertime.Followingrevisionstothescales,evidenceofinternalconsistency,orthedegreetowhichitemsfittogetherinmeaningfulways,wasstrong.Validitydataarelimited,althoughpreliminaryevidenceforconcurrentvaliditysuggeststheinstrumentmayyieldaccurateinformationabouttheconceptsitmeasures.10

ThefieldstudywhichprovidespsychometricsupportfortheASQinvolvedasampleof40after-schoolprogramsinMassachusettsandNorthCarolina(Knowlton&Cryer,1994).TwoversionsofASQscaleswereexamined:originalandrevised.Therevisedversion’sscalesarecomparabletothoseinthePOT:HumanRelationships,IndoorEnvironment,OutdoorEnvironment,Activities,andSafety,HealthandNutrition.Oftheoriginalscales,onlytwooverlappedwiththePOT,namelyhumanrelationshipsandactivities.Whenappropriate,westatewhichsetofscalesexhibitsspecificproperties.

InterraterReliabilityToexamineinterraterreliability,pairedratersevaluated40programsusingthemeasure.ASQindicatorsareorganizedinto21itemsandthoseitemsarefurtherorganizedintofivescales.KnowltonandCryerexaminedagreementamongratersatboththeitemandscalelevels.Thekappastatisticmeasuresthedegreetowhichratersagreeandcorrectsforcaseswhereratersagreesimplybychance.Allitemshadkappascoresabove.70,generallyconsideredthethresholdforhighagreement.Theauthorsalsocomputedintraclasscorrelationsand

alloftheASQoriginalscalesandtotalscorewerenearorabove.70,showinggoodagreementonthesescores.11However,becauseonlytheoriginalASQscales,nottherevisedversions,wereexamined,wecangeneralizeonlyforthosescalesthataresimilar(HumanRelationships,Activities–andthetotalscore).

Test-RetestReliabilityIdeally,instrumentsshouldbeabletoassessmajorchangesovertimebutshouldexhibitstabilityinscoresacrossmultipleassessmentsintheshort-term.FortheASQ,25programswerereassessedtwoweeksaftertheirinitialassessmenttodeterminetheinstrument’stest-retestreliability.KnowltonandCryer(1994)foundthatallitemsdemonstratedacceptablestability,withkappascoresabove.70.12Theauthorsalsocomputedintraclasscorrelationcoefficientstoexaminestabilityoftheoriginalscalesandtotalscoreovertime.Allscalesandtotalscorewereabove0.70,butwecanonlygeneralizetothescalesthatoverlapwiththePOT–HumanRelationshipsandActivities–andthetotalscore.

InternalConsistencyTodeterminewhetheritemswithinthescalesfittogetherinmeaningfulways,KnowltonandCryerexaminedtheinternalconsistencyoftheoriginalscalesandthetotalscorebycomputingastatisticcalledCronbach’salpha.Thealphaforoneoftheoriginalscales(Safety)wasverylow,sotheauthorsrevisedthescales(andtherevisionsmorecloselymatchthePOT).Resultsfromtherevisedscalesandthetotalscoredemonstratedgoodinternalconsistency,withalphasnearorabovetherecommendedcutoffof.70.

ConvergentValidityTodeterminetheextenttowhichtheASQyieldsaccurateinformationabouttheaspectsofprograms

9ThereareslightlymoreindicatorsintheASQ(84)thaninthePOT(80).Itisunclearhowmanyindicatorsareidenticalorsimilar.10Thetechnicalsectiononlyevaluatesevidencefromtheobservationalportionoftheinstrument,nottheadministrationquestionnaire.

11ReadersshouldnotethatKnowltonandCryeralsolookedattheinterraterreliabilityoftheindividualindicatorsthatcomposedtheitems.Manyindicatorsexhibitedpooragreement.However,summingtheindicatorsintoitemscreatesmorereliablemeasuresbecauseitcancelsoutsomeofthemeasurementproblems.Forthisreason,usersshouldevaluateprogramsbasedontheitemsandscales,nottheindividualindicators.12Similartotestsoninterraterreliability,theauthorsfoundthat40percentoftheindicatorshadpoorshort-termstability.However,themeasurementproblemsassociatedwithindividualindicatorslikelycanceloutwhencreatinganitemscore.Again,usersshouldexaminetheitemsandscales,nottheindicators,whenevaluatingprograms.

Page 51: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

51

itissupposedtomeasure,KnowltonandCryer(1994)comparedtheASQscoresfor11programswithsubjectiveratingsbyexperts.Specifically,twoexpertsrankedasetofprogramsintermsofoverallqualitywithineachofthefiveoriginalASQdomainsusingtheirowncriteria.UsingwhatiscalledtheSpearmancorrelation,ASQrankingsweremoderatelytostronglyrelatedtotheexpertrankings,withtheexceptionoftheSafetyandHealthandNutritionareas.Thisvalidityevidenceshouldberegardedaspreliminary,basedonthesmallnumberofprogramsandexpertsincludedintheanalysisandthefactthatestimateswerecomputedontheoriginal,unrevisedASQscales.

UserConsiderationsEaseofUseTheProgramObservationToolandNAAstandardsweredevelopedwithsignificantinputfrompractitioners,resultinginaccessiblelanguageandauser-friendlyformat.

ProgramswishingtoundertaketheaccreditationprocesscancontacttheCouncilonAccreditation(seecontactinformation).Forself-assessmentpurposes,observingtheprogramandscoringthefullinstrumenttakesroughly3-5hours.Theself-studymanualprovidesverydetailedguidancetoprogramdirectorsandstaffonhowlongandhowmuchoftheprogramtoobserve,howtodetermineratingsandhowtocombinescoresfromdifferentraters.

Theobservationtoolisoneofapackageofproductsrelatedtoaccreditation–theAdvancingandRecognizingQualityKit–whichincludesthestandardsbook;theguidetoprogramaccreditation;self-studymanualsthatincludetheobservationtoolaswellasstaff,familyandchild/youthquestionnaires;andatrainingvideo.Theteamleader’smanualwalksprogramdirectorsorstaffthroughthevariousstepsoftheaccreditationprocessindetailandincludesspecifictoolsfordevelopinganactionplanforimprovementbasedonobservationaldata.Asapackage,theseresourcescostapproximately$300.Thereareadditionalcostsrelatedtothefullaccreditationprocess.

AvailableSupportsItisimportanttoreiteratethatwhilethissummaryhasfocusedspecificallyontheProgramObservationTool,thatinstrumentisjustonepieceofanintegratedsetofresourcesrelatedtoself-studyandaccreditation.NAAofferstrainingthatcoverstheProgramObservationToolthroughitstheday-longEndorserTraining(NAArecommendstwoandahalfdaysoftraininginordertoensurereliability).SomeNAAstateaffiliatesofferlocaltrainingrelatedtotheinstrumentforprogramsinterestedinusingitforself-assessmentandimprovement.

IntheFieldTheUniversityofMissouri-AdventureClubisadistrict-wideafter-schoolinitiativeforelementaryschoolstudentsinMissouri’sColumbiapublicschooldistrict.TheNationalAfterschoolAssociation’sstandardsandobservationtool,aswellasthelargerAdvancingSchool-AgeQuality(ASQ)processwithinwhichtheseareembedded,serveastheorganizingframeworkforAdventureClub’s18programs.Institutionalizationofthestandardshasresultedinacommonlanguageandunderstandingofprogramqualitythatspanstheindividualstaff,programandcross-sitelevels.

TheProgramObservationToolisusedbyeachofthe18programsseveraltimeseachyearandisacorepieceofthenewstafforientationprocess,whichincludesconductinganddiscussingaprogramobservationwithmoreseniorcolleagues.Line-staffarewell-versedinthe36“keysofquality,”andeachweekduringcross-sitedirectors’meetingsonekeyisthefocusofin-depthdiscussion.

Inadditiontoregularobservations–bystaff,administratorsandparents,eachprogramhasanASQteammadeupofthesestakeholders(parents,staff,administratorsandsometimeschildren).Teamsmeetmonthlyorbi-monthlytoreviewnewobservationdataandrevisittheprogram’simprovementplan.“Thisisacontinuousprocess–itdoesn’tstartandstopeachyear.Eachprogramdevelopedaplanwhenwefirststartedusingthestandardsandthosegetrevisitedandupdatedseveraltimesayearbasedonongoingobservation,”

Page 52: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

52

explainedChrissyPoertner,whocoordinatestheaccreditationandimprovementprocessforthe18programs.Observationdataandprogramimprovementplansarealsousedtoguidestaffdevelopment.

Initiallysomestaffexpressedconcernthatthetoolwaslongandwouldbecumbersometoworkwith,butPoertnersaystheoverallresponsehasbeenverypositive,especiallybecauseeveryoneisinvolvedinandownstheprocess.“Thesetoolsgivestaffaguide,andwhenyou’reoutthereworkinginthefieldtheautonomycanfeeloverwhelming.Becausewe’vecreatedthebuy-inandtheyarepartoftheimprovementprocess,peoplerespondreallypositively.”

ForMoreInformationAdditionalinformationaboutNAA’sobservationtoolandaccreditationprocessisavailableonlineat:http://naaweb.yourmembership.com/?page=NAAAccreditation

ContactJudyNee,PresidentandCEOTheNationalAfterSchoolAssociation529MainStreet,Suite214Charlestown,[email protected]

Page 53: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

53

PurposeandHistoryTheProgramQualityObservationScale(PQO),fundedbytheNationalInstituteofChildHealthandHumanDevelopment(NICHD)aspartofaninitiativetostudyout-of-schooltime,wasdesignedtohelpobserverscharacterizetheoverallqualityofanafter-schoolprogramenvironmentandtodocumentindividualchildren’sexperienceswithinprograms.Thetoolhastwocomponents–qualitativeratingsfocusedontheprogramenvironmentandstaffbehaviorandtimesamplesofchildren’sactivitiesandtheirinteractionswithstaffandpeers.

ThePQOwasdevelopedforresearchpurposesbyDeborahVandellandKimPierceandhasbeenusedinaseriesofstudies,primarilylookingatthequalityofschool-andcenter-basedafter-schoolprogramsservingfirstthroughfifthgradeelementary-schoolchildren.TheinstrumenthasitsrootsinVandell’sobservationalworkinearlychildhoodcaresettings,includingtheNICHDStudyofEarlyChildhoodCareandherworkinafter-schoolprograms,includingtheEcologicalStudyofAfter-SchoolCarefundedbytheSpencerFoundation.

Theprimaryfocusofthetimesampleprocedureisonthreecomponentsofindividualchildren’sexperiencesinprograms–relationshipswithstaff,relationshipswithpeersandopportunitiesforengagementinactivities.Thequalititativeratingsfocusonallchildren’sexperiencesintheprogramintermsofstaffbehaviorandtheprogramenvironment.Thequalitativeratingsofprogramenvironmentarebestsuitedforuseinformalschool-orcenter-basedafter-schoolprograms,whilethequalitativeratingsofstaffbehaviorsandthetimesamplingofchildren’sactivitiesandinteractionsarerelevantinbothformalprogramsettingsaswellasinformal,adult-supervisedsettings.

ContentThePQOwasdesignedtohelpresearchersunderstandthequalityofchildren’sexperiencesinsideprogramsandfocusesonthreecomponentsofquality–relationshipswithstaff,relationshipswithpeersandopportunitiesfor

engagementinactivities.Asnotedabove,theinstrumenthastwomajorcomponents–qualitativeratingsandtimesamplesofchildren’sactivitiesandinteractions.Ratingsaremadeoftheprogramenvironmentandstaffbehavior,orwhatthedeveloperscall“caregiverstyle.”Thefollowingthreeaspectsoftheprogramenvironmentarerated:

Programmingflexibility•

Appropriatenessanddiversityoftheavailable•activities

Chaos•

Fourcharacteristicsofcaregiverstylearerated:

Positivebehaviormanagement•

Negativebehaviormanagement•

Positiveregardforchildren•

Negativeregardforchildren•

Thetimesamplecomponentofthetoolisdesignedtorecordtheactivitiesandinteractionsofindividualchildrenwithintheprogram.Thereare19differentactivitycategoriesforobserverstoselectfrom(e.g.,arts/crafts,largemotor,snack,academic/homework).Inaddition,thetoolprovidesobserverswithsixdifferenttypesofinteractionstolookfor:positive,neutralandnegativeinteractionswithpeers,andpositive,neutralandnegativeinteractionswithstaff.

BecausethefocusofthePQOisonchildren’sexperiencesinsideofprograms,ittendstofocusprimarilyonsocialprocessesandlessonresourcesortheorganizationofthoseresourceswithinprograms.However,Vandellandcolleagueshavedevelopedanumberofrelatedmeasuresthatdocaptureaspectsoftheseothercomponents,suchasaphysicalenvironmentscale.DevelopedlongbeforetheNationalResearchCouncil’sfeaturesofpositivedevelopmentalsettingsframework(2002),someaspectsofthePQOalignwellwiththatframeworkwhileothersmoreclearlyreflectitsearlychildhoodroots.

DevelopedbyDeborahLoweVandell&KimPierce

Page 54: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

54

StructureandMethodologyThefirstcomponentofthePQO–thequalitativeratings–isfocusedonprogramenvironmentandstaffbehavioror“caregiverstyle.”Ratingsareassignedbasedonaminimumof90minutesofcontinuousobservation.Whileprogramenvironmentratingsaremadeoftheprogramasawhole,caregiverstyleratingsaremadeseparatelyforeachstaffmemberobserved(butcouldbeadaptedtorateallstaffmemberscollectively).

Programenvironmentandcaregiverstyleratingsaremadeusingafour-pointscale.Usersaregivendescriptionsofwhatconstitutesa1,2,3or4rating

forthreedistinctaspectsofprogramenvironment–flexibility,activitiesandchaos,andfourdifferentaspectsofcaregiverstyle–positivebehaviormanagement,negativebehaviormanagement,positiveregardandnegativeregard.A“4”ratingmeansthatparticularaspectoftheenvironment(orstaffbehavior)ishighlycharacteristicoftheprogram(seeexamplebelow).

ThetimesamplingcomponentofthePQOisfocusedontheactivitiesandinteractionsthatindividualchildrenengageinatanafter-schoolprogram.Activitytypeisrecordedusing19differentcategories.Interactionsareassessedintermsofwhethertheyarepositive,neutralornegativeandwhethertheyhappenwithpeersorwithstaff.Inaddition,staffinteractionsarefurthercodedtonotewhethertheyareone-on-one,smallgrouporlargegroup.

Timesamplingentailsdocumentingtheactivitiesandinteractionsthatanumberofindividualchildrenhaveinaprogramforshortperiodsoftime.ThedevelopersofthePQOsuggestthat30-minutetimesamplesbeconductedin30-secondintervals(foratotalof60intervals).Duringeachinterval,theraterobservesachildfor20secondsandthenspends10secondsrecordingorcodingwhattheyobserved.Becausetimesampleobservationswillsometimesinvolvefewerthan60intervals,scoresneedtobeadjustedforthetotalnumberofintervalsactuallyobserved.Thistimesamplingcomponenthasbeenadjustedforuseindifferentstudies(forexamplewithlongerobservationperiods,fewercycles,etc).13

TechnicalPropertiesAvailablepsychometricevidencesupportingthePQOaddressesscoredistribution,interraterreliability,test-retestreliability,convergentvalidityandconcurrentvalidity,mostlyfromareportbyVandellandPierce

Chaos

4=

Chaosanddisorganizationarehighlycharacteristics,persistingacrossmultipleactivitiesandsettings.Thechildrenareoutofcontrol.Theymaybefightingwithoneanother,yelling,orbehavinginappropriately,jumpingonfurniture,ruiningmaterials,orjustgenerallyrunningaround.Activitiesdonotseemorganized;disorderisevident.

3=

Thereischaosanddisorganizationintheenvironment,butitisnotcharacteristicofmanychildrenorallactivities.Agroupofchildrenmayexhibitthebehaviorsthatmeritaratingof4orsomeactivitiesandtransitiontimesmaybechaoticanddisorganizedsuchthattheprogressoforbeginningofactivitiesforsomechildrenisimpeded.

2=

Oneortwochildren’sbehaviormaybeoutofcontrol,butingeneral,children’sbehaviorisappropriateandreasonablycontrolled.Transitionsandactivitiesgenerallygosmoothly,althoughtheremaybeexceptions.

1=Nochaosordisorganizationisobservedintheenvironment.Children’sbehaviorisappropriate,andactivitiesandtransitionsproceedsmoothly.

13ReadersshouldnotethatthedevelopersdidnotdesignthePQOforself-assessment,butratherforalargestudythatrequiredthattimesampleobservationscenteronasinglechildofinterest.Thetimesamplecomponentoftheinstrumentcouldbemodifiedforgeneralusebyobserversrandomlychoosingchildrenforeachassessment.However,itisuncleariftheavailablepsychometricfindingsonthetimesampleobservationswillextendtothismodifiedinstrument.Thiscaveatdoesnotapplytothequalitativeratings,whichweredesignedtomeasuretheprogramasawholeanddonotrequiremodificationforself-assessment.

Page 55: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

55

(2006)basedonmultipleobservationsofafter-schoolprogramsoverseveralyearsintheNICHDStudyofAfter-SchoolCareandChildren’sDevelopment.Thestudyincludedabroadsampleof46for-andnonprofitprogramslocatedinschools,childcarecentersandcommunitycenters.Eachprogramwasobservedthreeorfourtimesayearoverafive-yearperiod.PredictivevalidityevidencecomesfromastudyconductedbyPierce,BoltandVandell(2008),whichexaminessocialandacademicoutcomesfrom120childrenenrolledin46after-schoolprograms.Childrenwereassessedduringtheir2ndand3rdgrades.

ScoreDistributionsScoredistributionshelpusersdeterminewhetheritemsadequatelydistinguishbetweenprogramsonspecificdimensions.VandellandPierceexaminedtheaveragescoresandrangesforoverallobservedqualityandtheindividualqualitativeratingscalesobtainedinformalprogramsintheStudyofAfter-SchoolCare.Theoverallqualityscorewascreatedbyaveragingtheindividualqualitativeratings(afterreversingthescoresforChaos,NegativeRegardandNegativeBehaviorManagement).Forboththeoverallqualityscoreandtheindividualratings,annualcompositesareaveragesofallobservationsconductedwithinaschoolyear.

Theoverallscoreandmostofthequalitativeratingsandtime-sampledactivitiesandinteractionshadwidevariability,suggestingtheinstrumentcandetectdifferencesamongavarietyofprograms.Acrossmultipleobservationsinseveralyearsofstudy,thefullrangeofscoreswasobtainedformostofthesampledactivities.Amongthequalitativeratings,lowvariabilitywasfoundamongNegativeBehaviorManagementandNegativeRegardforChildren.However,thestrongvalidityevidencesuggeststhattheinstrumentisdetectingmeaningfuldifferencesinthesedomainsdespitetheirlowfrequencies.

Forchildren’sinteractions,thefullrangeofscoreswasobtainedforneutralinteractionswithstaffandwithpeers;aswouldbeexpected,therangewasmorerestrictedforinteractionsthatwereclearlypositiveornegative.

InterraterReliabilityThedegreetowhichdifferentratersagreewhenobservingthesameprogramwastestedforboththequalitativeratingsandtimesamplingcomponentsoftheinstrument.Forthequalitativeratings,kappacoefficientswerecomputedonceayearoverfouryears.Allofthedomainshadscoresabove.70,thebenchmarkforstronginterraterreliability,exceptStaffNegativeRegard,forwhichthelowestcoefficientwas.59.TheproportionofNegativeRegardscoresonwhichobserversachievedexactagreementwashigh,however,suggestingthatthemoderatekappascoremaybeduetotherelativeinfrequencywithwhichnegativeregardwasobserved.Theaveragekappascoreforstaffnegativeregardwas.82,suggestingthattrainedraterscanreachacceptableagreementonalldomains.

Agreementwasalsocomputedforalldomainsofthetimesampleobservationsexceptgroupsize.Kappascoresatalltimepointswereeitherabove.70orveryclose,indicatingstronginterraterreliability.

InternalConsistencyTodeterminewhetheritemsfittogethertoformameaningfuloverallscore,theauthorscomputedastatisticcalledCronbach’salpha.VandellandPiercefoundalphalevelsfortheannualoverallobservedqualityscoresaveraged.81,wellabovetherecommendedcutoffof.70.

Test-RetestReliabilityInordertodeterminewhetherthequalitycompositeandindividualqualitativeratingsgeneratedbythePQOarestableovertime,VandellandPiercecorrelatedtheratingsmadeinadjacentobservationsduringthesecondyearofthestudy,whenthesamplewaslargestandmostrepresentativeoftherangeofprogramsinthecommunity(N=45).Fourobservationswereconductedineachprogram,approximatelytwomonthsapart.Ratingsfromthefirstobservationwerecorrelatedwiththosefromthesecond;theratingsfromthesecondobservationwerecorrelatedwiththosefromthethird;andtheratingsfromthethirdobservationwerecorrelatedwiththosefromthefourth.Correlations

Page 56: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

56

foroverallobservedqualitywerenearorabove.70,suggestingtheinstrumentdetectschangesinprogramqualityandisnotoverlysensitivetominorchanges.Correlationsfortheindividualratingswerelower,withtheaverageforalldomainsrangingfrom.34to.59.Thissuggeststhatprogramsareonlysomewhatstableintheirscoresforparticulardomainsoverperiodsoftwomonths.Itisunclearwhetherthisreflectsshort-termvariability(asmightbeseenfromonedaytothenext)ormeaningfulchangesoverthecourseoftwomonths.

ConvergentValidityToexaminewhetherthePQOyieldsaccurateinformationabouttheaspectsofprogramsitissupposedtomeasuretheauthorscomparedfindingsforthequalitativeratingstofindingsfromtheSACERS(alsoreviewedinthisreport).Ifbothinstrumentstrulymeasureprogramquality,onecanreasonablyexpectthatthefindingswillberelated.ThefollowingrelationshipsbetweenPQOandSACERSscaleswereexamined:(1)PQOProgrammaticFlexibilitywaspositivelyrelatedtoSACERSProgramStructure,(2)PQOAvailableActivitieswaspositiverelatedtoSACERSActivities,and(3)PQOStaffPositiveRegardandPositiveBehaviorManagementwaspositivelyrelatedtoSACERSInteractions,andPQOStaffNegativeRegardandNegativeBehaviorManagementwasnegativelyrelatedtoSACERSInteractions.Thesefindingsprovidestrongevidencethattheinstrumentadequatelymeasuresprogramquality.Althoughconvergentvalidityissupportedformostqualitativeitems,wecannotinferthevalidityoftheChaosratingbecausetherewasnocomparableSACERSquestiontocompareitto.

ConcurrentValidityAnotherwaytoexaminewhetherthePQOyieldsaccurateinformationabouttheaspectsofprogramsitissupposedtomeasureistocomparetheinstrument’sratingstootherdistinctbuttheoreticallyimportantconcepts.DeveloperscomparedthefindingsassessedbythePQOtostructuralfeaturesofafter-schoolprograms(Pierce,K.M.,Hamm,J.V.,Sisco,C.,&Gmeinder,K.,1995).Similartowhatisreportedintheearlychildhoodliterature,PositiveRegardratingswerehigherandNegativeRegardscoreswerelowerinnonprofit

programscomparedtofor-profitprograms,whenchild-staffratiosweresmallerandwhenprogramstaffhadmoreformaleducation.Programmingflexibilitywashigherinnonprofitcomparedtofor-profitprogramsandwhenchild-staffratiosweresmaller.Ratingsofavailableactivitieswerehigherinnonprofitprograms.

Relationsbetweentime-sampledactivitiesandinteractionswerealsoassociatedwithprogramcharacteristicsaswellaschildreportsoftheirexperiencesintheprograms.Forexample,childrenwereobservedtohavemorefrequentpositive/neutralinteractionswithstaffandlessfrequentnegativeinteractionswithpeersinprogramswithsmallergroupsizes;andsmallerstaff-childratioswereassociatedwithchildrenhavingmorefrequentpositive/neutralinteractionswithstaffandspendinglesstimeintransition(e.g.,standinginline)andinlargemotoractivities.Althoughthisevidenceisquitestrong,itisunclearwhetherresearchershadexpectedadditionalrelationshipsthattheydidnotfind.

PredictiveValidityInastudyconductedbyPierce,Bolt,andVandell(2008),researchersexaminedtherelationshipsbetweenthreePQOscales(Staff-childrelations,Availableactivities,andProgrammingflexibility)andsocialandacademicoutcomesof120childrenenrolledin46after-schoolprograms.Programobservationswereconductedseveraltimesperschoolyearfortwoyears,andchildren’soutcomeswereassessedattheendofeachschoolyearwhentheywerein2ndand3rdgrades,respectively.BetterStaff-Childrelationswasassociatedwithhigherreadingscores(bothgradelevels),mathscores(grade3only)andbettersocialskills(2ndgradeboysonly),butthescalewasunrelatedtoworkhabits.Activitieswasrelatedtobettermathgradesandworkhabits(bothforGrade3only),butthescalewasunrelatedtoreadinggradesandsocialskills.ProgrammingFlexibilitywasnotrelatedtoanyoutcomes.Predictivevalidityappearsmixedbuttheevidenceshouldberegardedaspreliminary.Theauthorsstatethatlittleresearchcurrentlyexiststhatexaminestherelationshipsofspecificprogramstrategies(ratherthanoverallprogramquality)withchildren’sacademicandsocialoutcomes.

Page 57: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

57

UserConsiderationsEaseofUseWhilethePQOisavailableforanyonetouse,itisimportanttorecognizethatitwasdevelopedwithexclusivelyaresearchaudienceinmind.Whilethemanualincludesbasicinstructionsforconductingobservationsandcompletingtheforms,itwaswrittenforresearchersparticipatingindatacollectionrelatedtoaparticularstudy.Thematerialshavenotbeentailoredforgeneralorpractitioneruseatthistimeandthereforeincludesomeconceptsandlanguage(e.g.,adjustedfrequencies,sampling,qualitative)thatmaynotbeparticularlyaccessiblefornon-researchaudiences.

InthecontextofthestudiesthePQOwasdevelopedfor,formalobservationtimeatsiteswasfairlylimited,butsomeadditionaltimeshouldbefactoredinforreviewingnotesandassemblingratings.Itisrecommendedthatthequalitativeratingsofenvironmentandstaffbehaviorbemadebasedonaminimumof90minutesofobservation.Completingthetimesampleprocessasoutlinedinthemanualtakesaminimumof30minutes(6030-secondcycles)foranexperiencedobserver.Someguidanceabouthowtoconductobservationsanddevelopratingsisprovidedinthemanual.

AvailableSupportsAtthistime,trainingisnotregularlyavailableonhowtousethePQO,buthasbeenconductedwithdatacollectorsinvolvedinthestudiesitwasdevelopedfor.Trainingsincludereviewingthecontentsoftheinstrumentandpairingnewraterswithtrainedraterstodoanobservationinthefield,comparescoresandbuildinter-observeragreement.

ObservationdatacollectedusingthePQOhavealwaysbeencoupledwithsupplementarydatasourcessuchasaquestionnaireaboutthephysicalenvironmentaswellasstaff,studentandparentsurveys.HoweverformallinksdonotexistbetweentheobservationtoolandothermeasuresandthePQOcouldbeusedindependently.

IntheFieldIntheStudyofAfter-SchoolCareandChildren’sDevelopment,conductedbyDeborahVandellandKimPierceinthemid-1990s,liveobservationofchildren’sexperiencesinprogramswasatthecenteroftheresearch(Pierce,K.,Hamm,J.,&Vandell,D.L.,1999).Observationswereconductedduringtheprogramparticipants’first-gradeyearandeachchildwasobservedthreetimesbyanindividualobserverwhowasrandomlyassignedfromapoolofobservers.TheobserversusedbothcomponentsofthePQO–thetimesampleprocedureandqualitativeratingsoftheprogramenvironmentandcaregiverstyle.Othertypesofinformationwerecollectedusingdifferentmethodsandmeasures.

Inanalyzingthedata,theresearcherslookedforassociationsbetweenthevariousmeasuresofprogramqualityandalsoatassociationsbetweenprogramqualityandchildren’sadjustmentatschool.Intermsofhowaspectsofprogramqualityrelate,staffpositivitywasnegativelycorrelatedwithstaffnegativity,asonemightexpect.Staffpositivitywashigherinprogramsthatweremoreflexibleandofferedmoreactivities.Staffnegativitywasassociatedwithlessprogrammaticflexibility.Theyalsofoundassociationsbetweentheprogramqualityindicatorsandchildren’sadjustmentinthefirst-gradeclassroom,primarilyforboys.Staffpositivitywasassociatedwithboysearninghigherreadingandmathgradesandexhibitinglessexternalizingbehavioratschool.Greaterprogrammingflexibilitywasassociatedwithboysexhibitingbettersocialskillsatschool.Greateravailabilityofage-appropriateactivitieswasassociatedwithboysearningpoorerreadingandmathgrades,andexhibitingpoorerworkhabitsandmoreexternalizingbehavioratschool.

Pierce,Bolt,andVandell(inpress)recentlyexaminedassociationsbetweenprogramqualityindicatorsasmeasuredbyseveralPQOqualitativeratings(staffpositiveregard,activities,flexibility)andchildren’sadjustmentatschool(grades,workhabits,socialskills)inGrades2and3,controllingforchildandfamilycharacteristicsandchildprioradjustment.The

Page 58: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

58

researchersfoundthatgreaterstaffpositivityinafter-schoolprogramswasassociatedwithbothboysandgirlsearningbetterreadinggradesinGrades2and3andbettermathgradesinGrade3.BoysalsoexhibitedbettersocialskillsinGrade2whentheirafter-schoolprogramswerecharacterizedbygreaterstaffpositivity.Availabilityofmultipleactivitiesintheafter-schoolprogramswasassociatedwithboysandgirlsearningbettermathgradesandexhibitingbetterworkhabitsatschoolinGrade3.ProgrammingflexibilitywasnotassociatedwithchildoutcomesinGrades2and3.

VandellandPierce(2001)alsoreportedlong-termassociationsbetweenoverallprogramquality,asmeasuredbyannualcompositesofthequalitativeratingsandchildren’soutcomes.Theylookedatcumulativeprogramquality(averagedacrosstwoyears,threeyearsandfouryears)inrelationtochildren’sadjustmentatschool.Controllingforchildandfamilycharacteristicsandforchildren’sfunctioningattheendoffirstgrade,theresearchersfoundthatchildrenwhoexperiencedgreatercumulativeprogramqualityinGrades1–3werereportedbytheirteacherstohavebetteracademicgradesatschool.Girlswhoseafter-schoolprogramshadhighercumulativequalityacrossGrades1–3or1–4hadbetterworkhabitsandbettersocialskillswithpeersatschoolinGrades3and4.

ForMoreInformationThePQOisavailableonlineat:http://childcare.gse.uci.edu/des4.html

ContactDeborahLoweVandellDepartmentofEducationUniversityofCalifornia,Irvine2001BerkeleyPlaceIrvine,CA92697949.824.7840

Page 59: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

59

PurposeandHistoryIn2003,theNewYorkStateAfterschoolNetwork(NYSAN)beganatwo-yearprocessofdevelopingtheProgramQualitySelf-AssessmentTool(QSA).AQualityAssurancecommitteeinvolvingkeystakeholdersfrompractice,policyandresearch,reviewedrelevantliterature,draftedtheinstrument,conductedfieldtestsandincorporatedfeedbackfrompractitionersacrossthestate.Soonaftertheinstrumentwascompletedin2005,NewYorkStatebeganrequiringthatall21stCCLC-fundedprogramsuseittwiceayearforself-assessmentpurposes.

TheQSAwasdevelopedexclusivelyforself-assessmentpurposes;programsarediscouragedfromusingitforexternalassessmentorformalevaluation.Itisintendedtobeusedinitsentirety,ideallyasthefocalpointofacollectiveself-assessmentprocessthatinvolvesallprogramstaff.TheQSAisalsousedbynewafter-schoolprogramsduringtheirinitialdevelopment;specificitemsthatareconsidered“foundational”indicatorsforthestart-upstageareidentified.

TheQSAwasdesignedtobeusedinthefullrangeschoolandcommunity-basedafter-schoolprogramsandisparticularlyrelevantforprogramsthatintendtoprovideabroadrangeofservicesasopposedtothosewitheitheraverynarrowfocusornoparticularfocus(e.g.,drop-incenters).Itwasalsodesignedtobeusedbyprogramsservingabroadrangeofstudents,fromkindergartenthroughhighschool.

ContentTheProgramQualitySelf-AssessmentToolisorganizedinto10essentialelementsofeffectiveafter-schoolprograms(seebelow).Eachelementcontainsalistofstandardsofpracticeorqualityindicatorsthatdescribeeachelementingreaterdetail.Theelementsrepresentamixofactivity-level,program-levelandorganizational-levelconcerns:

Environment/Climate•

Administration/Organization•

Relationships•

Staffing/ProfessionalDevelopment•

Programming/Activities•

LinkagesBetweenDayandAfter-School•

YouthParticipation/Engagement•

Parent/Family/CommunityPartnerships•

ProgramSustainability/Growth•

MeasuringOutcomes/Evaluation•

BecausetheQSAwasdesignedwithaneyetowardsprogramsreceiving21stCCLCfunding,therewasanintentionalefforttocaptureaspectsofprogrammingthatalthoughtheymaynotrelatedirectlytoacademics,willenhanceprograms’abilitytoaddressstudents’educationalneeds.Thedevelopersareexploringoptionsthatwouldallowprogramstoaddressasubsetofitemsbasedontheirlevelofreadiness;however,theultimategoalistoassesstheprogramororganizationinitsentirety.

Becauseofitsbroadfocusextendingfromtheactivityleveltotheorganizationasawhole,theQSAemphasizesseveraldifferentcomponentsofprogramsettingsincludingsocialprocesses,programresourcesandtheorganizationorarrangementofthoseresourcesinsidetheprogram.Socialprocessesaddressedbythetoolincluderelationships,climateandpedagogy.Resourceissuesincludefacilitiesandstaffingrequirementsandarrangementssuchaseffectivetransitions,policiesandproceduresandrelationshipswithschoolsarealsoaddressed.

StructureandMethodologyBecauseofitscommitmenttochildandyouthdevelopmentbroadlydefined,itisnotsurprisingthattheitemsincludedintheQSAreflecteachofthefeaturesidentifiedbytheNationalResearchCouncilasfeaturesofpositivedevelopmentalsettings(2002).

EachoftheQSA’s10essentialelementsofeffectiveafter-schoolprogrammingisfurtherdefinedbya

DevelopedbytheNewYorkStateAfterschoolNetwork

Page 60: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

60

summarystatement,whichisthenfollowedbybetween7and18qualityindicators–statementsaimedatillustratingwhataparticularelementlookslikeinpractice.Whilemostessentialelementsareassessedthroughobservation,themoreorganizationallyfocusedelementssuchasadministration,measuringoutcomes/evaluationandprogramsustainability/growthareassessedprimarilythroughdocumentreview.

TheratingscaleusedintheQSA(seeexamplebelow)isdesignedtocaptureperformancelevelsforeachindicator.Indicatorsarealsoconsideredstandards

ofpractice,sothegoalistodeterminewhethertheprogramdoesordoesnotmeeteachofthestandards.Staffareaskedtodeterminewhethertheirperformanceineachindicatorareais:

4=Excellent/ExceedStandards

3=Satisfactory/MeetsStandards

2=SomeProgressMade/ApproachingStandard

1=MustAddress&Improve/StandardNotMet

Relationships:AQUALITYprogramdevelops,nurturesandmaintainspositiverelationshipsandinteractionsamongstaff,participants,familiesandcommunities.

AQualityProgram:PerformanceLevel PlantoImprove

1 2 3 4 RightNow

ThisYear

NextYear

Hasstaffwhorespectandcommunicatewithoneanotherandarerolemodelsofpositiveadultrelationships.

Interactswithfamiliesincomfortable,respectfulandwelcomingway.

Treatsparticipantswithrespectandlistenstowhattheysay.

Teachesparticipantstointeractwithoneanotherinpositiveways.

Teachesparticipantstomakeresponsiblechoicesandencouragespositiveoutcomes.

Issensitivetothecultureandlanguageoftheparticipants.

Establishesmeaningfulcommunitycollaboration.

Hasscheduledmeetingswithitsmajorstakeholders.

Encouragesformerparticipantstocontributeasvolunteersorstaff.

Page 61: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

61

Whilesomeadditionalguidanceisprovidedtostaffinthetool’sintroductionabouthowtodetermineratings,developersacknowledgethatthisisoneoftheareastheymayrevisitinthefuture,basedonfeedbackfromthefield.Usersarenotencouragedtocombinescoresforeachelementortodetermineaglobalrating,becausethetoolisintendedforinternalself-assessmentpurposesonly.Inadditiontoassigningaratingforeachindicator,usersaregivenspaceontheformtonoteandprioritizetheirplansforimprovement.

TechnicalPropertiesBeyondestablishingfacevalidity(peoplewithexpertiseintheafter-schoolfieldagreethismeasuresimportantfeaturesofprogramquality),researchrelatedtotheinstrument’spsychometricpropertieshasnotbeenconducted.

UserConsiderationsEaseofUsePractitionersledthedevelopmentoftheQSAandrepresentitsprimarytargetaudience.Thelanguageandformatoftheinstrumentarestraightforwardanduser-friendly.Thetoolconsistsofonedocument,freeanddownloadablefromtheWeb,thatincludesanoverview,instructionsandtheinstrumentitself.

NYSANhasdevelopedanewuserguide,publishedinApril2008,toassistprogramsinutilizingtheQSA.Thistoolprovidesguidanceonhowtoengagestaffintheassessmentprocessinadditiontooutliningthebasicguidelinesforadministeringthetool.

Programsareexpectedtogothroughtheself-assessmentprocesstwiceayear.Someinthefieldhaveconcernsaboutthetoolbeingtoolengthy;thisfeedbackwillbetakenintoanupcomingrevisionprocess.

AdditionalSupportsTheuserguide,mentionedabove,wascreatedinconsultationwithawiderangeofstakeholders–includingNYSANstaff,astate-wideQualityAssuranceCommittee,practitioner-basedfocusgroupsandanadvisorygroup.Theguideservesasa“self-guided

walkthrough”theQSAtool;thetoolisembeddedinthesecondhalfoftheguide.NYSANiscurrentlydevelopingphasetwooftheguide–anonlineversionwhichwillallowuserstoclickonlinkstootherweb-basedtools,articlesandresourcesrelatedtoanyoneofthetenessentialelementsortheoverallqualityassessmentandimprovementprocess.Theonlineversionwillalsoprovideadescriptiveexampleofoptimalperformanceforeverysingleindicator(thecurrenthardcopyguidefeaturesonlyselectexamples).ProgramscancontactNYSANtoreceiveadditionalreferralsfortechnicalassistanceinusingtheinstrument.

Whilenocentralizedmechanismforcollectingoranalyzingresultscurrentlyexists,withthedevelopmentoftheonlineversionofthetoolanduserguide,itwillbepossibletoenterdatabycomputer.Thiscouldleadtoefficientopportunitiestotrackandanalyzedataovertime.

Althoughadditionalinstrumentsarenotprovidedwiththetool,usersareencouragedtoconsiderQSAresultsoneimportantsourceofdatatoinformprogramplanningandareencouragedtouseitinconcertwithotherformalorinformalevaluativeeffortssuchasparticipant,parentandstaffsurveys,staffmeetingsandcommunityforums.Inthefuture,userswillbeabletolinktoothertoolsfromtheonlineversionoftheQSAandguide.

AllNYSANtrainingisnoworganizedbythe10elementsfeaturedinthetool,sopractitionerscaneasilyfindprofessionaldevelopmentopportunitiesthatconnectwiththeresultsoftheirself-assessment.Regulartrainingsthatareconductedtwiceayearwith21stCCLCgranteesarealsoorganizedaroundthe10elements.

IntheFieldTheNiagaraFallsSchoolDistricthasfundingthroughthe21stCCLCprogramtorunafter-schoolprogramsatfoursites–threemiddleschoolsandonehighschool.Whileallafter-schoolprogramsreceiving21stCCLCfundinginthestateofNewYorkarerequiredtoconductandsubmitQSAassessmentstwiceayear,theseprogramsinNiagaraFallshaveextendedtheiruseofthe

Page 62: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

62

toolwellbeyondself-assessment.TheyseetheQSAascentraltostaffandprogramdevelopmentefforts.

SusanRoss,theProgramDirectorwithintheschooldistrict,describedhowsitecoordinatorssethetool.“WeseetheQSAasastaffdevelopmentresource.Aboutthreeweeksaftertheschoolyearstarts,sitecoordinatorsbeginsittingdownwithalloftheirstaff–teachersandcommunitypartners–andwalkingthroughthetool,onepageperstaffmeeting.Itgivesusacollectivesenseofwhat’sworkingandwhatweneedtoimprove.It’sagreatfocalpointfordiscussionsamongstaff.”

Rossemphasizedthatoneoftheimportantbenefitsofthisprocessisthatithelpstoleveltheplayingfieldbetweenstafffromexternalcommunitypartnerorganizationsandschoolteacherswhoworkinafter-schoolprograms.“Thisreallygivesourpartnersanopportunitytofeeltheiropinionsarevalued.OftenwhenCBOstaffcomeintoschoolstheyfeellikeguestsasopposedtofull-fledgedpartners.Throughthisprocess,theyseetheiropinionsareequallyvaluedandthathelpsbuildoverallstaffmorale.”

Sitedirectorsandstafffindthetoolaccessibleanduser-friendly.RosssummedupherassessmentoftheQSAinamatter-of-factway.“Welikeit.It’seasytouse,self-explanatoryandunderstandable.Infact,Iwouldn’tchangeanythingaboutit.”

ForMoreInformationNYSAN’sProgramQualitySelf-AssessmentToolisavailableonlineat:www.nysan.org/content/document/detail/1991/

ContactAjayKhashu,DirectorofResearchNYSAN925NinthAvenueNewYork,[email protected]

Page 63: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

63

PurposeandHistoryThePromisingPracticesRatingScale(PPRS)wasdevelopedforresearchpurposesandisdesignedforuseinschool-andcommunity-basedafter-schoolprogramsthatserveelementaryandmiddleschoolstudents.Thetoolallowsobserverstodocumentthetypeofactivity,theextenttowhichpromisingpracticesareimplementedwithinactivitiesandoverallprogramquality.

The2005versionofthePPRS,theversionthatiscurrentlyavailable,wasdevelopedbyDeborahVandell,LizReisner,KimDadisman,KimPierceandEllenPechmaninthecontextofaspecificstudyfocusedontherelationshipbetweenparticipationin“typicallyperforming”programsandchildandyouthoutcomes(Vandell,D.,Reisner,E.,Pierce,K.,Brown,B.,Lee,D.,Bolt,D.,Dadisman,K.&Pechman,E.,2006;Vandell,D.,Reisner,E.&Pierce,K.,2007).Becauseofthis,thetoolwasinitiallydesignedtoverifywhetherornotprogramswerehigh-qualityratherthantolookatvariationsinqualityacrossprograms.

ThisinstrumentbuildsdirectlyonearlierworkbyVandellandcolleaguesfocusedattheelementarylevel(seewriteupoftheProgramQualityObservationScaleinthisreport)aswellasthefeaturesofpositivedevelopmentalsettingsidentifiedbytheNationalResearchCouncil(2002).Itsauthorsalsodrewuponseveralotherobservationinstrumentsincludedinthisreportastheydevelopedtheexemplarsofpromisingpractices:theSchool-AgeCareEnvironmentRatingScale,theProgramObservationToolandtheOSTObservationTooldesignedbyPolicyStudiesAssociates.

AlthoughthefocusofthissummaryisthePPRSspecifically,othercomponentsofthePromisingPracticesqualityassessmentsystemincludeinterviewsandquestionnairescompletedbyprogramdirectorsandstaff.Thesetoolsobtaininformationaboutstructuralfeaturesofprogramssuchasstaffqualificationsandongoingtraining,materialandfinancialresourcesandconnectionsbetweentheprogramandschool,familyandcommunity.

ContentThePPRSprovidesresearcherswithaframeworkforobservingessentialindicatorsofhighqualityprograms.Itaddressesthreedifferentaspectsofprogramming:activitytype,implementationofpromisingpracticesandoverallprogramquality.Thefirstsection,whichcloselymirrorstheOSTObservationTooldevelopedbyPolicyStudiesAssociates,focusesondocumentingarangeofin-depthinformationaboutthetypeofactivitybeingobservedandtheskillsemphasizedthroughthatactivity.Thepromisingpracticesratingsthatconstitutethecoreoftheinstrumentfocusonthefollowingeightareasofquality:

SupportiveRelationswithAdults•

SupportiveRelationswithPeers•

LevelofEngagement•

OpportunitiesforCognitiveGrowth•

AppropriateStructure•

Over-control•

Chaos•

MasteryOrientation•

Becauseofitsemphasisonwhatchildrenandyouthexperienceinprograms,thePPRShasanactivityandprogram-levelfocusanddoesnotaddressorganizationalissuesrelatedtomanagement,leadershiporpolicy.Theprimaryfocusisonsocialprocesses–includinginteractionsbetweenandamongyouthandstaffandsomeaspectsofinstruction.

Asmentionedabove,thedevelopersdrewheavilyontheCommunityProgramstoPromoteYouthDevelopmentreport(NationalResearchCouncil,2002),sothefeaturesofpositivedevelopmentoutlinedinthatreportarequitevisiblewithinthetool’sdefinitionofpromisingpractices.AlthoughthePPRSitselfdoesnotincludeafocusonconnectionsbetweentheprogramandschool,familyorcommunity(oneofthefeaturesdescribedintheNRCreport),companiontoolsareavailabletocapturethistypeofinformation.

DevelopedbytheWisconsinCenterforEducationResearch&PolicyStudiesAssociates,Inc.

Page 64: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

64

StructureandMethodologyThefirstpartofthePPRS,whichfocusesontheactivitycontext,hasobserverswatchanactivityfor15minutesandcodeseveralaspectsofwhattheyareobserving,including:

Activitytype(e.g.,tutoring,visualarts,music,•sports,communityservice);

Space(e.g.,classroom,gym,library,cafeteria,•auditorium,hallway,playground);

Primaryskilltargeted(e.g.,artistic,physical,•literacy,numeracy,interpersonal);

Numberofstaffinvolvedintheactivity;and•

Number,genderandgradelevelofparticipants.•

Theseobservationsarerecordedonacoversheetthatalsoincludesotherbasicinformationabouttheobserver,theprogram,date,time,etc.

Nextobserversareaskedtowritedownabriefnarrativedescriptionoftheactivitytheyareobserving,followingasetofspecificguidingquestions(seebelow).Thisdescriptionsupplementstheactivitycontextcodingwitharicherdescriptionofwhatisgoingon.

Whatareyouthdoing?•

Whatkindsofmaterialsareused?•

Whatkindsofinstructionalprocessesareused?•

What,ifany,specificskillsdoestheactivity’s•leader(s)havethatsupportstheinstructioninvolvedintheactivityhe/sheisconducting?

LevelofEngagement(inintendedexperiences)High Low

Studentsappearengaged,focusedandinterestedintheiractivities.

Engagedinthefocalactivityand/orusingfreetime•appropriately.

Appeartobeinterestedintheactivity.•

Followstaffdirectionsinanagreeablemanner.•

Studentsappearboredordistracted.Ignorestaffwhoaretalkingtothem•

‘Pretend’tolisten.•

Wanderaimlessly.•

Markersofengagementareappropriatetoactivity(e.g.intenseconcentrationwitnessedduringcomputeractivity,highlevelsofaffectduringsportsactivities;canbesolitaryorgroupactivities.

Markersofengagementinappropriatetoactivity(e.g.pickingflowerswhileplayingasportactivity).

Studentscontributetodiscussions.Discussbackandforthandoffercomments.•

Ask“on-task”questions.•

arecomfortableinitiatingconversation.•

Studentsdonotcontributetodiscussions.Donotparticipateindiscussions.•

Donotaskquestions.•

RatingIndicators:1=Moststudentsarenotengagedappropriately,mayappearbored.2=Studentsareparticipatinginactivitiesbutdonotappeartobeconcentratingoraffectivelyinvolved.3=Studentsarefocusedonactivitieswithsomeevidenceofaffectiveinvolvementorsustainedconcentration.4=Studentsareconcentratingonactivities,focused,interactingpleasantlywhenappropriateandareaffectivelyinvolvedintheactivity.

Page 65: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

65

Whatistheoverallaffectivetone?•

Towhatextentareyouthengaged?•

Describeobservedpromisingpracticesas•appropriateandraiseconcernsaboutquality,ifthereareany.

ThenextsectionandthecoreofthePPRSisthePromisingPracticesRatingssection,whereobserversdocumenttowhatextentcertainexemplarsofpracticearepresentintheprogram.Thissectionofthetooladdressestheeightkeyareasofpracticelistedpreviously.

Eachareaofpracticeissubdividedintotwotofivespecificexemplars,withmoredetailedindicatorsprovidedundereach.Observersaregivenbothpositiveandnegativeexemplarsandindicatorsforeachpracticeareainordertohelpguidedeterminationofratings(seeexampleonpage63).

InthePPRS,ratingsareonlyassignedattheoverallpracticelevel(notforindividualexemplarsorindicators).Practicesareeitherconsideredhighlycharacteristic(4),somewhatcharacteristic(3),somewhatuncharacteristic(2),orhighlyuncharacteristic(1).Additionalguidanceastowhateachofthesetermsmeanisprovidedintheinstrument.Atthebottomofthedescriptionofeachpracticearea,observersaregiventailoredguidanceastowhatmightleadtoa1,2,3or4ratingforthatpractice.

Finally,observersareaskedtoreviewtheirratingsofpromisingpracticesacrossmultipleactivitiesandassignanoverallratingforeachpromisingpracticearea.Anoverallprogramqualityscoreiscomputedasthemeanoftheratingsonthe8scales,afterreversingthescoresforover-controlandchaos.Foreachpracticeareathereisspacetowritedownnotesto“justify”theoverallratingassigned.

TechnicalPropertiesAvailablepsychometricevidencesupportingthePPRSaddressesinterraterreliability,scoredistribution,internalconsistencyandpredictivevalidityinformationfromastudyof35after-schoolelementaryandmiddleschoolprograms(Vandell,D.,Resiner,E.,et.al.,2006).

ScoreDistributionsScoredistributionshelpusersdeterminewhetheritemsadequatelydistinguishbetweenprogramsonspecificdimensions.Vandell,Reisner,etal.(2006)examinedtheaveragescoresforoverallprogramqualityandtheindividualratingscalesobtainedwiththesampleofhigh-qualityelementaryandmiddleschoolprogramsthatparticipatedintheStudyofPromisingAfter-SchoolProgramssampleattwotimepoints.Generally,itisimportanttohavearangeofscoresacrossprograms,asthatwouldsuggestthemeasurecandetectmeaningfuldifferencesbetweenprograms.Becausethissampleincludedonlyhigh-qualityprograms,however,thescoresnaturallyfelltowardthepositiveextremesofeachdimension.ScoredistributionsonthePPRSobtainedin37observationsinprogramsofvaryingqualityaremorewidelydistributed,suggestingthattheinstrumentdetectsmeaningfuldifferencesamongprograms.

Inthatstudy,theauthorstheorizedthatscoreswouldexhibitawiderrangeandwouldshowlow,moderateandhighquality(asopposedtomostprogramsscoringonthehighendofthescale).Asexpected,scoresforeachofthescalesgenerallyhadawiderdistributionwithaveragesacrosstheprogramsfallinginthemiddleformostoftheeightscales.Twoscales,OpportunitiesforCognitiveGrowthandMasteryOrientation,hadlowscoresoverall(averageswere1.65and1.78onascaleof1to4,respectively),althoughtheirscoresintheStudyofPromisingPracticeswereclosertothecenter(averagescoresacrosstwoobservationperiodswithinprogramsservingelementaryandmiddleschoolstudents,respectively,rangedfrom2.6to2.9).Thedistributiondifferencesforthesetwoscalessuggestthattheymaybebettersuitedtodifferentiateamongprogramsofhigherquality.TheOvercontrolscalewastheonlyitemthatwasconsistentlylowforboththeStudyofPromisingPracticesandthefollowupstudy,whichmaysimplysuggestthatstaffinmostprogramsdonotexhibitagreatdealofovercontrol.Takentogether,thetwostudiesprovidestrongevidencethattheinstrumentcapturesmeaningfuldifferencesacrossavarietyofprograms.

Page 66: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

66

InterraterReliabilityTheauthorsexaminedrateragreementforeachoftheinstrument’seightitemsbycalculatingintraclasscorrelationcoefficientsbetweenratingsof24programsmadebytwoobservers.Coefficientsrangedfrom.58forOpportunitiesforCognitiveGrowthto.86forStructure(average=.74)fortheindividualscales.Theintraclasscorrelationfortheoverallprogramqualityscorewas.90.Interrateragreementrepresentedbykappascoresinworkconductedbyotherresearchteamsinprogramsofvaryingqualityrangedfrom.63forover-controlto.94forsupportiverelationswithadults(average=.77)across37observationsmadebytwoobservers.Thesescoresindicateacceptableinterraterreliability,meaningtheinstrument’sitemsareclearenoughforraterstounderstandandagreeon.

Thereiscurrentlynoinformationregardinginterraterreliabilityfortheoverallprogramqualityrating,whichrangeonathree-pointscalefromlowprogramqualitytohighprogramquality.Therefore,evidencedoesnotexisttosupportwhetherratersagreeonthedegreeofqualitythatindividualprogramsexhibit.

AdditionalReliabilityEvidenceAdditionalrateragreementinformationwasobtainedbycomparingtwosetsofratingsbythesameraterconductedonconsecutivedaysforeachprogram.Theauthorsfoundthatthepercentagreementforratingsofeachfeatureovertwodayswasbetween81percentand97percent,withanaverageof90percent.Thistranslatesintoanaveragekappascoreof0.80,indicatingthattheaverageitem’sratingforDay2isnotverydifferentfromDay1.

InternalConsistencyTodeterminewhethertheitemsfittogethertoformameaningfuloverallscore,theauthorscomputedastatisticcalledCronbach’salpha.IntheStudyofPromisingAfter-SchoolPrograms,alphacoefficientsfortheoverallprogramqualityscorerangedfrom.74to.77,indicatingacceptableinternalconsistency.

PredictiveValidityInitialevidenceofpredictivevalidityisavailableforthePPRS,whichmeansthattheinstrumentdoespredictyouthoutcomesthatwouldbeexpectedfrompriortheoryorresearch.Specifically,Vandell,Reisnerandcolleagues(2005)foundthatyouthattendinghighqualityprograms(asmeasuredbythePPRS)hadbettereducationalandbehavioraloutcomesbytheendoftheacademicyearthanunsupervisedyouthwhodidnotregularlyattendanyafter-schoolprogram,includingbetteracademicperformance,taskpersistence,socialskillsandpro-socialbehaviorswithpeersandlessmisconduct,substanceabuseandaggressivebehavior.14Vandell,Reisneretal.(2006)reportedsimilarfindingsforlongertermoutcomesaftertwoyearsofprogramparticipation.Improvementinmathachievementscoreshavealsobeenreported(Vandell,Reisner&Pierce,2007).

Thefactthattheinstrument’sratingsrelatedtoexpectedoutcomesofferssomereassurancetousersthatitaccuratelymeasuresaspectsofprogramquality.However,thevalidityevidenceshouldbetakenaspreliminaryforseveralreasons.First,theauthorshavenotexaminedPPRSratingsoflow-qualityprograms.Noevidenceexiststhattheinstrumentdistinguishesbetweenexpertratingsoflow-andhigh-qualityprograms,orwhetherlow-qualityprogramratingspredictyouthoutcomesdifferentlythanhighqualityprogramratings.Itwouldalsobeusefultounderstandthepredictivevalidityofeachspecificscale(e.g.,levelofengagement,appropriatestructure)andtheoverallscore.

UserConsiderationsEaseofUseWhilethePPRSisavailableonlineandfreeforanyonetodownloadanduse,itisimportanttorecognizethatitwasdevelopedwithprimarilyaresearchaudienceinmind.Whiletheobservationmanualincludesbasicinstructionsforconductingobservationsandcompletingtheforms,itwaswrittenforresearchersparticipatingindatacollectionrelatedtoaparticularstudy.Thematerialshavenotbeentailoredforgeneraluseorforpractitioneruseatthistimeandthereforeincludesomelanguage(e.g.,construct,exemplar)thatmaynotnecessarilybeaccessiblefornon-researchaudiences.

14Resultswerefoundusingtwoadvancedstatisticaltechniquesknownasclusteranalysisandhierarchicallinearmodeling.

Page 67: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

67

InthecontextofthestudythePPRSwasdevelopedfor,sitevisitswerefairlytime-intensive(spreadoverthecourseoftwodays).However,formalobservationtimetotaledapproximatelytwohourspersite,withseveraladditionalhoursspentreviewingnotesandassigningratings.Someadditionalguidanceabouthowtoconductobservations,developratingsandcompletetheformsisprovidedinthemanual.

AvailableSupportsAtthistime,trainingisnotregularlyavailableonhowtousethePPRS,buthasbeenconductedwithdatacollectorsinvolvedinresearch.Trainingshaveincludedreviewingthecontentsoftheinstrumentandpairingnewraterswithtrainedraterstodoanobservationinthefield,comparescoresandbuildinter-observeragreement.

ObservationdatacollectedwiththePPRShasalwaysbeencoupledwithsupplementarydatasourcessuchasaquestionnaireaboutthephysicalenvironmentaswellasstaff,studentandparentsurveys.HoweverformallinksdonotexistbetweentheobservationtoolandothermeasuresandthePPRScouldbeusedindependently.AdditionalmeasuresarealsoavailableatthesamewebsiteasthePPRS.

IntheFieldThePromisingPracticesRatingSystemwasdevelopedspecificallyforuseintheStudyofPromisingAfter-schoolPrograms,anationalstudyfundedbytheC.S.MottFoundationthatfocusedontheshort-andlong-termimpactsofhigh-qualityafter-schoolprogramsonthecognitive,academic,socialandemotionaldevelopmentofchildrenandyouthinhigh-povertycommunities.TheresearchwasledbyDeborahVandellofUCIrvine(formerlyoftheUniversityofWisconsin-Madison)andElizabethReisnerofPolicyStudiesAssociates.

Two-daysitevisitstoparticipatingprogramswereconductedinfall2002,spring2003,fall2003andspring2005toassessthequalityofeachprogram.Duringsitesvisits,researchersconductedobservationsusingthePPRSontwoafternoons,foraminimumofonehourperday.Observersfocusedontheactivitiesof

thetargetagegroups(grades3and4andgrades6and7)andobservedasmanydifferenttypesofactivitiesaspossible,withaminimumof15minutesperactivity.Attheendofthefirstdayofthesitevisit,observersassignedtentativeratingstoeachoftheeightpracticeareas;attheendofthesecondday,thefinalratingsweredetermined.

Asanalysesgotunderway,theauthorsrevisedtheirconceptualschemebasedontheideathatsetsofexperiencesshouldbetakenintoconsiderationasopposedtolabelingstudentsasprogramvs.non-program.Elementarystudentswithhighratesofparticipationinqualityafter-schoolprogramsbutlowlevelsofparticipationinotherafter-schoolarrangements(theprogramonlycluster)outperformedthelowsupervisorcluster(self-care+limitedactivities)oneverymeasureofacademicandsocialcompetenceassessed.Thesupervisedathomegroupoutpacedtheself-care+activitiesclusteronallacademicmeasuresandsocialskills.Amongmiddleschoolstudents,theprogram+activitiesclusterhadbetterworkhabitsthanthelowsupervisiongroupandbothprogramgroups(program+activities,programonly)reportedlessmisconductandsubstanceusecomparedtothelowsupervisiongroup.Similarresultswerefoundforprograminvolvementacrossthreeyears.Additionalfindingsareavailableathttp://childcare.gse.uci.edu/des3.html.

ForMoreInformationThePPRSisavailableonlineat:http://childcare.gse.uci.edu/des3.html

ContactDeborahLoweVandellDepartmentofEducationUniversityofCalifornia,Irvine2001BerkeleyPlaceIrvine,[email protected]

Page 68: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

68

PurposeandHistoryTheQualityAssuranceSystem®wasdevelopedbyFoundations,Inc.tohelpafter-schoolprogramsconductqualityassessmentandcontinuousimprovementplanning.BasedonFoundations,Inc.’sexperiencerunningafter-schoolprograms,offeringprofessionaldevelopmentactivitiesandprovidingtechnicalassistanceandpublicationsforthefield,theQASwasdesignedtohelpprogramsdevelopandsustainacommitmenttoquality.

Initsfirstincarnation,theQASwasasimplechecklistdesignedtoassessthequalityofafter-schoolprogramsoperatedbytheorganizationitself.RoughlyfiveyearsagostaffatFoundationsreconstructedandexpandedthetoolforbroaderuse,withinputfrompractitionersbothinsideandoutsideoftheorganization.TheQASwasdevelopedtobegeneralenoughforuseinarangeofschool-andcommunity-basedprogramsservingchildrenandyouthgradespre-K–12.

Basedonseven“buildingblocks”thatareconsideredrelevantforanyafter-schoolprogram,thisWeb-basedtoolisexpandableandhasbeencustomizedforparticularorganizationsbasedontheirfocus.TheQASfocusesonqualityatthe“site”levelandaddressesarangeofaspectsofqualityfrominteractionstoprogrampoliciesandleadership.FillingouttheQASrequiresacombinationofobservation,interviewanddocumentreview.Scoresaregeneratedforeachbuildingblockratherthantheoverallprogram,reflectingthetool’semphasisonidentifyingspecificareasforimprovement.

ContentThevariouscomponentsofqualitythattheQASfocusesonarecalled“buildingblocks.”Thesevencorebuildingblocks,whichdescribewhatFoundationsconsiderstobethefundamentalfeaturesthatunderlieeffectiveafter-schoolprogramming,include:

Programplanningandimprovement;•

Leadership;•

Facilityandprogramspace;•

Healthandsafety;•

Staffing;•

Familyandcommunityconnections;and•

Socialclimate.•

Inadditiontotheseseven,three“programfocusbuildingblocks”reflectingtheparticulargoalsorfocusofaprogramareavailableforuserstoselectfrom:

Academics;•

Recreation;and•

Youthdevelopment.•

TheQASputsroughlyequalemphasisonthreedifferentcomponentsofsettingsincludingsocialprocesses,programresourcesandthearrangementororganizationofthoseresourceswithinprograms.ThereareitemsontheQASthataddressallofthefeaturesofpositivedevelopmentalsettingsoutlinedbytheNationalResearchCouncil(2002),withsomewhatmoreofanemphasisonthingsrelatedtostructureandskill-buildingthanonfeaturessuchas“supportforefficacyandmattering”and“supportiverelationships.”

StructureandMethodologyThestructureoftheQASisclearandstraightforward.Partone–programbasics–includesthesevencorebuildingblocks.Foreachone,usersaregivenabriefdescriptionoftheimportanceofthataspectofquality.Thebuildingblockisfurthersubdividedintofivetoeightspecificelements,eachofwhichareassignedaratingbyassessors.Forexample,theelementsofthesocialclimatebuildingblockinclude:behavioralexpectations,staff/participantinteractions,diversity,socialtimeandenvironment.Foreachelement,morespecificdescriptions(alsoreferredtoasa“rubric”)areprovided.Parttwoofthetool–programfocus–consistsofthethreeadditionalbuildingblocksanditsstructureparallelsthatofpartone.Programsareencouragedtouseone,twoorallthreeoftheprogramfocusbuildingblocksinconductingtheirassessment.

QualityAssuranceSystem®DevelopedbyFoundations,Inc.

Page 69: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

69

RatingsfortheQASaremadeusingafour-pointscalefromunsatisfactory(1)tooutstanding(4).Foreachelementofabuildingblock,specificdescriptionsofwhatmightleadtoa1,2,3or4ratingareprovided(seeexamplebelow).

Intermsofdatacollection,usersareprovidedwithadocumentchecklistthatidentifieswhatkindsofspecificdocumentsmightbeusefulinfillingouttheQASandareencouragedtogatherandexaminesuchdocumentspriortoobservingtheprogram.The“programprofile”

sectionofthetoolasksuserstouploadimportantbasicinformationabouttheprogramandcanalsobefilledout,forthemostpart,priortovisiting.

Onceon-site,theusers’guideencouragesobservers•togothroughfivesteps:

Meetpeopletoestablishrapportandhearfrom•staffandyouthabouttheprogram;

Wanderwithpurposetodevelopasenseofthe•entirefacility;

Staffing Unsatisfactory NeedsImprovement Satisfactory OutstandingScore

Elements 1 2 3 4

5.1StafftoParticipantRatio

Insufficientstaffarehiredforthenumberofparticipants

Sufficientstaffarehiredforsomelevelsofparticipation,butstaffingissometimesinsufficientduetoattendancefluctuations

Appropriateparticipanttostaffratiosaremaintainedconsistenly.

Staffnumberandattendanceexceeedrequiredratios.

5.2Qualifications

Fewerthanhalfthestaffhavetherequiredtrainingand/orexperience

Morethanhalfthestaffhavetherequiredtrainingand/orexperience.

Allstaffhavethetrainingand/orexperiencerequiredbytheprogram.

Manystaffmembersexceedtrainingand/orexperiencerequiredbytheprogram.

5.3ProfessionalGrowth

Professionaldevelopmentisnotprovidednoristimeallocatedforstafftopursueindividualprofessionalgrowth.

Someprofessionaldevelopmentopportunitiesareprovided,buttheyarepoorlyattended.

Staffattendprofessionaldevelopmentsessionsatleasttwiceayear.

Staffidentifyprofessionaldevelopmentneedsandattendprofessionaldevelopmentsessionsmorethantwiceayear.

5.4Attendence

Staffabsenteeismisanongoingproblem(e.g.significantnumberofstaffroutinelyabsent).

Staffabsencesareanoccasionalproblem.

Staffarereliableandabsencesareinfrequent.

Staffabsencesarerare.

5.5Retention Staffturnoverisidentifiedasaproblem.

Staffturnoveroccasionallyaffectsprogramofferings.

Staffretentionisnotidentifiedasaproblem.

Staffretentionisexcellentandprovidesstability.

TotalScore

Page 70: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

70

Observeactivitiestoseetheprograminaction,the•levelofengagementandthenatureofactivities;

Gathermaterialstoensurethatallofthe•documentsinthechecklistandanyotherrelevantmaterialsarecollected;and

Takenotestoensureyouhavearunningrecordof•yourobservationsandquestions.

OncescoresforeachelementareenteredintotheQAS,theprogramelectronicallygeneratesoverallbuildingblockscores.Theprogram’squalityprofilethenbeginstoemergethroughsummarygraphsthesoftwaregeneratesforeachbuildingblockaswellasaprogramsummarygraphthatcontainsscoresforeachbuildingblockassessed.Thegraphsandbuildingblockscoreshelpuserstargetareasforimprovementaspartoftheassessmentprocess.Afollow-upQASassessmentenablesuserstoidentifyareasofprogressandthenrefinegoal-settingandimprovementplanning.

TechnicalPropertiesBeyondestablishingfacevalidity(peoplewithexpertiseintheafter-schoolfieldagreethismeasuresimportantfeaturesofprogramquality),researchrelatedtotheinstrument’spsychometricpropertieshasnotyetbeenconducted.

UserConsiderationsEaseofUseTheQASisastraightforward,flexibletoolwithseveralbuilt-infeaturesthatmakeitparticularlyuser-friendly.Theinstructionguideiswritteninclear,accessiblelanguageandwalksusersthroughthenecessarybackgroundandbasicstepsforusingthesystem.ThestandardcostfortheQAShasrecentlybeenreducedto$75foranannualsitelicense.Thislicenseisgoodfortwoofficialuses(orassessments)–whichiswhatitsdeveloperssuggestprogramsconductannually,oncetowardthebeginningoftheyearandoncetowardtheend.Aftertwousesthesystemgeneratesacumulativereportcomparingtheinitialandfollow-upassessments.Forprogramswithmultiplesites,acumulativereportcomparingsiteresultsisavailablewiththeinitial

assessment.WhentheQASisusedaspartofaprofessionaldevelopmentpackagerelatedtoqualityimprovement,discountsareavailable.

AvailableSupportsFoundations,Inc.offersonlinesessionsandin-persontrainingoptionstoassistorganizationsinusingthetool.Multi-siteorganizationsmaycontractforindividualizedtechnicalassistanceandtraining,whichmayincludeoptionsforcustomizationofthetool.Trainingsaddressingqualityelementsreflectedinthebuildingblocksareavailableonline,intechnicalassistance,andinprofessionaldevelopmentsessions.

Forself-assessmentpurposes,onceaQASsitelicenseispurchased,programscanreceivelightphonetechnicalassistancefreeofchargefromFoundations,Inc.staffiftheyhavequestionswhileusingthesystem.ProgramsthatwishtohavetrainedassessorsconducttheirassessmentcanpurchasethisserviceundercontractwithFoundations,Inc.

TheQASisavailableinaWeb-basedformat,allowinguserstoenterdataandimmediatelygeneratebasicgraphsandanalyses.Thesite-specificreportsgeneratedarespecificallydesignedtohelpsitestaffandleadersusetheinformationtoguideimprovementplanning.

IntheFieldFoundations,Inc.isworkingwiththeU.S.DreamAcademy,anationalprogramservingelementaryandmiddleschoolstudentswhoarechildrenofprisoners.In10centersaroundthecountry,U.S.DreamAcademyprovidesacomprehensiveprogramincludingacademicsupport,enrichment,andaone-on-onementoringrelationship.TheU.S.DreamAcademychosetousetheQASandatechnicalassistancestrategyduring2008-2010withFoundationstobuildandsupportprogramquality,andtoestablishanongoingprocessofcontinuousimprovement.Duringinitialmeetings,U.S.DreamleadersandstaffworkedwithFoundationstoclarifyqualityindicatorsfortheprogramandcustomizethetool.Theprocessesofco-assessmentandselfassessmentaredesignedtobuildthecapacityofsites

Page 71: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

71

totargetspecificimprovementgoalsandconcretesteps,identifysitestrengthsandinnovations,andsharestrengthsorganization-wide.U.S.DreamAcademynationalheadquarterswillusetheQASfindingstoregularlyidentifywhereandhowtheycanbestsupporttheircenters.

DirectorsanticipatethattheQASandsurroundingprocesseswilldirectstafftoaskcriticalquestionsabouttheirprogramenvironmentsandstaffpractice.Atthesametime,itwillallowCenterstohighlightandsharestrengthsandaccomplishmentsacrosstheorganization,buildinginternalresourcesforquality.AfterjointassessmentsareconductedwithU.S.DreamstaffandFoundationsateachsite,individualscoreswillbeaggregatedandpresentedatanationalmeeting.Eachcenterwillconductafollow-upself-assessmentattheendofthe2008-2009schoolyear,atwhichpointtheywillbeabletoanalyzethedataandevaluatetheirownprogramdevelopment.TheQAStoolalsowillbeavailablethefollowingschoolyeartoalloweachsitetocontinuetheself-assessmentprocess.

TheQASdesign,coupledwithtechnicalassistanceprocesses,allowsforcustomizationofthetool.WiththeadditionofaneleventhbuildingblockfocusedonU.S.DreamAcademy’smentoringcomponent,thetoolencompassestheorganization’sfullrangeofessentialprogramcomponents.C.DianeWallaceBooker,ExecutiveDirectoroftheU.S.DreamAcademy,statedthat,“thebeautyofQASisthatitisdesignedwithevidence-basedqualityindicatorsyetiscustomizableandabletocapturetheuniqueelementsofourprogramandhelpedustomoreclearlydefinewhatqualitylookslikeforusversusanyotherafterschoolprogram.Further,theprocessofguided,selfassessmentandcontinuousimprovementplanningiscriticaltoourongoingeffortstoachieveimpactinthelivesofthechildrenweserve.”Establishinganongoingprocessforquality-buildingtailoredtothespecificsoftheprogramisparticularlyimportantformulti-siteprograms.AsU.S.DreamAcademyexpandstoservemorechildren,thesustainedqualityassurancecomponentbecomesevermorecritical.

ForMoreInformationAdditionalinformationabouttheQAS,includingorderinginformationisavailableonlineat:http://qas.foundationsinc.org/start.asp?st=1orbyvisitingwww.afterschooled.org

ContactRheMcLaughlin,AssociateExecutiveDirectorCenterforAfterschoolEducationFoundations,Inc.MoorestownWestCorporateCenter2ExecutiveDrive,Suite4Moorestown,[email protected]

Page 72: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

72

PurposeandHistoryTheSchool-AgeCareEnvironmentRatingScale(SACERS)isdesignedtoassessbefore-andafter-schoolcareprogramsforelementaryschoolagechildren(5-to12-yearolds)aswellaswholedayprogramsincommunitieswithyear-roundschools.Itfocuseson“processquality”orsocialoreducationalinteractionsgoingoninthesetting,aswellasprogramfeaturesrelatedtospace,schedule,materialsandactivitiesthatsupportthoseinteractions.

TheSACERSwasdevelopedforself-assessment,programmonitoringorprogramimprovementplanning,aswellasforresearchandevaluation.Itcanbeusedbyprogramstaffaswellastrainedexternalobserversorresearchers.Whileself-describedasappropriatefor“groupcareprograms,”theSACERShasbeenusedinarangeofprogramenvironmentsbeyondchildcarecenters,includingschool-basedafter-schoolprogramsandcommunity-basedorganizationssuchasYMCAsandBoysandGirlsClubs.

TheSACERS,publishedin1996butupdatedperiodicallysincethen,isoneofaseriesofprogramassessmentinstrumentsdevelopedbyresearchersaffiliatedwiththeFrankPorterGrahamChildDevelopmentInstitute(FPG).Assuch,theSACERSisanadaptationoftheEarlyChildhoodEnvironmentRatingScale(ECERS)andisquitesimilarinformatandmechanicstotheECERS,theFamilyDayCareRatingScale(FDCRS)andtheInfant/ToddlerEnvironmentRatingScale(ITERS).Somestatesandlocalitieshaveusedseveralscaleswithintheseriestocreatecontinuityacrossaccreditationoraccountabilitysystems,giventheconsistentorientation,language,formatandscoringtechniques.

ContentTheSACERSmeasuresprocessqualityaswellascorrespondingstructuralfeaturesofprograms.Itscontentreflectsthenotionthatqualityprogramsaddressthree“basicneeds”ofchildren:protectionoftheirhealthandsafety,positiverelationshipsandopportunitiesforstimulationandlearning.Thesethreebasiccomponentsofqualitycareareconsideredequallyimportant.Theymanifestthemselvesintangible,

observablewaysandconstitutethekeyaspectsofprocessqualityincludedintheSACERS.Thesevensub-scalesoftheSACERSinclude:

SpaceandFurnishings;•

HealthandSafety;•

Activities;•

Interactions;•

ProgramStructure;•

StaffDevelopment;and•

SpecialNeedsSupplement.•

Byaddressingbothprocessqualityaswellasstructuralfeaturesthatrelatetoprocessquality(andotherstructuralmattersnotdirectlyrelatedtoprocessqualitysuchashealthpolicy),theSACERSputsasmuchemphasis,ifnotmore,onprogramresourcesandtheorganizationofthoseresourcesasitdoesonsocialprocessesthatoccurwithinthesetting.Thisreflectsitsrootsintheassessmentandmonitoringofenvironmentsservingyoungchildren.ThereareitemsontheSACERSthataddresseachofthefeaturesofpositivedevelopmentalsettingsoutlinedbytheNationalResearchCouncil(2002),withthemostemphasis(thelargestnumberofrelevantitems)clusteringunderthe“physicalandpsychologicalsafety”feature.

InteractionsSub-ScaleItemsGreeting/Departing•

Staff-childInteractions•

Staff-childCommunication•

StaffSupervisionofChildren•

Discipline•

PeerInteractions•

InteractionsBetweenStaff&Parents•

StaffInteraction•

RelationshipsBetweenProgramStaff•&ClassroomTeachers

DevelopedbyFrankPorterGrahamChildDevelopmentInstitute&ConcordiaUniversity,Montreal

Page 73: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

73

StructureandMethodologyThestructureoftheSACERSisstraightforwardandconsistentwiththeothertoolsintheEnvironmentRatingScalesseries.Thescaleincludes49itemsinthesevensubscalesmentionedabove(seeboxfortheitemsinthe“Interactions”sub-scale).Allofthesub-scalesanditemsareorganizedintoonebookletthatincludestheitems,directionsforuseandscoringsheets.

Whileobservationisthemainformofdatacollectiontheinstrumentisbuiltaround,thereareseveralitemsthatarenotlikelytobeobservedduringprogramvisits.WhiletheSACERSdoesnotseparatethoseitemsoutintoaseparateinterviewscaleorform,ratersareencouragedtoaskquestionsofadirectororstaffpersoninordertoratetheseitemsandareprovidedwithspecificsamplequestionsthatwillhelpthemgetthenecessaryinformationtocompletetheform.

All49itemsareratedonaseven-pointscale,withonebeing“inadequate”andsevenbeing“excellent.”Concretedescriptionsofwhateachitemlookslike

ataone,three,fiveandsevenareprovided(seeexamplesbelow).Notesforclarificationthathelptheuserunderstandwhattheyshouldbelookingforandarealsoprovidedformanyitems.Observerscompiletheirscoresontoasummaryscoresheet,whichencouragesuserstocompilesratingsandcreateanoverallaverageprogramqualityscore.

TheSACERSismeanttobeusedwhileobservingonegroupatatime,foraperiodofthreehours.Asampleofone-thirdtoone-halfofgroups(whenprogramshavechildrendividedintogroupsorclassrooms)isrequiredtoestablishascoreforanentireprogram.

TechnicalPropertiesInthecaseoftheSACERS,psychometricevidencedemonstratesthatobservationsbydifferentratersareconsistent(interraterreliability)andthattheinstrument’sscalesconsistofitemsthatclustertogetherinmeaningfulways(internalconsistency).Preliminaryevidencealsoexistsforconcurrentvalidity,suggestingtheSACERSmaybeanaccuratemeasureof

Inadequate Minimal Good Excellent

1 2 3 4 5 6 7

Staf

f-Chi

ldC

omm

unicat

ions

Staff-child•communicationisusedprimarilytocontrolchildren’sbehavior&mangeroutines.

Children’stalknot•encouraged.

Stateinitiatebrief•conversations(e.g.askquestionsthatcanbeansweredwithyes.no,limitedturn-takinginconversations).

Limitedresponseby•stafftochild-initiatedconversations&questions.

Staff-childconversations•arefrequent.

Turn-takinginconversation•betweenstaff&childisencouraged(e.g.stafflistenaswellastalk).

Languageisusedprimarily•bystafftoexchangeinformationwithchildren&forsocialinteractions.

Childrenareasked“why,•how,whatis”questionswhichrequirelongermorecomplexanswers.

Staffmakeeffortto•talkwitheachchild(e.g.listentochild’sdescriptionofschoolday,includingproblems&successes).

Staffverballyexpand•onideaspresentedbychildren(e.g.addinformation,askquestionstoencouragechildrentoexploreideas).

Page 74: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

74

programpracticesthatpredictrelatedoutcomes.15TheinformationpresentedhereisreportedbyHarms,JacobsandWhite(1996).

InterraterReliabilityToexamineinterraterreliabilityorthedegreetowhichdifferentratersagreewhenobservingthesameprogram,pairedratersassessed24programsusingthemeasure.ResearcherstestedinterraterreliabilitywiththeSACERSscalesandtotalscoreusingkappascoresandintraclasscorrelationcoefficients.

Allreliabilitycoefficientswerenearorabove0.70,suggestingstrongagreement.Inotherwords,withadequatetrainingforraters,scoreswillnotdependonwhichraterisevaluatingagivenprogram.

InternalConsistencyResearchersexaminedhowconsistentindividualitemscoresarewithineachrespectiveSACERSscale,sincealloftheitemswithinaparticularscaleareintendedtomeasureaparticularconcept(e.g.,HealthandSafety).Internalconsistencyofthescalesandthetotalscorewasstrong,withalphavaluesrangingfrom.67to.95.Highinternalconsistencystrengthenstheargumentthattheitemsjointlyrepresentthecentralconceptofinterest.

ConvergentValidityConvergentvalidityisexaminedbycomparingthefindingsfromtheinstrumentofinteresttoasimilarassessmenttool,tohelpdemonstratetheinstrument’sabilitytomeasurewhatitissupposedtomeasure.FindingsfromthreeoftheSACERSscaleswerecomparedtoratingswithVandellandPierce’sProgramQualityObservationScale(bytheauthorsandcolleaguesofthePQO,alsoreviewedinthisreport).EvidenceindicatedthateachofthesethreeSACERSscales(ProgramStructure,ActivitiesandInteractions)wererelatedtosimilarPQOitemsinexpectedways.Specifically,VandellandPierce(1998)foundthefollowingrelationshipsbetweenPQOandSACERS

scalesin46after-schoolprograms:(1)SACERSProgramStructurewaspositivelyrelatedtoPQOProgrammingFlexibility,(2)SACERSActivitieswaspositiverelatedtoPQOAvailableActivities,and(3)SACERSInteractionswaspositivelyrelatedtoPQOStaffPositiveRegardandPositiveBehaviorManagement,andnegativelyrelatedtoPQOStaffNegativeRegardandNegativeBehaviorManagement.ConvergentvalidityevidenceisunavailableontheotherSACERSscales.

ConcurrentValidityTodeterminewhetherSACERSaccuratelymeasuresprogramquality,developersexaminedwhethertheinstrument’sratingswererelatedtodistinct,theoreticallyimportantconceptsinexpectedways.Additionalconcurrentvalidityevidencecoversallofthescalesandtotalscore.Specifically,becausepriorresearchsuggestsprogramqualityisrelatedtostaffeducation/training,researchersexpectedthatiftheSACERSscaleswereadequatelymeasuringquality,theywouldbepositivelyrelatedtostaffeducation/training.AsHarms,JacobsandWhite(1996)expected,SpaceandFurnishings,Interactions,andProgramStructure,aswellastheoverallSACERSscore(whichcanbethoughtofasgeneralprogramquality)weremoderately,positivelycorrelatedwithameasureofstaffeducationandtraining.However,theydidnotreportparallelcorrelationswithmeasuresofthreeadditionalscales(HealthandSafety,Activities,orStaffDevelopment);itisunclearwhethertheydidnottestthesescalesoriftheyfoundthemtobeunrelatedtostaffeducation/training.Theresearchersalsotestedthevalidityofthescalesbyexaminingtheirrelationshiptostaff-childratio.Asexpected,theyfoundthatHealthandSafety,Activities,andStaffDevelopmentweremoderatelyrelatedwithchild-staffratio.Theydidnotreportcorrelationsbetweentheotherscalesandtotalscorewithstaff-childratioanditisunclearwhethertheydidnottesttheseorwhethertheywereuncorrelatedwithstaff-childratio.

AdditionalValidityEvidenceToexploretheextenttowhichtheSACERSadequatelymeasuresprogramquality,thedevelopersaskednine15Exceptwhennoted,psychometricinformationisnotavailableforthe

supplementary“specialneeds”itemsattheendoftheinstrumentbecausenoneoftheprogramstestedhadexceptionalchildren.

Page 75: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

75

expertstoratehowmucheachitemintheinstrumentrelatedtotheirdefinitionofhighquality.Usingafive-pointscale(withfivebeingaveryimportantaspectofquality),theminimumaveragescorewasaroundafourandexpertsratedmostitemsclosetoafive.Thesescoressuggestthatitemsadequatelymeasureaspectsofquality.However,sinceexpertswerenotaskedwhetheranyaspectsofqualitywereabsentfromtheinstrument,thisshouldnotbetakenasevidencethatprogramqualityasawholeisadequatelyrepresented.

UserConsiderationsEaseofUseTheSACERSisveryeasytouseintermsofaccessibilityofformatandlanguage(itiscurrentlyavailableinEnglish,French,German,andmostrecently,Spanish).Fullinstructionsforusingthescaleareincludedinthebookletalongwiththeitemsthemselves,notesclarifyingmanyoftheitemsandatrainingguidewithadviceonpreparingtousethescale,conductingapracticeobservationanddetermininginterraterreliability.Oneblankscoresheetisincludedinthebookletandadditionalscore-sheetscanbeorderedinpackagesof30.TheSACERSbookletisavailableforpurchasethroughTeachersCollegePressat$15.95.

Developerssuggestittakesapproximatelythreehourstoobserveaprogramandcompletetheform(usersareencouragedtocheckoffindicatorsandmakeatleastinitialscoringdecisionswhileobserving).Acknowledgingthatqualitycanvarywithinthesamecenterorprogram,thedevelopersadvisethattheapproachtoobservationandscoringreflecthowprogramsarestructured.Ifaprogramhaschildrenbrokenintoseveraldifferentclassrooms,observersareencouragedtoobserveone-thirdtoone-halfofthegroupsintheprogrambeforecreatinganoverallscore.

AvailableSupportsThreeandfive-daytrainingworkshopsfocusedonthestructure,rationaleandscoringoftheSACERSareavailablethroughtheFPGInstitute,asisadditionalinformationabouttheinstrumentandtheotherratingscalesintheseries.Specificguidanceforhowtoconduct

yourowntrainingwithstafforotherobserversisprovidedintheSACERSbooklet.Trainingtoreliabilitytakesanestimated4-5days,withreliabilitychecksthroughout.

FPGiscurrentlysolicitinginputfromusersinthefieldtodevelopapracticalmanualforadulteducatorsusinganyoftheratingscales,whichwillincludespecificmaterialssuchascoursesyllabiandoutlines.Formshavealsobeendevelopedtoassistwithreportingandapplyingobservationstoprogramimprovementplans.UserscansignuptojoinalistservthroughtheFPGWebsitetointeractwithotherusersinthefieldandtohearaboutupdatesandotherrelevantdevelopments.

Largescaleusersoftheratingscalescannowworkwithacommercialsoftwarepackage–theERSDataSystem–toenterandscoretheirdata.TheTabletPCversiondisplaystheitemsasseenintheprintversionandscoresaremadebytappingonthescreen.Notescanalsobewrittenwithaspecialpenandareautomaticallytranslatedintoprinttextandcanbeincorporatedintothesummaryreports.Thesoftwarealsohasamoduleoninterraterreliabilitywhichcanbeusedtocomparescores,reachconsensusanddeterminereliability.UsingtheWeb-basedsystem,individualassessmentscanbeautomaticallyroutedtoasupervisorforqualityassuranceandfeedbackandaggregatedataanalysisandorganizationandprogram-levelreportingcanbeprovided.

ImportantinformationforupdatingtheSACERSisavailableatwww.fpg.unc.edu/~ecers,includingadditionalNotesforClarificationandanexpandedscoresheet.Also,arevisionofSACERSisforthcoming,asisaYouthratingscaleforprogramsservingmiddleandhigh-schoolageyouth.

IntheFieldThestateofTennesseepassedlegislationin2001requiringalllicensedchildcarecentersandfamily/grouphomesinthestatetobeassessedusingtheEnvironmentRatingScales(includingSACERS).TheresultingChildCareEvaluationandReportCardprogramhastwoparts,onemandatoryandonevoluntary,both

Page 76: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

76

ofwhicharestructuredaroundtheEnvironmentRatingScalestoassessthequalityofcareprovidedatspecificfacilities.Inthemandatorypartoftheprogram,theERSassessmentisoneofseveralcomponentsofanoverall“reportcard”giventoeachproviderthatmustbepostedalongwiththeirannuallicense.

ThevoluntarypartoftheprogramtiestheERS-basedassessmenttoreimbursements.IntheStar-QualityChildCareprogram,overallassessmentscoresforparticipatingprovidersisconvertedintoone,two,orthreestars,whichinturncanincreasetheprovider’sstatereimbursementby5,10or15percentrespectively.Tosupportparticipationinboththemandatoryandvoluntaryprograms,localTechnicalAssistanceUnitsprovideassistance,atnocharge,toanyproviderthatwantsinformationonhowtoimprovequalityandtherebyincreaseitsassessmentscore.

TheTennesseeDepartmentofHumanServices(TDHS)workswiththeUniversityofTennesseeandseveralotherorganizationstoimplementandmanagethisprogram.TDHSandUT’sSocialWorkOfficeofResearchandPublicServicemanagetheprogramandTennesseeStateUniversitypreparesanddeliverstheinitialtrainingforassessors.ElevenresourcecentersaroundthestatehouseanAssessmentandTechnicalAssistanceUnit.Theseunits,whichareresponsibleforconductingalltheERS-basedassessments,hireandemployabout60assessorsstatewide.AssessorsreceiveongoingtrainingandfrequentreliabilitychecksbyassessmentspecialistsattheUT.

Theassessmentprocesstakesplaceinconjunctionwithlicenserenewal.Adatabasehasbeendevelopedthatprovidesaccesstoregularlyupdatedstatisticalanddemographicinformationabouttheprogram’ssuccessinpromoting,supportingandincreasingqualitychildcareacrossthestate.

TheSACERSandotherscalesinthisseriesarepartofmanyotherstatequalityratingsystems,includingNorthCarolina,Mississippi,ArkansasandPennsylvania.

ForMoreInformationAdditionalinformationabouttheSACERS,supplementarymaterialsandorderinginformationisavailableonlineat:www.fpg.unc.edu/~ecers/

ContactThelmaHarms,DirectorofCurriculumDevelopmentFrankPorterGrahamChildDevelopmentInstitute517S.GreensboroStreetCarborro,[email protected]

Page 77: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

77

PurposeandHistoryTheYouthProgramQualityAssessment(YPQA)isaninstrumentdesignedtoevaluatethequalityofyouth-servingprograms.Whileitspracticalusesincludebothprogramassessmentandprogramimprovement,itsoverallpurposeistoencourageindividuals,programsandsystemstofocusonthequalityoftheexperiencesyoungpeoplehaveinprogramsandthecorrespondingtrainingneedsofstaff.

Whilesomequalityassessmenttoolsandprocessesfocusonthewholeorganization,theYPQAisprimarilyfocusedonwhatthedevelopersrefertoasthe“pointofservice”–thedeliveryofkeydevelopmentalexperiencesandyoungpeople’saccesstothoseexperiences.Whilesomestructuralandorganizationalmanagementissuesareincludedintheinstrument,itfocusesprimarilyonthosefeaturesofprogramsthatcanbeobservedandthatstaffhavecontroloverandcanbeempoweredtochange.Whilethesesocialprocesseshavenotalwaysbeenemphasizedinlicensingandregulatoryprocesses,researchsuggeststheyarecriticalininfluencingprogramqualityandoutcomesforyouth.Giventhisfocus,theYPQAisexpectedtoassessprogramqualitymostaccuratelywhenusersobserveprogramofferings(programmaticexperiencesconsistingofthesamestaff,childrenandlearningpurposeacrossmultiplesessions).

TheYPQAhasitsrootsinalonglineageofqualitymeasurementrubricsdevelopedbytheHigh/ScopeEducationalResearchFoundationoverthepastseveraldecadesforpre-school,elementaryandnowyouthprograms.Initsinitialiteration,theinstrumentwasdevelopedspecificallytoassessimplementationoftheHigh/Scopeparticipatorylearningapproach.Initscurrentform,thetoolisrelevantforawiderangeofcommunity-andschool-basedyouth-servingsettingsthatservegrades4–12.Ithasbeenusedinarangeofafter-school,camp,youthdevelopment,preventionandjuvenilejusticeprograms.Itisnotnecessarilyappropriateforuseinhighlyunstructuredsettingsthatlackfacilitatedactivities.

ContentTheYPQAmeasuresfactorsattheProgramOfferinglevelandtheOrganizationallevelthataffectqualityatthe“pointofservice.”Thesevenmajordomains(calledsub-scalesinthetool)thatarecoveredincludeEngagement,Interaction,SupportiveEnvironment,SafeEnvironment,Youth-centeredPoliciesandPractices,HighExpectationsandAccess.

Becauseofthefocusonthe“pointofservice,”theYPQAemphasizessocialprocesses–orinteractionsbetweenpeoplewithintheprogram.Themajorityofitemsareaimedathelpingusersobserveandassessinteractionsbetweenandamongyouthandadults,theextenttowhichyoungpeopleareengagedintheprogramandthenatureofthatengagement.HowevertheYPQAalsoaddressesprogramresources(human,material)andtheorganizationorarrangementofthoseresourceswithintheprogram.

ThecontentoftheYPQAalignswellwiththeNationalResearchCouncil’sfeaturesofpositivedevelopmentalsettings(2002),withtheleastemphasisonwhatisreferredtobytheNRCas“integrationoffamily,schoolandcommunityefforts.”ThecontentoftheYPQAhasalsobeenreviewedagainstandappearscompatiblewithJimConnellandMichelleGambone’syouthdevelopmentframework(2002).

StructureandMethodologyTheseventopicsordomainscoveredbytheYPQAaremeasuredbytwodifferentoverallscales(groupsofrelateditems)thatrequiredifferentdatacollectionmethods.TheprogramofferingitemsareincludedinFormAandareassessedthroughobservation.FormBincludestheorganizationlevelitems,whichessentiallyassessthequalityoforganizationalsupportfortheprogramofferinglevelitemsthatarethefocusofFormA.EvidenceforFormBisgatheredthroughacombinationofguidedinterviewandsurveymethods.

Thesevendomainscanbegraphicallyrepresentedbythe“pyramidofprogramquality,”(seebelow),whichrepresentsbothanempiricalrealityandaunified

DevelopedbytheDavidP.WeikartCenterforYouthProgramQuality16

16TheWeikartCenterisajointventurebetweentheHigh/ScopeEducationalResearchFoundationandtheForumforYouthInvestment.

Page 78: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

78

frameworkforunderstandingandimprovingquality.Fromanempiricalperspective,assessmentsusingtheYPQAthusfarfollowadistinctpattern–mostprogramsscorehighestinsafetyandthenprogressivelylowerasyoumoveupthelevelsofthepyramidthroughsupport,interactionandengagement.Programsthatscorehighinengagementandinteractionappearmost

abletoinfluencepositiveyouthoutcomes(seetechnicalpropertiesformoredetailonthevaliditystudy).

ThescaleusedthroughouttheYPQAisintendedtocapturewhethernoneofsomething(1),someofsomething(3)orallofsomething(5)exists.Foreachindicator,veryconcretedescriptorsareprovidedto

Interaction

SupportiveEnvironment

SafeEnvironment

ReflectMakeChoices

Lead&Mentor

PartnerWithAdults

ExperienceaSenseofBelonging

SetGoals&MakePlans

BeinSmallGroups

HealthyFood&Drinks

ReframingConflict

WelcomingAtmosphere SkillBuilding

AppropriateSessionFlow

ActiveEngagement

Encouragement

EmergencyProcedures&Supplies ProgramSpace

&Furniture

PhysicallySafeEnvironment

Psychological&EmotionalSafety

Engagement

HighExpectationsStaffDevelopment•

SupportiveSocial•Norms

HighExpectationsfor•YoungPeople

CommittedtoProgram•Improvement

YouthCenteredPolicies&PracticesStaffQualificationsSupport•PositiveYouthDevelopment

TapYouthInterests&BuildSkills•

YouthInfluenceSettings&•Activities

YouthInfluenceStructure&Policy•

AccessStaffAvailability•&Longevity

ProgramSchedules•

BarriersAddressed•

Families,Other•Organizations,Schools

Page 79: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

79

illustratewhatascoreof1,3or5lookslike(seeexampleonnextpage).ThescoringforFormsAandBisconsistent,butinthecaseofFormB,evidencetodrivethescoringisbasedonaninterviewasopposedtoobservations.Observersareencouragedtowritedownevidenceorexamplesthatsupportthescorethathasbeenapplied.

TechnicalPropertiesExtensivepsychometricevidenceabouttheYPQAisprimarilyavailablefromthreestudies.Thefirst,referredtoastheValidationStudy,examinedthereliabilityandvalidityoftheinstruments’scaleswithasampleof59organizations,mostofwhichwereafter-schoolprograms(Smith&Hohmann,2005).Thefindingssuggesttheinstrumenthasmanygoodpsychometricproperties.Threeofthesevenscales,however,didnotperformwellinoneormorepsychometricareas.

Thesecondstudy,referredtoastheSelf-AssessmentPilotStudy,includedasampleof24sitesandspecificallyexaminedtheYPQA’suseasaself-assessmenttoolforafter-schoolprograms(Smith,2005).Thisistheonlystudymentionedinthisreportthataskedprogramstoassessthemselvesratherthan

relyingonindependentresearcherstocollectdata.ThisstudyexaminedtheconcurrentvalidityoftheYPQAandfoundpreliminarysupportforthetotalscoreandseveralscales.Similartothefirststudy,somescalesexhibitedproblemswithinternalconsistency.

Thethirdstudy,referredtoasthePalmBeachQualityImprovementSystem(QIS)PilotStudy,usedamodifiedformoftheYPQAknownasthePBQ-PQAtoassessprogramqualityin38sites.ThePBQ-PQAhadsimilar,butnotidentical,scalescomparedtotheYPQA(Smith,C,Akiva,T.,Blazevski,J.&Pelle,L.,2008).

Inadditiontothesethreestudies,thedevelopersalsoconductedadditionalinterraterreliabilityanalysesfortheprogramofferingssectionoftheinstrument.Theyhavealsobegunusingtechniquesthatprovidemorerefinedanddetailedanalysesofreliabilityandvaliditythantraditionalmethods(seepages16-17).Inrelatedwork,CYPQhasjustfinishedavaliditystudyonayoungeryouthversionofthePQA(gradesK–4).

ScoreDistributionsScoredistributionshelpusersdeterminewhetheritemsadequatelydistinguishbetweenprogramsonspecific

II.SupportiveEnvironmentII-I.StaffSupportYouthinBuildingNewSkills

Indicators SupportingEvidence/Anecdotes1 3 5

Youtharenotencouragedtotryoutnewskillsorattempthigherlevelsofperformance.

Someyouthareencouragedtotryoutnewskillsorattempthigherlevelsofperformancebutothersarenot.

Allyouthareencouragedtotryoutnewskillsorattempthigherlevelsofperformance.

n/o=1

Someyouthwhotryoutnewskillswithimperfectresults,errorsorfailureareinformedoftheirerrors(e.g.,“That’swrong.”)and/orarecorrected,criticized,madefunof,orpunishedbystaffwithoutexplanation.

Someyouthwhotryoutnewskillsreceivesupportfromstaffwhoproblem-solvewithyouthdespiteimperfectresults,errors,orfailure,and/orsomeyoutharecorrectedwithanexplanation.

Allyouthwhotryoutnewskillsreceivesupportfromstaffdespiteimperfectresults,errors,orfailure;staffallowyouthtolearnfromandcorrectmistakesandencourageyouthtokeeptryingtoimprovetheirskills.

n/o=1

Page 80: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

80

dimensions.SmithandHohmann(2005)examinedaveragescoresandspreadforeachofthescalesandtotalscoresfortheProgramOfferingsandOrganizationitemsandfoundthatallofthescalesandtotalscorehadgooddistributionsexceptforSafeEnvironmentandAccess(whicheachhadmeansof4.4outofapossible5.0).Mostprogramsscoredveryhighonthesescales,makingithardtocapturereliabledifferences.ForSafeEnvironment,itmayberealistictoassumethatnearlyallprogramsarerelativelysafe,particularlysincethescoresfromthisscalewerevalidatedbyfindingsfromayouthsurvey(seesectiononconcurrentvalidity).However,additionalevidenceisneededtodeterminewhethernearlyallprogramsarehighonAccess,orwhethertherearemeaningfuldifferencesthatarenotbeingpickedupbecausetheitemsare“tooeasy.”Inthelattercase,theitemscouldberevisedtobettercapturedifferencesbetweenprograms.

InterraterReliabilityRecentanalysessuggestthatthecurrentversionofthetoolpairedwithimprovedtrainingtechniquesproducesmoderatetohighlevelsofinterraterreliability.FortheProgramOfferingitems,High/Scoperesearchershavecapturedfourpaired-raterdatasetsoverthepasttwoyearsforatotalof32raterpairsusingliveandvideomethodsfortestingagreement.OneofthesedatasetswasproducedindependentlybytheChildren’sInstituteattheUniversityofRochester.AllratersusedthecurrentversionoftheYPQA.Researchersfoundthatacrosstheraterpairstherewasanaverageof78percentperfectagreementattheindicatorlevel,whichtranslatestoanaveragemaximumkappacoefficientof.66,closetothe.70benchmarkforhighinterraterreliability.Similarly,theaverageitem-levelmaximumkappafortheProgramOfferingitemswasalsohighat0.72.

FindingssuggestthatthecurrentversionofthetoolpairedwithratertrainingproducesacceptablelevelsofinterraterreliabilityforthreeofthefourscalesintheProgramOfferingssection.Specifically,theSafety,Support,andEngagementscaleshadacceptablereliabilitiesrangingbetween0.66and0.73.TheInteractionscalehadmoderatereliability(0.54).

InformationfortheOrganizationitems(scalesfivethrougheight)comesfromanearliervalidationstudybySmithandHohmann(2005).Theauthorscomparedpairsofraterswhoexaminedthesameprogramsatthesamepointsintime.TheyexaminedthepercentageofagreementacrosstheseitemsandfoundthatthehighestpossibleKappawas0.68,veryclosetothe.70benchmarkforhighreliability.

SmithandHohmann(2005)alsoexaminedinterraterreliabilitiesofthethreeOrganizationscales,whichisimportantbecauseuserswillultimatelydrawmostoftheirconclusionsfromthescales,nottheindividualitems.Theyexaminedagreementusingastatisticknownastheintraclasscorrelationcoefficient(ICC),whichexaminesthedegreetowhichdifferencesamongallratingshavetodowiththedifferencebetweenratersordifferencesamongtheprogramsthemselves.TheYouthCenteredPoliciesandPractices,HighExpectationsandAccessscalesallhadhighinterraterreliability(ICC=.51,.90&.73respectively).

InternalConsistencyInternalconsistencyindicateshowcloselyrelatedscoresarefortheoreticallysimilaritems.TheValidationStudyfoundthatmostoftheYPQAscalesexhibitedacceptableinternalconsistencyexceptforSafeEnvironmentandAccess.Asnotedabove,thismayhavetodowiththedistributionsofscores.Twoitemsfromaninternallyconsistentscalegotogether,sothatwhenitemAisratedashigh,itemBisratedashighandwhenAislow,Bisalsolow.However,ifAisalwayshigh(becauseallprogramsdowellonit),whetherornotBishigh,internalconsistencywillbelow.

OneexampleofaniteminwhichmostorganizationsreceivedthehighestpossiblescoreintheSelf-AssessmentPilotStudywas,“Thephysicalenvironmentissafeandhealthyforyouth.”Ifitemssuchasthisonearealwayshigh,wemaynotneedtokeepmeasuringthem.However,ifresearchersbelievethatthereismeaningfulvariationamongprograms,thenthesescalesmayneedadditionalrevisionbeforewecanbeconfidentthattheirscoresreliablymeasuretheconceptsthat

Page 81: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

81

theyaresupposedtomeasure.Similarly,Smith(2005)foundintheSelf-AssessmentPilotStudythatthesetwoscaleshadlowinternalconsistency,butitalsoshowedlowinternalconsistencyfortwootherscales:YouthCenteredPoliciesandPracticeandHighExpectationsforAllStudentsandStaff.ApossibleexplanationisthatstaffparticipatingintheSelf-AssessmentPilotStudywereonlygivenonedayoftraining,whereastrainedratersintheValidationStudymayhavebeengivenmore.

Oneadditionalexplanationforwhyinternalconsistencymayhavebeenloweronsomescalescouldbethattheconceptsformingthesescalesareformativeratherthanreflective.AsexplainedinthesectiononAdditionalTechnicalConsiderations(page16-17),internalconsistencytestsareonlyappropriatewhenitemsarereflective,meaningthattheyallreflectthesameunderlyingconcept.Suchitemsarecloselyrelatedtooneanotherandeachrepresentsaunique“attempt”tomeasuretheconceptofinterest.However,internalconsistencyshouldnotbeusedwhenitemsareformative,meaningthatdifferentcomponentstogethermakeuporformacoherentset.Forexample,theSafeEnvironmentscalemaybemoreformativethanreflective.Aprogramthatprovideshealthyfoodanddrinks(asassessedbyoneitem)maynotnecessarilyhaveappropriateemergencyproceduresandsuppliespresent(anotheritemonthescale).However,eventhoughthesetwoitemstapdifferentunderlyingconcepts(nutrition,safetyinemergencies)andmaynotbecloselyrelated,theircombinationprovidesanimportantindexofhowaprogrampromotessafetyandhealth.

YPQAdevelopershavebegunexaminingwhethersomeYPQAscalesareformativeversusreflective,andtheyarecurrentlyexploringwhethercertainitemscanbecombinedtoformnew,reflectivescales.

Test-RetestReliabilityTheValidationStudyexaminedhowmuchscoreschangedonmultipleratingsoveraperiodofthreemonths.Correlationsbetweenassessmentsrangedfrom0.81to0.98,indicatingthatratingsdonotfluctuatewidelyovershortperiodsoftime.Long-termstability

wasnotassessed,sowecannotofferanyevidenceonwhethertheYPQAissensitiveenoughtodetectlong-termchange.

ValidityofScaleStructureEachofthescalesintheYPQAissupposedtomeasureaseparateconcept.Afactoranalysisexamineswhichitemsaresimilartoeachotherandwhicharedifferent.SmithandHohmann(2005)conductedafactoranalysisatbothobservationperiodsandfoundpreliminaryevidencethattheProgramOfferingitems(scalestwothroughfour)groupedtogetherinwayssimilartothescales.SafeEnvironmentwasnotincludedinthefactoranalysisandtheauthorsacknowledgethatthefactoranalysisdidnotsupporttheirexpectationsuntiltheyremovedtheseitems.WithouttheSafeEnvironmentitems,findingsindicatedthatSupportiveEnvironmentandOpportunitiesforInteractionoverlapandmaynotbeentirelydistinct.ValiditysupportwasstrongfortheOrganizationitems(scalesfivethroughseven),whichgenerallygroupedtogetheraccordingtothetheorizedstructureofthescales.

ConvergentValidityOnewaytoexaminewhetheraninstrumentactuallymeasuresaspectsofprogramqualityistocompareitsscorestomeasuresofidenticalorhighlysimilarconcepts.TheValidationStudytestedconvergentvaliditybycomparingallYPQAscalesexceptAccessandHighExpectationstosimilarscalesonaseparateyouthsurvey.Forexample,theSupportiveEnvironmentscalewascomparedtoaBelongingscaleontheyouthsurvey.CorrelationalevidenceindicatesthattheYPQAismoderatelytostronglyrelatedtofindingsfromtheyouthsurvey.TheYPQAtotalscoresfortheobservationandinterviewscaleswerealsorelatedtotheyouthsurveytotalscore.Theseresultsareencouragingforestablishingvalidity.

InthePalmBeachQualityImprovementSystemPilotStudy,researchersexaminedtherelationshipbetweenyouthperceptionsofprogramqualitytomodifiedversionsofthefourYPQAFormAdomainscales.Scalesonthisformweresimilartotheoriginalscales,but

Page 82: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

82

notidentical.AuthorsfoundthatyouthperceptionsofqualitywererelatedtotheInteractionscale,butwereunrelatedtoSafeEnvironment,SupportiveEnvironment,andEngagementscales.Althoughthisevidenceismixed,thisvalidityevidencemaynotapplytothecurrentYPQAscalessincetheYPQAandtheinstrumentusedinthisstudyarenotcompletelyidentical.

ConcurrentValidityConcurrentvalidityisestablishedwhenaninstrument’sitemsandscalesarerelatedtodistinctbuttheoreticallyimportantconceptsthataremeasuredinthesametimeperiod.TheValidationStudymeasuredthevalidityofthetotalprogramqualityscore(createdbyaveragingthevariousscalescores)byexaminingitsrelationshiptoexpertratingsoftheprogramsthatwerebeingevaluated.Specifically,expertsratedprogramsbasedonyouthcenterednessandavailabilityofresources.ItisreasonabletoexpectthatiftheYPQAisindeedmeasuringprogramquality,thenthetotalscorewouldberelatedtothesetwoexpert-ratedconcepts.UsingPearsoncorrelationsasameasureofrelatedness,Smith&Hohmann(2005)foundstrongevidencethattheYPQAtotalscoreisrelatedtoexpertratingsforthesetwodomains,lendingadditionalsupportthattheinstrumentisindeedmeasuringprogramquality.Theyalsotestedthevalidityoftheglobalprogramqualityscoresbycomparingprogramswithtrainedstafftoprogramswithouttrainedstaff.Asexpected,theprogramswithtrainedstaffhadhigherglobalqualityscoresthanthosewithout,againlendingsupportthattheinstrumentcanvalidlymeasureoverallprogramquality.

TheValidationStudyalsoexaminedhowwelltheinstrumentwasassociatedwithstudentexperiencesassessedbyaseparateyouthsurvey(Smith&Hohmann,2005).ThefollowingrelationshipswereexaminedbetweenYPQAandyouthsurveyscales:(1)YPQAtotalscorewiththeyouthsurveymeasureofoverallprogramexperiences,(2)YPQAEngagedLearningwithmeasuresofgivingbacktothecommunity,youthgrowth,interestintheprogram,andchallengingexperiences,and(3)YPQAInteractionOpportunitieswithameasureofdecisionmakingintheprogram.

Theauthorsfoundstrongevidenceforconcurrentvalidityinthatalloftheirhypothesizedrelationshipsweresupportedexceptfortwo(theEngagementscalewasnotrelatedtoyouths’interestintheprogramorchallengingexperiences).However,thisevidenceislimitedinthattheoreticallyimportantrelationshipsinvolvingFormA’sSafeEnvironmentandSupportiveEnvironmentscalesandthethreeFormBscaleswerenotexamined.

TheSelf-AssessmentPilotStudyexaminedconcurrentvaliditybycorrelatingfindingsfromtheSupportiveEnvironmentandEngagementscalesandtheProgramOfferingstotalscorewithayouthsurveymeasureofstaffsupport.FindingsindicatedastrongrelationshipbetweenSupportiveEnvironmentandtheyouthsurvey.TheEngagementscalewasrelatedinexpectedwaystoameasureofprogramgovernanceontheyouthsurveyandtheProgramOfferingstotalscorewasrelatedinexpectedwaystoacademicsupportandpeerrelations.Noneoftheserelationshipswerestatisticallysignificant,perhapsbecausethesamplesizewassosmall(12programs).Thus,theserelationshipsshouldbeconsideredpromisingbutnotdefinitive.

ThePalmBeachQualityImprovementSystemPilotStudyevaluatedtheconcurrentvalidityofmodifiedversionsofthefourFormAscalesbyexaminingtheirrelationshipswithtwoscalesfromayouthsurvey:positiveaffectandchallengingexperiences.HigherpositiveaffectscoreswererelatedtohigherscoresonYPQAInteractionOpportunities,butitwasunrelatedtotheotherthreescales.HigherscoresonchallengingexperienceswasrelatedtohigherscoresonYPQAEngagedLearning,butitwasunrelatedtotheotherthreescales.Inadditiontotheresultsbeingmixed,validitymaynotapplytotheYPQAscalessincetheywerenotidenticaltotheonesusedinthisstudy.

Theconcurrentvalidityevidenceispromisingbutlimitedatthispoint.Additionalsupportisneededforseveraloftheinstrument’sscales.

Page 83: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

83

VariationsinQualityAcrossDifferentContextsProgramqualitymayvaryacrossdifferentcontextssuchasdifferentofferings,andhowmanysessionschildrenandyouthhavehadwithoneanother.Itisimportanttoknowifaninstrumentissensitivetothesetypesofdifferences,becauseifso,thenuserswillneedtoconductobservationsacrossarangeofcontexts.Forexample,ifqualityscoresvaryacrossdifferenttypesofactivitieswithinaprogram,thenuserswillneedtoobserveawiderangeofactivitiestoobtainacompletepictureofquality.

DevelopershavebegunexamininghowtheYPQAperformsacrossthreeimportantprogramcomponents:individualofferings,thecontentoftheseofferings,andhowmanysessionsthechildrenandstaffhavehadtogether.Developersalsoexaminedvariationacrosstwocombinationsofthesecomponents.Forexample,doesqualityforsomeofferingsstayconstantthroughouttheyearwhereasqualityforotherofferingsimprovesfromthebeginningtotheendoftheyear?Arequalityscoresrelativelysimilarincertaincontentareasregardlessofwhichagenciesarebeingobservedwhereasqualityscoresinothercontentareasvaryacrossagencies?Currently,evidenceisonlyavailablefortheInteractionsscale.Findingsindicatedthatinadditiontodetectingqualitydifferencesacrossagencies,theInteractionsscalewassensitivetodifferencesacrossvarioustypesofofferingsandcontent.However,measuredqualitydidnotvarybythenumberofsessionschildrenandstaffhadtogetheroracrossdifferentcombinationsofprogrammaticcomponents.ThesefindingssuggestthatusersoftheYPQAshouldconductobservationsacrossdifferenttypesofofferingsandcontentareastoobtainanaccurateInteractionsscore.

Inaddition,eventhoughagreementamongraterswasacceptableinotherstudies(asindicatedinthesectiononinterraterreliability),thedevelopersfoundreliabledifferencesamongratingsgivenbydifferentraters.Thissuggeststhatevengoodreliabilityamongratersdoesnotmeanthatratersshouldbeignored–afindingthatprobablyextendstoalltheinstrumentsinthiscompendium.

NoevidenceonhowqualityvariesacrosscontextsiscurrentlyavailablefortheotherYPQAscales,anditisalsopossiblethattheinstrumentissensitivetootherdifferencesbesidestheonesalreadyexamined(e.g.,timeofday).

UserConsiderationsEaseofUseTheYPQAwasdevelopedwithandforbothpractitionersandresearchers;asaresultthelanguageisaccessibleandtheformatandscoringprocessisuser-friendly.TheadministrationmanualandtheintroductionstoFormAandBofferusersasummaryofthepurposeandbenefitsofthetool,definitionsofkeytermsused(e.g.,scale,sub-scale,offering,item)andclearstepsthatwalkusersthroughtheobservationandscoringprocess.Whiletrainingisrecommended,themanualsthemselvesareself-explanatory.A“starterpack”thatincludesanadministrationmanual,FormAandFormBcanbeorderedonlinefor$39.95.

UsersoftheYPQAareencouragedtoconductarunningrecordofwhatoccursduringarelativelyextensiveprogramobservationasopposedtocapturingseveralshortsnapshotsofprogramming,becausedevelopersbelieveactivitieshaveacertainflowthatisimportanttotrytoobserve.Thisisparticularlyimportantifthegoalistocomeupwithareliableandvalidscoreforanindividualprogramasopposedtoaggregatingalargesampleofobservationsforresearchpurposes.Developersestimatethatgeneratingascoreforaprogram,basedonbothFormsAandB,takesaminimumofapproximatelysixhoursforasinglestaffperson.Roughlyfourofthosehoursaretypicallyspentobserving/interviewingwithintheprogramandanothertwohourswritingupandscoringtheinstrument.

AvailableSupportsInadditiontoanonlinetraining,theWeikartCenteroffersYPQAtrainingperiodicallyaroundthecountry(whichwillsoonbeavailableonline).Theone-dayworkshop,YPQABasics,introducestheobservationandevidencegatheringmethod,familiarizesparticipantswitheachitemandindicatorandpreparesstaffto

Page 84: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

84

conducttheprogramself-assessmentmethodofevidencegatheringandscoring.Thetwo-dayYPQAIntermediateworkshopcoversallthematerialfromtheone-dayandgivesparticipantssubstantialpracticescoringthetoolusingwrittenscenariosandvideo,bringsparticipantstoacceptablelevelsofinterraterreliabilityandpreparesstafftoconducttheexternalassessmentmethodofevidencegatheringandscoring.Thethree-dayworkshopcoversallthematerialfromthesetwotrainingsandincludesasitevisit(duringwhichtheparticipantsscoreayouthprogram)andananalysisofthescoringefforts.

Inthepastyear,theWeikartCenterhasdevelopedasetofmanagement-focusedtrainingstoassistsitemanagersinleadingtheirprogramsthroughadata-drivenqualityimprovementprocess.

TheWeikartCenteralsooffers12youthdevelopmenttrainingsthatarealignedwiththecontentoftheYPQA.Followingaself-assessmentorevaluationprocess,forexample,programdirectorscanassembleatailoredstafftrainingexperiencebasedonspecificareaswithintheYPQAwheretheassessmentshowedworkwasneeded.

Anelectronic“scoresreporter”iscurrentlyavailablefromtheWeikartCenter(andisfreetothosewhopurchasetheinstrument).AmoresophisticatedWeb-baseddatamanagementsystem,iscurrentlyunderdevelopment.Thiswillallowindividualprogramsornetworkstojoin,goonlinetoenterandanalyzedataandseetheirresultsatvariouslevelsofaggregation.

IntheFieldTheRhodeIslandState21stCCLCprogramhaspartneredwiththeCenterforYouthProgramQualityinamulti-yearqualityassessmentprocessusingacustomizedtoolbasedontheresearch-validatedPQA.TheRhodeIslandProgramQualityAssessment(RIPQA),throughajointpartnershipoftheRhodeIslandAfterSchoolPlusAlliance,theProvidenceAfterSchoolAllianceandtheRhodeIslandDepartmentofEducation,iscurrentlyusedbyafter-schoolprogramsacrossthecityofProvidenceandthroughoutthestate,including

all21stCCLCfundedprograms.Participatingprogramsconductanannualself-assessmentusingtheRIPQA.Tosupporttheirefforts,aWeikartCenter-trainedQualityAdvisorworkswithprogramstojointlyobserveprogramofferingswithsitestaffandthenworkone-on-onewithagenciestodevelopqualityimprovementplansbasedonthoseobservations.

Asanadditionalcomponentofthiseffort,theWeikartCenterhasalsoconductedarandomizedfieldtrialtotestouttheirfulltrainingmodel.Basedon100interviewswithsitesupervisors,researchershavefoundthatengagingprovidersintheobservationandreflectionprocesshasbeenwell-receivedacrosstheboard.Thequalityadvisorandsite-basedtechnicalsupporthasbeenaveryimportantpartoftheprocess,especiallyforthoseproviderswithlimitedcapacity.Aggregatedsystem-widequalitydataareusedtodesignandcoordinatesystem-wideprofessionaldevelopmentofferingsaroundtheneedsthatgetsurfacedthroughassessment.

AccordingtoElizabethDevaney,DirectorofQualityInitiativesattheProvidenceAfterSchoolAlliance,thequalityimprovementefforthas“strengthenedourpositionandabilitytoattractpublicandprivateresourcestogrowthesystem,andisanimportantstrategyforsustainabilitygoingforward.”

ForMoreInformationInformationabouttheYPQAandorderinginformationisavailableonlineat:www.highscope.org/content.asp?contentid=117

ContactCharlesSmith,DirectorDavidP.WeikartCenterforYouthProgramQualityCentennialPlazaBuildingSuite601124PearlStreetYpsilanti,[email protected]

Page 85: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

85

Arbreton,A.,Goldsmith,J.&Sheldon,J.(2005).Launchingliteracyinafter-schoolprograms:Earlylessons

fromtheCORALinitiative.Philadelphia,PA:Public/PrivateVentures.

Arbreton,A.,Sheldon,J.,Bradshaw,M.,&Goldsmith,J.withJucovy,L.&Pepper,S.(2008).Advancing

achievement:Findingsfromanindependentevaluationof

amajorafter-schoolinitiative.Philadelphia,PA:Public/PrivateVentures.

Birmingham,J.,Pechman,E.,Russell,C.,&Mielke,M.(2005).Sharedfeaturesofhigh-performingafter-

schoolprograms:Afollow-uptotheTASCevaluation.Washington,D.C.:PolicyStudiesAssociates,Inc.

Connell,J.,&Gambone,M.(2002).Youthdevelopment

incommunitysettings:Acommunityactionframework.Philadelphia,PA:YouthDevelopmentStrategiesInc.

Durlak,J.&Weissberg,R.(2007).Theimpactofafter-

schoolprogramsthatpromotepersonalandsocialskills.Chicago,IL:CollaborativeforAcademic,Social,andEmotionalLearning.

Harms,T.,Jacobs,E.,&White,D.(1996).School-agecareenvironmentscale.NewYork,NY:TeachersCollegePress.

InterculturalCenterforResearchinEducation,&NationalInstituteonOut-of-SchoolTime(2005).Pathwaystosuccessforyouth:Whatworksin

afterschool:AreportoftheMassachusettsAfterschool

ResearchStudy(MARS).Boston,MA:UnitedWayofMassachusettsBay.

Kim,J.,Miller,T.,Reisner,E.&WalkingEagle,K.(2005).EvaluationofNewJerseyAfter3:First-year

reportonprogramsandparticipants.Washington,DC:PolicyStudiesAssociates,Inc.

Knowlton,J.,&Cryer,D.(1994).FieldtestoftheASQ

programobservationforreliabilityandvalidity.ChapelHill,NC:Authors.

MacKenzie,S.,Podsakoff,P.,&Jarvis,C.(2005).Theproblemofmeasurementmodelmisspecificationinbehavioralandorganizationalresearchandsomerecommendedsolutions.JournalofAppliedPsychology.90(4).(pgs.710-730).

Martinez,A.,&Raudenbush,S.W.(2008).Measuringandimprovingprogramquality:Reliabilityandstatisticalpower.InM.Shinn&H.Yoshikawa(Eds.),Toward

positiveyouthdevelopment:Transformingschoolsand

communityprograms(pgs.333-349).NewYork,NY:OxfordUniversityPress,Inc.

NationalResearchCouncilandInstituteofMedicine.(2002).Communityprogramstopromoteyouth

development.Eccles,J.andGootman,J.,eds.Washington,DC:NationalAcademyPress.

Pechman,E.,Mielke,M.,Russell,C.,White,R.&Cooc,N.(2008).Out-of-Schooltimeobservationinstrument:

Reportofthevalidationstudy.Washington,DC:PolicyStudiesAssociates,Inc.

Pierce,K.M.,Bolt,D.M.,&Vandell,D.L.(inpress).Specificfeaturesofafter-schoolprogramquality:Associationswithchildren’sfunctioninginmiddlechildhood.AmericanJournalofCommunityPsychology.

Pierce,K.,Hamm,J.,Sisco,C.,&Gmeinder,K.(1995).A

comparisonofformalafter-schoolprogramtypes.PostersessionpresentedatthebiennialmeetingoftheSocietyforResearchinChildDevelopment,Indianapolis,IN.

Pierce,K.,Hamm,J.,&Vandell,D.(1999).Experiencesinafter-schoolprogramsandchildren’sadjustmentinfirst-gradeclassrooms.ChildDevelopment.70(3),(pgs.756-767).

Raudenbush,S.,Martinez,A.,Bloom,H.,Zhu,P.,&Lin,F.(2008).Aneight-stepparadigmforstudyingthereliability

ofgroup-levelmeasures.Chicago:UniversityofChicago.

Russell,C.,Mielke,M.,&Reisner,E.(2008).EvaluationoftheNewYorkCityDepartmentofYouthand

Page 86: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

86

CommunityDevelopment:Out-of-schooltimeprograms

forYouthInitiative:Resultsofeffortstoincreaseprogram

qualityandscaleinyear2.Washington,DC:PolicyStudiesAssociates,Inc.

Russell,C.,Reisner,E.,Pearson,L.,Afolabi,K.,Miller,T.,&Mielke,M.(2006).EvaluationoftheOut-of-School

TimeInitiative:Reportonthefirstyear.Washington,DC:PolicyStudiesAssociates.

Seidman,E.,Tseng,V.,&Weisner,T.(February2006).Socialsettingtheoryandmeasurement.InWilliamT.GrantFoundationReportandResourceGuide2005-2006.NewYork,NY:WilliamT.GrantFoundation.

Smith,C.(2005).MeasuringqualityinMichigan’s

21stCenturyafterschoolprograms:TheYouthPQA

self-assessmentpilotstudy.Ypsilanti,MI:High/ScopeEducationalResearchFoundation.

Smith,C.,Akiva,T.,Blazevski,J.,&Pelle,L.(2008,January).FinalReportonthePalmBeachQuality

ImprovementSystemPilot:ModelImplementation

andProgramQualityImprovementin38After-school

Programs.Ypsilanti,MI:High/ScopeEducationalResearchFoundation.

Smith,C.,&Hohmann,C.(2005).Youthprogram

qualityassessmentyouthvalidationstudy:Findings

forinstrumentvalidation.Ypsilanti,MI:High/ScopeEducationalResearchFoundation.

Spielberger,J.&Lockaby,T.(2008).PalmBeach

County’sPrimeTimeinitiative:Improvingthequalityof

after-schoolprograms.Chicago:ChapinHallCenterforChildrenattheUniversityofChicago.

Vandell,D.L.,Reisner,E.R.,&Pierce,K.M.(2007).Outcomeslinkedtohigh-qualityafterschoolprograms:

Longitudinalfindingsfromthestudyofpromising

afterschoolprograms.Unpublishedmanuscript.PolicyStudiesAssociates,Inc.

Vandell,D.,Pierce,K.,Brown,B.,Lee,D.,Bolt,D.,Dadisman,K.,Pechman,E.,&Reisner,E.(2006).

Developmentaloutcomesassociatedwiththeafter-school

contextsoflow-incomechildrenandyouth.Unpublishedmanuscript.

Vandell,D.,&Pierce,K.(2006).Studyofafter-schoolcare:Programqualityobservation.Retrievedonlineatwww.wcer.wisc.edu/childcare/pdf/asc/program_quality_observation_manual.pdf.

Vandell,D.,&Pierce,K.(2001,April).Experiencesinafter-schoolprogramsandchildwell-being.InJ.L.Mahoney(Chair).Protectiveaspectsofafter-schoolactivities:Processesandmechanisms.PapersymposiumconductedatthebiennialmeetingoftheSocietyforResearchinChildDevelopment,Minneapolis,MN.

Vandell,D.,Reisner,E.,Pierce,K.,Brown,B.,Lee,D.,Bolt,D.,&Pechman,E.(2006).Thestudyofpromisingafter-

schoolprograms:Examinationoflongertermoutcomesafter

twoyearsofprogramexperiences.WisconsinCenterforEducationResearch,UniversityofWisconsin-Madison.

Vandell,D.L.,&Pierce,K.M.(2001,April).Experiencesinafter-schoolprogramsandchildwell-being.InJ.L.Mahoney(Chair),Protectiveaspectsofafter-schoolactivities:Processesandmechanisms.PapersymposiumconductedatthebiennialmeetingoftheSocietyforResearchinChildDevelopment,Minneapolis,MN.

Vandell,D.&Pierce,K.(1998).Measuresusedinthestudy

ofafter-schoolcare:Psychometricpropertiesandvalidity

information.Unpublishedmanual,UniversityofWisconsin-Madison.

WalkingEagle,K.,Miller,T.,Reisner,E.,LeFleur,J.,Mielke,M.,Edwards,S.,Farber,M.(2008).Increasingopportunitiesforacademicandsocialdevelopmentin2006-07:Evaluation

ofNewJerseyAfter3.Washington,DC:PolicyStudiesAssociates,Inc.

Westmoreland,H.&Little,P.(2006).Exploringqualitystandardsformiddleschoolafterschoolprograms:Whatwe

knowandwhatweneedtoKnow:Asummitreport.HarvardFamilyResearchProject;Cambridge,MA.Retrievedonlineatwww.gse.harvard.edu/hfrp/content/projects/afterschool/conference/summit-2005-summary.pdf.

Page 87: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

87

Psychometrics:Whataretheyandwhyaretheyuseful?

TheyouthprogramJaniceworksforisinterestedinself-assessmentandislookingforatoolthatmeasurestheoverallqualityoftheprogram.Afterlookingoverseveraloptions,shesettlesonaninstrumentthatseemseasytouse,withquestionsthatseemrelevanttotheorganization’sgoals.Unfortunately,sheencountersanumberofproblemsonceshestartsusingtheinstrument.First,theobserversinterpretquestionsverydifferently,leadingtodisputesovertheirassessmentsofquality.Second,theindividualitemscoresdon’tseemtoformacoherentpictureoftheprogram.Third,thefindingsareunrelatedtoyouthoutcomesthatshouldbedirectlyrelatedtoprogramquality.AlloftheseissuesmakeJanicequestionwhethertheinstrumentmeasuresprogramqualityaswellasitshould.

TheinstrumentJanicechoselookedusefulonthesurface,butitsfieldperformancewasnotparticularlyhelpful.PsychometricinformationmighthavehelpedJaniceunderstandthestrengthsandweaknessesoftheinstrumentbeforesheusedit.Psychometricsarestatisticsthathelpresearchersevaluateinstruments’fieldperformances.Psychometricinformationcanbedividedintoseveralcategories.

ReliabilityAninstrument’sabilitytogenerateconsistentanswersorresponses.Themostcommonanalogyusedtounderstandreliabilityisagameofdarts.Ifaplayer’sdartsconsistentlylandonthesamelocationontheboard,wewouldsaythatthedartplayerhasexcellentreliability(whetherornotthatplaceisthecenteroftheboard).Thesameistrueforresearchinstrumentsthatyieldpredictableandconsistentinformation.Therearevarioustypesofreliabilitydiscussedbelow.

InterraterReliabilityTheextenttowhichtrainedratersagreewhenevaluatingthesameprogramatthesametime.

Foraccurateprogramassessments,usersshouldchooseinstrumentsthatyieldreliableinformationregardlessofthewhimsorpersonalitiesofindividualraters.Whenfindingsdependlargelyonwhoisratingtheprogram(e.g.,ifRaterAismorelikelytogivefavorablescoresthanRaterB),itishardtogetasenseoftheprogram’sactualstrengthsandweaknesses.Forthisreason,organizationsshouldconsidertheinterraterreliabilityofvariousmeasuresevenifonlyoneraterwillberatingtheprogram.Poorinterraterreliabilityoftenstemsfromambiguousquestionsthatleavealotofroomforindividualinterpretationandsuchambiguityisnotalwaysimmediatelyapparentfromlookingattheinstrument.

Severalmethodsexisttomeasureinterraterreliability.Manyoftheinstrumentsinthisreportgivethepercentagethatratersagreeforagivenitem(allowingaone-pointdifferencetocountasagreement).Whilethismethodiscommon,itisnotasusefulasotherstatistics.Whenavailable,weinsteadreporttwootherstatisticsknownaskappaandintraclasscorrelation.Valuesofkappanearorabove.70indicatehighreliabilityandthisvalueisoftenconsideredthebenchmarkforastrong,reliableinstrument.Otherresearchersstatethatkappavaluesstartingat.60indicatesubstantial/strongagreement,whereasvaluesrangingfrom.40to.59indicatemoderateagreement.Similarguidelinesdonotyetexistfortheintraclasscorrelation,butthisreportconsidersvaluesclosetoorabove.50toindicatehighreliability.

Thereasonthatpercentageagreementdoesnotsufficientlyrepresentreliabilityisthatitdoesnotaccountforthoseinstanceswhereratersagreesimplybychance,whereaskappascoresandintraclasscorrelationsdo.Inmanycases,whatlookslikehighinterrateragreementmayactuallyhavealowkappascoreorintraclasscorrelationcoefficient.Whenkappascoresorintraclasscorrelationsarenotavailableforaninstrument,weprovideanestimateofkappa.Readersshouldknowthattheestimateisthebestpossiblescorebasedontheavailableinformation,thoughitispossibletheactualkappascoreismuchlower(indicatingworsereliability).

Page 88: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

88

Itisimportanttonotethatinterraterreliabilitystatisticsassumethatallratershavebeenadequatelytrainedontheinstrument.Someinstruments’developersoffertrainingforraters.Ifyoucannotreceiveformaltrainingonaninstrument,itisstillimportanttotrainratersyourselfbeforeconductinganevaluation.Organizationscanholdmeetingstorevieweachquestionindividuallyanddiscusswhatcriteriaarenecessarytoassignascoreof1,2or3,etc.Ifpossible,ratersshouldgothrough“testevaluations”topracticeusingtheinstrumentwithscenariosthatcouldoccurintheprogram(ideallythroughvideos,butsuchscenarioscouldalsobewrittenifdetailedenough).Whendisagreementoccursonindividualquestions,ratersshoulddiscusswhytheychosetoratetheprogramthewaytheydidandcometoaconsensus.Practiceevaluationswillhelpratersget“onthesamepage”andhaveamutualunderstandingofwhattolookfor.

Test-RetestReliabilityThestabilityofaninstrument’sassessmentsofthesameprogramovertime.Ifseveralafter-schoolprogramsareeachassessedtwotimes,onemonthapart,therespectivescoresatbothassessmentswoulddifferverylittleiftheinstrumenthadstrongtest-retestreliability.Thestrengthofaninstrument’stest-retestreliabilitydependsonboththesensitivityoftheinstrumentandhowmuchtheprogramchangesovertime.Ifinstrumentsaretoosensitivetosubtlechangesinaprogram,test-retestreliabilitywillbelowandscoresmaydifferwidelybetweenassessmentseventhoughthesubtlechangesdrivingthisdifferencemayholdlittlepracticalsignificance.Ontheotherextreme,instrumentswithextremelyhightest-retestreliabilitymaybeinsensitivetoimportantlong-termchanges.Asisthecasewithinterraterreliability,severalmethodstomeasuretest-retestreliabilityexistincludingpercentageagreement,kappaandintraclasscorrelations,withthelattertwobeingpreferred.

Veryfewoftheinstrumentsinthisreporthaveundergonetestingforthistypeofreliability.Becausethetimespanbetweenassessmentshasbeenrelativelyshortfortheseinstruments,test-retestreliabilityshouldbehigh.

InternalConsistencyThecohesivenessofitemsformingtheinstrument’sscales.Anitemisaspecificquestionorratingandascaleisasetofitemswithinaninstrumentthatjointlymeasureaparticularconcept.Forexample,aninstrumentmightinclude10itemsthataresupposedtomeasurethefriendlinessofprogramstaffanduserswouldaverageorsumthe10scorestogetanoverall“friendlinessscore.”Becauseitemsformingascalejointlymeasurethesameconcept,wecanexpectthatthescoresforeachitemwillberelatedtoalloftheotheritems.Forexample,saythatthreeofour“friendliness”itemsinclude:(1)Howmuchdoesthestaffmembersmileatchildren?(2)Howmuchdoesthestaffmembercomplimentchildren?(3)Howmuchdoesthestaffmembercriticizechildreninaharshmanner?Ifthescalehadhighinternalconsistency,thescoresforeachquestionwould“makesense”comparedtotheothers(e.g.,ifthefirstquestionreceivedahighscore,wewouldexpectthatthesecondwouldalsoreceiveahighscoreandthethirdwouldreceivealowscore).Inascalewithlowinternalconsistencytheitems’scoresareunrelatedtoeachother.Lowinternalconsistencysuggeststheitemsmaynotfittogetherinameaningfulwayandthereforetheoverallscore(e.g.,averagefriendliness)maynotbemeaningfuleither.

Theanalogyofthedartboardisusefulwhenunderstandinginternalconsistency.Thinkabouttheindividualitemsasthedarts:theaimofthethrowerismeaninglessifthedartslandhaphazardlyacrosstheboard.Inthesameway,anoverallscorelikeaveragefriendlinessismeaninglessiftheitems’scoresdonotrelatetoeachother.ThestatisticthatdeterminesinternalconsistencyiscalledCronbach’salpha.Forascaletohaveacceptableinternalconsistency,itshouldbenearorovertheconventionalcutoffof0.70.Whereasinterraterandtest-retestreliabilitiesareimportantinformationforallinstruments,internalconsistencyisonlyrelevantforinstrumentswithscales.

TheWeikartCenter(YPQAdeveloper),amongothers(MacKenzie,S.,Podsakoff,P.,&Jarvis,C.,2005),

Page 89: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

89

hasnotedthatinternalconsistencyisonlyappropriatewhentheitemsarereflectiveofalargerconceptratherthanformative.Foramorein-depthdiscussionofthisrequirement,readersshouldrefertothesectiononAdditionalTechnicalConsiderations,foundonpages16-17ofthisreport.

VariationinQualityAcrossDifferentContextsProgramqualitymaynotbeentirelyuniformacrossdifferentstaff,differentactivities,orevendifferentdaysoftheweekormonthsoftheyear.Evenwhentwoobserverscanagreeonthelevelofqualitythattheyareobservingwhenbothareobservingpreciselythesameactivityatthesametime,theymightcomeupwithdifferentratingsiftheyobserveadifferentactivityatadifferenttime.Someinstrumentsmayalsobeparticularlysensitivetosometypesofvariation.AstheWeikartCenterandothershavenoted(Raudenbush,S.,Martinez,A.,Bloom,H.,Zhu,P.,&Lin,F.,2008),evidenceaboutthewaysthatscoresonaparticularinstrumentvarywithinaprogramisimportantsothatusersknowhowtoaccountforthisvariation(e.g.,ifaninstrument’sscoresdependontheactivity,thenitisimportanttoassessawiderangeofactivitiesintheprogram).Foramorein-depthdiscussionoftheseissues,readersshouldrefertothesectionAdditionalTechnicalConsiderations,foundonpages16-17ofthisreport.

Validity17

Aninstrument’sabilitytomeasurewhatitissupposedtomeasure.Ifaninstrumentissupposedtomeasureprogramquality,thenitwouldbevalidifityieldedaccurateinformationonthistopic.Howeverresearchershavedevisedseveraldifferentmethodsforestablishingvalidity.Themostcommonanalogyusedtounderstandvalidityagainisthegameofdarts.Whilereliabilityisabouttheplayerconsistentlythrowingdartstothesamelocation,validityrelatestowhetherornottheplayerishittingthebull’seye.Thebull’seyeisthetopicaninstrument

issupposedtomeasure.Whilereliabilityisessential,itisalsoimportanttoknowifaninstrumentisvalid(dartplayersthatconsistentlymisstheboardentirelymaybereliable–theymayhitthesamespotoverandover–buttheyaresuretolosethegame!).

Sometimesaninstrumentmaylooklikeitmeasuresoneconceptwheninfactitmeasuressomethingratherdifferentornothingatall.Forexample,aninstrumentmightclaimtomeasureafter-schoolprogramquality,butitwouldnotbeparticularlyvalidifitfocusedsolelyonwhetherchildrenlikedtheprogramandwerehavingfun.

Validitycanbetrickytoassessbecausetheconceptsofinterest(e.g.,programquality)areoftennottangibleorconcrete.Unlikethecaseofreliability,thereisnospecificnumberthattellsusaboutvalidity.Thesemethodseachassessdifferenttypesofrelationshipsthattogethergiveusconfidencethattheinstrumentismeasuringwhatwethinkitmeasures.Next,wedescribethedifferentsubtypesofvalidity.

FaceValidityIndividuals’opinionsofaninstrument’squality.Thisistheweakestformofvaliditybecauseitdoesnotinvolvedirecttestingoftheinstrumentandisbasedonappearanceonly.Oneexampleoffacevalidityinamedicalcontextconcernstakingatemperature.Todayweknowtodothiswithathermometer.Butthinkbackacouplehundredyears.Atthattime,feelingapatient’sforeheadwouldhaveseemedamuchmorevalidmeasureoftemperaturethanstickingaglasstubefilledwithmercuryintothepatient’smouth.Howhotaforeheadfeelsisafacevalidmeasureoftemperature,butfewpeopletodayconsiderthismethodalonetobeadequate.Instead,doctorsrelyonthermometersbecausetheyhavebeenscientificallyproventobemoreaccurate.Similarly,researchersandpractitionersshouldconsiderotherformsofvaliditywhenavailablebeforechoosinganinstrument.

ConvergentValidityTheextenttowhichaninstrumentcomparesfavorablywithanotherinstrument(preferablyone

17ResearchersoftenrefertothetypeofvaliditydiscussedinthisreportasConstructValidity,becauseitaddresseswhetheraninstrumentadequatelymeasuresaspecificconceptorconstruct.Althoughotherformsofvalidityexist,theyarenotaddressedinthisreport.

Page 90: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

90

withdemonstratedvaliditystrengths)measuringidenticalorhighlysimilarconcepts.Iftwoinstrumentsarepresumedtomeasurethesameorsimilarconcepts,wewouldexpectprogramsthatreceivehighscoresononemeasuretoalsoreceivehighscoresontheother.Forexample,imagineresearchershavedevelopedanewinstrument(InstrumentA)thatissupposedtomeasurestaffbehaviormanagementtechniquesinafter-schoolprograms.Todetermineitsvalidity,researchersmightcompareInstrumentAtoInstrumentB,whichisalreadyknowntoaccuratelymeasurestaff’sdisciplinestrategiesinafter-schoolprograms.AssumingthatInstrumentAisavalidmeasurement,wecanexpectthatwhenInstrumentBfindsthatprogramsrarelyuseappropriatedisciplinestrategies,InstrumentAwillfindthatthesameprogramsutilizepoorbehaviormanagementtechniques(andviceversa).Ifthiswerenotthecase,wewouldconcludethatInstrumentAprobablydoesnotadequatelymeasurebehaviormanagement.

ConcurrentandPredictiveValidityTheextenttowhichaninstrumentisrelatedtodistincttheoreticallyimportantconceptsandoutcomesinexpectedways.Ifaninstrumentmeasuresthequalityofhomeworkassistanceinafter-schoolprograms,thenchildrenwhoattendhighqualityprogramsshouldhavehigherratesofhomeworkcompletion(orperhapsgrades)thanchildrenwhoattendlowqualityprograms(assumingthereisnodifferencebetweenthechildrenbeforestartingtheprograms).Usually,theoryandpriorresearchfindingshelpresearchersdeterminewhichoutcomesaremostappropriatetoexaminewitheachinstrument.Validityevidenceisstrongestwhendifferencesintheoutcomesaredetectedaftertheinitialprogramobservationshavebeenconducted(knownasPredictiveValidity).Forexample,imaginethattwoafter-schoolprogramsaredesignedtoimprovechildren’sgrades,andthatchildrenattendingtheseprogramshadsimilargradesatthebeginningoftheschoolyear.Afterconductingprogramobservations,researchersdeterminedthatoneprogramwasofhighqualityandtheotherwasoflowquality.Ifchildrenattendingthehighqualityprogramhadhighergradesattheendoftheschoolyearcomparedtothe

childrenattendingthelowqualityprogram,thismakesusmoreconfidentthattheinstrumentaccuratelydetectedqualitydifferencesbetweenthetwoprograms.

Sometimesobservationsandrelatedconceptsaremeasuredinthesametimeperiod(knownasConcurrentValidity),particularlywhentherelatedconceptsareexpectedtochangesimultaneously.Howeverresearchersgenerallyprefertoseethehypothesizedcause(programquality)comebeforetheeffect.Whenbotharemeasuredatthesametime,itismorelikelythattheremaybeanotherexplanationfortherelationship.

Althoughsimilarinsomeways,concurrentandpredictivevalidityareseparatefromconvergentvalidity.Whereasconvergentvaliditycomparestwoinstrumentsthatmeasureidenticalorhighlysimilarconcepts,concurrentandpredictivevalidityrefertorelationshipsbetweendistinctconceptsthatweexpecttobestrongbasedontheoryandpriorresearch.

ValidityofScaleStructureTheextenttowhichitemsstatisticallygrouptogetherinexpectedwaystoformscales.Asalreadystated,scalesarecomposedofseveralitemsthat,whenaveragedorsummed,createanoverallscoreofaspecificconcept.Determiningwhetherscalesadequatelymeasuretheconceptstheyclaimtomeasurecanbedifficult,thoughconductingafactoranalysisisonehelpfulwaytodoso.Factoranalysisverifiesthatitemsgotogetherthewaysthedevelopersthoughttheywouldbyexaminingwhichitemsaresimilartoeachotherandwhicharedifferent.

Forexample,imagineaninstrumentwithtwoscales:StaffCommunicationStyleandStaffPatience.Next,imaginethatwheneverstaffareratedashavingaharshcommunicationstyletowardchildren,theyarealsoalwaysratedashavinglittlepatiencewithchildren.Becauseoftheirhighsimilarity,wewouldsaythatweareactuallymeasuringoneconcept,nottwo,anditwouldmakemoresensetohaveoneoverallscore(perhapsrenamedStaffAttitudesTowardChildren).

Page 91: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we

©January2009TheForumforYouthInvestment

MeasuringYouthProgramQuality:AGuidetoAssessmentTools,SecondEdition

91

Factoranalysiscanalsohelpdetermineifonescaleactuallyincorporatesmorethanonerelatedconcept.ImaginethatwehaveaninstrumentwithascalecalledHomeworkAssistance,butourfactoranalysisfindsthatweactuallyhavetwoseparateconcepts.WemightdiscoverthatsomeitemsrelatetoTutoringonSpecificSubjectMatterwhereasanothersetrelatestoTeachingStudySkills.Thereasonthatthevalidityofscalestructureisimportantisbecausewewanttoknowexactlywhichconceptsourinstrumentmeasures.

ScoreDistributionThedispersionorspreadofscoresfrommultipleassessmentsforaspecificitemorscale,includingfeaturessuchastheaveragescore,therangeofobservedvaluesandtheirconcentrationaroundparticularpoint(s).Inorderforitemsandscalestobeuseful,theyshouldbeabletodistinguishdifferencesbetweenprogramsonarangeofqualities.Toachievethis,scoresshouldnotbe“bunchedup”onanyparticularplaceonthescale.Forexample,imaginethataparticularinstrumenthasascalecalledPositiveChildBehaviorandusersmustrate,from1to5,howtruestatementslike“Childrenneverstophelpingeachother”and“Childrenthankstaffateveryopportunity”areforalargenumberofprograms.Ifalmosteveryprogramscoredlowforthisparticularscale,wemightarguetheitemsaremakingit“toodifficult”toobtainahighscoreanddonotmeaningfullydistinguishbetweenprogramsonthisdimension.Onesolutionwouldbetorevisetheitemstobetterreflectprogramdifferences.Thetwosampleitemsabovemightberevisedtosay“Childrenhelpeachotherwhenneeded”and“Childrenappreciatehelpfromstaff.”

Severalimportantstatisticshelpresearchersunderstandwhetherscoresarebunchingupontheends,includingtheaveragescore(sometimescalledthemean)andhowspreadoutthescoresare.Forexample,ascaleoritemwouldnotbeveryusefulfordistinguishingbetweenprogramsiftheaveragescoreacrossmanydifferentprogramswasa4.8outofapossible5.0.Inaddition,ascaleoritemmighthaveanaverageof3.5,butitwouldbelessusefulifthescoresonlyrangedbetween3and4insteadofalargerspreadbetween1and5.

Page 92: Measuring Youth Program Quality: A Guide to Assessment ...forumfyi.org/files/MeasuringYouthProgramQuality_2ndEd.pdf · guide their decision-making. Over the last several years, we