THE BEHAVIORAL AND EMOTIONAL SCREENING SYSTEM _ STUDENT FORM AS A
PREDICTOR OF BEHAVIORAL OUTCOMES IN YOUTH
AN ABSTRACT
SUBMITTED ON THE FOURTEENTH DAY OF JULY 2016
TO THE DEPARTMENT OF PSYCHOLOGY
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS
OF THE SCHOOL OF SCIENCE AND ENGINEERING
OF TULANE, I.INIVERSITY
FOR THE DEGREE
OF
DOCTOR OF PHILOSOPHY
hryn M. Jones, M.S, M.A.
APPROVED:
Chair
Constance Patterson. Ph.D.
AN ABSTRACT
Early identification and intervention is key to decreasing the short- and long-term
negative outcomes associated with behavioral and emotional difficulties in youth.
Universal screening in schools has been found to be an effective and proactive means of
identifying youth at-risk for or currently experiencing behavioral and emotional
difficulties (Burke et al., 2012). It is imperative that schools have access to measurement
tools that are capable of making accurate predictions regarding youth outcomes that can
inform tailored prevention and intervention efforts. One such tool is the BASC-2
Behavioral and Emotional Screening System – Student Form (BESS SF; Kamphaus &
Reynolds, 2007). Although support for the predictive validity of the BESS SF overall risk
score was found, examinations of classification accuracy for suspensions and absences
call its effectiveness at predicting negative behavioral outcomes into question as the
BESS SF was a better predictor of which students were not at risk for negative behavioral
outcomes than of which students were at risk for such outcomes. Initial support for the
utility of alternate behavioral outcomes (e.g., Major Discipline Citations, Positive
Behaviors) was found but concerns over the reliability of the teacher-collected outcome
data merit further investigation. Two BESS SF domain-specific factors
(Inattention/Hyperactivity and School Problems) were found to predict behavioral
outcomes, indicating room for improvement in the precision of BESS SF predictions. At
this time, caution is urged regarding the use the BESS SF to identify low income African
American students in need of prevention and intervention efforts until further validation
can be completed.
Key words: universal screening, BESS SF, predictive validity, classification accuracy, domain-specific factors, behavioral outcomes, African American youth
BESS-SF AS A PREDICTOR OF BEHAVIOR
THE BEHAVIORAL AND EMOTIONAL SCREENING SYSTEM _ STUDENT FORM AS A
PREDICTOR OF BEHAVIORAL OUTCOMES IN YOUTH
A DISSERTATION
SUBMITTED ON THE, FOURTE,ENTH DAY OF JULY 2016
TO THE DEPARTMENT OF PSYCHOLOGY
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS
OF THE SCHOOL OF SCIENCE AND ENGINEEzuNG
OF TULANE LINIVERSITY
FOR THE DEGREE
OF
DOCTOR OF PHILOSOPHY
Chair
hryn M. Jones, M.S, M.A.
APPROVED:
Constance Pattbrson, Ph.D.
© Copyright by Kathryn M. Jones, 2016 All Rights Reserved
ii
TABLE OF CONTENTS
LIST OF TABLES iv
Chapter
I. THE BEHAVIORAL AND EMOTIONAL SCREENING SYSTEM – STUDENT FORMASAPREDICTOROFBEHAVIORALOUTCOMESINYOUTH
1
Evidence Supporting the Use of the BESS Student Form as a Universal Screener
4
The BASC-2 Behavioral and Emotional Screening System (BESS) 5
Criterion-Related Validity of BESS SF 7
Factor Structure of the BESS SF 17
Current Study 24
II. METHODS 29
Participants 29
Procedure 29
Measures 30
BESS Student Form 30
Behavioral Outcome Variables 32
Suspensions 32
Absences 32
Specific Behavioral Outcomes 33
4
iii
Major Discipline Citations 34
Minor Discipline Citations 34
Positive Behaviors 34
III. RESULTS 35
Data Screening 35
Descriptive Analyses 36
Aim One: Predictive Validity of the BESS SF Overall Risk Score 37
Aim Two: Classification Accuracy of the BESS SF 38
Absences 39
Suspensions 39
Aim Three: Predictive Utility of the Four-Factor Bifactor Model of the BESS SF
40
Absences 42
Suspensions 42
Minor Discipline Citations 42
Major Discipline Citations 43
Positive Behaviors 43
IV. DISCUSSION 44
Limitations 56
Implications and Future Directions 59
TABLES 71
APPENDIX 80
LIST OF REFERENCES 82
iv
LIST OF TABLES
Table 1. Descriptive Statistics 71
Table 2. Correlations Between Demographic, Predictor, and Outcome Variables 72
Table 3. Prediction of Absences by Overall BESS SF Risk Score 73
Table 4. Prediction of Minor Discipline Citations by Overall BESS SF Risk Score 73
Table 5. Prediction of Number of Suspension by Overall BESS SF Risk Score 73
Table 6. Prediction of Days Suspended by Overall BESS SF Risk Score 74
Table 7. Prediction of Major Discipline Citations by Overall BESS Risk Score 74
Table 8. Prediction of Positive Behaviors by Overall BESS SF Risk Score 74
Table 9. Classification Accuracy Using BESS SF Overall Score 75
Table 10. BESS SF Bifactor Model Standardized Weight Estimates 76
Table 11. Prediction of Absences by the BESS Domain-Specific Factors 77
Table 12. Prediction of Number of Suspensions by the BESS Domain-Specific Factors
77
Table 13. Prediction of Days Suspended by the BESS Domain-Specific Factors 78
Table 14. Prediction of Minor Discipline Citations by the BESS Domain-Specific Factors
78
Table 15. Prediction of Major Discipline Citations by the BESS Domain-Specific Factors
79
Table 16. Prediction of Positive Behaviors by the BESS Domain-Specific Factors 79
1
I. THE BEHAVIORAL AND EMOTIONAL SCREENING SYSTEM – STUDENT
FORM AS A PREDICTOR OF BEHAVIORAL OUTCOMES IN YOUTH
By emphasizing the importance of behavioral and emotional skill development,
schools can help prepare students to function effectively within the larger society, while
giving them tools that strengthen their academic performance at the same time (Zins,
Bloodworth, Weissberg, & Walberg, 2007). One important component of promoting
behavioral and emotional health is effectively and efficiently attending to students who
are at-risk for behavioral and emotional problems or are already experiencing difficulties
(Burke et al., 2012; Walker, Nishioka, Zeller, Severson, & Feil, 2000). As early
identification and intervention is key to decreasing the short- and long-term negative
outcomes associated with behavioral and emotional difficulties in youth, it is important
that schools implement methods that best facilitate timely identification of youth in need
of additional assessment and services.
Traditionally, referrals for social, emotional and behavioral services in schools
have relied on teacher reports and the use of disciplinary records such as office discipline
referrals (ODRs) and suspension data (King & Reschly, 2014; Renshaw et al., 2009;
Walker, Cheney, Stage, & Blum, 2005). One major problem with traditional
identification methods is that they utilize a wait-to-fail approach where students are not
identified as in need of intervention until they are already presenting problem behaviors
(Burke et al., 2012; Schanding & Nowell, 2013; Walker, 2010). Additionally, use of
traditional methods may over-identify those exhibiting overt problem behaviors while
2
missing students struggling with internalizing symptoms (Walker et al., 2005). More
recently, there is a call for increased use of universal screening using standardized tools
designed to proactively identify students in need of additional supports to promote
positive behavioral and emotional functioning (e.g., Glover & Albers, 2007; Walker et
al., 2000; Walker et al., 2005). Universal screening allows educators to take a proactive
approach by identifying youth who are at-risk for behavioral and emotional difficulties as
well as those who may already be experiencing problems but have not come to the
attention of school staff (Albers & Kettler, 2014; Burke et al., 2012; Walker et al., 2000).
In order to promote school use of universal screening, it is imperative that schools be able
to select measurement tools that have been thoroughly evaluated psychometrically, are
appropriate to use with the demographic population of the school, and fit with the
practical needs of the school (e.g., time constraints, administrative ease, cost; Glover &
Albers, 2007; Young, Sabbah, Young, Reiser, & Richardson, 2010).
One such tool is the BASC-2 Behavioral and Emotional Screening System
(BESS; Kamphaus & Reynolds, 2007). The BESS was designed to assess behavioral and
emotional functioning in youth, using selected items from the Behavioral Assessment
Scale for Children, 2nd Edition (BASC-2; Reynolds & Kamphaus, 2004) that reflect the
domains of Inattention/Hyperactivity, Internalizing Problems, School Problems, and
Personal Adjustment. The overall risk score produced by the BESS demonstrates
concurrent and longitudinal relationships with important student outcomes including
increased school disciplinary actions (e.g., office discipline referrals [ODRs] and
suspensions), decreased academic performance, and decreased academic engagement
(e.g., teacher ratings of work effort and cooperation; Chin, Dowdy, & Quirk, 2013;
3
Kamphaus, DiStefano, Dowdy, Eklund, & Dunn, 2010; King & Reschly, 2014; King,
Reschly, & Appleton, 2012; Renshaw et al., 2009).
Although the overall risk score has some utility, it provides limited information
regarding the specific nature of an individual’s behavioral and emotional difficulties that
could guide next steps in assessment and allow for the use of targeted interventions.
Factor analytic studies provide preliminary data to support a hierarchical factor structure
of the BESS (Chen, West, & Sousa, 2006; Naser, Hitti, & Overstreet, 2016; Schanding &
Nowell, 2013; Wiesner & Schanding, 2013), but to date, the predictive utility of those
factors or whether they predict outcomes over and above the general factor, has not been
studied. Therefore, an examination of the unique contributions of lower order factors as
predictors of external criteria is an important next step in the evaluation of the BESS as a
universal screening tool. The current study sought to examine the utility of a hierarchical
factor structure applied to the student report form of the BESS in predicting behavioral
outcomes. Although there is initial support for this type of factor structure (Naser et al.,
2016), there have been no studies examining the predictive ability of the factor scores
over and above the overall risk score produced by the BESS. This study provides an
important initial investigation the utility of domain-specific factors as predictors of
behavioral outcomes.
The current study utilized student reports because students are able to provide
important information related to how they act and emote in different situational contexts
as opposed to teachers, who only interact with students within the school context
(Achenbach, McConaughy, & Howell, 1987; King & Reschly, 2014). Additionally, the
exclusive use of teacher measures may result in less consistent identification of students
4
with internalizing problems than externalizing problems (Achenbach et al., 1987). It is
also possible that ethnic and racial disproportionality in special education services may be
reduced through the use of self-report universal screening tools by removing the
influence of implicit teacher bias on referrals (Raines, Dever, Kamphaus, & Roach,
2012). Although research has shown that valuable information can be gained from the use
of the BESS TF (e.g., Chin et al., 2013; King & Reschly, 2014; Wiesner & Schanding,
2013), the BESS SF may provide unique information about student risk status. It is,
therefore, imperative that the BESS SF be fully validated as a tool for obtaining
information regarding youth behavioral and emotional functioning in order to provide a
complete view of overall personal functioning.
The following literature review provides an overview of the BESS, summarizes
studies examining the predictive ability of the BESS SF overall risk score, and reviews
results from factor analytic studies. Based on those findings, rationale for examining
predictive ability of domain-specific factor scores is presented, followed by a description
of the proposed study.
Evidence Supporting the Use of the BESS Student Form as a Universal Screener
In order to make informed decisions, school personnel need access to information
that allows them to evaluate how universal screening tools fit their individual needs and
contextual considerations, including the relevance of results to guiding identification and
intervention as well as the practicalities of administration and scoring (Feeney-Kettler,
Kratochwill, Kaiser, Hemmeter, & Kettler, 2010; Glover & Albers, 2007). Although the
current research base regarding the BESS is limited, the research that does exist
demonstrates its promise as a universal screening tool. This section will review the
5
characteristics of the BESS before summarizing the currently available research
regarding the predictive validity and factor structure of the BESS Student Form (SF).
The BASC-2 Behavioral and Emotional Screening System (BESS)
The BESS (Kamphaus & Reynolds, 2007) is a brief (5 – 10 minutes), broadband
screener of behavioral and emotional risk for youth populations. There are
complimentary parent, teacher, and student report versions. Informants use a 4-point
Likert scale ranging from 1 (never) through 4 (almost always) to indicate the frequency
that the student experiences different behavioral and emotional problems using items
taken from the parent, teacher, and student versions of the Behavior Assessment System
for Children –Second Edition (BASC-2; Reynolds & Kamphaus, 2004). In order to create
the BESS SF, unrotated principal components analyses (PCAs) were performed on the
items composing the four composite scales of the BASC-2 Self-Report of Personality
(BASC-2 SRP; Internalizing Problems, Inattention/Hyperactivity, Personal Adjustment,
and School Problems). Items chosen for inclusion were those that best represented each
composite scale according to the PCA while also covering the range of content
represented by each composite. The internal consistency of the items representing each
composite was also evaluated and additional items were added until all dimensions
reached a minimum of .80. As a result, the BESS SF is a self-report measure consisting
of 30 items (six Inattention/Hyperactivity, six School Problems, eight Personal
Adjustment, and ten Internalizing Problems).
Despite the presence of items representing four separate constructs, the official
BESS manual does not provide instructions on scoring or interpreting any subscales;
rather, for each version of the BESS, an overall risk score in the form of a T-score is
6
obtained that signifies overall behavioral and emotional risk status (Kamphaus &
Reynolds, 2007). Individuals scoring at or below 60 are classified as normal risk and are
not considered at-risk for behavioral or emotional difficulties. Those scoring between 61
and 70 are considered at elevated risk and those scoring 71 or above are considered at
extremely elevated risk. The severity of an individual’s current risk status serves to guide
next steps in assessment and intervention.
Although the usability and acceptability of the BESS to schools, parents, and
other stakeholders have not been formally assessed, it seems that the BESS meets many
of the standards set by Glover and Albers (2007) regarding the evaluation of universal
screeners. With the limited time and financial resources available in schools to address
mental health concerns, it is very important that universal screeners be able to meet the
needs of a school in a cost and time efficient manner (Glover & Albers, 2007). As a brief
broadband measure, the BESS is designed to provide information regarding overall
behavioral and emotional difficulties and the student version can be administered by
teachers in classroom settings in approximately 5 to 10 minutes (Kamphaus & Reynolds,
2007). There are many benefits associated with the use of this tool, including the
availability of forms for multiple informants allowing for comparisons of risk status
across settings, Spanish-language and audio versions, and availability of electronic
scoring programs. As an efficient provider of risk status information that can be used to
guide further assessment and intervention, the BESS is a strong candidate for an
appropriate and practical universal screener within school settings.
7
Criterion-Related Validity of BESS SF
Another important factor that schools must consider when choosing a universal
screener is the psychometric adequacy of the screener (Glover & Albers, 2007). BESS SF
norms were developed using the BASC-2 SRP normative sample, which was designed to
be representative of the population of the United States (Kamphaus & Reynolds, 2007).
Initial evaluations of the BESS SF conducted during test development found it to exhibit
strong internal consistency and test-retest reliability (Kamphaus & Reynolds, 2007). The
overall BESS SF risk score was strongly correlated in the expected directions with
BASC-2 and the Achenbach System of Empirically Based Assessment – Youth Self
Report (ASEBA YSR; Achenbach & Rescorla, 2001) scales representing internalizing
and externalizing concerns (Kamphaus & Reynolds, 2007). Based on this information,
the BESS SF is both a reliable and valid method of assessing youth behavior and
emotional risk, but continued validation of the BESS SF by outside researchers is needed
to provide further evidence for its psychometric adequacy.
Criterion-related validity is how accurately an assessment tool predicts
performance on another related tool or measure (Glover & Albers, 2007). There are two
types of criterion-related validity, concurrent validity and predictive validity. Concurrent
validity concerns how well an assessment tool predicts current performance on related
outcome measures (Glover & Albers, 2007; Michel, Schultze-Lutter, & Schimmelmann,
2014). Although universal screeners are important as tools of early identification of
behavioral and emotional difficulties that can inform prevention and intervention efforts
employed by schools, it is also important to identify youth who are already engaging in
8
risky behavior or experiencing emotional difficulties so that appropriate interventions can
be instituted in a timely manner.
One investigation of the utility of the BESS SF as a tool for identifying those
currently engaging in problematic behaviors was conducted by Dowdy, Furlong, and
Sharkey (2012). They examined the concurrent validity of the BESS SF with a primarily
Latino (64.8%) sample of 3,331 students (51.5% female, 48.5% male) in eighth, tenth,
and twelfth grades in four school districts in California. Engagement in problematic
behaviors was assessed using the California Healthy Kids Survey (CHKS; California
Department of Education, 2010), which measured the frequency of eight specific
behaviors in the past 30 days (i.e., cigarette use, alcohol use, binge drinking, marijuana
use, and skipping school out of fear) or the past year (i.e., fighting at school, being
injured or threatened with a weapon at school, and contemplation of suicide in past year).
The survey also assessed feelings of chronic sadness (i.e., 2 weeks of sadness in past
year). Scores on the BESS SF and chronic sadness measure served as mental health
indicators and were represented dichotomously (i.e., no sadness reported/sadness reported
and normal/elevated risk) and were used to predict problematic behaviors, which were
also represented dichotomously (i.e., presence or absence of each specific behavior).
Students categorized as at-risk based on the BESS score were more likely than
students not at-risk to engage in all eight problematic behaviors assessed even after
controlling for the presence of chronic sadness (Dowdy et al., 2012). In fact, odds ratios
revealed that at-risk status was a stronger predictor than chronic sadness of seven of the
eight problematic behaviors with contemplation of suicide being the lone exception. At-
risk status was an even stronger predictor when used in combination with chronic
9
sadness; students who were categorized both as at-risk and chronically sad were more
likely to engage in problematic behaviors than youth with either elevated risk or chronic
sadness alone. This study highlights the utility of the BESS SF in the identification of
youth who are already engaging in risky behavior in order to begin intervention services
or make appropriate changes to services already in place. This study could be improved
by examining the utility of the BESS in making predictions of future student outcomes
rather than looking strictly at concurrent behaviors. The addition of outcomes reported by
someone other than the student could improve this study by decreasing the possible
impact of social desirability on responding.
Predictive validity has been identified as particularly important in relation to
universal screening as it concerns how well an assessment tool predicts future
performance (Glover & Albers, 2007; Michel et al., 2014). As the goal of behavioral and
emotional universal screening is to identify youth at risk for future difficulties, it is
important that schools use measures that are proven in their ability to identify youth who
later go on to experience said difficulties as well as correctly identifying those youth who
are not in need of services. Classification accuracy is an important component of
predictive validity and indicates the degree to which the assignment of test takers to
specific categories is accurate and avoids false positives and false negatives. Four
statistical concepts are routinely used to evaluate classification accuracy: sensitivity,
specificity, positive predictive power/value, and negative predictive power/value (Glaros
& Kline, 1988; Glover & Albers, 2007; Hill, Lochman, Coie, Greenberg, & The Conduct
Problems Prevention Research Group, 2004; Levitt, Saka, Romanelli, & Hoagwood,
2007; Streiner, 2003).
10
Sensitivity is “the capacity of an assessment instrument to yield a positive result
for a person with the attribute of interest” (Glaros & Kline, 1988, p. 1014). Specificity is
“the capacity of an assessment instrument to yield a negative result for a person without”
the attribute of interest (Glaros & Kline, 1988, p. 1014). By examining sensitivity and
specificity with respect to a specific behavioral outcome, it is possible to determine how
well the BESS identified youth who did and did not engage in problem behaviors.
Although this is useful in the examination of validity of a measure when outcomes are
clearly known, screening to determine risk status for potential outcomes does not involve
such concrete attributes (Glaros & Kline, 1988). Instead, at the time of the screening what
outcome group members will experience is unknown. Since provision of prevention and
intervention efforts is contingent upon screening outcomes, it is important to determine
the likelihood that youth have been correctly sorted into their respective groups, which
can be assessed by examining positive and negative predictive powers/values (Glaros &
Kline, 1988). Positive predictive power “is the likelihood that a person with a positive
test finding actually has the predicted attribute” (Glaros & Kline, 1988, p. 1016).
Negative predictive power “is the likelihood that a person with a negative test sign does
not” have the predicted attribute of interest (Glaros & Kline, 1988, p. 1016). By
examining these statistics, researchers and schools can examine what proportion of kids
identified as at-risk on the BESS go on to exhibit problem behaviors and what proportion
of kids identified as normal risk do not go on to exhibit problem behaviors.
Due to the high risk of negative outcomes for at-risk youth who are not accurately
identified by screening tools, some authors have proposed that schools may prefer to use
measures that tend to over-identify rather than under-identify at-risk youth (Glover &
11
Albers, 2007; Levitt et al., 2007). Consistent with this purpose, acceptable psychometric
standards for classification accuracy of screening instruments is generally lower than for
diagnostic tests, ranging from 70% through 80% (American Academy of Pediatrics,
2012; Glover & Albers, 2007). Glover and Albers (2007) argue that low positive
predictive power and high sensitivity may be ideal for screening instruments as this
decreases the risk of missing youth who are in need of prevention and intervention,
although it also increases the risk of identifying youth who are not in need. As a screener
meant to be used as part of a comprehensive system rather than a diagnostic tool, the
BESS SF was designed with this standard in mind and is meant to err on the side of over-
identifying youth (Kamphaus & Reynolds, 2007; King et al., 2012). As this may result in
increased need for follow-up assessments and potential misapplication of limited
resources to youth who do not need services, it is important for schools to consider their
ability to deal with the consequences of over-identification before choosing to use the
BESS SF as part of their system of identification (Glovers & Albers, 2007). With these
considerations and when used as part of a comprehensive screening and intervention
system, the BESS SF can be considered a valid tool for screening for at-risk youth with
limited false negatives despite its tendency to over-identify youth who are at-risk of
negative outcomes (Glover & Albers, 2007; Kamphaus & Reynolds, 2007; Levitt et al.,
2007). Even though the importance of validity is widely recognized, studies examining
the predictive validity of the BESS SF are few and far between. The studies that do exist
demonstrate the utility of the BESS SF overall risk score as a predictor of future
behavioral and emotional difficulties although there is room for improvement in its
identification of youth who are in need of services.
12
As part of a larger evaluation of the BESS system as universal screening tools,
King et al. (2012) examined the predictive validity of the BESS SF on student behavioral
outcomes with elementary-aged students. The study was conducted in a rural community
with 207 students attending third through fifth grade at one elementary school. The
authors do not provide specific information on the ethnicity, gender, or socioeconomic
status (SES) for the youth who completed the BESS SF, but the overall sample was
primarily European American (64.7%), gender was evenly split (52.4% female), and
68.3% were from low SES homes as indicated by receiving free or reduced lunch. BESS
data was collected 10 weeks into the school year. Behavior outcome data (e.g., ODRs,
attendance, suspensions) was collected midway through the school year. Academic
performance data was assessed using benchmark measures of oral reading fluency
obtained in November of the school year of administration.
Using Spearman’s rho correlations, BESS SF T-scores were positively correlated
with ODRs and negatively correlated with attendance and oral reading fluency. The
overall risk score on the BESS SF was not significantly correlated with suspensions.
Next, similar to Dowdy et al. (2012), the authors collapsed the elevated and extremely
elevated groups into a single at-risk group, as their goal was to evaluate the utility of the
BESS as a screener to identify students who were at risk of negative outcomes (King et
al., 2012). Nonparametric Independent Samples Kruskal-Wallis Tests were used to
predict behavioral and academic outcomes using BESS SF classification as normal or at-
risk. At-risk youth were found to have significantly higher rates of ODRs and
significantly lower oral reading fluency and attendance than youth who were not at-risk.
13
Suspension rates approached significance with at-risk youth tending to have higher rates
of suspension.
To provide information on classification accuracy, the authors used the presence
of an office disciplinary referral as the indicator of problematic behavior (King et al.,
2012). Of those students who did not demonstrate the problematic behavior by mid-year,
73.184% were identified as being at normal risk at the beginning of the year (specificity).
Similarly, of those students who were identified as being at normal risk at the beginning
of the year, 90.345% did not demonstrate the problematic behavior by mid-year (negative
predictive power). These results indicate that the BESS does a good job of identifying
students who are unlikely to go on to develop problematic behaviors. However, the
classification accuracy of the BESS is not as strong for students who develop problematic
behaviors. For example, of those students who demonstrated problematic behavior at
mid-year, just 57.576% were identified as being at elevated risk at the beginning of the
year (sensitivity). Furthermore, of those students who were identified as being at elevated
risk for problem behaviors, only 28.358% went on to demonstrate problematic behavior
by mid-year (positive predicative power). Despite the fact that the BESS SF predicted
ODRs, attendance, and oral reading fluency in the expected directions, additional analysis
of ODRs as a behavioral outcome indicated that the overall score of the BESS SF was
better at identifying those youth who were not at-risk of negative behaviors than
identifying those who are at risk (King et al., 2012).
In their examination of the BESS SF and TF as predictors of student outcomes,
Chin et al. (2013) expanded on King et al. (2012) by using the BESS SF to predict
outcomes over a full school year. They used a sample of 694 sixth and seventh grade
14
students (age 11 through 14; 46.5% female). Participants were primarily Latino (88.3%)
and attended a single middle school in Southern California. The BESS SF and TF were
administered universally to all students in the fall. Dichotomized behavioral outcomes
(e.g., ODRs, suspensions, unsatisfactory behavioral grades) represent data from the full
academic year. Logistic regressions were used to examine the relationship between BESS
scores and behavior outcomes.
The overall T-score on the BESS SF was found to significantly predict all
behavioral outcomes (Chin et al., 2013). Specifically, as the BESS SF score increased, so
did the likelihood of students receiving one or more ODRs and/or one or more
suspensions during the examined school year. Additionally, higher overall scores were
associated with unsatisfactory behavioral grades reflecting poorer work habits and
cooperation. When divided into risk status groups based on their overall BESS SF risk
score, students in the extremely elevated risk group had the worst outcomes (i.e., ODRs,
suspensions, unsatisfactory behavioral grades), followed by those in the elevated group,
then those in the normal risk group.
Although data on classification accuracy was not specifically provided by Chin et
al. (2013), information provided in the manuscript was used to evaluate classification
accuracy of the BESS SF. The presence of an office discipline referral and total number
of suspensions during the course of the school year were used to indicate the
demonstration of problematic behavior. Similar to the findings reported by King et al.
(2012), the BESS SF showed the best accuracy in classifying students at normal risk for
both outcomes; for ODRs, 89.231% of students who did not demonstrate problematic
behavior were identified as normal risk at the beginning of the year (specificity) and
15
79.700% of students identified as normal risk at the beginning of the year did not
demonstrate problematic behavior (negative predictive power); for suspensions, a
specificity of 85.107% and a negative predictive power of 94.300% were obtained.
Accuracy for at-risk students was not as strong; for ODRs, just 32.107% of students who
demonstrated problematic behavior over the course of the year were identified as being at
elevated risk at the beginning of the year and only 50.018% of students identified as
being at elevated risk at the beginning of the year went on to demonstrate problematic
behaviors; for suspensions, a sensitivity of 32.485% and positive predictive power of
14.252% were obtained. Similar to King et al. (2012), although the BESS SF predicted
ODRs, suspensions, and unsatisfactory behavior grades in the expected directions,
additional analysis of ODRs and suspensions as behavioral outcomes indicated that the
overall score of the BESS SF was better at accurately identifying those youth who were
not at-risk of negative behaviors than identifying those who are at-risk.
Although the studies discussed above demonstrate initial evidence of the BESS
SF as a predictor of behavioral outcomes for youth, the current evidence is less than
definitive. When ODRs served as the behavioral outcome, the BESS SF was actually
better at identifying youth who were not at-risk than youth who were. While this may be
due to characteristics inherent in the BESS itself, classification accuracy has to do with
both the test and the outcome (Glaros & Kline, 1988; Streiner, 2003). If the outcome is
not the most reliable or if the specific cut point is not clinically meaningful, then
classification accuracy can be appear to be worse than if another outcome and/or cut
point were chosen. Therefore, in addition to considering the validity of the BESS SF
itself, we need to examine what we should be using as markers of problem behaviors. It is
16
possible that the tendency of the BESS SF to be better at classifying “normal” youth than
at-risk youth is a remnant of the decision of King et al. (2012) and Chin et al. (2013) to
use the presence of least one office discipline referral as a marker for problem behavior
and, in the case of Chin et al. (2013), the receipt of one or more suspension. A cut point
based on one occurrence of a behavior or an event may be meaningful for a variable like
suspensions, which is indicative of a serious infraction, but may not be meaningful for
variables like ODRs or school absences, which could be associated with less serious risk.
In other words, a present/absent cut point for such variables may be too low to truly be
reflective of problem behavior. Instead, it may be necessary to identify a more “clinically
significant” cut point to indicate the presence of the behavior at a meaningful level. In the
current study, this will be examined with respect to absences. In the state where the
participating school is located, students who miss more than 10 days of school are
considered ineligible for promotion to the next grade (Louisiana Department of
Education, n.d.). Therefore, a cut point of 10 absences was chosen as a “clinically
significant” indicator of absences.
In addition to examining ways to improve classification accuracy, the current
study also expands the literature on predictive validity by examining the association
between the BESS and important future outcomes and strategies for strengthening those
associations. One possible strategy for doing so involves improving the strength of the
BESS SF as a predictor variable. Although findings from Dowdy et al., (2012) suggest
that although BESS SF was a strong predictor of problematic behaviors, it became even
stronger when more specific information about sadness was added to it. This
demonstrates that there is room for improvement in the predictive ability of the BESS
17
overall score. Recent thinking about the hierarchical structure of the BESS could make it
possible to use the BESS itself to refine the predictor variable. Through the addition of
the domain-specific factors, the ability of the BESS SF overall risk score to predict
outcomes may be improved upon. The current study will examine the utility of using
lower order, domain-specific factors to predict experiencing negative behavioral
outcomes to enhance the predictions made using the overall BESS SF risk score.
Factor Structure of the BESS SF
As the BESS SF was specifically designed to reflect the four main composites of
the BASC-2 SRP (Internalizing Problems, Inattention/Hyperactivity, School Problems,
Adaptive Behavior; Kamphaus & Reynolds, 2007), it is possible that the BESS has an
underlying factor structure that reflects these components beyond the unidimensional
factor represented by the overall score. In fact, studies examining the factor structure of
the BESS SF and TF have found evidence supporting the existence of an underlying
factor structure (Dowdy et al., 2011; Harrell-Williams, Raines, Kamphaus, & Dever,
2015; Naser et al., 2016, Wiesner & Schanding, 2013). These studies have used a variety
of statistical methodologies to investigate the factor structure of the BESS, including
nonhierarchical factor analytic techniques and more complex methodologies designed to
explore hierarchical structures.
Using a combination of exploratory factor analysis and confirmatory factor
analyses, Dowdy et al. (2011) tested the factor structure of the BESS SF using three
different samples, two from the BASC-2 norming sample (N = 994 and N = 1,466, both
representative of U.S. population, ages 6 through 11) and an independent verification
sample (N = 273, 81.4% Latino, ages 7 through 12). An exploratory factor analysis
18
(EFA) was completed with the first sample, investigating model solutions with one
through six factors. A unidimensional model was identified and found to have adequate
factor loadings, however, analyses most strongly supported a four-factor solution
consistent with the BASC-2 scales from which items were drawn (Personal Adjustment,
Inattention/Hyperactivity, Internalizing Problems, School Problems). In order to achieve
best fit, three items (9, 11, and 22) were removed, resulting in a final model based on 27
rather than 30 BESS SF items. Next, two confirmatory factor analyses (CFAs) were
conducted, examining the fit of the four-factor solution using the second sample from the
norming group and the independent verification sample. Both CFAs supported the
previously identified four-factor structure, with a range in goodness of fit from acceptable
to good in all three analyses. Internal consistency for each factor was found to be
acceptable. Dowdy et al. (2011) concluded that the BESS successfully measures the
constructs it was designed to assess. As many of these analyses were in the acceptable
range, additional research is warranted, especially research that employs independent
samples (Dowdy et al., 2011).
Answering the call for continuing validation studies of the factor structure of the
BESS SF, Harrell-Williams et al. (2015) used CFAs to evaluate the four-factor solution
previously found by Dowdy and colleagues with three high school samples, one from
southern California (N = 1,688, 94% Latino), one from central Georgia (N = 1,857,
72.8% African American), and one subsample of the BESS national norming sample (N
= 1,261, representative of U.S. population). The authors tested a one-factor model and a
four-factor model using all 30 BESS SF items, unlike the final model identified by
Dowdy et al. (2011) that excluded three items. Harrell-Williams et al. (2015) chose not to
19
remove these items after an item analysis conducted using all three samples combined did
not find that it was warranted. Using chi-square difference tests, the four-factor solution
was determined to fit the BESS SF better than the unidimensional model for all three
samples. Consistent with Dowdy et al. (2011), these four factors aligned with the original
BASC-2 SRP composite scales from which the BESS SF items were drawn; Internalizing
Problems, Hyperactivity/Inattention, School Problems, and Personal Adjustment.
However, only the Internalizing Problems factor demonstrated adequate expected a
posteriori over person variance (EAP/PV) reliability estimates across all three samples
using the .80 threshold recommended by Lance, Butts, and Michels (2006).
In response to these findings, Harrell-Williams et al. (2015) endorsed the
continued use of the overall BESS SF score for universal screening rather than the
individual factor scores due to concerns about the reliability and usefulness of the factors.
The authors are concerned that by relying on the predictive ability of the individual
factors rather than the overall risk score, schools may overly narrow their screening to
look at specific concerns. As a result, they worried that schools may be less likely to
identify all at-risk youth and/or use interventions that are too limited in scope based on
the “diagnosis” provided by the BESS SF (Harrell-Williams et al., 2015). Given the
preliminary nature of the findings and lack of guidance for schools to switch to a new
method of using the BESS SF for screening, it is reasonable and appropriate to suggest
that practitioners continue to use the BESS SF overall risk score to guide prevention and
intervention decisions.
However, researchers should view these findings as a way to expand our
understanding of the BESS and explore whether domain-specific factors can provide
20
more subtle information that could help guide further assessment and intervention. Rather
than dismissing the possibility that factors may provide additional useful information in
the identification of at-risk youth and treating these factor reliabilities as evidence that
factor scores should not be utilized to predict behavior, it is imperative that researchers
actually test the ability of the factors to predict student outcomes. It is possible that
predicting outcomes using factor scores could serve to refine identification and
intervention procedures as part of a comprehensive system. In order to facilitate this
process, further research should be conducted examining the utility of domain-specific
factors as predictors of student outcomes, which may be useful in guiding interventions
and further assessment as well as choosing outcomes for progress monitoring.
Furthermore, the strength of the Internalizing Problems factor, an area that is often under-
identified by traditional methods of screening, merits additional investigation.
Researchers must investigate whether or not alternate factor structures, such as
hierarchical and bifactor models, can produce more reliable factor structures.
When researchers and practitioners are interested in both a general construct as
well as domain-specific constructs, bifactor models may provide a better structural
approach than non-hierarchical or unidimensional models (Brunner, Nagy, & Wilhelm,
2012; Chen et al., 2006; Naser et al., 2016; Reise, 2012). Bifactor models allow modeling
of constructs that are thought to be both hierarchical and multifaceted. These models
consist of a general factor explaining common variance between all items as well as
orthogonal lower order, domain-specific factors that include items with more
conceptually specific content clusters (Brunner et al., 2012; Chen et al., 2006; Reise,
2012). This combination is ideal for psychological measurement tools designed to assess
21
psychological constructs that are theoretically multidimensional, like the behavioral and
emotional risk construct assessed by the BESS SF (Brunner et al., 2012; Chen et al.,
2006; Naser et al., 2016; Reise, 2012, Wiesner & Schanding, 2013). By utilizing a
bifactor model to examine the BESS SF, it is possible to separate out the common
variance accounted for by the overall factor from the unique variance associated with
each specific factor (Chen et al., 2006; Reise, 2012; Wiesner & Schanding, 2013).
Theoretically, this allows more specific predictions of student outcomes to be made
through the examination of the predictive validity of the factors controlling for the effects
of the other factors.
As universal screening of behavior in youth aims to use a single tool to identify
risk associated with related but unique areas of behavioral and emotional functioning,
bifactor modeling is a theoretically appealing option (Chen et al., 2006; Naser et al.,
2016; Wiesner & Schanding, 2013). In fact, bifactor models have been used to model
another commonly utilized universal screening tool for youth behavioral and emotional
functioning, the Strengths and Difficulties Questionnaire (SDQ; Goodman, 1997). Using
a nonclinical population of Hungarian youth ages 8 through 13, Kóbor, Takács, and
Urbán (2013) found that a five-factor bifactor model yielded excellent fit statistics. The
model consisted of a higher order factor, labeled “General Problems”, that reflected
overall behavioral functioning, and five lower order factors that generally corresponded
with the five scales of the SDQ (i.e., Emotional Symptoms, Behavioral Problems,
Hyperactivity, Peer Problems, and Prosocial). By examining the factor scores in addition
to the overall score on the SDQ, the authors believe that a stronger picture of the
22
behavioral and emotional strengths and weaknesses of individual children can be
developed than by relying on the overall score alone.
At this time, the application of bifactor modeling to the BESS has been limited to
two studies. Wiesner and Schanding (2013) explored non-hierarchical unidimensional
and multidimensional models, as well as hierarchical multidimensional models, to
identify the most appropriate and meaningful structure of the BESS TF. A total of 1,885
first through fifth grade students attending a suburban Southern school district were
screened using the BESS TF. CFAs testing a unidimensional model, a four-factor model,
and a higher order CFA model were not found to fit the data adequately. In contrast, two
other measurement models did adequately fit the data, one of which was a bifactor model
with three factors. This model consisted of a general factor, Maladaptive Problems, and
three domain-specific factors: Internalizing Problems, Externalizing Problems, and
combined Low Adaptive Skills/School Problems. Although a non-hierarchical four-factor
model also fit the data adequately, the authors deemed the bifactor model as the best fit
due to the inclusion of a global factor accounting for substantial covariation among
screener items that corresponds with the overall T-score obtained by the BESS TF, as
well as the existence of the orthogonal (i.e., uncorrelated) specific factors that represent
distinct concepts being measured by the BESS separate from the overall risk factor. It is
imperative that researchers examine the overall and specific factors as predictors of youth
behavioral outcomes in order to assess their relationship and usefulness in identifying at-
risk youth within the screening process.
These results indicate that traditional CFA may not be the best statistical approach
to examine the factor structure of the BESS TF (and, by extension, the BESS SF;
23
Wiesner & Schanding, 2013). Instead, additional factor analyses should be completed
beyond EFAs and CFAs, including a wider variety of statistical techniques. Specifically,
bifactor models are a theoretically ideal fit to psychological assessment tools such as the
BESS as they show support for an overall factor that can be used for identification of all
at-risk youth as well as providing more specific information regarding the particular areas
in which individual children may be in most need of intervention through the existence of
domain-specific factors (Chen et al., 2006; Naser et al., 2016; Wiesner & Schanding,
2013).
At this time, only one known study has been conducted applying bifactor
modeling to the BESS SF. Naser et al. (2016) utilized BESS SF scores obtained at the
beginning of the academic year from 893 African American fourth through eighth grade
students attending two urban public schools in a Southeastern state. The authors tested
three model types: unidimensional, multidimensional, and hierarchical, multidimensional.
Each of the multidimensional models utilized a factor structure based upon the four
BASC-2 SRP composite scales (Inattention/Hyperactivity, School Problems, Personal
Adjustment, and Internalizing Problems) from which the items were originally drawn.
The unidimensional model demonstrated a poor fit, while the fit of the nonhierarchical
multidimensional model was acceptable. However, the bifactor model representing a
hierarchical, multidimensional structure had the best fit out of the tested models. This
model consisted of an overall, general factor, which corresponds with the overall risk
score, and four orthogonal domain-specific factors, which correspond with the BASC
SRP scales from which the BESS SF items were drawn. These findings extend the
previous research on the factor structure of the BESS SF by applying advanced statistical
24
models that are more theoretically appropriate to evaluating the factor structure of
universal screeners than unidimensional and nonhierarchical multidimensional models
(Brunner et al., 2012; Chen et al., 2006; Naser et al., 2016; Reise, 2012; Wiesner &
Schanding, 2013). The next step is to look at the predictive validity of the overall and
domain-specific factors in order to determine their utility in predicting behavioral and
emotional difficulties in youth. In fact, behavioral predictions made using domain-
specific factor scores may provide more insight into the type of behavioral outcomes that
students with different risk presentations are vulnerable for, facilitating the selection of
appropriate prevention and intervention efforts.
Current Study
The BESS SF has proven to be a strong candidate for universal screening in
schools, allowing for the identification of youth in need of intervention to proactively
work towards prevention of negative behavioral outcomes. However, we need more
information on classification accuracy, especially within African American populations.
The majority of the previous research regarding the BESS SF has been conducted with
Latino and European American samples. Although a differential item function analysis
conducted by Harrell-Williams et al. (2015) did not find evidence of measurement bias
based on ethnicity, socioeconomic status, or language proficiency, a lack of differential
item functioning does not rule out differential predictive validity for different groups of
students (c.f., Helms, 1992; 2006). Therefore, it is imperative that the BESS SF be fully
investigated with diverse populations as the decisions made based on it can greatly
impact youth short- and long-term outcomes. Additional studies examining the predictive
validity and classification accuracy of the BESS SF with African American populations
25
are greatly needed in order to determine its appropriateness as a screener with this
population. By using a similar sample to Naser et al. (2016), this study continues to
expand the research base by examining the predictive validity of the BESS SF for a
sample of African American youth.
The first aim of the current study was to examine the predictive validity of the
BESS SF overall risk score and behavioral outcomes via longitudinal associations. The
current study sought to replicate past studies by utilizing commonly used variables.
Consistent with the work of King et al. (2012) and Chin et al. (2013), this study examined
the ability of the BESS SF to predict student suspensions and absences. Based on prior
findings, it was hypothesized that students demonstrating higher risk scores on the BESS
would exhibit higher rates of absences and suspensions than those with lower risk scores.
In order to extend past research, the current study also examined the ability of the
BESS SF to predict student behavior based on citations for minor and major discipline
violations (Educational and Community Supports, 2016; Gion, McIntosh, & Horner,
2014) as well as citations for positive behavior. Prior research has only examined total
number of discipline referrals, which tells us nothing about the severity of the behavior
the child demonstrated. Depending on school policies, which can vary a great deal, an
ODR could be for anything from a uniform violation to fighting. The current study
sought to use more specific behavioral outcomes by examining the severity of student
behaviors and including both positive and negative behaviors. To do so, data gathered as
part of the school-wide positive behavioral intervention and supports system (SWPBIS)
was utilized. SWPBIS is intended to promote positive student outcomes and decrease
behavior problems through the emphasis of reinforcement for positive behaviors instead
26
of punishment for inappropriate behaviors (e.g., Bear, 2008; Positive Behavioral
Interventions and Supports [PBIS], 2016). Teachers recorded instances of positive
behaviors and violations of school rules electronically using Kickboard (Kickboard,
2016) with students receiving positive or negative points for specified behaviors (e.g.,
High Academic Achievement = positive points, Off Task = negative points). Students
were then able to “purchase” rewards with their earned points. Using this data had two
important advantages: 1) it was already being gathered resulting in no additional burden
to teachers and 2) it is clinically meaningful to the school as the behaviors represent
school-endorsed rules and values.
One way to group behavioral citations into meaningful categories is based upon
the severity of the demonstrated behavior (Educational and Community Supports, 2016;
Gion et al., 2014). Behavioral violations can be considered Major Discipline Citations
(Bullying/Taunting, Stealing, Lying, Willful Disobedience) or Minor Discipline Citations
(Off Task, Not Following Directions, Gossiping/Ribbing), with Major offenses being
generally consistent with behaviors for which students can be suspended according to the
statutes of the state in which the study was conducted (Child Trends & EMT Associates,
Inc., 2016). By using more specific behavior outcomes than simple counts of ODRs, the
BESS SF may gain more predictive power. It was hypothesized that students with higher
risk scores on the BESS SF would receive more Major and Minor Discipline Citations
than those with lower risk scores. As Major Discipline Citations are granted for more
extreme, less common behaviors than Minor Discipline Citations, Major Discipline
Citations are likely a more clinically meaningful indicator of negative behavioral
27
outcomes. Therefore, it was also predicted that the BESS SF will be a better predictor of
Major Discipline Citations than Minor Discipline Citations.
In addition to improving our understanding of problematic student behavior, it is
beneficial to examine the ability of the BESS SF to predict positive student behaviors
(Kaufman et al., 2010). As students are expected to engage in positive behaviors such as
achieving academically, participating in class, and being a leader, failing to exhibit
positive behaviors may represent another way to conceptualize risk. In these cases, youth
with lower BESS SF scores might engage in more positive behaviors while those with
elevated risk may engage in fewer positive behaviors. It was hypothesized that higher risk
scores on the BESS SF will be associated with lower rates of citations for Positive
Behavior than higher risk scores.
The second aim of the study is to examine classification accuracy by calculating
the sensitivity, specificity, positive predictive power, and negative predictive power of
the BESS SF. This study sought to utilize more “clinically significant” outcomes than
past studies by using cut points to indicate severity of problem behavior. As suspensions
are indicative of engaging in serious problematic behavior, a cut point of one was
selected for suspensions, which is consistent with prior research (Chin et al., 2013). For
absences, a cut point of 10 was selected as that is the point at which students become
ineligible for promotion to the next grade in the state in which the study was conducted
(Louisiana Department of Education, n.d.). It was hypothesized that measures of
classification accuracy would demonstrate improvements upon those obtained by King et
al. (2012) and Chin et al. (2013) due to the use of more clinically significant cut points.
Despite this improvement, the use of risk group status based on the overall risk T-score is
28
predicted to produce sensitivity and positive predictive power scores below acceptable
limits (e.g., 70%) due to problems with the lack of precision inherent in the overall score
on the BESS SF.
The third aim of the proposed study was to investigate the predictive utility of the
four-factor bifactor model developed by Naser et al. (2016). Although past research
found that the BESS SF overall risk score performed better at identifying youth who are
not at risk of poor behavioral outcomes than those who are (Chin et al., 2013; King et al.,
2012), it is possible that its predictive abilities may be enhanced through the use of its
underlying factor structure. The current study sought to examine the predictive
relationship between the domain-specific factors of the BESS SF (Internalizing Problems,
Inattention/Hyperactivity, School Problems, and Personal Adjustment) and student
outcomes over and above the general risk score produced by the BESS. It was
hypothesized that the BESS SF factors would predict behavioral outcomes above and
beyond what is predicted by the overall BESS SF score.
29
II. METHODS
Participants
Participants for this study were drawn from archival data collected at an urban
public charter school in a Southeastern state during the 2013 – 2014 academic year. Out
of the 447 students enrolled across kindergarten through eighth grade that school year,
97.3% identified as African American, 1.6% as Latino/Hispanic, 0.4% as Caucasian,
0.4% as Hawaiian/Pacific Islander, and 0.2% as Multi-Racial (New Orleans Parents’
Guide, 2014). The majority of the student body (94.2%) qualified for free or reduced
lunch. All students in fourth through eighth grade completed the BESS SF as part of the
schools’ fall universal screening data collection. A total of 230 (92% response rate)
students completed the BESS SF. After removing participants with missing data (see
below for more information), 220 students were included in the final sample (52.273%
female, age 8 – 15, M = 11.430, SD = 1.666). Students from fourth through eighth grade
were equally represented in the sample (20.909% in fourth grade, 17.273% in fifth,
19.091% in sixth, 20.909% in seventh, and 21.818% in eighth).
Procedure
The participating school provided archival data for use in this study. As the data
were de-identified and there was minimal risk to participants, the Institutional Review
Board of the sponsoring university deemed this study exempt from human subjects
review. Ethical standards endorsed by the American Psychological Association guided
the collection and handling of obtained data.
30
All students in fourth through eighth grade completed the BESS SF as part of the
school’s fall universal screening program in October 2013. Passive consent procedures
were utilized. Administration of the BESS SF occurred approximately two months into
the school year. Within one week after the initial administration, school representatives
attempted to administer the BESS SF to all students who were absent on the day of initial
administration; those who were unable to complete the measure within this timeframe
were not included in this study.
Students completed the measure in a group format during their enrichment period;
surveys were read aloud and students followed along. Student responses were recorded
on BESS SF Scantron forms, which were reviewed for completion and readability. These
forms were scored using BESS software, which produced an overall BESS SF risk score
for each student and scores for each individual item. Archival student records including
suspensions, absences, and student behavioral outcomes were provided at the conclusion
of the 2013 – 2014 school year.
Measures
BESS Student Form. The BESS SF (Kamphaus & Reynolds, 2007) is a 30-item
broadband screener of behavioral and emotional risk for youth populations. Items
representing four composite scales (six Inattention/Hyperactivity, six School Problems,
eight Personal Adjustment, and ten Internalizing Problems) were taken from the BASC-2
SRP to create the BESS SF (Reynolds & Kamphaus, 2004). Students respond to items on
a 4-point Likert scale ranging from 1 (never) through 4 (almost always) to indicate the
frequency of different behaviors and emotions. An overall risk score in the form of a T-
score is obtained that signifies overall behavioral and emotional risk status. Individuals
31
scoring at or below 60 are classified as being at “normal risk” for behavioral or emotional
difficulties. Those scoring between 61 and 70 are considered at elevated risk and those
scoring 71 or above are considered at extremely elevated risk.
Initial evaluations of the BESS SF conducted during test development found
strong internal consistency (split-half reliability = .90 - .93) and test-retest reliability (.80;
Kamphaus & Reynolds, 2007). The overall BESS SF risk score was strongly correlated in
the expected directions with BASC-2 and the Achenbach System of Empirically Based
Assessment – Youth Self Report (ASEBA YSR; Achenbach & Rescorla, 2001) scales
representing internalizing and externalizing concerns (Kamphaus & Reynolds, 2007).
Analysis revealed strong internal consistency (� = .878) of the BESS SF for the
current sample. Risk status groups (e.g., Normal, Elevated, and Extremely Elevated) were
determined based on T-scores as recommended by Kamphaus and Reynolds (2007), with
185 participants being classified as normal risk (84.091%), 28 classified as elevated
(12.727%), and 7 classified as extremely elevated (3.182%). The elevated and extremely
elevated categories were collapsed into one “At-Risk” group (n = 35; 15.909% of the
sample) following the procedure used by Dowdy et al. (2012) and King et al. (2012).
To investigate the underlying factor structure of the BESS –SF, four domain-
specific factors were constructed that align with the BASC-2 composites from which the
items were drawn (Internalizing Problems, Inattention/Hyperactivity, School Problems,
Personal Adjustment; Kamphaus & Reynolds, 2007). These factors also align with the
structure identified by Naser et al. (2016). Specific factors were computed by summing
the items representing each composite. All factors demonstrated acceptable internal
consistency: Personal Adjustment (� = .744; possible score range = 8 – 32),
32
Inattention/Hyperactivity (� = .705; possible score range = 6 – 24), Internalizing
Problems (� = .816; possible score range = 10 – 40), School Problems (� = .811;
possible score range = 6 – 24). T-scores were computed individually for each factor for
use in regressions analyses. Please see the discussion of Aim Three in the Results section
for additional discussion of the factor development process.
Behavioral outcome variables. Behavioral outcome variables included
suspensions, absences, and behavioral indicators derived from the school-wide positive
behavioral intervention and supports system (SWPBIS) employed by the study school. As
the goal of this study was to predict longitudinal behavior outcomes, only outcome data
representing quarters two through four were used in analyses, thereby restricting
behavioral outcomes to those occurring after BESS SF administration.
Suspensions. Suspensions reflect out-of-school suspensions. They were utilized
in three ways: two continuous variables were computed to represent total number of
suspensions (Number of Suspensions) and total days of suspensions (Days Suspended); a
dichotomous variable was created to represent students with no suspensions and students
with one or more.
Absences. Absences were utilized in two ways: a continuous variable was
computed to represent total number of absences and a dichotomous variable was
computed to represent students with excessive absences (10+) and students with non-
excessive absences (less than 10). Students who miss more than 10 days of school are
considered ineligible for promotion to the next grade (Louisiana Department of
Education, n.d.).
33
Specific Behavioral Outcomes. Specific behavioral outcomes were derived from
the Kickboard (Kickboard, 2016) electronic recording system as part of the school-wide
positive behavioral intervention and supports system (SWPBIS). Kickboard allows
teachers and administrators to instantly award students points for positive behavior and
subtract points for inappropriate behavior. The system tracks students’ total points,
allowing students to earn rewards over time with the intention of improving overall
student behavior.
Teachers recorded student behavior on Kickboard each day. A total of 57
behaviors were tracked by the study school, representing both appropriate (e.g., Doing
More Than Asked, High Academic Achievement) and inappropriate behaviors (e.g.,
Causing Distractions/Disturbances, No Homework, Throwing). The 57 behaviors were
categorized into Positive Behavior as well as Minor and Major Discipline Citations based
on the classification system that has been incorporated into the School-Wide Information
System Suite (SWIS), an electronic system designed by Horner and colleagues at the
University of Oregon to assist schools in implementing positive behavioral interventions
(see Appendix for categorizations; Educational and Community Supports, 2016). Major
Discipline Citations (e.g., Bullying/Taunting, Stealing, Lying, Willful Disobedience)
were generally consistent with suspendable offenses according to the statutes of the state
in which the study was conducted (Child Trends & EMT Associates, Inc., 2016;
Educational and Community Supports, 2016; Gion et al., 2014). Minor Discipline
Citations (e.g., Off Task, Not Following Directions, Gossiping/Ribbing) generally related
to disrespect, disruption, and dress code violations.
34
Behaviors tracked on Kickboard were examined for fit with Major versus Minor
categories and a newly created category for Positive Behavior. Two variables were
removed because they did not represent student behavior (Parental Involvement, Signed
Paycheck), one variable because it was not defined (LTS), and one because the school did
not start tracking this variable until the fourth quarter (Kindness, Empathy, and Respect).
The three behavior categories are described below. Descriptions do not include internal
consistency estimates because these behaviors represent discrete events that are not
assessing a specific construct (e.g., Gray, Litz, Hsu, & Lombardo, 2004; Netland, 2001).
Major Discipline Citations. Major Discipline Citations includes 13 behaviors that
demonstrate severe behavioral violations (e.g., Bullying/Taunting, Stealing, Lying,
Willful Disobedience). Instances of the 13 behaviors during quarters two through four
were summed to represent frequency of Major Discipline Citations; total scores were
converted to z-scores for use in analyses.
Minor Discipline Citations. Minor Discipline Citations includes 29 behaviors that
demonstrate minor behavioral violations (e. g., Off Task, Not Following Directions,
Gossiping/Ribbing). Instances of the 29 behaviors during quarters two through four were
summed to represent frequency of Minor Discipline Citations; total scores were
converted to z-scores for use in analyses.
Positive Behaviors. Positive Behaviors included 11 behaviors representing
positive school values (e.g., High Academic Achievement, Exemplary Effort, High
Enthusiasm). Instances of the 11 behaviors during quarters two through four were
summed to represent frequency of Positive Behaviors; total scores were converted to z-
scores for use in analyses.
35
III. RESULTS
Data Screening
Data screening was completed prior to statistical analyses in order to assess the
overall accuracy of the data and results (Tabachnick & Fidell, 2007). In terms of missing
data, screening revealed that ten students lacked attendance data (4.348%), and eight of
these students also lacked at least one quarter of Kickboard data, suggesting that they did
not complete the school year at the study school. As this represented less than 5% of
cases, a decision was made to remove these cases from the sample (Tabachnick & Fidell,
2007). An examination of the BESS data revealed that 20 students (9.091%) were
missing at least one item score, with eight of those cases missing two item scores; there
was no obvious pattern to the missing data. The BESS scoring system was used to
compute overall T-scores and risk group status prior to estimation of missing data as the
software is capable of doing so with up to two missing scores (Kamphaus & Reynolds,
2007). To compute factor scores, mean substitution for each missing BESS item was
chosen to estimate the missing data (Tabachnick & Fidell, 2007), allowing for the
calculation of domain-specific factor scores with no missing items.
Outliers (e.g., values over three standard deviations from mean) were identified
for composite variables (e.g., BESS scores, behavioral outcomes). Outlier cases were
maintained in the data set through the use of winsorizing procedures (Field, 2013).
Specifically, values over three standard deviations from the mean were replaced with the
next highest obtained value. After two rounds of winsorizing, no variables exhibited
36
scores farther than three standard deviations from the mean. Despite these efforts, there
continued to be some evidence that some cases were demonstrating stronger than
expected influence over the predictive values (e.g., high leverage, high Mahalnobis
distances; Field, 2013). As the goal of this study is to investigate the predictions of
students who are at-risk for behavioral outcomes, including those with high levels of risk
and extreme behavioral outcomes, the decision was made to not delete any further
participants from the data set.
Descriptive Analyses
Means, standard deviations, and the observed range for continuous variables are
presented in Table 1. Although several variables demonstrated significant skew and/or
kurtosis (p < .05), the impact of skewness and kurtosis is reduced in sample sizes over
200 (Tabachnick & Fidell, 2007) and therefore, no corrections were made.
During quarters two through four, participants were absent on average 7.709 days
(SD = 5.639) with 32.727% of students being reported as absent from school 10 days or
more in this time period. In comparison, 15.9% of fifth through eighth grade students in
public schools within the city were reported to have missed 10 or more days of school
during the 2013 – 2014 school year (Sims & Vaughn, 2014). Participants averaged 0.982
suspensions (SD = 1.567) for an average of 1.864 days total (SD = 3.136). Out of all
participants, 39.545% were suspended on at least one occasion during this time period.
On average, schools within the city suspended 9.998% of students within the full school
year (New Orleans Parents’ Guide, 2015).
Correlation analyses were completed in order to assess the relationship between
demographic variables, predictors, and outcome variables (see Table 2). With respect to
37
demographic variables, gender was significantly correlated with Minor Discipline
Citations, Major Discipline Citations, and Positive Behaviors. Age was significantly
positively correlated with School Problems, Absences, Number of Suspensions, Days
Suspended, Minor Discipline Citations, Major Discipline Citations, and Positive
Behaviors. Based on these analyses and Kaufman et al. (2010)’s findings that gender and
age are strongly associated with school disciplinary experiences, gender and age will be
controlled for in regression analyses.
Aim One: Predictive Validity of the BESS SF Overall Risk Score
Two types of analyses were conducted to assess the relationship between the
BESS SF overall risk score and behavioral outcomes. First, correlation analyses were
completed to assess associations between the BESS SF overall risk score and outcome
variables. Next, the predictive ability of the BESS SF overall risk score was assessed
through the use of linear regressions, controlling for gender and age.
The BESS overall risk score was positively correlated with Number of
Suspensions and Days Suspended and negatively correlated with Positive Behaviors (see
Table 2). No significant association was found between the BESS overall risk score and
Absences, Minor Discipline Citations, or Major Discipline Citations.
Next, linear regressions were completed to assess the predictive power of the
BESS SF overall risk score. Due to past findings about the influence of age and gender
on behavioral outcomes in schools (Kaufman et al., 2010) and their significant correlation
with many of the outcome variables (see Table 2), age and gender were controlled for in
all linear regressions. See Tables 3 through 8 for detailed results.
38
Consistent with correlational results, the BESS SF overall risk score was not
found to significantly predict Absences (F [3, 216] = 2.422, p > .05; see Table 3) or
Minor Disciplinary Citations (F [3, 216] = 23.718, p < .001; β = .097, p > .05; see Table
4) after controlling for age and gender. However, the BESS SF overall risk score was
found to be a significant predictor of Number of Suspensions (F [3, 216]) = 14.893, p <
.001; see Table 5), Days Suspended (F [3, 216] = 13.525, p < .001; see Table 6), Major
Discipline (F [3, 216] = 18.602, p < .001; see Table 7), and Positive Behaviors (F [3,
216] = 6.814, p < .001; see Table 8) after controlling for age and gender. As the overall
risk score on the BESS SF increases, participants were suspended more frequently and
for longer periods of time during quarters two through four, providing support for the
hypothesis that the BESS SF overall risk score predicts suspensions. The hypotheses that
the overall risk score would predict Absences and Minor Behavior Citations were not
supported.
Aim Two: Classification Accuracy of the BESS SF
To assess the ability of the BESS SF to predict whether or not students will
exhibit problematic behavioral outcomes, indices of classification accuracy were
calculated using the two risk groups (Normal and Elevated) based on the BESS SF
overall risk score as the predictor variable and the dichotomized Absences and
Suspensions variables as outcome variables (see Table 9). Logistic regressions were also
conducted to further examine the ability of the BESS SF overall risk score to predict
whether or not participants exhibited excessive absences or suspensions, supplementing
the classification accuracy analyses, which require the use of a categorical rather than a
continuous predictor. Age and gender were controlled for in both logistic regressions.
39
Absences. For Absences, 82.432% of participants who did not demonstrate
excessive absences were identified as being at normal risk at the beginning of the year
(specificity). Similarly, 65.946% of those students who were identified as being at normal
risk at the beginning of the year did not demonstrate excessive absences (negative
predictive power). In contrast, only 12.500% of those students who demonstrated
excessive absences were identified as being at elevated risk at the beginning of the year
(sensitivity). Furthermore, only 25.714% of those students who were identified as being
at elevated risk went on to demonstrate excessive absences (positive predicative power).
Similar to previous studies (Chin et al., 2013; King et al., 2012), the BESS SF was better
at predicting students who did not demonstrate negative outcomes than predicting those
who did demonstrate negative outcomes. The use of excessive versus non-excessive
absences as an indicator of negative behavioral outcomes did not demonstrate a
meaningful improvement in the classification accuracy of the BESS SF.
Logistic regression revealed that the Overall BESS SF risk score did not
significantly predict group membership based on Absences (χ2 = 5.739, p > .05) when
controlling for age and gender. For students with non-excessive absence rates, 99.324%
were correctly classified in contrast to only 4.167% of those with excessive absences,
representing an overall classification accuracy of 68.182%. This is consistent with
classification accuracy statistics in previous research and the current study, which found
that the BESS SF is better at predicting who will not have problematic outcomes than
who will have problematic outcomes.
Suspensions. For Suspensions, 85.714% of participants who were not suspended
were identified as being at normal risk at the beginning of the year (specificity).
40
Similarly, 61.622% of those students who were identified as being at normal risk at the
beginning of the year were not suspended (negative predictive power). In contrast, just
18.391% of those students who were suspended were identified as being at elevated risk
at the beginning of the year (sensitivity). Furthermore, only 45.714% of those students
who were identified as being at elevated risk went on to be suspended (positive
predicative power). The BESS SF was better at predicting students who did not
demonstrate negative outcomes than predicting those who did demonstrate negative
outcomes.
Using logistic regression the Overall BESS SF risk score was able to significantly
predict group membership based on Suspensions (χ2 = 29.736, p < .001, R2 = .171
[Nagelkerke]) when controlling for age and gender. Students with higher overall risk
scores were more likely to have been suspended than those with lower overall risk scores
(β = .035, Wald < .05). The classification table revealed that although overall risk
predicted those who received zero suspensions at a rate of 86.466%, only 42.529% of
those who were suspended were correctly classified. Overall, this model accurately
predicted group membership for 69.091% of participants. Although the overall BESS SF
risk score significantly predicted group membership for suspensions, supporting the
hypothesis regarding the classification accuracy of the BESS, further analysis revealed
that the BESS SF is better at predicting who will not be suspended than students who will
be suspended, consistent with the results found above.
Aim Three: Predictive Utility of the Four-Factor Bifactor Model of the BESS SF
A confirmatory factor analysis was run using AMOS 18 in order to assess the fit
of the bifactor model of the BESS SF identified by Naser et al. (2016) to the current data
41
set. This was done prior to the exclusion of 10 cases that were missing absence and
suspension data (see above), resulting in the inclusion of 230 rather than 220 participants.
This model consisted of an overall factor and four orthogonal domain-specific factors
consistent with the BASC-2 composites (Internalizing Problems,
Inattention/Hyperactivity, School Problems, Personal Adjustment). Although the factor
loadings were generally consistent with Naser et al. (2016; see Table 10), the model did
not demonstrate acceptable fit to the current data, χ2 (376) = 787.841, p < .001, RSMEA
= .141, 90% CI = (.136, .147), pclose < .001, CFI = .000, and TLI = .000.
The unacceptable fit of the bifactor model precludes testing Aim 3 as proposed. It
is not appropriate to include the overall risk score and domain-specific factors in the same
linear regression due to multicollinearity. Instead, the decision was made to examine the
predictive ability of the domain-specific factors separately from the overall risk score. To
do this, four domain-specific factors were constructed that align with the BASC-2
composites from which the items were drawn (Internalizing Problems,
Inattention/Hyperactivity, School Problems, Personal Adjustment; Kamphaus &
Reynolds, 2007). These factors also align with the structure identified by Naser et al.
(2016). As described in the Methods section, specific factors were computed by summing
the items representing each composite. All factors demonstrated acceptable internal
consistency: Personal Adjustment (� = .744; possible score range = 8 – 32),
Inattention/Hyperactivity (� = .705; possible score range = 6 – 24), Internalizing
Problems (� = .816; possible score range = 10 – 40), School Problems (� = .811;
possible score range = 6 – 24). T-scores were computed individually for each factor for
use in regressions analyses.
42
Linear regressions were completed to assess the predictive power of the BESS SF
domain-specific factors. Due to past findings about the influence of age and gender on
behavioral outcomes in schools (Kaufman et al., 2010) and their significant correlation
with many of the outcome variables (see Table 2), age and gender were controlled for in
all linear regressions.
Absences. The model with all four domain-specific factors was not found to
significantly predict of Absences (F [6, 213] = 1.508, p > .05; see Table 11) after
controlling for age and gender.
Suspensions. The model with all four domain-specific factors was found to be a
significant predictor of Number of Suspensions (F [6, 213] = 10.005, p < .001; see Table
12) and Days Suspended (F [6, 213] = 8.743, p < .001; see Table 13), after controlling
for age and gender. Results indicated that the significant variance accounted for in the
model for both Suspension variables was due to Inattention/Hyperactivity (Number β =
.285, p < .001 and Days β = .270, p < .001). None of the other domain-specific factors
accounted for a significant amount of variance in the model. As the t-score on the
Inattention/Hyperactivity factor increased, participants were suspended more frequently
and for longer periods of time during quarters two through four.
Minor Discipline Citations. The model with all four domain-specific factors was
found to be a significant predictor of Minor Discipline Citations (F [6, 213]) = 16.540, p
< .001; see Table 14) after controlling for age and gender. Results indicated that the
significant variance accounted for in the model was due to Inattention/Hyperactivity (β =
.263, p < .001). None of the other domain-specific factors accounted for a significant
43
amount of variance in the model. As the t-score on the Inattention/Hyperactivity factor
increased, participants received more Minor Discipline Citations.
Major Discipline Citations. The model with all four domain-specific factors was
found to be a significant predictor of Major Discipline Citations (F [3, 216] = 12.199, p <
.001; see Table 15) when controlling for age and gender. Results indicated that the
significant variance accounted for in the model was due to Inattention/Hyperactivity (β =
.270, p < .001). None of the other domain-specific factors accounted for a significant
amount of variance in the model. As the t-score on the Inattention/Hyperactivity factor
increased, participants received more Major Discipline Citations.
Positive Behaviors. The model with all four domain-specific factors was found to
be a significant predictor of Positive Behaviors (F [3, 216] = 5.513, p < .001; see Table
16) when controlling for age and gender. Results indicated that the significant variance
accounted for in the model was due to Inattention/Hyperactivity (β = -.201, p < .01) and
School Problems (β = -.170, p < .05). As the t-scores on Inattention/Hyperactivity and
School Problems increased, participants received fewer citations for Positive Behaviors.
44
IV. DISCUSSION
As schools strive to improve proactive identification and intervention with youth
at risk for negative behavioral outcomes, it is imperative that they have access to
validated universal screeners that fit their population needs. The current study sought to
assess the predictive validity and classification accuracy of the BESS SF overall risk
score within a school serving a largely African American student population.
Additionally, this study sought to identify how improving the specificity of outcome
variables impacted the overall predictive validity of the BESS SF. Finally, this study
sought to investigate the ability of the BESS SF factors to predict behavioral outcomes
above and beyond what is predicted by the overall BESS SF score through the application
of the bifactor model identified by Naser et al. (2016).
Despite the high rates of poverty at the school of interest and known connections
between living in poor, urban environments and high rates of life stressors and traumatic
experiences, (New Orleans Parents’ Guide, 2014, Overstreet & Mazza, 2003), the number
of students identified as at-risk in this study are generally on par with what is expected
using a three-tiered model of socioemotional functioning (e.g., Splett, Fowler, Weist,
McDaniel, & Dvorsky, 2013). Based on these models, it is expected that 80% of students
in any given school exhibit normal levels of risk, 15% are at-risk and/or exhibiting low
levels of problematic behavior, and 5% exhibit significant behavioral problems. In
comparison, in the current study 84.091% were classified as normal risk, 12.727% were
classified as elevated risk, and 3.182% were classified as extremely elevated risk on the
45
BESS SF. These relatively normative rates of students at risk for negative behavioral and
emotional outcomes may be indicative of high levels of resilience in the face of stress
amongst these students. However, students’ self-reported resilience does not seem to be
reflected in their teachers’ responses to them in the school environment. The high rates of
suspension and citations for major and minor behaviors indicated that the school
environment may be particularly “reactive” as evidenced by the high rates of disciplinary
action. This may be due to implicit bias of the teachers in the interpretation of the
behavior of their students as African American students frequently receive
disproportionate rates of disciplinary actions even for similar actions (Gregory, Skiba, &
Noguera, 2010; Skiba, Michael, Nardo, & Peterson, 2002; Skiba et al., 2011; U.S.
Department of Education, 2016). At a system-wide level, emphasis on rules and
discipline in school policies may create an environment where teachers feel required to
issue high levels of disciplinary citations as part of administrator efforts to promote
student behavior through development of strict rules (e.g., American Psychological
Association [APA] Zero Tolerance Policy Task Force, 2008; Bear, 2008; Fleming &
Rose, 2007). Therefore, behavioral outcomes may have more to do with the adults in the
environment and their perception of and reaction to behavior than with the student’s own
reported risk. Nevertheless, student reported risk was predictive of some important
outcomes.
There was some support for the predictive validity of the BESS SF overall risk
score and behavioral outcomes as demonstrated via longitudinal associations. Consistent
with the work of King et al. (2012) and Chin et al. (2013), the BESS SF overall risk score
obtained at the beginning of the school year significantly predicted the number of times
46
that students were suspended throughout the rest of the school year. Building on past
research, the BESS SF was also found to predict the total number of days that students
were suspended. Students with higher risk were suspended more frequently and for more
days than students with lower risk, supporting the study hypothesis and providing
evidence of the ability of the BESS SF to predict negative student outcomes. Providing
evidence in support of expanding conceptualizations of behavioral outcomes used in
validation studies, students scoring higher on the BESS SF were found to receive more
Major Discipline Citations than those with low levels of risk. As indicated by their name,
Major Discipline Citations represent severe behaviors that are consistent with
suspendable offenses (Child Trends & EMT Associates Inc., 2016; Educational and
Community Supports, 2016; Gion et al., 2014). Although hypervigilance may play a part
in increasing the overall number of suspensions and Major Discipline Citations within
this population, the severity of many of the behaviors necessary to warrant these actions
(e.g., Bullying/Taunting, Defacing School Property, Stealing) indicates that students are
likely exhibiting behaviors that warrant concern. The serious nature of these infractions is
associated with difficulties with behavioral and emotional control even if less severe
disciplinary actions may have been appropriate, allowing for the prediction of these
behaviors based on overall risk. Together, these results support study hypotheses and
provide evidence that the BESS SF is able to predict “serious” and clinically meaningful
negative behavioral outcomes, including suspensions and major violations of school
behavioral expectations.
Consistent with this finding and in support of the study hypothesis, overall risk
was negatively association with Positive Behaviors. This paints a picture of students who
47
are cited for negative behaviors while receiving lower accommodations for positive
behaviors. It is possible that children exhibiting inappropriate behaviors are not likely to
receive as much positive attention and recognition from their teachers. Instead, they are
disciplined which may, in turn, cause them to disengage further from school, resulting in
a reciprocal relationship between disciplinary actions and engagement (Wang &
Fredricks, 2014). As they become more disengaged from school, they may be less prone
to exhibiting positive behaviors, falling into a dangerous cycle of problematic behavior,
discipline, and disengagement from school.
Although this data provides support for the predictive validity of the BESS SF
overall risk score, examinations of classification accuracy call its effectiveness at
predicting negative behavioral outcomes into question. Similar to past research (e.g.,
King et al., 2012; Chin et al., 2013), the BESS SF was a better predictor of which
students were not at risk for negative behavioral outcomes than of which students were at
risk for such outcomes. This result was consistent across classification accuracy statistics
and logistic regressions. Although the BESS SF overall risk score had better positive
predictive power than in the Chin et al. (2013) with respect to suspensions (Current:
45.714%, Chin: 14.252%), results in the current study demonstrated lower sensitivity and
negative predictive power than found by Chin and colleagues (Current: 18.391% and
61.622%, Chin: 32.485% and 94.300%). Specificity was almost identical for suspensions
in both studies (Current: 85.714%, Chin: 85.107%). These results do not support the
utility of the BESS SF at identifying at-risk youth within this population.
With the exception of specificity, the classification accuracy statistics for
suspensions fall below the recommended standards of 70% to 80% classification
48
accuracy for screening instruments (American Academy of Pediatrics, 2012; Glover &
Albers, 2007), indicating problems with the use of the BESS SF as a clinical measure of
negative behavioral outcomes in youth despite the predictive ability of the overall BESS
SF score. High suspension rates in the current study may have decreased the
meaningfulness of suspensions as an indicator of negative behavioral outcomes as it
failed to represent a clinically meaningful cut-off point (Glaros & Kline, 1988; Streiner,
2003). In contrast to Chin et al. (2013) who reported that 7.080% of participants were
suspended one or more times, 39.545% of students participating in the current study were
suspended at least once. Due to the high rate of suspensions within the population, a
higher cut-off than one suspension may be necessary to distinguish at-risk students from
those who are not at-risk.
Although hypervigilance towards behavior may not have inhibited the prediction
of severe disciplinary actions due to the serious nature of the infractions and associated
difficulty with behavioral and emotional control, the same may not be true for more
minor disciplinary concerns. This is reflected in the inability of the BESS SF to predict
Minor Discipline Citations which indicates that some factor or factors outside of the child
may better account for these disciplinary actions. One way that student control over
outcomes may be restricted is through teacher hypervigilance to minor behaviors,
resulting in the issuance of citations based on low thresholds of inappropriate behavior
and/or misinterpretation of behaviors as problematic. It is possible that implicit bias on
behalf of teachers played a role in the high rates of disciplinary actions. Past research has
found that African American students are disciplined at higher rates than their peers, even
for similar actions (Gregory et al., 2010; Skiba, et al., 2002; Skiba et al., 2011, U.S.
49
Department of Education, 2016). Unlike the severe behaviors resulting in suspensions
and Major Discipline Citations, those resulting in Minor Discipline Citations tend to be
more normative rule violations (e.g., Causing Distractions/Disturbances, Gum Chewing,
Running in Hallway/Stairway) rather than true indicators of risk without knowing the
duration and intensity of these events. Using Off Task as an example, a student who is
given a citation for being off task for briefly daydreaming who still managed to complete
his/her work demonstrates a different severity of behavior from a student who received a
citation for being off task who was unfocused for long periods of time, resulting in work
incompletion. If a teacher immediately cites both students, the teacher’s hypervigilance to
the behavior may obscure the utility of the citation as an indicator of risk. If a student is
cited as Not Following Directions for being out of his/her seat without the teacher
recognizing that he/she was retrieving a dropped pencil, this behavior is misinterpreted as
problematic when it is not. It is also important to consider the larger system in which the
teachers are operating due to the influence of school policies on the behavior of
individual teachers (Bronfenbrenner 1977; 1986; Fenning & Rose, 2007; Foreman &
Zins, 2008; Nastasi, Moore, & Varjas, 2004). School-wide emphasis on strict behavioral
codes may result in hostile school environments with high rates of discipline citations
such as seen here (e.g., APA Zero Tolerance Policy Task Force, 2008; Bear, 2008).
Additionally, some of the included minor behaviors may be more indicative of
life circumstances outside of the control of students that commonly impact youth from
low socioeconomic status backgrounds (Overstreet & Mazza, 2003). For example, a
student may be out of uniform due to lack of funds to complete laundry that week or
sleeping during class because noise in their neighborhood kept them up the night before.
50
Rather than looking solely at the risk associated with factors internal to students, it is
important to consider how external factors including teacher interpretation of behavior
through a lens of implicit bias, school policies, and socioeconomic status (SES) influence
student behavior.
A lack of student control in determining school attendance may also explain the
fact that contrary to past research (King et al., 2012), overall risk did not predict school
absences despite fact that 32.727% of students were absent from school for 10 or more
days during quarters two through four. In comparison, 15.9% of students in fifth through
eighth grade students attending public schools within the same city missed 10 or more
days during entire school year (Sims & Vaughn, 2014). Rather than representing
behavioral risks specific to the child, the high absence rate may be influenced by factors
outside of their control, such as factors related to SES (Chang & Romero, 2008;
Morrissey, Hutchison, & Winsler, 2014). Children growing up in low SES families may
have more difficulty attending school regularly due to several concerns, including
housing instability and neighborhood safety. Additionally, many low-SES families do not
have access to reliable transportation. In cities where neighborhood-based schools have
been replaced with charter systems, as has happened in the city where this study was
conducted (Kamenetz, 2014), families can experience problems with school access due to
transportation issues. For example, if something happens to their method of
transportation (e.g., car breaks down, bus arrives early), students may have to miss a day
of school due to lack of alternate means to get them. Therefore, absences may mean
something different in the current study than in King et al. (2012); rather than
representing attendance problems related to self-reported risk (e.g., skipping), absences
51
for this population may reflect the influence of factors outside of student control such as
SES and associated difficulties.
Examinations of classification accuracy were consistent in that excessive
absences (10+) versus non-excessive absences were not found to be a clinically
meaningful indicator of behavioral risk, reaching an acceptable standard for specificity
only (82.432%). Although absences had better specificity than ODRs in King et al.
(2012; Current: 85.714%, King: 73.184%) and better positive predictive power than past
examinations of suspensions (Current: 25.714%, Chin et al., 2013: 14.252%), it was
weaker on all other previously gathered classification accuracy measures. Due to the high
rate of absences within the population, a higher cut-off than 10 absences may be
necessary to distinguish students at risk for behavioral outcomes from those who are not
at risk. Another possible explanation for the differential findings may be due to
operationalizion of absences/attendance in the current study versus King et al. (2012). In
the current study, the variable of interest was number of absences, looking specifically at
quarters two through four, while King et al. (2012) looked at the percentage of days that
students attended school on-time, including tardies. This difference in measurement may
have changed the nature of the outcome variable from what was employed in the current
study.
In summary, although linear regressions indicate that the BESS SF is able to
predict engagement in severe inappropriate behaviors (e.g., Suspensions, Major
Discipline Citations), further investigation into its classification accuracy revealed that
the BESS SF is better at predicting who will not demonstrate problematic outcomes than
predicting those who will, which is consistent with past research (Chin et al., 2013; King
52
et al., 2012). Glover and Albers (2007) argue that low positive predictive power and high
sensitivity may be ideal for screening instruments as this decreases the risk of missing
youth who are in need of prevention and intervention, but the current results were far
below the 70 – 80% standard for classification accuracy (American Academy of
Pediatrics, 2012). Although the BESS SF was designed to over-identify youth so as to
decrease the possibility of failing to identify youth in need of prevention and intervention
(Kamphaus & Reynolds, 2007; King et al., 2012), use of the BESS SF in the current
population as a screening tool may overwhelm the resources of the school to complete
necessary follow-up assessments. As the purpose of universal screeners are to provide
efficient and effective means of proactively identifying youth in need of prevention and
intervention efforts in order to decrease the possibility of negative behavioral outcomes,
this current study calls into the questions the acceptability of the BESS SF as a screener
for use within a comprehensive system for low SES, African American youth.
One way to address possible concerns about the predictive validity of the BESS
SF overall score is to examine how the underlying factor structure of the BESS can be
used to enhance its ability to predict behavioral outcomes. This study intended to
investigate the predictive utility of the four-factor bifactor model developed by Naser et
al. (2016) but was precluded from doing so due to the unacceptable fit of the bifactor
model to the current data. Instead, the current study examined the predictive ability of the
domain-specific factors based on the BASC-2 composites from which the items were
drawn (Internalizing Problems, Inattention/Hyperactivity, School Problems, Personal
Adjustment; Kamphaus & Reynolds, 2007) separately from the overall risk score.
53
Through these analyses, Inattention/Hyperactivity emerged as the main predictor
of student behavioral outcomes, serving as the sole predictive factor for Suspensions and
Major and Minor Discipline Citations. On examination, it is clear that the items
compromising the Major and Minor Discipline Citations categories are largely associated
with externalizing behaviors (e.g., Causing Distractions/Disturbances, Throwing,
Bullying/Taunting). Although the specific incidences for which students were suspended
are unavailable, review of state-endorsed suspendable behavior also reflect an emphasis
on externalizing behaviors (e.g., Cursing/Vulgar Language, Fighting, Willful
Disobedience; Child Trends & EMT Associates, Inc., 2016). As the
Inattention/Hyperactivity factor represents risk related to externalizing behaviors such as
talking while others are talking and having difficulty staying still, it is not surprising that
student-endorsement of risk related to this factor would be associated with these
outcome. In fact, these items specifically address school-related behaviors such as having
trouble paying attention to the teacher and standing in lines. Student response to such
questions may reflect their experiences being corrected in school for these specific
behaviors whether or not students agree that their particular behaviors are problematic
(Phares & Compas, 1990). At this time, research examining the relationship between
youth-reported risk related to specific domains of functioning and behavioral outcomes is
lacking, with the majority of available research on school-occurring externalizing
behaviors and behavioral consequences relying solely upon teacher-reported information
(e.g., McIntosh, Campbell, Carter, & Zumbo, 2009). Although past research has found
that correlations between teacher and youth reported externalizing behaviors/risk tend to
be at the moderate level at best (Achenbach et al., 1987; De Los Reyes & Kazdin, 2005;
54
Salbach-Andrae, Lenz, & Lehmkuhl, 2009), this research has largely been conducted
using longer scales with a wider variety of questions than those included on the domain-
specific factors of the BESS SF. By restricting the included questions to those more
specifically relevant to school settings as occurs on the Inattention/Hyperactivity domain-
specific factor, the reliability of youth reports of their behavior may have been enhanced,
resulting in the observed associations. It is imperative that future researchers make efforts
to include assessments of youth-reported risk such as the BESS SF rather than relying
solely on teacher- or parent-reported risk related to externalizing behaviors to assess this
possibility.
Inattention/Hyperactivity and School Problems were both significant predictors of
Positive Behaviors. One way to make sense of this relationship is through the lens of
school engagement (e.g., Fredricks, Blumenfeld, & Paris, 2004; Wang & Fredricks,
2014). There are multiple related domains composing the construct of school
engagement, including behavioral engagement (e.g., following school rules, exhibiting
academically relevant behaviors such as effort and enthusiasm, participating in activities),
emotional engagement (e.g., identification with and affect towards school), and cognitive
engagement (e.g., desire and focus on learning). As items on the School Problems factor
are largely associated with lack of engagement and enjoyment of school, it can be
conceptualized as a measure of emotional engagement in school.
Inattention/Hyperactivity can be conceptualized as a measure of behavioral engagement
as it focuses on difficulties paying attention and conforming to school behavioral
expectations. The items composing the Positive Behavior composite, such as Exemplary
Effort and Doing More Than Asked, can also be conceptualized as indicators of student
55
engagement. As indicators of various aspects of school engagement, the association
between Inattention/Hyperactivity, School Problems, and Positive Behavior makes
conceptual sense. Although research specifically examining the relationship between
school engagement and behaviors outside of academic achievement is in its infancy,
initial findings indicate a negative and reciprocal association between school engagement
and problematic behavior in adolescents (e.g., substance use, delinquency; Wang &
Fredricks, 2014). In the current study, students who receive fewer citations for positive
behaviors were also found to receive more negative disciplinary actions. Applying the
reciprocal relationship found by Wang and Fredericks (2014), students who demonstrate
fewer positive behaviors associated with school engagement demonstrate more
problematic behaviors, for which they are disciplined. Disciplinary actions may involve
missed class time and/or negative interactions with the school, which then further
decreases school engagement and the cycle continues. By tapping into school
engagement, low citations for Positive Behaviors may provide an alternate way to
identify students who are at risk for negative behavioral outcomes outside of the
examination of disciplinary actions.
None of the domain-specific factors significantly predicted student absences. This
was unexpected, especially with respect to Internalizing Problems as youth with
internalizing problems frequently have higher rates of absences from school (e.g., Zolog
et al., 2011) than youth without internalizing problems. The fact that absences were not
predicted by Internalizing Problems provides further evidence of the need to examine the
role of influences outside of student control on absence rates within this population as
discussed above.
56
In summary, the domain-specific factors demonstrate the potential to enhance the
predictive ability of the BESS SF. Specifically, Inattention/Hyperactivity and School
Problems may serve to predict those at-risk for specific types of behavioral outcomes. As
classification accuracy associated with the domain-specific factors was not investigated,
it remains to be seen whether or not domain-specific risk demonstrates an improvement
in prediction of who is at-risk versus who is not at-risk over that obtained using the BESS
SF overall risk score. Although further validation studies are necessary as are
investigations into factors that impact the relationship between risk and observed
behavioral outcomes, the current study represents an important step in this line of
research.
Limitations
A major limitation for this study was the relatively small sample size, which
limited the ability to fit a bifactor model to the BESS SF data. Although bifactor models
can be run successfully with sample sizes similar to the current study (N = 230), smaller
sample sizes are more prone to estimation problems (Brunner et al., 2012; MacCallum,
Widaman, Zhang, & Hong, 1999; Yang & Green, 2010). It is possible that with a larger
sample size the bifactor model would have fit to the current data. Alternatively, it is
possible that an alternate bifactor structure would have been a better fit to the current data
than the model found by Naser et al. (2016). By conducting a confirmatory factor
analysis without performing any exploratory factor analyses, possible alternative models
may have been overlooked. The lack of a bifactor model with acceptable fit indices
limited the ability to test the predictive validity of the domain-specific factors over and
above the overall BESS SF risk score as it was not possible to weight the factors.
57
Therefore, the factors were not orthogonal with the overall risk score resulting in
multicollinearity. Although it was possible to complete analyses examining the predictive
ability of the domain-specific factors without the inclusion of the overall factor, the
sample size and statistical composition of the data thwarted true examination of the
predictive power of a bifactor model of the BESS SF.
The predictive ability of the domain-specific factors may also have been hindered
by reliance on outcome variables that focused on observable behaviors recorded by
teachers, which generally consisted of behaviors associated with externalizing rather than
internalizing concerns. Although one of the advantages of using universal screening over
traditional methods of identification of students in need for prevention and intervention
efforts is the inclusion of items assessing risk for internalizing behaviors (Achenbach et
al., 1987; Walker et al., 2005), it is not possible to assess the ability of the BESS SF to
predict internalizing problems without representing them among outcomes.
The current study also lacked an indicator of academic performance such as grade
point average. Due to the connection between behavioral and emotional risk and impaired
academic performance, identifying students at need for socioemotional interventions
through universal screening can facilitate the provision of interventions that also serve to
improve academic performance (e.g., Eklund & Dowdy, 2014; King et al., 2012; Zins et
al., 2007). Although some indicators of academic performance were included as part of
the Positive Behavior variable (e.g., High Academic Achievement), the failure to include
a specific academic indicator inhibits the ability to evaluate the BESS SF as a predictor of
academic performance in African American youth.
58
Efforts to improve the clinical meaningfulness of behavioral outcomes in the
current study may have been hampered by the high rates of absences, suspensions, and
citations for minor and major behaviors amongst the participants. As a result, chosen
suspension and absence cut-off points may have lost their clinical significance (Glaros &
Kline, 1988; Streiner, 2003), resulting in the poor classification accuracy of the BESS SF.
This is despite the fact that the current study may actually underrepresent suspension and
absences as data represented quarters two through four rather than the whole school year.
Although the goal of this study was to make longitudinal predictions of negative
behavioral outcomes, concurrent validity is also important to identifying students in need
of prevention and intervention (Dowdy et al., 2012; Glover & Albers, 2007). Future
studies should seek to assess both concurrent and predictive validity of the BESS SF.
Another area of limitation for this study is the use of teacher-gathered data for the
Major and Minor Discipline Citations and Positive Behaviors. As teachers are responsible
for the education and supervision of large groups of students, it is highly likely that they
miss occurrences of behaviors as they work to complete the large variety of tasks that are
required as part of their job (e.g., Putnam, Luiselli, Handler, & Jefferson, 2003). Even the
most conscientious teacher is not going to be able to observe and record every instance of
every behavior included within the Kickboard system. Additionally, if individual teachers
conceptualize behaviors differently or have different thresholds for issuing citations, the
integrity of the data could be compromised (Education and Community Supports, 2016;
Kaufman et al., 2010; McIntosh et al., 2009; Putnam et al., 2003). For example, one
teacher may include both passive and active behaviors in their consideration of whether
or not a student is off task, while another teacher may focus only on active off-task
59
behaviors. As classification into Major or Minor categories was completed retroactively
based on behavioral categorization without specific knowledge of the behavior leading to
the citation, differences in conceptualization of categories could result in incorrect
assumptions regarding severity of incident. For example, one teacher could cite a
noncompliant student for Not Following Directions, a Minor Discipline Citation, while
another one views that same behavior as Willful Disobedience, a Major Discipline
Citation. Finally, one of the proposed reasons for using a self-report measure such as the
BESS SF instead of teacher report screeners is to reduce the influence of implicit teacher
bias on referrals (Raines et al., 2012). By focusing mainly on outcomes that are recorded
by teachers and other school employees, the element of bias is introduced back into the
equation. Hypervigilance to behaviors, whether due to individual bias or application of
school policies, may result in misinterpretation of non-problematic behaviors as
problematic and citations for behaviors at such a low threshold that they fail to indicate
true risk. This likely impacted Minor Discipline Citations more than Major ones as the
severity of behavior necessary to warrant Major Discipline Citations may indicate
difficulties with behavioral and emotional control even if less severe disciplinary actions
may have been appropriate. Therefore, for all these reasons, relying on teachers to input
data constitutes a limitation for the current study.
Implications and Future Directions
As schools seek ways to proactively identify students who are at-risk for negative
behavioral and emotional outcomes through universal screening, it is imperative that they
have access to measures that are appropriate to their needs and have been validated for
their population (Glover & Albers, 2007; Young et al., 2010). Towards this goal, the
60
current study sought to examine the predictive validity of the BESS SF with a low
socioeconomic status, African American population attending a public charter school in a
Southeastern city. Although the BESS SF was able to predict disciplinary actions related
to severe problematic behaviors (e.g., Suspensions, Major Discipline Citations), it proved
better at predicting those who will not demonstrate problematic outcomes than predicting
those who will as demonstrated by problematically low classification accuracy for
suspensions and attendance. This calls into question its utility as an effective and efficient
tool for use as part of a comprehensive system for identifying students in need of
prevention and intervention services. To some degree, the low classification accuracy of
the BESS SF is intentional in order to limit false negatives that result in students in need
not receiving appropriate services with the intention being that follow-up assessment as
part of a comprehensive screening and intervention will separate the students who are
truly at-risk from those who were falsely identified (Glover & Albers, 2007; Kamphaus
& Reynolds, 2007; Levitt et al., 2007). The BESS SF is not intended to be the sole
decision point for service provision. However, the high rates of over-identification
indicated in the current study cause concerns about the possibility of overwhelming
schools that are already strapped for resources (Glover & Albers, 2007), as many urban
schools are. As the BESS SF demonstrated the ability to predict behavioral outcomes
using regressions, it is imperative that researchers continue to investigate why this does
not translate to acceptable classification accuracy.
One area for further investigation is the clinically meaningfulness of indicators of
behavioral outcomes. Determinations of classification accuracy are dependent on the
clinical meaningfulness of included variables, both those that are used to predict
61
outcomes and those that are used to measure outcomes (Glaros & Kline, 1988; Streiner,
2003). If the specific cut point chosen to separate those exhibiting problematic behavior
from those who do not is not clinically meaningful, then classification accuracy can
appear to be worse than if another outcome and/or cut point were chosen. Rather than
representing a failure of the BESS SF to identify at-risk students, the low classification
accuracy statistics for absences and suspensions could be due to the decision to use cut
points that are too low to indicate clinically meaningful problematic behavior outcomes.
When 39.545% of students are suspended at least once and 32.727% were absent at least
ten days, cut points of one or more suspensions and ten or more absences may no longer
separate problematic from non-problematic behavioral outcomes. Instead, it may be
necessary to employ stricter standards of what constitutes problematic behavioral
outcomes in this population. For example, Chang and Romero (2008) advocate
conceptualizing “chronic absence as missing 10 percent or more of the school year …
regardless of whether absences are excused or unexcused” (p. 3). Alternatively rather
than relying on predetermined cut-off points, future studies can employ statistical
methodologies such as the application of receiver operator characteristic (ROC) curves to
identify clinically meaningful cut-off points for suspensions and absences that are
relevant to the specific population of interest (see Burke et al., 2012 for a demonstration
of this process). Additionally, this process can be used to determine clinically meaningful
cut points for new conceptualizations of problematic behavioral outcomes such as Major
and Minor Discipline Citations and Positive Behaviors. Once validated and normed, the
classification accuracy of the BESS SF domain-specific factors should also be
investigated.
62
As part of investigations of the appropriateness and clinical utility of chosen
outcome variables, it is important that future researchers investigate other factors that
could impact the observed relationship between student-reported risk and problematic
behavioral outcomes. Despite the high likelihood of exposure to life stressors and
traumatic experiences connected to growing up in poor, urban environments (New
Orleans Parents’ Guide, 2014; Overstreet & Mazza, 2003), students were identified as at-
risk at generally normative rates based on a three-tiered model (e.g., Splett et al., 2013).
This could indicate that the BESS SF is not an appropriate tool for universal screening in
schools serving urban, low socioeconomic status (SES), African American youth.
Alternatively, low self-reported risk could also indicate true resiliency on part of these
students; however, students’ self-reported resilience does not seem to be reflected in their
teachers’ responses to them in the school environment as indicated by the high rates of
disciplinary actions. Instead, that the school environment may be particularly “reactive”
and characterized by teacher hypervigilance to behavioral infractions. Although the use
of the self-report measures may serve to decrease the over-identification of at-risk youth
(Raines et al., 2012), the assessed outcomes with the exception of absences were
determined by teachers, and, therefore, are subject to potential bias. If teachers hold
unconscious biases, they may be more likely to perceive behaviors as violations worthy
of disciplinary citations for African American students that would not result in
disciplinary action for European American students, resulting in higher rates of office
discipline referrals and suspension for African American students (Gregory et al., 2010;
Skiba et al., 2002; Skiba et al., 2011; U.S. Department of Education, 2016). As a result,
even students with low risk of problematic behavioral outcomes may receive high
63
numbers of disciplinary actions, explaining the problems with classification accuracy for
suspensions observed in this study. In order to assess the role of teacher perceptions of
behaviors, future research should investigate differential classification accuracy and
predictive validity of the BESS SF using a variety of behavioral outcomes as mediated by
implicit bias of teachers. It is also possible that students see themselves as more resilient
than teachers perceive them to be. This possibility is supported by past findings of low
correlations between BESS SF and TF overall risk scores (interrater reliability = .393, p
< .01; King et al., 2012). Future studies could benefit from including assessments of risk
from multiple informants to evaluate differences in perceived risk and the impact of these
differences on demonstrated behavioral outcomes.
Future research should utilize systems based approaches to consider how school
disciplinary policies impact the disciplinary behavior of individual teachers
(Bronfenbrenner 1977; 1986; Bear, 2008; Fenning & Rose, 2007; Foreman & Zins, 2008;
Nastasi et al., 2004). Even if teachers do not perceive certain policies as necessary or fair,
they may feel pressured to conform to administrative policies, resulting in hypervigilance
to behaviors. This school-wide emphasis on strict behavioral codes may result in hostile
school environments with high rates of discipline citations such as found in the current
study (e.g., APA Zero Tolerance Policy Task Force, 2008; Bear, 2008; Fenning & Rose,
2007). Future studies should assess the role of administrative policies emphasizing strict
application of rules on the relationship between student risk status and behavioral
outcomes.
In addition to the race/ethnicity of the students, this sample differed from many
others due to the high representation of students from low socioeconomic backgrounds.
64
Children growing up in low SES families are exposed to circumstances outside of their
control that may impact their ability to comply with behavioral expectations (Overstreet
& Mazza, 2003). Noncompliance with several of the behaviors resulting in Minor
Disciplinary Citations may be more indicative of such circumstances than of personal
risk. For example, students may be cited for “No Learning Supplies” not because they
forgot to bring pencils to class or because they do not care about school, but because all
their pencils are broken and their families cannot afford to buy new ones. The context of
growing up in poverty may also impact student absences as transportation issues, housing
instability, and neighborhood safety can make it difficult to get to school (Chang &
Romero, 2008; Morrissey et al., 2014). By learning more about the contextual factors
influencing student behavioral outcomes, appropriate prevention and intervention
strategies can be implemented that target the true cause of their behaviors rather than
focusing on behavioral and emotional risk factors specific to the student.
Another area for future investigation is the role of school engagement as a
potential mediator of the relationship between risk and problematic behaviors. On
examination of the questions composing the BESS SF domain-specific factors, it was
possible to make comparisons between Inattention/Hyperactivity and School Problems
and the concepts of behavioral engagement and emotional engagement, respectively.
These two factors were significant predictors of Positive Behaviors, which can be
conceptualized as reflective of overall engagement in school. The reciprocal association
between low school engagement and problematic behavior is concerning (Chang &
Romero, 2008; Fredricks et al., 2004; Wang & Fredricks, 2014), especially in light of the
high disciplinary actions in this school. Students who experience high rates of
65
disciplinary action miss academic time and may become further disengaged from school,
resulting in escalating problematic behaviors (Gregory et al., 2010; Wang & Fredricks,
2014). In contrast, engaged students may exhibit more positive behaviors, for which they
receive praise rather than discipline, which may, in turn, increase their engagement in
school. By tapping into the school engagement, Positive Behaviors may provide an
alternate way to identify students who are at risk for negative behavioral outcomes
outside of the examination of disciplinary actions. Future studies should examine the role
that student-reported school engagement plays in the relationship between risk status and
problematic behavioral outcomes including low receipt of citations for Positive
Behaviors.
The high number of disciplinary actions taken within this school are concerning
as restrictive discipline policies are associated with higher rather than lower rates of
problematic behaviors (Gregory et al., 2010; Wang & Fredricks, 2014; Way, 2011) As it
is known that positive reinforcement of appropriate behaviors is more effective at
promoting engagement in positive behaviors than punishment of inappropriate behaviors,
one possible way to provide prevention and intervention to students at-risk for
problematic behaviors may be to focus on providing specific instruction in the behaviors
required to earn merit awards. Instead of implementing strict disciplinary policies,
increasing positive behavioral supports through policies focusing on the importance
positive reinforcement in schools and clarity of expectations may help students
experience improved connections with teachers and the school as a whole resulting in
increased engagement and motivation to perform academically and behaviorally (APA
Zero Tolerance Policy Task Force, 2008; Bear, 2008; Fredricks et al., 2004).
66
Additionally, instruction in social-emotional and behavioral self-management within the
context of a supportive system of reinforcements and rewards for all students should help
reduce high levels of disciplinary action seen in this school. By reducing hypervigilance
to undesired behaviors, schools may enhance the clinical meaningfulness of disciplinary
actions as indicators of problematic outcomes as their issuance will better reflect
occurrences of problematic behaviors. As specific behavioral outcome data including
Positive Behaviors for the current study were collected as part of the school-wide positive
behavioral intervention and supports system (SWPBIS), it is clear that the school is
attempting to implement such a system, but the fidelity with which it is being
implemented is unclear at this time. Future research should seek to explore how SWPBIS
implementation can impact the relationship between risk status and problematic behavior
outcomes, specifically examining how reinforcement of positive behaviors and
disciplinary actions impact school engagement.
The current study provided initial evidence for new conceptualizations of
problematic behaviors, especially Major Discipline Citations and Positive Behaviors.
Despite this, the reliance on teacher gathered data may have negatively impacted the
reliability of this data due to the influence of implicit bias, school policy issues, potential
differences in variable conceptualization, and difficulty gathering data in light of other
responsibilities (Education and Community Supports, 2016; Fenning & Rose, 2007;
Gregory et al., 2010; Kaufman et al., 2010; McIntosh et al., 2009; Putnam et al., 2003;
Skiba et al., 2002; Skiba et al., 2011). One way to lessen the impact of this problem
would be to provide teacher trainings on the operationalization of variables, including a
focus on appropriate use of the Kickboard system with implementation integrity checks
67
occurring in classrooms throughout the year. Special attention addressing implicit bias
and hypervigilance to behaviors can be included as part of these trainings. Consultation
and trainings must also include school administration and other stakeholders who have a
role in determining school policies in order to improve buy-in and support of teachers as
they strive to change the way they discipline students and improve their cultural
competence (Bear, 2008; Fenning & Rose, 2007; Foreman & Zins, 2008; Nastasi et al.,
2004). Without institutional acceptance of the recommended changes, teachers may
receive conflicting messages regarding disciplinary expectations, decreasing the
effectiveness of any consultation efforts. Doing so may also serve to improve the overall
implementation of the SWPBIS system, which in turn, should serve to increase student
engagement and motivation to perform academically and behaviorally (APA Zero
Tolerance Policy Task Force, 2008; Bear, 2008). The perceived need for high rates of
disciplinary actions may be decreased through efforts to improve the reliability of data
collection.
Additionally, work can be done to improve the specificity of the Kickboard
variables themselves. For example, behaviors could be classified as Major or Minor at the
time of incident to decrease the possibility of misclassification. Alternative categorization
schemas should also be tested. The Major versus Minor classification system based on
the work of Horner and colleagues (Educational and Community Supports, 2016) utilized
by this study is only one way to categorize types of behavioral citations that can be issued
to students. Instead of categorizing referrals based on severity of incident, type of
behavior could be used to guide classification. For example, Putnam et al. (2003)
recorded whether behaviors resulting in ODRs were considered aggressive, disruptive,
68
disrespectful, noncompliant, or other. Using another system, Kaufman et al. (2010)
examined the occurrence of ODRs related to attendance, delinquency, aggression, and
disrespect. Future research should explore the relative merits of classifying behaviors by
severity versus behavior type with respect to feasibility of application and as clinically
meaningful indicators of problematic behavioral outcomes.
In order to address concerns over the predictive validity of the BESS SF overall
score, efforts should be made to improve its preciseness. One way to do this is examine
how the application of factors representing the underlying structure of the BESS can
function to enhance its utility. Efforts to examine the predictive ability of a four-factor
bifactor model of the BESS SF were hindered due to the poor fit of the bifactor model
found by Naser et al. (2016) to the current data set. As a result, it was not possible to
determine their predictive ability above and beyond the BESS SF overall risk score.
Future validation studies with larger sample sizes applying the bifactor model obtained
by Naser et al. (2016) and using exploratory factor analyses to determine model structure
should be completed in order to examine the predictive ability of a bifactor model of the
BESS SF. Despite this, analyses of the domain-specific factors consistent with the
BASC-2 domains from which the BESS items are drawn (Kamphaus & Reynolds, 2007)
demonstrated initial evidence of their utility as predictors of negative behavioral
outcomes. Specifically, these analyses demonstrated that students endorsing risk related
to Inattention/Hyperactivity have higher occurrences of negative behavioral outcomes in
school including suspensions and disciplinary citations focused on externalizing
concerns. These students may benefit from interventions designed to improve their
attention and behavioral control. Additionally, students at risk for
69
Inattention/Hyperactivity and/or School Problems received fewer citations for positive
behaviors than those who were not at risk on these factors. Therefore, students exhibiting
high scores on the BESS SF may benefit from interventions designed to encourage more
positive behaviors through efforts to improve their overall school engagement and the
application of SWPBIS as discussed above. Validation of the ability of the domain-
specific factors of the BESS SF as predictors of student behavioral outcomes and their
utility in developing targeted interventions based on specific areas of risk represent
important areas for further research.
The significant association between the BESS SF and Positive Behaviors
provided evidence for expanding examined behavioral outcomes beyond those related to
disciplinary outcomes and absences. Special attention should be paid to identifying
outcomes that may be predicted by the domain-specific factors as true examinations of
the predictive ability and classification accuracy of a predictor cannot be made unless
appropriate and clinically meaningful indicators of outcomes are used (Glaros & Kline,
1988). For example, the predictive ability of the Internalizing Problems factor can only
be examined through the presence of an outcome associated with internalizing concerns.
Although not feasible for individual schools to implement on a large scale, future
research could include specific measures designed to assess outcomes related to anxiety
and depression. Such efforts are necessary to complete validation studies of the BESS SF.
In conclusion, the current study represents another step towards the validation of
the BESS SF as a universal screening tool that can be used as part of a comprehensive
system of identification of students at-risk for negative behavioral and emotional
outcomes. Although the association of the overall BESS SF score with specific
70
behavioral outcomes seems promising, classification accuracy statistics continue to be
lacking, leading to the recommendation of caution when using the BESS SF to identify
students in need of prevention and intervention efforts. It should not be used, nor is it
intended to be used, without a comprehensive system in place to help separate those who
are truly at risk from the false positives. Despite this, the potential for improvement is
there. Further validation studies must be completed to determine whether meaningful and
appropriate cut-off points for behavioral outcomes can be established and explore
alternative behavioral outcomes such as positive behaviors and internalizing concerns.
Other factors that may account for the high rates of negative behavior outcomes despite
the low levels of overall risk such as implicit teacher bias, school policies, and
socioeconomic status should be explored. The predictive utility of a bifactor BESS model
remains to be seen; however, the predictive utility of the Inattention/Hyperactivity and
School Problems factors shows potential for the usefulness of domain-specific factors. As
early identification and intervention through the use of universal screeners such as the
BESS SF is key to decreasing the short- and long-term negative outcomes associated with
behavioral and emotional difficulties in youth, it is imperative that these validation efforts
be continued.
71
TABLES
Table 1 Descriptive Statistics
M SD Lowest Highest
Age 11.430 1.666 8.000 15.000
Overall Risk Score 51.200 9.603 31.000 81.000
Minor Discipline Citations 64.209 53.265 0.000 221.000
Major Discipline Citations 9.614 10.470 0.000 40.000
Positive Behaviors 168.286 69.459 38.000 364.000
Inattention/Hyperactivity 11.716 3.608 6.000 23.000
School Problems 12.081 4.168 6.000 23.000
Internalizing Problems 18.449 5.626 10.000 36.000
Personal Adjustment 14.904 4.474 8.000 29.000
Absences (Days) 7.709 5.639 0.000 26.000
Number of Suspensions 0.982 1.567 0.000 6.000
Days Suspended 1.864 3.136 0.000 12.000
72
Table 2 Correlations Between Demographic, Predictor, and Outcome Variables
Variables 2 3 4 5 6 7 8 9 10 11 12 13
1. Gendera .081 .008 -.008 .003 .033 -.020 0103 -.058 -.053 -.168* -.151* .196**
2. Age -.006 .208** -.041 -.130 .127 .150* .375*** .359*** .443*** .396*** .165*
3. Overall Risk Score .684*** .686*** .858*** .650*** .041 .147* .147* .093 .118 -.159*
4. School Problems .236*** .415*** .442** .101 .181** .157* .239*** .188** -.171*
5. Personal Adjustment .536*** .202** .026 .010 .035 -.055 -.016 -.038
6. Internalizing Problems .406*** .009 .014 .021 -.067 -.006 -.075
7. Inattention / Hyperactivity -.010 .310*** .291*** .298*** .300*** -.210**
8. Absences .251*** .252*** .183** .210** -.314***
9. Number of Suspension .937*** .674*** .660*** -.435***
10. Days Suspended .601*** .611*** -.427***
11. Minor Discipline Citations .892*** -.477***
12.Major Discipline Citations -.409***
13. Positive Behavior a 0 = Boys, 1 = Girls
73
Table 3 Prediction of Absences by Overall BESS SF Risk Score
Variable b SE b β ΔR2 Step 1 .031* Gender 1.025 0.091 .091 Age 0.484 0.143 .143* Step 2 .002
Overall BESS SF Risk Score 0.024 0.039 .041
* p < .05, ** p < .01, *** p < .001 Final model statistics: R2 = .033, F (3, 216) = 2.422 p > .05
Table 4 Prediction of Minor Discipline Citations by Overall BESS SF Risk Score Variable b SE b β ΔR2 Step 1 .238*** Gender -0.410 0.119 -.205** Age 0.276 0.036 .460*** Step 2 .009
Overall BESS SF Risk Score 0.010 0.006 .097
* p < .05, ** p < .01, *** p < .001 Final model statistics: R2 = .248, F (3, 216) = 23.718, p< .001
Table 5 Prediction of Number of Suspension by Overall BESS SF Risk Score Variable b SE b β ΔR2 Step 1 .149*** Gender -0.278 0.197 -.089
Age 0.360 0.059 .383*** Step 2 .023*
Overall BESS SF Risk Score 0.025 0.010 .151*
* p < .05, ** p < .01, *** p < .001
Final model statistics: R2 = .171, F (3, 216) = 14.893, p < .001
74
Table 6 Prediction of Days Suspended by Overall BESS SF Risk Score Variable b SE b β ΔR2 Step 1 .136*** Gender -0.520 0.397 -.083
Age 0.689 0.119 .366*** Step 2 .022*
Overall BESS SF Risk Score 0.049 0.020 .150*
* p < .05, ** p < .01, *** p < .001 Final model statistics: R2 = .146, F (3, 216) = 13.525, p< .001
Table 7 Prediction of Major Discipline Citations by Overall BESS Risk Score Variable b SE b β ΔR2 Step 1 .191*** Gender -0.369 0.122 -0.185**
Age 0.247 0.037 0.411*** Step 2 .015*
Overall BESS SF Risk Score 0.013 0.006 0.121*
* p < .05, ** p < .01, *** p < .001
Final model statistics: R2 = .205, F (3, 216) = 18.602, p< .001 Table 8 Prediction of Positive Behaviors by Overall BESS SF Risk Score Variable b SE b β ΔR2 Step 1 .061** Gender 0.368 0.132 .184**
Age 0.09 0.040 .150* Step 2 .025*
Overall BESS SF Risk Score -0.016 0.007 -.159*
* p < .05, ** p < .01, *** p < .001
Final model statistics: R2 = .086, F (3, 216) = 6.814, p< .001
75
Table 9 Classification Accuracy Using BESS SF Overall Score
Outcome Sensitivity Specificity
Positive Predictive
Power
Negative Predictive
Power Absences 12.500% 82.432% 25.714% 65.945% Number of Suspensions 18.391% 85.714% 45.714% 61.622%
76
Table 10 BESS SF Bifactor Model Standardized Weight Estimates
Item Description Overall Personal
Adjustment Inattention /
Hyperactivity Internalizing
Problems School
Problems
9. I am liked by others. .457 .654 21. People think I'm fun to be with. .328 .574 30. Others have respect for me. .414 .459 15. My parents trust me. .333 .287 26. My parents are proud of me. .384 .274 4. I like the way I look. .246 .273 18. My parents listen to what I say. .435 .122 1. I am good at making decisions. .381 .045 8. I have trouble paying attention to the teacher. .36 .538 25. I get into trouble for not paying attention. .329 .525 2. I talk while other people are talking. .272 .480 28. I have trouble standing still in lines. .262 .447 11. I have trouble sitting still. .294 .363 24. People tell me that I am too noisy. .309 .344 13. I feel like people are out to get me. .496 .873 14. I worry about what is going to happen. .405 .250 3. I worry but I don’t know why. .441 .155 10. I feel like my life is getting worse and worse. .714 .148
7. People get mad at me, even when I don't do anything wrong. .505 .126
16. I am left out of things. .534 .117 27. Even when I try hard, I fail. .617 -.082 23. I get blamed for things I can't help. .601 -.027 5. I feel out of place around people. .588 .025 20. I want to do better but can't. .489 12. School is boring. .212 .767
17. I hate school. .380 .687
19. Teachers are unfair. .363 .573
29. My school feels good to me. .286 .500
6. I feel like I want to quit school. .472 .395
22. Teachers make me feel stupid. .545 .210
77
Table 11 Prediction of Absences by the BESS Domain-Specific Factors Variable b SE b β ΔR2 Step 1 .031* Gender 1.025 0.755 .091 Age 0.484 0.227 .143* Step 2 .010 Internalizing Problems -0.001 0.051 -.001 School Problems 0.057 0.045 .102 Inattention/ Hyperactivity -0.042 0.044 -.074 Personal Adjustment 0.013 0.045 .023
* p < .05, ** p < .01, *** p < .001 Final model statistics: R2 = .041, F (6, 213) = 1.508, p > .05
Table 12 Prediction of Number of Suspensions by the BESS Domain-Specific Factors Variable b SE b β ΔR2 Step 1 .149*** Gender -0.278 0.197 -.089 Age 0.36 0.059 .383*** Step 2 .071** Internalizing Problems -0.009 0.013 -.057 School Problems 0.001 0.011 .009 Inattention/ Hyperactivity 0.045 0.011 .285*** Personal Adjustment -0.001 0.011 -.006
* p < .05, ** p < .01, *** p < .001 Final model statistics: R2 = .220, F (6, 213) = 10.005, p < .001
78
Table 13 Prediction of Days Suspended by the BESS Domain-Specific Factors Variable b SE b β ΔR2 Step 1 .136*** Gender -0.520 0.397 -.083 Age 0.689 0.119 .366*** Step 2 .062** Internalizing Problems -0.015 0.026 -.049 School Problems -0.005 0.023 -.016 Inattention/ Hyperactivity 0.085 0.022 .270*** Personal Adjustment 0.008 0.023 .024
* p < .05, ** p < .01, *** p < .001 Final model statistics: R2 = .198, F (6, 213) = 8.743, p < .001
Table 14 Prediction of Minor Discipline Citations by the BESS Domain-Specific Factors Variable b SE b β ΔR2 Step 1 .238*** Gender -0.410 0.119 -.205** Age 0.276 0.036 .460*** Step 2 .079*** Internalizing Problems -0.014 0.008 -.141 School Problems 0.011 0.007 .110 Inattention/ Hyperactivity 0.026 0.007 .263*** Personal Adjustment -0.004 0.007 -.043
* p < .05, ** p < .01, *** p < .001 Final model statistics: R2 = .318, F (6, 213) = 16.540, p < .001
79
Table 15 Prediction of Major Discipline Citations by the BESS Domain-Specific Factors Variable b SE b β ΔR2 Step 1 .191*** Gender -0.369 0.122 -.185** Age 0.247 0.037 .411*** Step 2 .065** Internalizing Problems -0.006 0.008 -.058 School Problems 0.002 0.007 .023 Inattention/ Hyperactivity 0.027 0.007 .270*** Personal Adjustment -0.003 0.007 -.029
* p < .05, ** p < .01, *** p < .001 Final model statistics: R2 = .256, F (6, 213) = 12.199, p < .001
Table 16 Prediction of Positive Behaviors by the BESS Domain-Specific Factors Variable b SE b β ΔR2 Step 1 .061** Gender 0.368 0.132 .184** Age 0.090 0.040 .150* Step 2 .073** Internalizing Problems 0.010 0.009 .103 School Problems -0.017 0.008 -.170* Inattention/ Hyperactivity -0.020 0.007 -.201** Personal Adjustment 0.000 0.008 -.003
* p < .05, ** p < .01, *** p < .001 Final model statistics: R2 = .134, F (6, 213) = 5.513, p < .001
80
APPENDIX Development of Specific Behavioral Outcomes Based on Kickboard Data
Category Referral Types (Educational and Community Supports, 2016) Specific Citations
Major Discipline Citations
Abusive Language/Inappropriate Language/Profanity, Bullying, Defiance/Insubordination/Non-Compliance, Fighting,
Forgery/Theft/Plagiarism, Inappropriate Location/Out of Bounds Area, Lying/Cheating, Property Damage/Vandalism, Skip Class
Bullying/Taunting Cursing/Vulgar Language Defacing School Property
Forgery Improper Touching
Leaving Early Lying
Skipping Skipping Detention
Stealing Throwing
Unauthorized Area Willful Disobedience
Minor Discipline Citations
Defiance, Disrespect, Disruption, Dress Code Violation, Property Misuse, Technology Violation
Causing Distractions/Disturbances Cell Phone or Electronic Device
Disrespect to Adults, Peers, or Property Eating in Computer Lab
Giving Up/Making Excuses Gossiping/Ribbing
Gum Chewing Horseplay/Play Fighting
Improper Use of Materials Incomplete Work
Littering Low/No Participation
No Do Now No Homework
No Learning Supplies Not Following Directions
Off Task Running in Hallway/Stairway Safe- Lining Up Incorrectly
Sleeping Talking During Level 0
Tardy to Class Unauthorized Food or Drinks
Uniform Violation - Jacket Coat Sweater Uniform Violation - Shirt
Uniform Violation - Shirt Untucked Uniform Violation - Shoes/Sneakers
Uniform Violation - Socks Uniform Violation - Wearing Hood
81
Category Referral Types Specific Citations
Positive Behaviors None
Dedication and Drive
Doing More than Asked
Exemplary Effort
Exemplary Leadership
Exemplary Service to Others
Exemplary Work
High Academic Achievement
High Enthusiasm
Major Improvement
Responsible - Lining Up Correctly
Taking Responsibility for Actions
Removed: Ambiguous None
LTS
Parental Involvement
Signed Paycheck Removed: Not start
using until 4th quarter
None Kindness, Empathy, and Respect for Others
82
LIST OF REFERENCES
Achenbach, T.M., McConaughy, S.H., & Howell, C.T. (1987). Child/adolescent
behavioral and emotional problems: Implications of cross-informant correlations
for situational specificity. Psychological Bulletin, 101, 213 – 232. doi:
10.1037/0033-2909.101.2.213
Achenbach, T. M., & Rescorla, L. A. (2001). Manual for the ASEBA School-Age Forms
& Profiles. Burlington, VT: University of Vermont, Research Center for Children,
Youth, & Families.
Albers, C.A., & Kettler, R.J. (2014). Best practices in universal screening. In P. Harrison
and A. Thomas (Eds.), Best practices in school psychology: Data-based and
collaborative decision making (pp. 121 – 131). Bethesda, MD: NASP.
American Academy of Pediatrics (2012). Addressing mental health concerns in primary
care: A clinician’s toolkit. Mental health screening and assessment tools for
primary care. Retrieved from: https://www.aap.org/en-us/advocacy-and-
policy/aap-health-initiatives/Mental-Health/Documents/MH_ScreeningChart.pdf
American Psychological Association Zero Tolerance Task Force (2008). Are zero
tolerance policies effective in the schools? An evidentiary review and
recommendations. American Psychologist, 63, 852 – 862. doi: 10.1037/0003-
066X.63.9.852
83
Bear, G.G. (2008). School-wide approaches to behavior problems. In B. Doll and J.A.
Cummings (Eds.), Transforming school mental health services: Population-based
approaches to promoting the competency and wellness of children (pp. 103 –
141). Thousand Oaks, CA: NASP and Corwin Press.
Bronfenbrenner, U. (1977). Toward an experimental ecology of human development.
American Psychologist, 32, 513-531. doi: 10.1037/0003-066X.32.7.513
Bronfenbrenner, U. (1986). Ecology of the family as context for human development:
Research perspectives. Developmental Psychology, 22, 723-742. doi:
10.1037/0012-1649.22.6.723
Brunner, M., Nagy, G., & Wilhelm, O. (2012). A tutorial on hierarchically structured
constructs. Journal of Personality, 80, 796 – 846. doi: 10.1111/j.1467-
6494.2011.00749.x
Burke, M.D., Davis, J.L., Lee, Y.-H., Hagan-Burke, S., Kwok, O.-M., & Sugai, G.
(2012). Universal screening for behavioral risk in elementary schools using
SWPBS expectations. Journal of Emotional and Behavioral Disorders, 20, 38 –
54. doi: 10.1177/1063426610377328
California Department of Education (2010). California Healthy Kids Survey. Retrieved
from http://chks.wested.org
Chang, H.N., & Romero, M. (2008). Present, engaged and accounted for: The critical
importance of addressing chronic absence in the early grades. Retrieved from
http://www.nccp.org/publications/pdf/text_837.pdf
84
Chen, F.F., West, S.G., & Sousa, K.H. (2006). A comparison of bifactor and second-
order models of quality of life. Multivariate Behavioral Research, 41, 189 – 225.
doi:10.1207/s15327906mbr4102_5
Child Trends & EMT Associates, Inc. (2016). Louisiana compilation of school discipline
laws and regulations. Retrieved from https://safesupportivelearning.ed.gov/
sites/default/files/ discipline-
compendium/Louisiana%20School%20Discipline%20Laws%20and%20
Regulations.pdf
Chin, J.K., Dowdy, E., & Quirk, M.P. (2013). Universal screening in middle school:
Examining the Behavioral and Emotional Screening System. Journal of
Psychoeducational Assessment, 31, 53 – 60. doi: 10.1177/0734282912448137.
De Los Reyes, A., & Kazdin, A.E. (2005). Informant discrepancies in the assessment of
childhood psychopathology: A critical review, theoretical framework, and
recommendations for further study. Psychological Bulletin, 2005; 483 – 509. doi:
10.1037/0033/2909.131.4.483
Dowdy, E. Furlong, M.J., & Sharkey, J.D. (2012). Using surveillance of mental health to
increase understanding of youth involvement in high-risk behaviors: A value-
added analysis. Journal of Emotional and Behavioral Disorders, 21, 33 – 44. doi:
10.1177/10634266611416817
Dowdy, E., Twyford, J.M., Chin, J.K., DiStefano, C.A., Kamphaus, R.W., & Mays, K.L.
(2011). Factor structure of the BASC-2 Behavioral and Emotional Screening
System Student Form. Psychological Assessment, 23, 379 – 387. doi:
10.1037/a0021843
85
Educational and Community Supports (2016). PBIS apps. Retrieved from
https://www.pbisapps.org/Pages/Default.aspx
Eklund, K., & Dowdy, E. (2014). Screening for behavioral and emotional risk versus
traditional school identification methods. School Mental Health, 6, 40 – 49. doi:
10.110/s12310-013-9109-1
Feeney-Kettler, K.A., Kratochwill, T.R., Kaiser, A.P., Hemmeter, M.L., & Kettler, R.J.
(2010). Screening young children’ risk for mental health problems: A review of
four measures. Assessment for Effective Intervention, 35, 218 – 230. doi:
10.1177/1534508410380557
Fenning, P., & Rose, J. (2007). Overrepresentation of African American students in
exclusionary discipline: The role of school policy. Urban Education, 42, 536 –
559. doi: 10.1177/0042085907305039
Field, A. (2013). Discovering statistics using IBM SPSS Statistics and sex and drugs and
rock ‘n’ roll (4th ed.). Washington, DC: Sage Publications Ltd.
Foreman, S.F., & Zins, J.E. (2008). Section commentary: Evidence-based consultation:
The importance of context and the consultee. In W.P. Erchul and S.M. Sheridan
(Eds.), Handbook of research in school consultation (pp. 361 – 371). New York:
Routledge.
Fredricks, J.A., Blumenfeld, P.C., & Paris, A.H. (2004). School engagement: Potential of
the concept, state of the evidence. Review of Educational Research, 74, 59 – 109.
doi: 10.3102/00346543074001059
86
Gion, C.M., McIntosh, K., & Horner, R. (2014). Patterns of minor office discipline
referrals in schools using SWIS. Retrieved from https://www.pbis.org/blueprint/
evaluation-briefs/patterns-of-minor-odrs
Glaros, A.G., & Kline, R.B. (1988). Assessing the accuracy of tests with cutting scores:
The sensitivity, specificity, and predictive value model. Journal of Clinical
Psychology, 44, 1013 – 1023. doi: 10.1002/1097-4679(198811)44:63.0.C);2-Z
Glover, T.A., & Albers, C.A. (2007). Considerations for evaluating universal screening
assessments. Journal of School Psychology, 45, 117 – 135. doi:
10.1016/j.jsp.2006.05.020
Gray, M.J., Litz, B.T., Hsu, J.L., & Lombardo, T.W. (2004). Psychometric properties of
the Life Events Checklist. Assessment, 11, 330 – 341. doi:
10.1177/1073191104269954
Gregory, A., Skiba, R.J., & Noguera, P.A. (2010). The achievement gap and the
discipline gap: Two sides of the same coin? Educational Research, 39, 59 – 68.
doi: 10.3102/001389X09357621
Goodman, R. (1997). The Strengths and Difficulties Questionnaire: A research note.
Journal of Child Psychology and Psychiatry, and Allied Disciplines, 38, 581 –
586. doi: 10.1111/j.1469-7610.1997.tb01545.x
Harrell-Williams, L.M., Raines, T.C., Kamphaus, R.W., & Dever, B.V. (2015).
Psychometric analysis of the BASC-2 Behavioral and Emotional Screening
System (BESS) Student Form: Results from high school student samples.
Psychological Assessment. Advance online publication.
http://dx.doi.org/10.1037/pas0000079
87
Helms, J.E. (1992). Why is there no study of cultural equivalence in standardized
cognitive ability testing? American Psychologist, 47, 1083 – 1101. doi:
10.1037/0003-066X.47.9.1083
Helms, J.E. (2006). Fairness is not validity or cultural bias in racial-group assessment: A
quantitative perspective. American Psychologist, 61, 845 – 859. doi:
10.1037/0003-066X.61.8.845
Hill, L.G., Lochman, J.E., Coie, J.E., Greenberg, M.R., & the Conduct Problems
Prevention Research Group (2004). Effectiveness of early screening for
externalizing problems: Screening accuracy and utility. Journal of Consulting and
Clinical Psychology, 72, 809 – 820. doi: 10.1037/0022-006S.72.5.809
Kamenetz, A. (2014). The end of neighborhood schools. Retrieved from
http://apps.npr.org/the-end-of-neighborhood-schools/
Kamphaus, R.W., DiStefano, C., Dowdy, E., Eklund, K., & Dunn, A.R. (2010).
Determining the presence of a problem: Comparing two approaches for detecting
youth behavioral risk. School Psychology Review, 39, 395 – 407.
Kamphaus, R.W., & Reynolds, C.R. (2007). BASC-2 Behavioral and Emotional
Screening System manual. Minneapolis, MN: Pearson.
Kaufman, J.S., Jaser, S.S., Vaughan, E.L., Reynolds, J.S., Di Donato, J., Bernard, S.N., &
Hernandez-Brereton, M. (2010). Patterns in office referral data by grade,
race/ethnicity, and gender. Journal of Positive Behavior Interventions, 12, 44 –
54. doi: 10.1177/1098300708329710
Kickboard (2016). Kickboard. Retrieved from https://www.kickboardforschools.com/
88
King, K.R., & Reschly, A.L. (2014). A comparison of screening instruments: Predictive
validity of the BESS and the BSC. Journal of Psychoeducational Assessment, 32,
687 – 698. doi: 10.1177/0734282914531714
King, K., Reschly, A.L., & Appleton, J.J. (2012). An examination of the validity of the
Behavioral and Emotional Screening System in a rural elementary school:
Validity of the BESS. Journal of Psychoeducational Assessment, 30, 527 – 538.
doi: 10.1177/07342829440673
Kóbor, A., Takács, Á., & Urbán, R. (2013). The bifactor model of the Strengths and
Difficulties Questionnaire. European Journal of Psychological Assessment, 29,
299 – 307. doi: 10.1027/1015-5759/a000160
Lance, C.E., Butts, M.M., & Michels, L.C. (2006). The sources of four commonly
reported cutoff criteria: What did they really say? Organizational Research
Methods, 9, 202 – 220. doi: 10.1177/10944228105284919
Levitt, J.M., Saka, N., Romanelli, L.H., & Hoagwood, K. (2007). Early identification of
mental health problems in schools: The status of instrumentation. Journal of
School Psychology, 45, 163 – 191. doi: 10.1016/j.jsp.2006.11.005
Louisiana Department of Education (n.d.). Louisiana believes: Attendance requirements.
Retrieved from https://www.louisianabelieves.com/courses/attendance-
requirements
MacCallum, R.C., Widaman, K.F., Zhang, S., & Hong, S. (1999). Sample size in factor
analysis. Psychological Methods, 4, 84 – 99. doi: 10.1037/1082-989X.4.1.84
89
McIntosh, K., Campbell, A.L., Carter, D.R., & Zumbo, B.D. (2009). Concurrent validity
of office discipline referrals and cut points used in schoolwide positive behavior
support. Behavioral Disorders, 34, 100 – 113.
Michel, C., Schultze-Lutter. F. & Schimmelmann, B.G. (2014). Screening instruments in
child and adolescent psychiatry: General and methodological considerations.
European Journal of Child and Adolescent Psychiatry, 23, 725 – 727. doi:
10.10007/s00787-014-0608-x
Morrissey, T.W., Hutchison, L., & Winsler, A. (2014). Family income, school
attendance, and academic achievement in elementary school. Developmental
Psychology, 50, 741 – 753. doi: 10.1037/a0033848
Naser, S., Hitti, A., & Overstreet, S. (2016). The Behavioral and Emotional Screening
System Student Form: Is there evidence of a global at-risk factor in a sample of
African American youth? Manuscript submitted for publication.
Netland, M. (2001). Assessment of exposure to political violence and other potentially
traumatizing events. A critical review. Journal of Traumatic Stress, 14, 311 –
326. doi: 10.1023/A:1011164901867
New Orleans Parents’ Guide (2014). New Orleans parents’ guide to public schools:
Spring 2014 Edition. Retrieved from http://neworleansparentsguide.org/files/
NOPG2014.pdf
New Orleans Parents’ Guide (2015). New Orleans parents’ guide to public schools:
Spring 2015 Edition. Retrieved from http://neworleansparentsguide.org/
90
Overstreet, S., & Mazza, J.J. (2003). An ecological-transactional understanding of
community violence: Theoretical perspectives. School Psychology Quarterly, 18,
66 – 87. doi: 10.1521/scpq.18.1.66.20874
Phares, V., & Compas, B.E. (1990). Adolescents’ subjective distress over their
emotional/behavioral problems. Journal of Consulting and Clinical Psychology,
58, 596 – 603. doi: 10.1037/0022-006X.58.5.596
Positive Behavioral Interventions & Supports (PBIS; 2016). PBIS: Positive behavioral
interventions & supports: OSEP technical assistance center. Retrieved from
http://www.pbis.org/
Putnam, R.F., Luiselli, J.K., Handler, M.W., & Jefferson, G.L. (2003). Evaluating student
discipline practices in a public school through behavioral assessment of office
referrals. Behavior Modification, 27, 505 – 523. doi: 10.1177/0145445503255569
Raines, T.C., Dever, B.V., Kamphaus, R.W., & Roach, A.T. (2012). Universal screening
for behavioral and emotional risk: A promising method for reducing
disproportionate placement in special education. The Journal of Negro Education,
81, 283 – 296.
Reise, S.P. (2012). The rediscovery of bifactor measurement models. Multivariate
Behavioral Research, 47, 667 – 696. doi: 10.1080/00273171.2012.71555
91
Renshaw, T.L., Eklund, K., Dowdy, E., Jimerson, S.R., Hart, S.R., Earhart, Jr., J., &
Jones, C.N. (2009). Examining the relationship between scores on the Behavioral
and Emotional Screening System and student academic, behavioral, and
engagement outcomes: An investigation of concurrent validity in elementary
school. The California School Psychologist, 14, 81 – 88. doi:
10/1007/BF03340953
Reynolds, C.R., & Kamphaus, R.W. (2004). Behavior Assessment System for Children –
second edition (BASC-2). Circle Pines, MN: AGS.
Nastasi, B.K., Moore, R.B., & Varjas, K.M. (2004). School-based mental health services:
Creating comprehensive and culturally specific programs. Washington, DC:
American Psychological Association.
Salbach-Andrae, H., Lenz, K., & Lehmkuhl, U. (2009). Patterns of agreement among
parent, teacher and youth ratings in a referred sample. European Psychiatry, 24,
345 – 351. doi: 10.1016/j.eurpsy.2008.07.008
Schanding, Jr., G.T., & Nowell, K.P. (2013). Universal screening for emotional and
behavioral problems: Fitting a population-based model. Journal of Applied School
Psychology, 29, 104 – 119. doi: 10.1080/15377903.2013.751479
Sims, P., & Vaughn, D. (2014). The state of public education in New Orleans: 2014
report. Retrieved from www.speno2014.com/wpcontent/uploads/2014/08/
SPENO-HQ.pdf
Skiba, R.J., Horner, R.H., Chung, C.-G., Rausch, M.K., May, S.L., & Tobin, T. (2011).
Race is not neutral: A national investigation of African American and Latino
disproportionality in school discipline. School Psychology Review, 40, 85-107.
92
Skiba, R.J., Michael, R.S., Nardo, A.C., & Peterson, R.L. (2002). The color of discipline:
Sources of racial and gender disproportionality in school punishment. The Urban
Review, 34, 317 – 342. doi: 10.1023/A:1021320817372
Splett, J.W., Fowler, J., Weist, M.D., McDaniel, H., & Dvorsky, M. (2013). The critical
role of school psychology in the school mental health movement. Psychology in
the Schools, 50, 245 – 258. doi: 10.1002/pits.21677
Streiner, D.L. (2003). Diagnosing tests: Using and misusing diagnostic and screening
tests. Journal of Personality Assessment, 81, 209 – 219. doi:
10.1207/S153277552JPA8103_03
Tabachnick, B.G., & Fidell, L.S. (2007). Using multivariate statistics (5th ed.). Boston,
MA: Pearson Education.
U.S. Department of Education: Office of Civil Rights. (2016). 2013 – 2014 civil rights
data collection: A first look. Retrieved from http://www2.ed.gov/about/offices
/list/ocr/docs/CRDC2013-14-first-look.pdf
Walker, B.A. (2010). Effective schoolwide screening to identify students at risk for social
and behavioral problems. Behavior Management, 46, 104 – 110. doi:
10.1177/1053451210374989
Walker, B., Cheney, D., Stage, S., & Blum, C. (2005). Schoolwide screening and positive
behavior supports: Identifying and supporting students at risk for school failure.
Journal of Positive Behavior Intervention, 7, 194 – 204. doi:
10.1177/10983007050070040101
93
Walker, H.M., Nishioka, V.M., Zeller, R., Severson, H.H., & Feil, E.G. (2000). Causal
factors and potential solutions for the persistent underidentification of students
having emotional or behavioral disorders in the context of schooling. Assessment
for Effective Intervention, 26, 29 – 39. doi: 10/1177/073724770002600105
Wang, M.-T., & Fredricks, J.A. (2014). The reciprocal links between school engagement,
youth problem behaviors, and school dropout during adolescence. Child
Development, 85, 722 – 737 doi: 10.1111/cdev.121138
Way, S.M. (2011). School discipline and disruptive classroom behavior: The moderating
effects of student perceptions. The Sociological Quarterly, 52, 346 – 375. doi:
10.1111/j.1533-8525.2011.01210.x
Wiesner, M., & Schanding, G. T. (2013). Exploratory structural equation modeling,
bifactor models, and standard confirmatory factor analysis models: Application to
the BASC-2 Behavioral and Emotional Screening System Teacher Form. Journal
of School Psychology, 51, 751 – 763. doi: 10.1016/j.jsp.2013.09.001
Yang, Y., & Green, S.B. (2010). A note on structural equation modeling estimates of
reliability. Structural Equation Modeling: A Multidisciplinary Journal, 17, 66 –
81. doi: 10.1080/10705510903438963
Young, E.L., Sabbah, H.Y., Young, B.J., Reiser, M.L., & Richardson, M.J. (2010).
Gender differences and similarities in a screening process for emotional and
behavioral risks in secondary schools. Journal of Emotional and Behavioral
Disorders, 18, 225 – 235. doi: 10.1177/1063426609338858
94
Zins, J.E., Bloodworth, M.R., Weissberg, R.P., & Walberg, H.J. (2007). The scientific
base linking social and emotional learning to school success. Journal of
Educational and Psychological Consultation, 17, 191 – 210. doi:
10.1080/10474410701413145
Zolog, T.C., Jane-Ballabriga, M.C., Bonillo-Martin, A., Canals-sans, J., Hernandez-
Martinez, C., Romero-Acosta, K., & Domenech-Llaberia, E. (2011). Somatic
complaints and symptoms of anxiety and depression in a school-based sample of
preadolescents and early adolescents, functional impairment and implications for
treatment. Journal of Cognitive and Behavioral Psychotherapists, 11, 191 – 208.
95
BIOGRAPHY
Kathryn Jones received her Bachelor of Science in Psychology, Bachelor of Arts in
Sociology, and Master of Science in Psychology from Tulane University. She also has a
Master of Arts in Forensic Psychology from Marymount University. She is currently a
doctoral candidate in School Psychology at Tulane University. She completed her
internship through the Psychological Services Center with the Illinois School Psychology
Internship Consortium. On internship, she had the opportunity to work in schools and in
primary care, which allowed her to work across settings to benefit the mental health of
youth and their families. Her research interests focus on the relationship between
psychosocial and physical factors in the development and maintenance of pediatric
somatic symptoms. Additionally, she is interested in the role of universal screenings in
schools and medical settings to improve identification, prevention, and intervention of
behavioral and emotional difficulties in youth. Kathryn will complete her post-doctoral
training in integrated pediatric primary care with Geisinger Medical Systems starting in
August 2016.
Top Related