ANRA Manual
Transcript of ANRA Manual
Copyright © 2006 NCS Pearson, Inc. All rights reserved.
Advanced Numerical Reasoning Appraisal TM
(ANRA)
Manual
John Rust
888-298-6227 • TalentLens.com
888-298-6227 • TalentLens.com
Copyright © 2006 by NCS Pearson, Inc. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the copyright owner. The Pearson and TalentLens logos, and Advanced Numerical Reasoning Appraisal are trademarks,in the U.S. and/or other countries, of Pearson Education, Inc. or its affiliate(s). Portions of this work were previously published. Printed in the United States of America.
Copyright © 2006 by NCS Pearson, Inc. All rights reserved.
Table of Contents
Acknowledgements Chapter 1 Introduction............................................................................................. 1
Numerical Reasoning and Critical Thinking ............................................................ 2
Chapter 2 History and Development of ANRA....................................................... 4 Description of the Test ............................................................................................ 4
Adapting RANRA .................................................................................................... 4
Development of RANRA ......................................................................................... 5
Chapter 3 Directions for Administration ................................................................ 6 General Information ................................................................................................ 6
Preparing for Administration.................................................................................... 6
Testing Conditions .................................................................................................. 7
Answering Questions .............................................................................................. 7
Administering the Test ............................................................................................ 7
Scoring and Reporting ............................................................................................ 8
Test Security ........................................................................................................... 8
Concluding Test Administration .............................................................................. 8
Administering ANRA and Watson-Glaser Critical Thinking Appraisal® in a Single Testing Session..................................................................................... 9
Accommodating Examinees with Disabilities .......................................................... 9
Chapter 4 ANRA Norms Development.................................................................... 10 Using ANRA as a Norm- or Criterion-Referenced Test........................................... 10
Using Norms to Interpret Scores............................................................................. 11
Converting Raw Scores to Percentile Ranks .......................................................... 12
Using Standard Scores to Interpret Performance ................................................... 12
Converting z Scores to T Scores....................................................................... 13
Using ANRA and Watson-Glaser Critical Thinking Appraisal Together .................. 14
Copyright © 2006 by NCS Pearson, Inc. All rights reserved.
Chapter 5 Evidence of Reliability............................................................................ 15 Reliability Coefficients and Standard Error of Measurement................................... 15
RANRA Reliability Studies ..................................................................................... 17
ANRA Reliability Studies......................................................................................... 17
Evidence of Internal Consistency ...................................................................... 18
Evidence of Test-Retest Stability ...................................................................... 20
Chapter 6 Evidence of Validity ................................................................................ 20 Face Validity............................................................................................................ 20
Evidence Based on Test Content............................................................................ 21
Evidence Based on Test-Criterion Relationships.................................................... 22
Correlations Between ANRA Test1 and Test 2....................................................... 25
Evidence of Convergent and Discriminant Validity ................................................. 25
Correlations Between ANRA and Watson-Glaser
Critical Thinking Appraisal—Short Form ........................................................... 25
Correlations Between ANRA and Other Tests .................................................. 26
Chapter 7 Using ANRA as an Employment Selection Tool .................................. 27 Employment Selection ............................................................................................ 27
Using ANRA in Making a Hiring Decision ............................................................... 27
Differences in Reading Ability, Including the Use of English
as a Second Language .......................................................................................... 29
Using ANRA as a Guide for Training, Learning, and Education.............................. 29
Fairness in Selection Testing .................................................................................. 30
Legal Considerations......................................................................................... 30
Group Differences and Adverse Impact ............................................................ 30
Monitoring the Selection System....................................................................... 31
References................................................................................................................... 32
Appendices Appendix A Description of the Normative Sample .................................................... 35
Appendix B ANRA Total Raw Scores, Mid-Point Percentile Ranks, and T Scores by Norm Group....................................................................... 37
Appendix C Combined Waston-Glaser and ANRA T Scores and Percentile Ranks by Norm Group.......................................................... 39
Copyright © 2006 by NCS Pearson, Inc. All rights reserved.
Tables Table 5.1 Coefficient Alpha, Odd-Even Split-Half Reliability, and
Standard Error of Measurement (SEM) for RANRA (from Rust, 2002, p.85).......................................................................... 17
Table 5.2 ANRA Means, Standard Deviations (SD), Standard Errors of Measurement (SEM), and Internal Consistency Reliability Coefficients (Alpha) ............................................................................... 18
Table 5.3 ANRA Test-Retest Stability (N = 73)...................................................... 19
Table 6.1 Evidence of ANRA Criterion-Related Validity (Total Raw Score) of Job Incumbents in Various Finance-Related Occupations and Position Levels ............................................................................... 24
Table 6.2 Correlations Between Watson-Glaser Critical Thinking Appraisal—Short Form and ANRA (N = 452) ........................................ 25
Table 6.3 Correlations Between ANRA, the Miller Analogies Test for Professional Selection (MAT for PS), and the Differential Aptitude Tests for Personnel and Career Assessment—Numerical Ability (DAT for PCA—NA) .................................................................... 26
Figure Figure 4.1 The Relationship of Percentiles to T Scores ......................................... 14
Copyright © 2006 by NCS Pearson, Inc. All rights reserved.
Acknowledgements
Pearson’s Talent Assessment group would like to recognize and thank Professor John Rust, Director of the Psychometrics Center at the University of Cambridge, United Kingdom, for his seminal efforts that led to his development of the Rust Advanced Numerical Reasoning Appraisal (RANRA). This manual details our adaptation of RANRA for use in the United States—the Advanced Numerical Reasoning Appraisal (ANRA).
We are indebted to numerous professionals and organizations for their assistance during several phases of our work—project design, data collection, statistical data analyses, editing, and publication.
We acknowledge the efforts of Julia Kearney, Sampling Projects Coordinator; Jane McDonald, Sampling Recruiter; Terri Garrard, Study Manager; David Quintero, Clinical Handscoring Supervisor; Hector Solis, Sampling Manager, and Victoria Locke, Director, Field Research, in driving the data collection activities. Nishidha Goel helped to collate and prepare the data.
We thank Zhiming Yang, PhD, Psychometrician, and JJ Zhu, PhD, Director of Psychometrics, Clinical Products. Dr. Yang’s technical expertise in analyzing the data and Dr. Zhu's psychometric leadership ensured the high level of psychometric integrity of the results.
Our thanks also go to Toby Mahan and Troy Beehler, Project Managers, for diligently managing the logistics of this project. Toby and Troy worked with several team members from the Technology Products Group, Pearson to ensure the high quality and accuracy of the computer interface. These dedicated individuals included Paula Oles, Manager, Software Quality Assurance; Christina McCumber, Software Quality Assurance Analyst; Matt Morris, Manager, System Development; Maurya Buchanan, Technical Writer; and Alan Anderson, Director, Technology Products Group. Dawn Dunleavy, Senior Managing Editor, Konstantin Tikhonov, Project Editor; and Marion Jones, Director, Mathematics, provided editorial guidance. Mark Cooley assisted with the design of the cover.
Finally, we wish to acknowledge the leadership, guidance, support, and commitment of the following people through all the phases of this project: Jenifer Kihm, PhD, Senior Product Line Manager, Talent Assessment; John Toomey, Director, Talent Assessment; Paul McKeown, International Product Development Director; Judy Chartrand, PhD, Director, Test Development; Gene Bowles, Vice President, Publishing and Technology; Larry Weiss, PhD, Vice President, Psychological Assessment Products Group; and Aurelio Prifitera, PhD, Group President and CEO of Clinical Assessment/Worldwide.
Kingsley C. Ejiogu, PhD, Research Director
John Trent, M.S., Research Director
Mark Rose, PhD, Research Director
Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 1
Chapter 1
Introduction
The Advanced Numerical Reasoning Appraisal (ANRA) measures the ability to recognize,
understand, and apply mathematical and statistical reasoning. Specifically, ANRA measures
numerical reasoning abilities that involve deduction, interpretation, and evaluation. Numerical
reasoning, as measured by ANRA, is operationally defined as the ability to correctly perform the
domain of tasks represented by two sets of items—Comparison of Quantities and Sufficiency of
Information. Both require the use of analytical skills rather than straightforward computational
skills. The key attribute ANRA measures is an individual’s ability to apply numerical reasoning
to everyday problem solving in professional and business settings.
Starkey (1992) describes numerical reasoning as comprising “a set of abilities that are used to
operate upon or mentally manipulate representations of numerosity” (p. 94). Research suggests
that numerical reasoning abilities exist even in infancy, before children begin to receive explicit
instruction in mathematics in school (Brannon, 2002; Feigenson, Dehaene, & Spelke, 2004;
Spelke, 2005; Starkey, 1992; Wynn, Bloom, & Chiang, 2002). As Spelke (2005) observed,
children harness these core abilities when they learn mathematics, and adults use the core abilities
to engage in mathematical and scientific thinking.
The numerical reasoning skill is the foundation of all other numerical ability (Rust, 2002). This
skill enables individuals to learn how to evaluate situations, how to select and apply strategies for
problem-solving, how to draw logical conclusions using numerical data, how to describe and
develop solutions, and to recognize when and how to apply the solutions. Eventually, one is able
to reflect on solutions to problems and determine whether the solutions make sense.
The nature of work is changing significantly and there is an increased demand for a new kind of
worker—the knowledge worker (Hunt, 1995). As Facione (2006) observed, though the ability to
think critically and make sound decisions does not absolutely guarantee a life of happiness and
economic success, having this ability equips an individual to improve his or her future and
contribute to society. As the Internet has transformed home life and leisure time, people have
been deluged with data of ever-increasing complexity. They must select, interpret, digest,
evaluate, learn, and apply information.
Employers are typically interested in tests that measure candidates' ability to apply constructively
and critically, rather than by rote, what they have learned. A person can be trained or educated to
Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 2
engage in numerical reasoning; as a result, tests that measure the ability to use mathematical
reasoning within the context of work have an important function in career development. Such
tests enable an organization to identify candidates who may need to improve their skills to
enhance their work effectiveness and career success.
Numerical Reasoning and Critical Thinking In a skills search of the O*Net OnLine database for “Mathematics” (defined by O*Net OnLine as
“using mathematics to solve problems”) and “Critical Thinking” (defined by O*Net OnLine as
“using logic and reasoning to identify the strengths and weaknesses of alternative solutions,
conclusions, or approaches to problems”), both of these skills were rated as “Very Important” for
as many as 99 occupations (accountant, actuary, auditor, financial analyst, government service
executive, management analyst, occupational health and safety specialist, etc.). Numerical
reasoning and critical thinking are essential parts of the cognitive complexity that is a basic factor
for understanding group differences in work performance (Nijenhuis & Flier, 2005).
Both numerical reasoning and critical thinking are higher-order thinking skills—“fundamental
skills that are essential to being a responsible, decision-making member of the work-place”
(Paul & Nosich, 2004, p. 5). Paul and Nosich contrasted the higher-order thinking skills with such
lower-order thinking skills as rote memorization and recall, and they noted that critical thinking
could be applied to any subject matter and any situation where reasoning is relevant. Such a
subject matter or situation could range from accounting (Kealy, Holland, & Watson, 2005;
American Institute of Certified Public Accountants, 1999), through medicine (Vandenbroucke,
1998), to truck driving (Nijenhuis & Flier, 2005). As Paul and Nosich (2004) stated, in any
context where we are thinking well, we are thinking critically.
The enhancement of critical thinking in U.S. college students is a national priority (National
Educational Goals Panel, 1991). In a paper commissioned by the United States Department of
Education, Paul and Nosich (2004) highlighted what the National Council for Excellence in
Critical Thinking Instruction regarded as a basic principle of critical thinking instruction as
applied to subject-matter teaching: “to achieve knowledge in any domain, it is essential to think
critically” (Paul & Nosich, p. 33). Critical thinking is the skill that is required to increase the
probability of desirable outcomes in our lives, such as making the right career choice, using
money wisely, or planning our future. Such critical thinking is reasoned, purposeful, and goal
directed. At the cognitive level, such critical thinking involves solving problems, formulating
inferences, calculating likely outcomes and decision-making. Once people have developed this
Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 3
critical thinking skill, they are able to apply it in a wide variety of circumstances. Critical
thinking can involve proper language use, applied logic, and practical mathematics.
Because ANRA items require higher-order numerical reasoning skills, rather than rote calculation
to solve, using the Watson-Glaser Critical Thinking Appraisal® (a reliable and valid test of verbal
critical thinking) in conjunction with ANRA provides a demanding, high-level measurement of
numerical reasoning and verbal critical thinking skills, respectively. These two skills are
important when recruiting in the competitive talent assessment market.
In response to requests from Watson-Glaser Critical Thinking Appraisal customers in the United
Kingdom, The Psychological Corporation (now Pearson) in the UK developed the Rust Advanced
Numerical Reasoning Appraisal (RANRA) in 2000 as a companion numerical reasoning test for
the Watson-Glaser Critical Thinking Appraisal. In 2006, Pearson adapted RANRA to enhance
the suitability and applicability of the test in the United States. This manual contains detailed
information on the U.S. adaptation—ANRA.
Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 4
Chapter 2
History and Development of ANRA
Description of the Test
ANRA consists of a set of two tests: Test 1—Comparison of Quantities and Test 2—Sufficiency
of Information. The candidate must apply his or her numerical reasoning skills to decisions that
reflect the wide variety of numerical estimation and analytic tasks frequently encountered in
many everyday situations at work or in a learning environment.
The two ANRA tests are designed to measure different, but interdependent, aspects of numerical
reasoning. The tests require the candidate to consider alternatives (either by comparing quantities
or judging information to be sufficient) in relation to given problems. The examinee's task is to
study each problem and to evaluate the appropriateness or validity of the alternatives. The ANRA
maximum total raw score is 32.
Because ANRA is intended as a test of numerical reasoning power rather than speed, there is no
rigid time limit for taking the test. Candidates should be given as much time as they reasonably
need to finish the test. An individual typically completes the test in about 45 minutes. About 90%
of the 452 individuals in the normative group who were employed in professional, management,
and higher-level positions completed the test within 75 minutes.
Adapting RANRA The Rust Advanced Numerical Reasoning Appraisal (RANRA) was adapted to reflect U.S.
English and U.S. measurement units. Because RANRA measures reasoning more than
computation, only the measurement units were changed and the original numbers were kept,
except in cases where it affected the realism of the situation. For example, “82 kilograms” was
changed to “82 pounds,” though 82 kg = 180.4 lbs. Similarly, “5,000 British pounds sterling” was
changed to “5,000 U.S. dollars,” though 5,000 British pounds sterling ≠ 5,000 U.S. dollars.
ANRA contains the original 32 RANRA items plus additional items for continuous test
improvement purposes. All the items were reviewed by a group comprising 16 individuals—
researchers in test development, financial analysts, business development professionals,
industrial/organizational psychologists, and editors in test publishing. Item sentence construction
was modified in some items, based on input from the American reviewers.
Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 5
Development of RANRA In developing RANRA (2002), Rust first did a conceptual analysis of the role of critical thinking
in the use of mathematics. Through this conceptual analysis, he identified the two subdomains of
comparison of quantities and sufficiency of information as the key concepts in developing an
assessment of mathematical reasoning. Rust then constructed 80 items and had a panel of
educators and psychologists evaluate and modify them, and then generated the pilot version of
RANRA. This pilot version of RANRA was administered to 76 students and staff from diverse
subject backgrounds within the University of London. The data were subjected to detailed
analysis at the item level. Distractor analysis led to the modification of some items. Item-
difficulty values were calculated for each item, based on the proportion of examinees passing
each item. The discrimination index was also calculated, and those items that showed they were
measuring a common quality in numerical reasoning were identified and retained. This approach
led to the development of the 32-item RANRA.
Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 6
Chapter 3
Directions for Administration
General Information
ANRA is administered through the online testing platform at TalentLens.com, an Internet-based
testing system designed by Pearson, for the administration, scoring, and reporting of professional
assessments. Instructions for administrators on how to order and access the test online are
provided at TalentLens.com. Instructions for accessing ANRA interpretive reports are provided
on the website. After a candidate has taken ANRA online, the test administrator can use the link
Pearson provides to review the candidate’s results in an interpretive report.
Preparing for Administration Being thoroughly prepared before administering the test results in a more efficient administration
session. Test administrators should take ANRA prior to administering the test and comply with
the directions. Candidates are not allowed to use calculators or similar calculation devices while
completing the test. Test administrators should provide candidates with pencils, an eraser, and a
sheet of paper to write their calculations if needed.
Test administration must comply with the code of practice of the testing organization, applicable
government regulations, and the recommendations of the test publisher. Candidates should be
informed before the testing session about the nature of the assessment, why the test is being used,
the conditions under which they will be tested, and the nature of any feedback they will receive.
Test administrators need to assure candidates that their test results will remain confidential.
The test administrator must obtain informed consent from the candidate before testing. The
informed consent is a written statement that explains the type of test to be administered, the
purpose of the test, as well as who will have access to the test data, signed by the candidate. It is
the responsibility of the test user to ensure that candidates understand the testing procedure. The
test administrator should also ensure that all relevant background information from the candidate
is collected and verified (e.g., name, gender, educational level, current employment, occupational
history, and so on).
Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 7
Testing Conditions The test administrator has a significant responsibility to ensure that the conditions under which
the test is taken do not contain undesirable influences on the test performance of candidates. Such
undesirable influences can either inflate or reduce the test scores of candidates. Poor
administration of a test undermines the value of test scores and makes an accurate interpretation
of results very difficult, if not impossible.
It is important to ensure that the test is administered in a quiet, well-lit room. The following
conditions are necessary for accurate scores and for maintaining the cooperation of the examinee:
good lighting, comfortable seating, adequate desk or table space, comfortable positioning of the
computer screen, keyboard and mouse, and freedom from noise and other distractions.
Interruptions and distractions from outside should be kept to a minimum, if not eliminated.
Answering Questions The test administrator may answer examinees' questions about the test before giving the signal to
begin. To maintain standard testing conditions, answer such questions by re-reading the
appropriate section of these directions. Do not volunteer new explanations or examples. The test
administrator is responsible for ensuring that examinees understand the correct way to indicate
their answers and what is required of the examinees. The question period should never be rushed
or omitted.
If any examinees have routine questions after the testing has started, try to answer them without
disturbing the other examinees. However, questions about the test items should be handled by
telling the examinee to do his or her best.
Administering the Test After the examinee is seated at the computer and the initial instruction screen for ANRA appears,
say,
The on-screen directions will take you through the entire process that begins with some demographic questions. After you have completed these questions, the test will begin. You will have as much time as you reasonably need to complete the test items. The test ends with a few additional demographic questions. Do you have any questions before starting the test?
Answer any questions and say, Please begin the test.
Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 8
Once the examinee clicks the “Start Your Test” button, administration begins with the first page
of questions. The examinee may review test items at the end of the test. Allow examinees as
much time as they reasonably need to complete the test. Average completion time is about 45
minutes. About 90% of candidates are finished with the test within 75 minutes.
If an examinee’s computer develops technical problems during testing, the test administrator
should move the examinee to another suitable computer location. If the technical problems cannot
be solved by moving to another computer location, the administrator should contact Pearson’s
Technical Support at 1-888-298-6227 for assistance.
Scoring and Reporting Scoring is automatic, and the report is typically available within a minute after the test is
completed. A link to the report will be available on the online testing platform at TalentLens.com.
Adobe® Acrobat Reader® is required to open the report. The test administrator may view, print, or
save the candidate’s report.
Test Security ANRA scores are confidential and should be stored in a secure location accessible only to
authorized individuals. It is unethical and poor test practice to allow test-score access to
individuals who do not have a legitimate need for the information. Storing test scores in a locked
cabinet or password-protected file that can only be accessed by designated test administrators will
help ensure the security of the test scores. The security of testing materials (e.g., access to online
tests) and protection of copyright must also be maintained by authorized individuals. Avoid
disclosure of test access information such as usernames or passwords, and only administer ANRA
in proctored environments. All the computer stations used in administering ANRA must be in
locations that can be easily supervised and with adequate level of security.
Concluding Test Administration At the end of the testing session, thank the candidate(s) for his or her participation and check the
computer station(s) to ensure that the test is closed.
ANRA can be a demanding test for some candidates. It may be constructive to clarify what part
the test plays within the context of the selection or assessment procedures. It is also constructive
to reassure candidates about the confidentiality of their test scores.
Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 9
Administering ANRA and Watson-Glaser Critical Thinking Appraisal in a Single Testing Session When administering the ANRA and the Watson-Glaser in a single testing session, administer the
Watson-Glaser first. Just as ANRA is intended as a test of numerical reasoning power rather than
speed, the Watson-Glaser is intended as a test of critical thinking power rather than speed. Both
tests are untimed; administration of ANRA and the Watson-Glaser Short Form in one session
should take about 1 hour and 45 minutes.
Accommodating Examinees With Disabilities The Americans with Disabilities Act (ADA) of 1990 requires an employer to reasonably
accommodate the known disability of a qualified applicant, provided such accommodation would
not cause an “undue hardship” to the operation of the employer’s business.
The test administrator should provide reasonable accommodations to enable candidates with
special needs to comfortably take the test. Reasonable accommodations may include, but are not
limited to, modifications to the test environment (e.g., high desks) and medium (e.g., having a
reader read questions to the examinee, or increasing the font size of questions) (Society for
Industrial and Organizational Psychology, 2003). In situations where an examinee’s disability is
not likely to impair his or her job performance, but may hinder the examinee’s performance on
ANRA, the organization may want to consider waiving the test or de-emphasizing the score in
lieu of other application criteria. Interpretive data as to whether scores on ANRA are comparable
for examinees who are provided reasonable accommodations are not available at this time due to
the small number of examinees who have requested such accommodations.
Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 10
Chapter 4
ANRA Norms Development
Norms provide a basis for evaluating an individual's score relative to the scores of other
individuals who took the same test. Norms allow for the conversion of raw scores to more useful
comparative scores, such as percentile ranks. Typically, norms are constructed from the scores of
a large sample of other individuals who took the test under similar conditions. This group of
individuals is called the norm group.
The characteristics of the sample used for preparing norms are critical in determining the
usefulness of those norms. For such purposes as selecting from among applicants to fill a
particular job, normative information derived from a specific, relevant, well-defined group might
be most useful. However, the composition of the sample of job applicants is influenced by a
variety of situational factors, including the job demands and local labor market conditions.
Because such factors can vary across jobs, locations, and over time, the limitations on the
usefulness of any set of published norms should be recognized.
When a test is used to make employment decisions, the most appropriate norm group is one that
is representative of those who will be taking the test in the local situation. It is best, whenever
possible, to prepare local norms by accumulating the test scores of applicants, trainees, or
employees. One of the factors that must be considered in establishing norms is sample size. Data
from small samples tend to be unstable and the presentation of percentile ranks for all possible
scores is imprecise. As a result, the use of in-house norms is only recommended when the sample
is sufficiently large (about 100 or more people). Until a sufficient and representative number of
cases has been collected, the test user should consider norms based on other similar groups rather
than from local data with a small sample size. In the absence of adequate local norms, the norms
provided in Appendixes A and B should be used to guide the interpretation of scores.
Using ANRA as a Norm- or Criterion-Referenced Test
ANRA may be used as a norm-referenced or as a criterion-referenced instrument. A norm-
referenced test enables a human resource professional to interpret an individual's test performance
in comparison to a particular normative group. An individual's performance on a criterion-
referenced instrument can only indicate whether or not that individual meets certain, predefined
criteria. It is appropriate to use ANRA as a norm-referenced instrument in the process of
Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 11
employment selections. For optimal results in such decisions, the overall total score, rather than
the subtest scores should be used. Subtest scores represent fewer items and, therefore, are less
stable than the total score. However, as a criterion-referenced measure, it is feasible to use subtest
scores to analyze the numerical reasoning abilities of a class or larger group and to determine the
types of numerical reasoning or critical thinking training that may be most appropriate.
In norm-referenced situations, raw scores need to be converted before they can be compared.
Though raw scores may be used to rank candidates in order of performance, little can be inferred
from raw scores alone. There are two main reasons for this. First, raw scores cannot be treated as
having equal intervals. For example, it would be incorrect to assume that the difference between
raw scores of, say, 20 and 21 is of the same significance as the difference between raw scores of
30 and 31. Second, ANRA raw scores may not be normally distributed. Hence, they are not
subject to the psychometric principles of parametric statistics required for the proper evaluation
of validity.
Using Norms to Interpret Scores The ANRA norms presented in Appendix B and Appendix C were derived from data collected
February 2006 through June 2006, from 452 adults in a variety of employment settings. The
tables in Appendix B (Tables B.1 and B.2) show the ANRA total raw scores with corresponding
percentile ranks and T scores for the identified norm groups.
When using the norms tables in Appendix B, look for a group that is similar to the individual or
group tested. For example, you would compare the test score of a person who applied for a
Manager position with norms derived from the scores of other managers. When using the norms
in Appendix B to interpret candidates’ scores, keep in mind that norms are affected by the
composition of the groups that participated in the normative study. Therefore, it is important to
examine specific position level and occupational characteristics of a norm group.
By comparing an individual’s raw score to the data in a norms table, it is possible to determine
the percentile rank corresponding to that score. The percentile rank indicates an individuals’
relative position in the norm group. Percentiles should not be confused with percentage scores
that represent the percentage of correct items. Percentiles are derived scores that are expressed in
terms of the percent of people in the norm group scoring equal to or below a given raw score.
Percentiles have the advantage of being readily understood and universally applicable. However,
although percentiles are useful for expressing an examinee’s performance relative to other
candidates, percentiles have limitations. For example, percentile ranks do not have equal
intervals. While percentiles indicate the relative position of each candidate in relation to the
Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 12
normative sample, they do not show the amount of difference between scores. In a normal
distribution of scores, percentile ranks tend to cluster around the 50th percentile. This clustering
affects scores in the average range the most because a difference of one or two raw score points
may change the percentile rank. Extreme scores are affected less; a change in one or two raw
score points at the extremes typically does not produce a large change in percentile ranks. These
factors should be considered when interpreting percentile ranks.
Converting Raw Scores to Percentile Ranks To find the percentile rank of a candidate’s raw score, locate the ANRA total raw score in Table
B.1 or B.2. The corresponding percentile rank is read from the selected norm group column. For
example, if a person applying for a job as a Director had a score of 25 on ANRA, it is appropriate
to use the Executives/Directors norms in Table B.1 for comparison. In this case, the percentile
rank corresponding to a raw score of 25 is 67. This percentile rank indicates that about 67% of the
people in the norm group scored lower than or equal to a score of 25 on ANRA, and about 33%
scored higher than a score of 25 on ANRA. The lowest raw score will lie at the 1st percentile; the
median raw score will fall at the 50th percentile, and the highest raw score will lie at the 99th
percentile.
Each group’s size (N), raw score mean, and raw score standard deviation (SD) are shown at the
bottom of the norms tables. The group raw score mean or average is calculated by summing the
raw scores and dividing the sum by the total number of examinees. The standard deviation
indicates the amount of variation in a group of scores. In a normal distribution, approximately
two-thirds (68.26%) of the scores are within the range of 1 SD below the mean to 1 SD above the
mean. These statistics are often used in describing a sample and setting cut scores. For example, a
cut score may be set as one SD below the mean. In compliance with the Civil Rights Act of 1991,
Section 5 (a) (1), as amended, the norms provided in Appendix B and Appendix C combine data
for males and females, and for white and minority candidates.
Using Standard Scores to Interpret Performance
Test results can be reported in many different formats. Examples of these formats include raw
scores, percentiles, and various forms of standard scores. Standard scores express the score of
each individual in terms of its distance from the mean. Examples of standard scores are z scores
and T scores. Standard scores do not suffer from the drawbacks associated with percentiles. The
advantage of percentiles is that they are readily understood and, therefore, immediately
meaningful. As indicated above, however, there is a risk of percentiles being confused with
Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 13
percentage scores, or of percentiles being interpreted as an interval scale. Standard scores avoid
the unequal clustering of scores by adopting a scale based on standard deviation units.
The basic type of standard score is the z score, which is a raw score converted to a standard
deviation unit. Thus a raw score that is 0.53 standard deviations below the mean score for the
group receives a z score of –0.53. z scores are generally in the –3.00 to + 3.00 range. However,
there are certain disadvantages in saying that a person has a score of –0.53 on a test. From the
point of view of presentation, the use of decimal points and the negative symbol is unappealing.
Hence, certain transformation algorithms have become available that enable a more user-friendly
image for standard scores.
Converting z Scores to T Scores To convert a z score to a T score, multiply the z score by 10 and add 50. Thus, a z score of –0.53
becomes a T score of 44.7, which is then rounded, as a matter of convention, to the nearest whole
number, that is, 45. A set of T scores has a mean of 50 and at each standard deviation point there
is a score difference of 10. Thus, a T score of 30 is at two standard deviations below the mean,
while a T score of 60 is one standard deviation above the mean. The T score transformation
results in a scale that runs from 10 to 90, with each 10th interval coinciding with a standard
deviation point. Appendix B shows ANRA T scores. Appendix C shows the sum of Watson-
Glaser and ANRA T scores and their corresponding percentiles. Because the Watson-Glaser and
ANRA do not measure identical constructs, their combined T scores must be derived by first
transforming separate Watson-Glaser and ANRA raw score pairs to their respective T scores, and
then summing the T scores. Figure 4.1 illustrates the relationship between percentiles and
T scores.
Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 14
Figure 4.1 The Relationship of Percentiles to T Scores
Using ANRA and Watson-Glaser Critical Thinking Appraisal Together The ANRA and Watson-Glaser combined score provides a broader range of critical reasoning
skills than would be obtained by the use of each test alone. Scores from ANRA and the Watson-
Glaser can be combined by first converting each total raw score to a T score and then adding the
two T scores together. The sum of the T scores can also be converted to percentile ranks.
Appendix C (Tables C.1 and C.2) shows the percentile ranks of the sum of ANRA and
Watson-Glaser Short Form T scores.
Another potential benefit from using ANRA and the Watson-Glaser together is in the expected
difference between scores on the two tests. This expected difference depends on the type of norm
group to which the candidate belongs. Generally speaking, candidates in financial or scientific
occupations are expected to score higher on ANRA than on the Watson-Glaser. On the other
hand, managers, particularly in fields where critical thinking using language is a key skill, and
employees in occupations that do not require a great deal of numeracy, will be expected to
perform better on the Watson-Glaser than on ANRA. By examining the difference between a
candidate’s Watson-Glaser and ANRA scores, the user can make appropriate development
suggestions to the candidate.
Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 15
Chapter 5
Evidence of Reliability
The reliability of a measurement instrument refers to the accuracy, consistency, and precision of
test scores across situations (Anastasi & Urbina, 1997). Test theory posits that a test score is an
estimate of an individual’s hypothetical true score, or the score an individual would receive if the
test were perfectly reliable. In actual practice, however, some measurement error is to be
expected. A reliable test has relatively small measurement error.
The methods most commonly used to estimate test reliability are test–retest (the stability of test
scores over time), alternate forms (the consistency of scores across alternate forms of a test), and
internal consistency of the test items (e.g., Cronbach’s alpha coefficient, Cronbach 1970).
Decisions about the form of reliability to be used in comparing tests depend on a consideration of
the nature of the error that is involved in each form. Different types of error can be operating at
the same time, so it is to be expected that reliability coefficients will differ in different situations
and on different groupings and samplings of respondents. An appropriate estimate of reliability
can be obtained from a large representative sample of the respondents to whom the test is
generally administered.
Reliability Coefficients and Standard Error of Measurement The reliability of a test is expressed as a correlation coefficient, which represents the consistency
of scores that would be obtained if a test could be given an infinite number of times. Reliability
coefficients are a type of estimate of the amount of error associated with test scores and can range
from .00 to 1.00. The closer the reliability coefficient is to 1.00, the more reliable the test. A
perfectly reliable test would have a reliability coefficient of 1.00 and no measurement error. A
completely unreliable test would have a reliability coefficient of .00. The U.S. Department of
Labor (1999) provides the following general guidelines for interpreting a reliability coefficient:
above .89 is considered “excellent,” .80–.89 is “good,” .70–.79 is considered “adequate,” and
below .70 “may have limited applicability.”
Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 16
Repeated testing leads to some variation. Consequently, no single test event effectively measures
an examinee’s actual ability with complete accuracy. Therefore, an estimate of the possible
amount of error present in a test score, or the amount that scores would probably vary if an
examinee were tested repeatedly with the same test is necessary. This estimate of error is known
as the standard error of measurement (SEM). The SEM decreases as the reliability of a test
increases. A large SEM denotes less reliable measurement and less reliable scores. The standard
error of measurement is calculated with the formula:
xxrSDSEM −= 1
In this formula, SEM represents the standard error of measurement, SD represents the standard
deviation of the distribution of obtained scores, and rxx represents the reliability coefficient of the
test (Cascio, 1991, formula 7-11).
The SEM is a quantity that is added to and subtracted from an examinee’s standard test score to
create a confidence interval or band of scores around the obtained standard score. The confidence
interval is a score range that, in all likelihood, includes the examinee’s hypothetical “true” score
that represents the examinee’s actual ability. A true score is a theoretical score entirely free of
error. Since the true score is a hypothetical value that can never be obtained because testing
always involves some measurement error, the score obtained by an examinee on any test will vary
somewhat from administration to administration. As a result, any obtained score is considered
only an estimate of the examinee’s “true” score. Approximately 68% of the time, the observed
standard score will lie within +1.0 and –1.0 SEM of the true score; 95% of the time, the observed
standard score will lie within +1.96 and –1.96 SEM of the true score; and 99% of the time, the
observed standard score will lie within +2.58 and –2.58 SEM of the true score.
Using the SEM means that standard scores are interpreted as bands or ranges of scores, rather
than as precise points (Nunnally, 1978). To illustrate the use of SEM with an example, assume a
director candidate obtained a total raw score of 25 on ANRA, with SEM = 2.32. From the
information in Table B.1, the standard score (T score) for this candidate is 57. We can, therefore,
infer that if this candidate were administered a large number of alternative forms of ANRA, 95%
of this candidate’s T scores would lie within the range between 57 –1.96 x 2.32�52 T score
points and 57 + 1.96 x 2.32�62 T score points. We can further infer that the expected average of
this person’s T scores from a large number of alternate forms of ANRA would be 57.
Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 17
Thinking in terms of score ranges serves as a check against overemphasizing small differences
between scores. The SEM may be used to determine if an individual’s score is significantly
different from a cut score, or if the scores of two individuals differ significantly. An example of
one general rule of thumb is that the difference between two scores on the same test should not be
interpreted as significant unless the difference is equal to at least twice the standard error of the
difference (SED), where SED = SEM 2 (Gulliksen, as cited in Cascio, 1991, p.143).
RANRA Reliability Studies Because ANRA is a U.S. adaptation of RANRA, the information on previous studies refers to
RANRA. For the sample used in the initial development of RANRA in the United Kingdom
(N = 1546), Cronbach’s alpha coefficient and split-half reliability were .78 for the overall
RANRA score (Rust, 2002). The reliability coefficients of RANRA for both Test 1 and Test 2
and for the overall RANRA score are shown in Table 5.1.
Table 5.1 Coefficient Alpha, Odd-Even Split-Half Reliability, and Standard Error of Measurement (SEM) for RANRA (from Rust, 2002, p. 8.5)
Alpha Split-Half SEM
Test 1: Comparison of Quantities .63 .60 6.32 Test 2: Sufficiency of Information .70 .71 5.39 RANRA Score .78 .78 4.69
The RANRA score reported in Table 5.1 is a T score transformed from the total raw score, while
the standard error of measurement reported in the table was based on the split-half reliability
(Rust, 2002).
ANRA Reliability Studies Evidence of Internal Consistency Cronbach’s alpha and the standard error of measurement (SEM) were calculated for the sample
used for the ANRA norm groups reported in this manual. The internal consistency reliability
estimates for ANRA total raw score and ANRA subtests are shown in Table 5.2.
Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 18
Table 5.2 ANRA Means, Standard Deviations (SD), Standard Errors of Measurement (SEM), and Internal Consistency Reliability Coefficients (Alpha)
ANRA Total Raw Score Norm Group N Mean SD SEM Alpha Executives/Directors 91 21.3 6.0 2.32 .85 Managers 88 20.1 5.6 2.38 .82 Professionals/Individual Contributors 200 22.1 6.4 2.22 .88 Employees in Financial Occupations 198 21.9 6.4 2.22 .88 ANRA Test 1: Comparison of Quantities Norm Group N Mean SD SEM Alpha Executives/Directors 91 10.9 3.4 1.63 .77 Managers 88 10.3 3.4 1.70 .75 Professionals/Individual Contributors 200 11.4 3.6 1.53 .82 Employees in Financial Occupations 198 11.3 3.5 1.57 .80 ANRA Test 2: Sufficiency of Information Norm Group N Mean SD SEM Alpha Executives/Directors 91 10.4 3.3 1.60 .75 Managers 88 9.9 2.9 1.67 .67 Professionals/Individual Contributors 200 10.7 3.3 1.62 .76 Employees in Financial Occupations 198 10.6 3.3 1.58 .77
The values in Table 5.2 show that the ANRA total raw score possesses good internal consistency
reliability. The ANRA subtests showed lower internal consistency reliability estimates than the
ANRA total raw score. Consequently, the ANRA total score, not the subtest scores, should be
used for optimal hiring results.
Evidence of Test-Retest Stability
ANRA was administered on two separate occasions to determine the stability of performance on
the test over time. A sample of 73 job incumbents representing various occupations and
organizational levels took the test twice. The average test-retest interval was two weeks. The test-
retest stability was evaluated using Pearson’s product-moment correlation of the standardized T
scores from the first and second testing occasions. The test-retest correlation coefficient was
corrected for the variability of the sample (Allen & Yen, 1979). Furthermore, the standard
difference (i.e., effect size) was calculated using the mean score difference between the first and
second testing occasions divided by the pooled standard deviation (Cohen, 1996, Formula 10.4).
This difference (d), proposed by Cohen (1988), is useful as an index to measure the magnitude of
the actual difference between two means. The corrected test-retest stability coefficient was .85.
The difference in mean scores between the first testing and the second testing was statistically
small (d = –0.03). As the data in Table 5.3 indicate, ANRA demonstrates good test-retest stability
over time.
Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 19
Table 5.3 ANRA Test-Retest Stability (N = 73)
First Testing Second Testing Mean SD Mean SD r12
Corrected r12
Standard Difference (d)
ANRA Standardized T score 50.1 9.2 49.8 10.0 .82 .85 –0.03
Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 20
Chapter 6
Evidence of Validity
Validity refers to the degree to which specific data, research, or theory support the interpretation
of test scores entailed by proposed uses of tests (American Educational Research Association
[AERA], American Psychological Association [APA], & National Council on Measurement in
Education [NCME], 1999). Cronbach (1970) observed that validity is high if a test gives the
information the decision maker needs. Several sources of validity evidence are discussed next in
relation to ANRA.
Face Validity Face validity refers to a test's appearance and what the test seems to measure, rather than what the
test actually measures. Face validity is not validity in any technical sense and should not be
confused with content validity. Face validity refers to whether or not a test looks valid to
candidates, administrators and other observers. If test content does not seem relevant to the
candidate, the result may be lack of cooperation, regardless of the actual validity of the test. For a
test to function effectively in practical situations, such a test not only has to be objectively valid
but also face valid.
However, a test cannot be judged solely on whether it “looks right.” Appearance and graphic
design of a test are no guarantee of quality. Face validity should not be considered a substitute for
objectively determined validity. As mentioned in the chapter on the development of ANRA,
ANRA items were reviewed by a group of individuals who provided feedback on the test. The
reviewers provided their feedback regarding issues like clarity of the items, the extent to which
items appeared to measure numerical reasoning, extent to which test content appeared relevant to
jobs that required numerical reasoning, and to what extent they thought the test would yield
useful information. From the responses by this group, it was evident that ANRA had high face
validity and participants recognized its relevance to the skills required by employees who deal
with numbers or project planning. Although the item content of ANRA could not reflect every
work situation for which the test would be appropriate, the operations and processes required in
each subtest represent abilities that are valued and readily appreciated.
Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 21
Evidence Based on Test Content Evidence based on the content of a test exists when the test includes a representative sample of
tasks, behaviors, knowledge, skills, abilities, or other characteristics necessary to perform the job.
Evidence of content validity is usually gathered through job analysis and is most appropriate for
evaluating knowledge and skills tests.
Evaluation of content-related evidence is usually a rational, judgmental process
(Cascio & Aguinis, 2005). In employment settings, the principal concern is with making
inferences about how well the test samples a job performance domain—a segment or aspect of
the job performance universe that has been identified and about which inferences are to be made
(Lawshe, 1975). Because most jobs have several performance domains, a standardized test
generally applies only to one segment of the job performance universe (e.g., a typing test
administered to a secretary applies to typing—one job performance domain in the job
performance universe of a secretary). Thus, the judgment of whether content-related evidence
exists depends on an evaluation of whether the same capabilities are required in both the job
performance domain and the test (Cascio & Aguinis, 2005).
When considering content validity, it is important to recognize that a test attempts to sample the
area of behavior being measured. It is rarely the purpose of a test to be exhaustive in assessing
every possible manifestation of a domain. While content exhaustiveness may seem feasible in
some highly specific areas of achievement, in other measurement situations it would simply not
be possible. Aptitude, ability and personality tests always aim to achieve representative sampling
of the behaviors in question, and the evaluation of content validity relates to the degree to which
this representation has been achieved.
Evidence of content validity is most easily shown with reference to achievement tests where the
relationship between the items and the expected manifestation of that ability in real-life situations
is very clear. Achievement tests are designed to measure how well an individual has mastered a
particular skill or course of study. From this perspective, it might seem that an informed
inspection of the contents of a test would be sufficient to establish its validity for such a purpose.
For example, a test of spelling should consist of spelling items. A careful analysis of the domain
will be necessary to ensure that all the important features are covered by the test items, and that
the features are appropriately represented in the test according to their significance.
The effect of speed on test scores also needs to be checked. Participants may perform differently
under the additional pressure of a timed test. There are also implications for test design and
scoring arising from the interaction of speed and accuracy and from situations where candidates
Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 22
fail to finish a timed test. In any case, ANRA is not a speed test and it is unlikely that anyone
failing to complete the test within a reasonable amount of time would improve his or her score
significantly if given extra time.
In an employment setting, evidence of ANRA content-related validity should be established by
demonstrating that the jobs require the numerical reasoning skills measured by ANRA. Content-
related validity in instructional settings may be examined for the extent to which ANRA measures
a sample of the specified objectives of such instructional programs.
Evidence Based on Test-Criterion Relationships One of the primary reasons for using tests is to be able to make an informed prediction about an
examinee’s potential for future success. For example, selection tests are used to hire or promote
individuals most likely to be productive employees. The rationale behind using selection tests is
the better an individual performs on the test, the better this individual will perform as an
employee.
Evidence of criterion-related validity addresses the inference that individuals who score better on
tests will be successful on some criterion of interest. Criterion-related validity evidence indicates
the statistical relationship (e.g., for a given sample of job applicants or incumbents) between
scores on the test and one or more criteria, or between scores on the test and independently
obtained measures of subsequent job performance. By collecting test scores and criterion scores
(e.g., job performance results, grades in a training course, supervisor ratings), one can determine
how much confidence may be placed in using test scores to predict job success. Typically,
correlations between criterion measures and scores on the test serve as indicators of criterion-
related validity evidence. Provided the conditions for a meaningful validity study have been met
(e.g., sufficient sample size, and adequate criteria), these correlation coefficients are important
indicators of the utility of the test.
The conditions for evaluating criterion-related validity evidence are often difficult to fulfill in the
ordinary employment setting. Studies of test-criterion relationships should involve a sufficiently
large number of persons hired for the same job and evaluated for success using a uniform
criterion measure. The criterion itself should be reliable and job-relevant, and should provide a
wide range of scores. In order to evaluate the quality of studies of test-criterion relationships, it is
essential to know at least the size of the sample and the nature of the criterion.
Assuming that the conditions for a meaningful evaluation of criterion-related validity evidence
had been met, Cronbach (1970) characterized validity coefficients of .30 or better as having
“definite practical value.” The U.S. Department of Labor (1999) provides the following general
Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 23
guidelines for interpreting validity coefficients: above .35 are considered “very beneficial,” .21–
.35 are considered “likely to be useful,” .11–.20 “depends on the circumstances,” and below .11
“unlikely to be useful.” It is important to point out that even relatively lower validities (e.g., .20)
may justify the use of a test in a selection program (Anastasi & Urbina, 1997). This suggestion is
because the practical value of the test depends not only on the validity, but also other factors,
such as the base rate for success on the job (i.e., the proportion of people who would be
successful in the absence of any selection procedure). If the base rate for success on the job is low
(i.e., few people would be successful on the job), tests with low validity can have considerable
utility or value. When the base rate is high (i.e., selected at random, most people would succeed
on the job), even highly valid tests may not contribute significantly to the selection process.
In addition to the practical value of validity coefficients, the statistical significance of coefficients
should be noted. Statistical significance refers to the odds that a non-zero correlation could have
occurred by chance. If the odds are 1 in 20 that a non-zero correlation could have occurred by
chance, then the correlation is considered statistically significant. Some experts prefer even more
stringent odds, such as 1 in 100, although the generally accepted odds are 1 in 20. In statistical
analyses, these odds are designated by the lower case p (probability) to signify whether a non-
zero correlation is statistically significant. When p is less than or equal to .05, the odds are
presumed to be 1 in 20 (or less) that a non-zero correlation of that size could have occurred by
chance. When p is less than or equal to .01, the odds are presumed to be 1 in 100 (or less) that a
non-zero correlation of that size occurred by chance.
In a study of ANRA criterion-related validity, we examined the relationship between ANRA
scores and on-the-job performance of job incumbents in various occupations (mostly finance-
related occupations) and position levels (mainly professionals, managers, and directors). Job
performance was defined as supervisory ratings on behaviors determined through research to be
important to most professional, managerial, and executive jobs. The study found that ANRA
scores correlated .32 with supervisory ratings on a dimension made up of Analysis and Problem
Solving behaviors, and .36 with supervisory ratings on a dimension made up of Judgment and
Decision Making behaviors (see Table 6.1). Furthermore, ANRA scores correlated .36 with
supervisory ratings on a dimension composed of job behaviors dealing with Quantitative/
Professional Knowledge and Expertise. Supervisory ratings from the sum of ratings on 24 job
performance behaviors (“Total Performance”), as well as ratings on a single-item measure of
“Overall Potential” were also obtained. The ANRA scores correlated .44 with Total Performance
and .31 with ratings of Overall Potential. The correlation between ANRA scores and a single-item
supervisory rating of “Overall Performance” was .38.
Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 24
Table 6.1 Evidence of ANRA Criterion-Related Validity (Total Raw Score) of Job Incumbents in Various Finance-Related Occupations and Position Levels
Criterion N Mean SD r
Analysis and Problem Solving 89 37.6 7.0 .32**
Judgment and Decision Making 91 32.2 5.9 .36**
Quantitative/Professional Knowledge and Expertise
59 53.6 8.9 .36**
Total Performance (24 items) 58 127.0 22.0 .44**
Overall Performance (single item) 94 5.6 1.1 .38**
Overall Potential 94 3.4 1.1 .31**
** p < .01
In Table 6.1, the column entitled N details the number of cases having valid supervisory ratings
for every single job behavior contained in the specified criterion. The means and standard
deviations refer to the criteria ratings shown in the table. The validity coefficients appear in the
last column.
The criterion-related validity coefficients reported in Table 6.1 apply to the specific sample of job
incumbents mentioned in the table. These validity coefficients clearly indicate that ANRA is
likely to be very beneficial as an indicator of the criteria shown in Table 6.1. However, test users
should not automatically assume that these data constitute sole and sufficient justification for use
of ANRA. Inferring validity for one group of employees or candidates from data reported for
another group is not appropriate unless the organizations and job categories being compared are
demonstrably similar.
Careful examination of Table 6.1 can help test users make an informed judgment about the
appropriateness of ANRA for their own organization. However, the data presented here are not
intended to serve as a substitute for locally obtained validity data. Local validity studies, together
with locally derived norms, provide a sound basis for determining the most appropriate use of
ANRA. Hence, whenever technically feasible, test users should study the validity of ANRA, or
any selection test, at their own location or organization.
Sometimes it is not possible for a test user to conduct a local validation study. There may be too
few incumbents in a particular job, an unbiased and reliable measure of job performance may not
be available, or there may not be a sufficient range in the ratings of job performance to justify the
computation of validity coefficients. In such circumstances, evidence of a test’s validity reported
elsewhere may be relevant, provided that the data refer to comparable jobs.
Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 25
Correlations Between ANRA Test 1 and Test 2
The correlation between Test 1 (Comparison of Quantities) and Test 2 (Sufficiency of
Information) of ANRA was 0.71 (N = 452, p < .0001). This correlation is clearly significant and
also lower than the reliabilities of either test shown in Table 5.2, chapter 5. This evidence
suggests that ANRA effectively samples both of these reasoning domains within the broader
conception of numerical reasoning (Rust, 2002).
Evidence of Convergent and Discriminant Validity Convergent evidence is provided when scores on a test relate to scores on other tests or variables
that purport to measure similar traits or constructs. Evidence of relations with other variables can
involve experimental (or quasi-experimental) as well as correlational evidence (AERA et al.,
1999). Discriminant evidence is provided when scores on a test do not relate closely to scores on
tests or variables that measure different traits or constructs.
Correlations Between ANRA and Watson-Glaser Critical Thinking Appraisal—Short Form Correlations between ANRA and the Watson-Glaser Critical Thinking Appraisal®—Short Form
(see Table 6.2) suggest that the tests are measuring a common general ability. Evidence for the
validity of the Watson-Glaser as a measure of critical thinking and reasoning appears in the
Watson-Glaser Short Form Manual (Watson & Glaser, 2006). The data in Table 6.2 suggest that
ANRA also measures reasoning ability.
The fact that the correlations between ANRA and the Watson-Glaser Short Form tests are lower
than the inter-correlation between the two ANRA tests suggests that ANRA also measures some
distinct aspect of reasoning that is not measured by the Watson-Glaser (Rust, 2002).
Table 6.2 Correlations Between Watson-Glaser Critical Thinking Appraisal—Short Form and ANRA (N = 452)
Watson-Glaser
ANRA Test 1: Comparison of
Quantities
ANRA Test 2: Sufficiency of Information
ANRA Total Raw Score
Watson-Glaser Short Form Total Raw Score .65 .61 .68 Test 1: Inference .48 .47 .52 Test 2: Recognition of Assumptions .40 .36 .41 Test 3: Deduction .53 .51 .56 Test 4: Interpretation .60 .51 .60 Test 5: Evaluation of Arguments .35 .36 .39 Note. For all the correlations, p < .001
Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 26
Correlations Between ANRA and Other Tests
In addition to the correlations with the Watson-Glaser, we also examined the correlations between
ANRA and two other tests: Miller Analogies Test for Professional Selection (N = 67), and the
DAT for Personnel and Career Assessment–Numerical Ability (N = 80). As would be expected,
ANRA correlated higher with the Numerical Ability test of the DAT for PCA (r = .70, p < .001)
than with the MAT for PS (r = .57, p = < .001). Details of these results, which suggest convergent
as well as discriminant validity, are shown in Table 6.3.
Table 6.3 Correlations Between ANRA, the Miller Analogies Test for Professional Selection (MAT for PS), and the Differential Aptitude Tests for Personnel and Career Assessment—Numerical Ability (DAT for PCA—NA)
ANRA MAT for PS
(N = 67) DAT for PCA—NA
(N = 80) ANRA Total Raw Score .57 .70 ANRA Test 1: Comparison of Quantities .50 .69 ANRA Test 2: Sufficiency of Information .50 .57
Note. For all the correlations, p < .001
Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 27
Chapter 7
Using ANRA as an Employment Selection Tool
ANRA was developed for use in adult employment selection. It may be used to predict success in
jobs that require application of numerical reasoning skills. ANRA can also be useful in
monitoring the effectiveness of numerical reasoning instruction and training programs, and in
researching the relationship between numerical reasoning and other abilities or skills.
Employment Selection
Many organizations use testing as a component of their employment selection process.
Employment selection programs typically use cognitive ability tests, aptitude tests, personality
tests, basic skills tests, and work values tests to screen out unqualified candidates, to categorize
prospective employees according to their probability of success on the job, or to rank order a
group of candidates according to merit.
ANRA was designed to assist in the selection of employees for jobs that require numerical
reasoning. Many finance-related, project-management, and technical professions require the type
of numerical reasoning ability measured by ANRA. The test is useful to assess applicants for a
variety of jobs, such as Accountant, Account Manager, Actuary, Banking Manager, Business
Analyst, Business Development Manager, Business Unit Leader, Finance Analyst, Loan Officer,
Project Manager, Inventory Planning Analyst, Procurement or Purchasing Manager, and
leadership positions with financial responsibilities.
It should not be assumed that the type of numerical reasoning required in a particular job is
identical to that measured by ANRA. Job analysis and local validation of ANRA for selection
purposes should follow accepted human resource research procedures, and conform to existing
guidelines concerning fair employment practices. In addition, no single test score can possibly
suggest all of the requisite knowledge and skills necessary for success in a job.
Using ANRA in Making a Hiring Decision It is ultimately the responsibility of the hiring authority to determine how it uses ANRA scores.
We recommend that if the hiring authority establishes a cut score, examinees’ scores should be
considered in the context of appropriate measurement data for the test, such as the standard error
of measurement and data regarding the predictive validity of the test. In addition, we recommend
Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 28
that selection decisions be based on multiple job-relevant tools rather than relying on any single
test (e.g., using only ANRA scores to make employment decisions).
Human resource professionals can look at the percentile rank that corresponds to the candidate’s
raw score in several ways. Candidates’ scores may be rank ordered by percentiles so that those
with the highest scores are considered further. Alternatively, a cut score (e.g., the 50th percentile)
may be established so that candidates who score below the cut score are not considered further. In
general, the higher the cut score is set, the higher the likelihood that a given candidate who scores
above that cut score will be successful. However, the need to select high scoring candidates
typically needs to be balanced with situational factors, such as the need to keep jobs filled and the
supply of talent in the local labor market.
When interpreting ANRA scores, it is useful to know the specific behaviors that an applicant with
a high ANRA score may be expected to exhibit. These behaviors, as rated by supervisors, were
consistently found to be related to ANRA scores across different occupations requiring numerical
reasoning. In general, candidates who score low on ANRA may find it challenging to effectively
demonstrate these behaviors. Conversely, candidates who score high on ANRA are likely to
display a higher level of competence in the following behaviors:
• Uses quantitative reasoning to solve job-related problems.
• Learns new numerical concepts quickly.
• Applies sound logic and reasoning when making decisions.
• Demonstrates knowledge of financial indicators and their implications.
• Breaks down information into essential parts or underlying principles.
• Readily integrates new information into problem-solving and decision-making
processes.
• Recognizes differences and similarities in situations or events.
• Engages in a broad analysis of relevant information before making decisions.
• Probes deeply to understand the root causes of problems.
• Reviews financial statements, sales reports, and/or other financial data when
planning.
• Accurately assesses the financial value of things (e.g., worth of assets) or people
(e.g., credit worthiness).
Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 29
Human resource professionals who use ANRA should document and examine the relationship
between applicants’ scores and their subsequent performance on the job. Using locally obtained
criterion-related validity information provides the best foundation for interpreting scores and
most effectively differentiating examinees who are likely to be successful from those who are not.
Pearson does not establish or recommend a passing score for ANRA.
Differences in Reading Ability, Including the Use of English as a Second Language Though ANRA is a mathematical test, a level of reading proficiency in the English language is
assumed and reflected in the items. Where ANRA is being used to measure the numerical
reasoning capabilities of a group, for some of whom English is not their first language, reasonable
precautions need to be taken. If a candidate experiences difficulty with the language or the
reading level of the test, note this information and consider it when interpreting the test scores. In
some cases, it may be more appropriate to test such individuals with another assessment
procedure that fully accommodates their language of preference or familiarity.
Using ANRA as a Guide for Training, Learning, and Education
Critical thinking, numerical or otherwise, is trainable (Halpern, 1998; Paul & Nosich, 2004).
Thus, when interpreting test scores on ANRA, it is important to bear in mind the extent to which
training may have influenced the scores. The ability to think critically has long been recognized
as a desirable educational objective and studies that have been done in educational settings
demonstrate that critical thinking can be improved as a result of training directed to this end (Hill,
1959; Kosonen & Winne, 1995; Nisbett, 1993, Perkins & Grotzer, 1997).
Scores on ANRA are likely to be influenced by factors associated with training. Typically,
individuals will differ in the extent to which such training has been made available to them.
Although traditional classes in math and science in school are important, many of these classes
involve computational arithmetic and other lower order-thinking skills, such as the rote
application of rules that have been learned. Training in higher-order numerical reasoning during
the school years will often have been indirect and largely dependent on the overall quality of
education available to the individual. Consequently, this indirect training would likely depend on
the amount of time spent in education or learning. Furthermore, the extent to which numerical
reasoning skills are trainable will likely differ between individuals.
Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 30
Fairness in Selection Testing Fair employment regulations and their interpretation are continuously subject to changes in the
legal, social, and political environments. Therefore, ANRA users should consult with qualified
legal advisors and human resources professionals as appropriate.
Legal Considerations Governmental and professional regulations cover the use of all personnel selection procedures.
Relevant source documents that the user may wish to consult include the Standards for
Educational and Psychological Testing (AERA et al., 1999); the Principles for the Validation and
Use of Personnel Selection Procedures (Society for Industrial and Organizational Psychology,
2003); and the federal Uniform Guidelines on Employee Selection Procedures (Equal
Employment Opportunity Commission, 1978). For an overview of the statutes and types of legal
proceedings that influence an organization’s equal employment opportunity obligations, the user
is referred to Cascio and Aguinis (2005) or the U.S. Department of Labor’s (1999) Testing and
Assessment: An Employer’s Guide to Good Practices.
Group Differences and Adverse Impact Local validation is particularly important when a selection test may have adverse impact.
According to the Uniform Guidelines on Employee Selection Procedures (Equal Employment
Opportunity Commission, 1978), adverse impact is indicated when the selection rate for one
group is less than 80% (or 4 out of 5) of another group. Adverse impact is likely to occur with
cognitive ability tests such as ANRA.
Although it is within the law to use a test with adverse impact (Equal Employment Opportunity
Commission, 1978), the testing organization must be prepared to demonstrate that the selection
test is job-related and consistent with business necessity. The Civil Rights Act of 1991, as
amended, defined “business necessity” to mean that, “in the case of employment practices
involving selection …, the practice or group of practices must bear a significant relationship to
successful performance of the job” (Section 3 (o) (1) (A)). In deciding whether the standards for
business necessity have been met, the Civil Rights Act of 1991 states that “demonstrable
evidence is required”. The Act provides examples of “demonstrable evidence” as “statistical
reports, validation studies, expert testimony, prior successful experience and other evidence as
permitted by the Federal Rules of Evidence” (Section 3 (o) (1) (B)).
A local validation study, in which ANRA scores are correlated with job performance indicators,
can provide evidence to support the use of the test in a particular job context. An evaluation that
Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 31
demonstrates that ANRA (or any employment assessment tool) is equally predictive for protected
subgroups, as outlined by the Equal Employment Opportunity Commission, will assist in the
demonstration of fairness of the test. For example, from the results of their review of 22 cases in
U.S. Appellate and District Courts involving cognitive ability testing in class-action suits,
Shoenfelt and Pedigo (2005, p. 6) reported that “organizations that utilize professionally
developed standardized cognitive ability tests that are validated and that set cutoff scores
supported by the validation study data are likely to fare well in court.”
Monitoring the Selection System An organization’s abilities to evaluate selection strategies and to implement fair employment
practices depend on its awareness of the demographic characteristics of applicants and
incumbents. Monitoring these characteristics and accumulating test score data are clearly
necessary for establishing legal defensibility of a selection system, including those systems that
incorporate ANRA. The most effective use of ANRA is with a local norms database that is
regularly updated and monitored.
The hiring organization should ensure that its selection process is clearly job related and focuses
on characteristics that are important to job success. Good tests that are appropriate to the job in
question can contribute a great deal towards monitoring and minimizing the major sources of bias
in the selection procedures. ANRA is a reliable and valid instrument for the assessment of
numerical reasoning. When used for the assessment of candidates or incumbents for work that
requires this skill, ANRA can be useful in the selection of the better candidates. However, where
candidates drawn from different sub-groups of the population are more or less deficient in
numerical reasoning skills as a result of the failure to provide the necessary educational
environment during schooling, then there is the risk of overlooking candidates who can develop
this skill but who have not had the opportunity to do so. Employers can reasonably expect that
candidates should have achieved all the necessary basic skills before applying for the job.
However, in circumstances where adverse impact is manifest, an organization might wish to
consider ways in which it can contribute to the reduction of adverse impact. This approach might
take the form of providing training courses to employees in the deficient skill areas, or of
increasing involvement with the local community to identify ways in which the community might
assist, or of re-evaluating recruitment strategy, for example, by advertising job positions more
widely or through different media.
Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 32
References
Allen, M.J., & Yen, W.M. (1979). Introduction to measurement theory. Monterey, CA: Brooks/Cole.
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education (1999). Standards for educational and psychological testing. Washington, DC: Author.
American Institute of Certified Public Accountants, AICPA (1999). Broad business perspective competencies. Retrieved February 27, 2006, from http://www.aicpa.org/edu/bbfin.htm
Americans With Disabilities Act of 1990, Titles I & V (Pub. L. 101-336). United States Code, Volume 42, Sections 12101–12213.
Anastasi, A. & Urbina, S. (1997). Psychological testing (7th ed.). New York: Macmillan.
Brannon, E.M. (2002). The development of ordinal numerical knowledge in infancy. Cognition, 83, 223–240.
Cascio, W.F. (1991). Applied psychology in personnel management (4th ed.). Englewood Cliffs, NJ: Prentice Hall.
Cascio, W. F., & Aguinis, H. (2005). Applied psychology in human resource management (6th ed.). Upper Saddle River, NJ: Prentice Hall.
Civil Rights Act of 1991. 102nd Congress, 1st Session, H.R.1. Retrieved August 4, 2006. Access: http://usinfo.state.gov/usa/infousa/laws/majorlaw/civil91.htm
Cohen, B.H. (1996). Explaining psychological statistics. Pacific Grove, CA: Brooks & Cole.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum.
Cronbach, L.J. (1970). Essentials of psychological testing (3rd ed.). New York: Harper & Row.
Equal Employment Opportunity Commission. (1978). Uniform guidelines on employee selection procedures. Federal Register, 43(166), 38295–38309.
Facione, P.A. (2006). Critical Thinking: What It Is and Why It Counts–2006 Update. Retrieved July 28, 2006 from http://www.insightassessment.com/pdf_files/what&why2006.pdf
Feigenson, L, Dehaene, S., & Spelke, E. (2004). Core systems of number. Trends in Cognitive Sciences, 8, 307–314.
Halpern, D. F. (1998) Teaching critical thinking for transfer across domains: Dispositions, skills, structure training, and metacognitive monitoring. American Psychologist, 53, 449–455.
Hill, W. H. (1959). Review of Watson-Glaser Critical Thinking Appraisal. In O.K. Buros (Ed.), The fifth mental measurements yearbook. Lincoln: University of Nebraska Press.
Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 33
Hunt, E. (1995). Will we be smart enough? New York: Russell Sage Foundation.
Kealy, B.T., Holland, J., & Watson, M. (2005). Preliminary evidence on the association between critical thinking and performance in principles of accounting. Issues in Accounting Education, 20 (1), 33–47.
Kosonen, P. & Winne, P. H. (1995). Effects of teaching statistical laws on reasoning about everyday problems. Journal of Educational Psychology, 87, 33–46.
Lawshe, C.H. (1975). A quantitative approach to content validity. Personnel Psychology, 28, 563–575.
National Education Goals Panel. (1991). The national education goals report. Washington, DC: U.S. Government Printing Office
Nijenhuis, J., & Flier, H. (2005). Immigrant-majority group differences on work-related measures: the case for cognitive complexity. Personality and Individual Differences, 38, 1213–1221.
Nisbett, R. E. (Ed.) (1993). Rules for reasoning. Hillsdale, NJ: Lawrence Erlbaum
Nunnally, J.C. (1978). Psychometric theory (2nd ed.). Hew York: McGraw-Hill.
O*Net OnLine (2005). Skill searches for: Mathematics, Critical Thinking. Occupational Information Network: O*Net OnLine. Retrieved July 17, 2006 via O*Net OnLine Access: http://online.onetcenter.org/skills/result?s=2.A.2.a&s=2.A.1.e&g=Go
Paul, R., & Nosich, G.M. (2004). A Model for the National Assessment of Higher Order Thinking. Retrieved July 13, 2006, from http://www.criticalthinking.org/resources/articles/a-model-nal-assessment-hot.shtml
Perkins, D. N. & Grotzer, T. A. (1997). Teaching intelligence. American Psychologist, 52, 1125–1133.
Rust, J. (2002) Rust Advanced Numerical Reasoning Appraisal Manual. The Psychological Corporation: London.
Shoenfelt, E.L., & Pedigo, L.C. (2005, April). A Review of Court Decisions on Cognitive Ability Testing, 1992-2004. Poster Presentation at the 20th Annual Conference of the Society for Industrial and Organizational Psychology, Los Angeles, CA.
Society for Industrial and Organizational Psychology. (2003). Principles for the validation and use of personnel selection procedures (4th ed.). Bowling Green, OH: Author.
Spelke, Elizabeth S. (2005). Sex differences in intrinsic aptitude for mathematics and science? A critical review. American Psychologist, 60, 958-958.
Starkey, P. (1992). The early development of numerical reasoning. Cognition, 43, 93–126.
U.S. Department of Labor. (1999). Testing and assessment: An employer’s guide to good practices. Washington, DC: Author.
Vandenbroucke, Jan P. (1998). Clinical investigation in the 20th century: The ascendancy of numerical reasoning. The Lancet, 175(352), 12–16.
Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 34
Watson, G. B. & Glaser, E. M. (2006) Watson–Glaser Critical Thinking Appraisal Short Form Manual. San Antonio, TX: Pearson.
Wynn, K., Bloom, P., & Chiang, W. (2002). Enumeration of collective entities by 5-month-old infants. Cognition, 83, B55–B62.
Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 35
Appendix A
Description of the Normative Sample
The normative information provided below is based on data collected during the period of
February 2006 through June 2006.
Table A.1 Description of the Normative Sample by Occupation
Occupation Norms and Sample Characteristics Employees in Various Financial Occupations
N = 198 Mean = 21.9 SD = 6.4
Occupations in the Financial Occupations norm group Accountants = 6.1% Accounting Analysts = 1.5% Actuaries = 32.3% Auditors = 1.0% Banking Supervisors/Managers = 5.1% Billing Coordinators = 1.0% Bookkeepers = 2.0% Business Analysts = 3.5% Business Specialists = 0.5% Buyers = 2.5% Chief Financial Officers = 2.5% Claims Adjusters = 1.0% Collections Supervisors/Managers = 1.0% Comptrollers/Controllers = 2.0% Finance Analysts/Managers = 17.7% Finance or Budget Estimators = 0.5% Financial Planners = 3.0% Insurance Agents = 2.5% Insurance Analysts = 0.5% Insurance Brokers = 2.0% Loan Officers = 2.0% Procurement or Purchasing Officers/Managers = 9.6%
Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 36
Table A.2 Description of the Normative Sample by Position Level
Position Level Norms and Characteristics Executives/Directors Executive- and Director-level positions within various industries.
N = 91 Mean = 21.3 SD = 6.0 Industry Characteristics Financial Services/Banking/Insurance = 53.9% Government/Public Service/Defense = 7.7% Professional Business Services/Consulting = 6.6% Publishing/Printing = 12.1% Real Estate = 1.1% Retail/Wholesale = 2.2% Other (unspecified) = 16.5%
Managers Manager-level positions within various industries.
N = 88
Mean = 20.1
SD = 5.6
Industry Characteristics
Financial Services/Banking/Insurance = 38.6%
Government/Public Service/Defense = 19.3%
Professional Business Services/Consulting = 10.2%
Publishing/Printing = 12.5%
Real Estate = 2.3%
Retail/Wholesale = 1.1%
Other (unspecified) = 14.8%
Professionals/Individual Contributors
Professional-level and individual-contributor positions within various industries.
N = 200
Mean = 22.1
SD = 6.4
Industry Characteristics
Financial Services/Banking/Insurance = 23.0%
Government/Public Service/Defense = 36.5%
Professional Business Services/Consulting = 12.5%
Publishing/Printing = 7.5%
Real Estate = 1.0%
Retail/Wholesale = 1.5%
Other (unspecified) = 16.5%
Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 37
Appendix B
ANRA Total Raw Scores, Mid-Point Percentile Ranks, and T Scores by Norm Group
Table B.1 ANRA Total Raw Scores, Mid-Point Percentile Ranks, and T Scores by Position Level
Percentile Ranks by Position Level ANRA Total Raw Score
Executives/ Directors Managers
Professionals/Individual Contributors T Score
32 ≥99 ≥99 ≥99 ≥68 31 ≥99 ≥99 98 66 30 96 98 94 65 29 92 95 88 63 28 87 91 81 62 27 81 87 73 60 26 76 82 66 58 25 67 77 60 57 24 58 72 53 55 23 54 66 47 54 22 48 61 42 52 21 43 53 38 50 20 40 46 34 49 19 34 43 29 47 18 30 37 26 46 17 26 31 23 44 16 23 27 19 43 15 18 21 16 41 14 13 15 13 39 13 9 12 11 38 12 7 10 9 36 11 6 6 7 35 10 5 3 6 33 9 4 2 5 31 8 3 ≤1 4 30 7 2 ≤1 2 28 6 ≤1 ≤1 ≤1 27 5 ≤1 ≤1 ≤1 25 4 ≤1 ≤1 ≤1 23 3 ≤1 ≤1 ≤1 22 2 ≤1 ≤1 ≤1 20 1 ≤1 ≤1 ≤1 19 0 ≤1 ≤1 ≤1 17
Raw Score
Mean = 21.3 Raw Score
Mean = 20.1 Raw Score
Mean = 22.1 SD = 6.0 SD = 5.6 SD = 6.4 N = 91 N = 88 N = 200
Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 38
Table B.2 ANRA Total Raw Scores, MId-Point Percentile Ranks, and T Scores for Employees in Various Financial Occupations (see Table A.1 for a list of the occupations in this norm group)
ANRA Total Raw Score
Percentile Ranks for Employees in Financial Occupations T Score
32 ≥99 ≥68
31 98 66
30 93 65
29 86 63
28 78 62
27 71 60
26 65 58
25 60 57
24 55 55
23 52 54
22 47 52
21 42 50
20 37 49
19 33 47
18 30 46
17 26 44
16 22 43
15 19 41
14 15 39
13 11 38
12 8 36
11 6 35
10 4 33
9 3 31
8 2 30
7 ≤1 28
6 ≤1 27
5 ≤1 25
4 ≤1 23
3 ≤1 22
2 ≤1 20
1 ≤1 19
0 ≤1 17
Raw Score Mean = 21.9 Raw Score SD = 6.4 N = 198
Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 39
Appendix C
Combined Watson-Glaser and ANRA T Scores and Percentile Ranks by Norm Group
Table C.1 Combined Watson-Glaser Short Form and ANRA T Scores and Percentile Ranks by Position Level
Percentile Ranks by Position Combined T Scores
Executives/ Directors Managers
Professionals/ Individual Contributors
Combined T Scores
≥135 ≥99 ≥99 ≥99 ≥135
134 ≥99 ≥99 ≥99 134
133 ≥99 ≥99 ≥99 133
132 ≥99 ≥99 ≥99 132
131 ≥99 ≥99 ≥99 131
130 98 ≥99 ≥99 130
129 98 ≥99 ≥99 129
128 97 ≥99 ≥99 128
127 95 98 97 127
126 93 97 95 126
125 92 97 93 125
124 91 96 90 124
123 88 94 88 123
122 85 91 84 122
121 82 89 81 121
120 82 88 77 120
119 80 87 74 119
118 77 86 72 118
117 75 86 70 117
116 73 84 67 116
115 71 81 65 115
114 69 79 63 114
113 66 78 60 113
112 63 78 58 112
111 62 77 57 111
110 61 75 54 110
Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 40
Table C.1 Combined Watson-Glaser Short Form and ANRA T Scores and Percentile Ranks by Position Level continued
Percentile Ranks by Position Level Combined T Scores
Executives/ Directors Managers
Professionals/ Individual Contributors
Combined T Scores
109 60 73 51 109
108 58 71 48 108
107 56 70 46 107
106 55 67 44 106
105 53 64 42 105
104 51 61 41 104
103 48 60 39 103
102 46 57 37 102
101 43 55 36 101
100 42 54 35 100
99 41 52 33 99
98 40 51 31 98
97 38 48 30 97
96 37 44 29 96
95 34 42 28 95
94 31 39 27 94
93 29 36 26 93
92 27 34 25 92
91 27 32 23 91
90 25 29 22 90
89 22 27 22 89
88 21 26 21 88
87 20 25 19 87
86 20 24 18 86
85 19 23 17 85
84 18 21 17 84
83 17 19 16 83
82 16 18 14 82
81 16 16 13 81
80 16 15 12 80
79 16 14 11 79
78 14 14 10 78
77 11 13 9 77
76 10 12 8 76
75 9 12 8 75
Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 41
Table C.1 Combined Watson-Glaser Short Form and ANRA T Scores and Percentile Ranks by Position Level continued
Percentile Ranks by Position Level Combined T Scores
Executives/ Directors Managers
Professionals/ Individual Contributors
Combined T Scores
74 8 11 8 74
73 7 9 8 73
72 6 7 7 72
71 6 5 7 71
70 6 3 7 70
69 5 3 7 69
68 4 2 6 68
67 4 2 5 67
66 3 2 5 66
65 3 2 5 65
64 3 2 4 64
63 2 ≤1 4 63
62 2 ≤1 3 62
61 ≤1 ≤1 3 61
60 ≤1 ≤1 2 60
59 ≤1 ≤1 2 59
58 ≤1 ≤1 2 58
57 ≤1 ≤1 2 57
56 ≤1 ≤1 2 56
55 ≤1 ≤1 <=1 55
54 ≤1 ≤1 <=1 54
53 ≤1 ≤1 <=1 53
52 ≤1 ≤1 <=1 52
51 ≤1 ≤1 <=1 51
≤50 ≤1 ≤1 <=1 ≤50
N = 91 N = 88 N = 200
Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 42
Table C.2 Combined Watson-Glaser Short Form and ANRA T Scores and Percentile Ranks for Employees in Various Financial Occupations (See Table A.1 for a list of the occupations in this group.)
Combined T Scores
Percentile Ranks for Employees in
Financial Occupations Combined T Scores ≥135 ≥99 ≥135
134 ≥99 134 133 ≥99 133 132 ≥99 132 131 ≥99 131 130 ≥99 130 129 ≥99 129 128 97 128 127 95 127 126 92 126 125 89 125 124 87 124 123 84 123 122 80 122 121 77 121 120 74 120 119 73 119 118 71 118 117 69 117 116 67 116 115 65 115 114 64 114 113 62 113 112 60 112 111 60 111 110 59 110 109 58 109 108 57 108 107 55 107 106 54 106 105 52 105 104 50 104 103 48 103 102 47 102 101 45 101 100 43 100
99 40 99 98 38 98 97 37 97 96 37 96
Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 43
Table C.2 Combined Watson-Glaser Short Form and ANRA T Scores and Percentile Ranks for Employees in Various Financial Occupations (continued)
Combined T Scores
Percentile Ranks for Employees in
Financial Occupations Combined T Scores 95 35 95 94 34 94 93 32 93 92 31 92 91 29 91 90 28 90 89 27 89 88 27 88 87 26 87 86 25 86 85 24 85 84 22 84 83 20 83 82 19 82 81 18 81 80 17 80 79 16 79 78 14 78 77 12 77 76 10 76 75 9 75 74 8 74 73 7 73 72 5 72 71 4 71 70 4 70 69 4 69 68 4 68 67 3 67 66 3 66 65 2 65 64 2 64 63 ≤1 63 62 ≤1 62 61 ≤1 61 60 ≤1 60
Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 44
Table C.2 Combined Watson-Glaser Short Form and ANRA T Scores and Percentile Ranks for Employees in Various Financial Occupations (continued)
Combined T Scores
Percentile Ranks for Employees in
Financial Occupations Combined T Scores 59 ≤1 59 58 ≤1 58 57 ≤1 57 56 ≤1 56 55 ≤1 55 54 ≤1 54 53 ≤1 53 52 ≤1 52 51 ≤1 51
≤50 ≤1 ≤50
N = 198