NASP Home > NASP Resources > NASP Fact Sheets > Large Scale Decisions & High Stakes Testing
Large Scale Assessments and High Stakes Decisions: Facts, Cautions
and Guidelines
National Association of School Psychologists
A basic premise of standards-based reform is that all children can learn.
Although some students may require more time or varied instruction, standards-based
reform articulates the expectation that all students will be provided
the opportunity to meet a common set of instructional goals. Shaped by
legislation and challenged by research, large-scale assessment programs
have generated considerable controversy and inconsistency as states and
districts attempt to measure student attainment of high standards. The
purpose of this document is to highlight the factors influencing large-scale
assessment, summarize cautions in implementing 'high stakes'
testing programs and offer some basic guidelines to policymakers and administrators.
Background
In the 1994 reauthorization of the Elementary and Secondary Education
Act (ESEA), states were required to set challenging standards
for student achievement, and develop and administer assessments to measure
student progress towards those standards. According to the National Research
Council's 1999 guide ('Testing, Teaching and Learning'),
the intended outcome of these requirements is higher student achievement.
Mandated core components of 'standards based reform' include
a) content and performance standards set for all students; b) development
of tools to measure the progress of all students toward the standards;
and c) accountability systems that require continuous improvement of student
achievement. The 2002 reauthorization of ESEA, the No Child Left Behind
Act, ups the ante for states and requires annual assessments in grades
3 through 8. It further requires states and schools to meet 'adequate
yearly progress' by increasing test scores. Schools that fail to
meet these goals will face a series of graduated sanctions.
Students with disabilities are specifically included in
the definition of 'all' students in ESEA. Additionally, the
Individuals with Disabilities Education Act (IDEA) requires states to
include children with disabilities in general state and district-wide
assessment programs, with appropriate accommodations where necessary,
and to report annually on the participation rates, performance, and progress
of students with disabilities. When students with disabilities cannot
participate in testing, even with accommodations, states are required
to include students using alternate assessments. Estimates
of the prevalence of severe disabilities indicate that only 1-2% of all
students will need to take alternate assessments.
During the past 30 years, Congress also enacted civil rights statutes
to ensure equal access to a quality education for many targeted populations
of students, such as students of color, economically disadvantaged students,
students with disabilities, students with limited English proficiency
and females (e.g., Civil Rights Act of 1965; Americans with Disabilities
Act; Section 504 of the Rehabilitation Act of 1973). The focus of these
statutes is on the right of all students to have an equal
opportunity to achieve high academic standards as measured by appropriate
assessment processes. Fairness or integrity of the assessment process
is indicated by careful alignment of standards, curriculum and instruction,
assessment, and opportunity to learn.
Students with disabilities, students from disadvantaged backgrounds,
and students who do not speak English as a first language have
struggled to overcome low educational expectations for some time. Federal
laws such as ESEA and IDEA can be seen as legislated attempts to 'raise
the bar.' Research, such as that conducted by the National Center
for Educational Outcomes, has documented a number of positive consequences
of including students with disabilities, such as increased levels of performance,
higher expectations for student achievement, increased access to the general
education curriculum, and improved teaching and instruction. However,
as standards based reform is implemented, concerns have been raised that,
despite good intentions, the potential exists for unintended negative
outcomes.
Concerns and Cautions Regarding Large-Scale Assessment
Recognizing multiple purposes of large-scale assessment.
Nearly all states have established large-scale assessment programs to
measure student progress toward standards. However, multiple stakeholders
want assessments to meet a variety of needs--educators want test results
to inform instruction; taxpayers want to know that the money they spend
translates into student learning; governors want assurances that their
students are achieving at a level similar to or better than students in
other states. Yet, we know that tests should be designed for the specific
purpose that they are intended to serve and for the population
that they will measure. While some states have tried to meet the demands
for accountability by modifying existing large-scale assessments or developing
new tests, many other states continue to use single tests for multiple
purposes of system accountability, school improvement, and measurement
of individual student or group performance, regardless of their intended
use and inherent limitations. In efforts to quickly meet new mandates,
this inappropriate practice is likely to increase.
High stakes and negative consequences. Tests are considered
high stakes for students when the results are used to make critical
decisions about the individual's access to educational opportunity,
grade-level retention or promotion, graduation from high school, or receipt
of a standard or alternative diploma. These kinds of decisions all have
immediate as well as long-range impact on the student. In some states,
high stakes also are attached to test results for school systems--teachers,
administrators and schools are rewarded or sanctioned based on student
performance. When such high stakes are attached to assessment scores,
there is greater potential for manipulation of data and negative consequences:
1) Use of a single test score in making promotion/retention decisions.
Test development experts agree that it is not appropriate to use performance
on a single standardized test for making high-stakes decisions for individuals.
Yet, increasingly, states are requiring schools and school districts to
use state test scores to determine whether students should be promoted
to the next grade level, resulting in higher numbers of retained students
each year. Extensive research over many years indicates that repeating
a grade does not usually improve student achievement and further demonstrates
a strong relationship between retention and increased dropout rates.
2) Use of a single test score in graduation decisions.
Some states have adopted exit exams for high school graduation, resulting
in the denial of a diploma to thousands of students based on a single,
standardized test, without regard to their classroom performance, teachers'
recommendations, or access to adequate classroom resources, quality instruction,
or pupil services support. Although states may allow students to take
these tests several times, multiple administrations of the same type of
measure do not improve the reliability of the scores or reduce the general
limitations of such testing.
3) Use of test performance as a basis for systems level rewards
and sanctions. There is strong political support for the use of
assessment results for system accountability, as reflected in the
new provisions of ESEA. Administrators and teachers are rewarded or sanctioned
based on student test performance. In some schools, these consequences
negatively affect instruction for all students, including students with
disabilities, by dramatically narrowing the curriculum and encouraging
the use of generally inappropriate 'quick fix' approaches to
student learning.
4) Impact on mainstream education. Large scale testing
programs can also have unintended but negative effects on the education
provided to all students by unduly emphasizing basic skills to the exclusion
of the arts, sciences and humanities; creating a culture of "teach-to-the-test";
increasing the psychological stress on children and families; and decreasing
teacher job satisfaction. Further, schools may focus limited resources
on efforts they believe will directly improve test scores, rather
than on strategies to improve school climate and student learning.
Interpreting Results from Large Scale Assessments: Cautions and Considerations
Districts and states need to take great care when applying results of
large-scale assessments to high stakes decisions such as graduation, retention,
merit pay, etc. Factors that influence the accurate interpretation of
standards test results include the following:
Who is assessed? There may be inconsistency in the groups
of students included in the state assessment reports over time. For example,
when students are retained or drop out, the group of students included
in testing changes. Further, some states and districts continue to (illegally)
exclude some students with disabilities and/or limited English proficiency
from their assessment systems. New mandates and funding incentives may
further pressure states to exclude groups of students who might tend to
score below standards or require extensive accommodations.
Additionally, due to high student mobility in some areas, the group of
students tested may vary significantly from one year to the next. In
some schools 30% or more of the students turn over annually. Therefore
many students tested in one school in a given year may have received much
of their instruction elsewhere. Measuring effectiveness of instruction
across schools or over time is severely compromised with highly mobile
populations.
What tests are used and what do they measure? Assessment
programs vary in many ways across states. Some states use assessments
to compare individual student performance to a national group, while others
compare individual student performance to established performance standards.
Further, states differ in the content measured and how proficiency is
defined and demonstrated. For example, some states may use 'minimum
standards' while others use 'high standards.' Although
trends within states are more reliable for comparison than cross-state
trends, even comparisons within a given state must be reported carefully
to assure similar data and standards are used. Additionally, it is essential
that parents and the community understand what skills are addressed by
testing programs. While academic skills (reading, math, writing) may be
the presumed content measured, states' and districts' assessments
often include 'critical thinking skills' components. Because
these components are more related to aptitude or ability than to
attainment of academic standards, inclusion of such measures may
lead to misinterpretation or inappropriate use of test results. Where
such tests are used, additional negative outcomes might include ability
grouping and fixed expectations.
What accommodations were provided? States have different
rules about the kinds of accommodations that can and cannot be used for
students with disabilities and students with limited English proficiency.
It is important to know not only that students were given appropriate
accommodations, but also the kinds of accommodations given and how reliably
these accommodations were provided in order to make accurate interpretations
of results.
Recommended Guidelines for Large Scale Assessments
High stakes testing for individual students. Performance
on a standardized test (or on multiple administrations of the same test)
should not be the sole determinant in any 'either/or' decision
about instructional placement, promotion, or graduation. Rather, results
should be used as indicators of the need for early intervention, programmatic
changes, or more specific evaluation of learning problems. Multiple measures
of academic achievement, as well as teacher and family input, must be
utilized in making such important decisions.
Test design and selection. Tests must meet professional
standards for technical adequacy, must be reliable and valid for the purpose
for which they are being used, and must be designed to measure progress
towards standards. Further, tests must be appropriately aligned with standards,
curriculum, and instruction, and be administered on a timeline that allows
for adequate instruction. Tests should be critically reviewed to determine
whether they are appropriate and valid for the widest range of children
and youth, including students with disabilities and students with limited
English proficiency. Critically important, tests used for making decisions
about individuals must be more reliable than those used for comparing
groups. Assessments that are universally designed can significantly
reduce the need for accommodations and increase comparability of scores.
School districts and state policymakers should consult assessment experts
such as school psychologists to assure that their testing programs properly
address technical issues tied to test construction and selection. States
must distribute information about the amount of error in the test scores
and caution educators and parents about the limitations of tests.
Including all students. To support the necessary inclusion
of all students in standards-based assessment programs, schools must appropriately
implement accommodations, modifications, or alternate assessments when
necessary. In addition, data on all students' performance should
be included in all reports, clearly identifying which students are included
in each data set. Educators must exercise caution in interpreting the
results of large-scale assessments for all individuals and groups of students,
particularly those with disabilities or limited English proficiency, as
these tests may not adequately reflect the content or level of their instruction
or address realistic instructional goals.
Training. Research (e.g., National Center for Educational
Outcomes) indicates that the rapid pace of implementing inclusive large-scale
assessments has not allowed sufficient opportunities for training. Ongoing
staff development and training opportunities for educators is critical
as this reform initiative moves forward.
Evaluation and research. All standards testing programs
must have a systematic evaluation plan to address appropriate selection
and implementation of procedures as well as student and system outcomes.
Evaluation must consider the match between the assessment's purpose
and its design; differences in performances across groups of students
and possible sources of bias; the degree to which all students are included;
compliance with intended accommodations, modifications, and alternative
procedures; and the intended and unintended consequences of the testing
program for individual students, staff, schools, districts, and states.
Ongoing research is essential to address many unanswered questions about
large-scale assessment and to assure development and implementation of
accurate, fair, and useful measures of student and system progress.
Funding. Large-scale assessment is a complex and costly
endeavor when appropriately designed and implemented. However, such assessment
is even more costly when inadequate or inappropriate procedures are used.
Any mandated state-wide or district-wide assessment program must have
sufficient funds and timelines to ensure that a high quality process is
developed, implemented, maintained, and evaluated.
Resources
American Educational Research Association (2000). AERA position statement
concerning high-stakes testing in PreK-12 education. Available:
www.aera.net/about/policy/stakes.htm.
Heubert, J. P. & Hauser, R. M. (Eds.). (1999). High stakes: Testing
for tracking, promotion, and graduation. Washington: National Academy
Press. Available: www.nap.edu/books/0309062802/html/index.html
National Center for Educational Outcomes, University of Minnesota (www.coled.umn.edu/NCEO)
National Association of School Psychologists (www.nasponline.org)
National Research Council (1999). Testing, Teaching, and Learning:
A Guide for States and School Districts. Washington DC: Author.
U.S. Department of Education, Office of Civil Rights (2000).
The use of tests when making high stakes decisions for students:
A resource guide for educators and policy makers. Washington, DC:
Author. Available: www.ed.gov/offices/OCR/testing/Testing
Resource.pdf.
Some of this material is excerpted or adapted from articles in the
NASP Communiqué, authored by staff of the National Center for
Education Outcomes, and from 'Students with Disabilities in
Standards-Based Reform' by Martha Thurlow, published by OSEP (2000).
NASP acknowledges the significant contributions of Dr. Cammy Lehr (National
Center for Educational Outcomes) to the development of this document.
© 2002, National Association of School Psychologists, 4340 East West
Hwy #402; Bethesda, MD 20814