Position Statement on Using Large Scale Assessment for High Stakes Decisions
A basic premise of standards-based
reform is that all children can learn. The National Association of School
Psychologists (NASP) recognizes that, although some students may require more
time or varied instruction, all students should be provided the opportunity
to reach a set of high educational standards. In recent years, federal legislation
(such as the No Child Left Behind [NCLB] Act [ESEA] of 2002; IDEA) and civil
rights statutes (such as the Americans with Disabilities Act and Section 504)
have shaped standards-based reform. These mandates, particularly the No Child
Left Behind Act, have prompted the development of state and district "large
scale assessments"-testing all students in a given population to measure attainment
of uniform, high academic standards. NASP supports intended positive consequences
of these efforts, including improved teaching and instruction, higher achievement
for all students, higher standards for students who have struggled to overcome
low expectations and increased access to the general education curriculum
for all students. In particular, NASP notes that students with disabilities,
students from disadvantaged backgrounds and students who do not speak English
as a first language have struggled to overcome low educational expectations
for some time.
However,
despite good intentions as well as documented positive results in some settings,
NASP is concerned by the potential for unintended negative outcomes of large
scale assessment applied to both systemic and individual student decisions.
NASP urges caution in the use of large scale assessments for high stakes decision-making
at all levels, from school to district to state. Furthermore, NASP strongly
opposes the use of large-scale testing as the sole determinant for
making critical, high stakes decisions about individual students and educational
systems, including access to educational opportunity, retention or promotion,
graduation or receipt of a diploma.
Concerns and Cautions Regarding Large Scale Assessment
Recognizing multiple purposes of large scale assessment.
All states are required to establish large-scale assessment programs to measure
student progress toward standards. However, different stakeholders want assessments
to meet a variety of needs--educators want test results to inform instruction;
taxpayers want to know that the money they spend translates into student learning;
governors want assurances that their students are achieving at a level similar
to or better than students in other states.
Yet, NASP and other experts acknowledge that tests should be
designed for the specific purpose that they are intended to serve and
for the population that they will measure (National Research Council,
1999). While some states have tried to meet the demands for accountability
by modifying existing large scale assessments or developing new tests, many
other states continue to use single tests for multiple purposes of
system accountability, school improvement, and measurement of individual student
or group performance, regardless of their intended use and inherent limitations.
High stakes and negative consequences for systems and individuals.
Tests are considered high stakes for students when the results are
used to make critical decisions about the individual's access to educational
opportunity, grade-level retention or promotion, graduation from high school,
or receipt of a standard or alternative diploma. These kinds of decisions all
have immediate as well as long-range impact on the student. In some states,
high stakes also are attached to test results for school systems--teachers,
administrators, and schools are rewarded or sanctioned based on student performance.
NASP recognizes that, when high stakes are attached to test scores, there is
greater potential for misuse of data and negative consequences:
1) Use of a single test score in making promotion/retention
decisions. NASP and test development experts agree that it is not appropriate
to use performance on a single test (or composite test battery) for making high-stakes
decisions for individuals. Yet, increasingly, states are requiring schools and
school districts to use state test scores to determine whether students should
be promoted to the next grade level, resulting in higher numbers of retained
students each year. Extensive research over many years indicates that student
achievement rarely improves when repeating a grade and, further, demonstrates
a strong relationship between retention and increased dropout rates.
2) Use of a single test score in graduation decisions.
Many states have adopted exit exams for high school graduation, in some cases
resulting in the denial of a diploma to thousands of students based on a single
test, without regard to their classroom performance, teachers' recommendations,
or access to adequate classroom resources, quality instruction, or pupil services
support. Although states may allow students to take these tests several times,
multiple administrations of the same type of measure do not improve the reliability
of the scores or reduce the general limitations of such testing.
3) Use of test performance as a basis for systems level rewards and sanctions.
There is strong political support for the use of assessment results for system
accountability, as reflected in the new provisions of NCLB. Administrators
and teachers are rewarded or sanctioned based on student test performance, despite
having little or no influence on some factors that significantly impact student
achievement, such as student mobility and parent involvement. In some schools,
these consequences could negatively affect instruction for all students, including
students with disabilities, by dramatically narrowing the curriculum to emphasize
test content and encouraging the use of generally inappropriate "quick fix"
approaches to student learning.
4) Impact on mainstream education. High stakes
testing programs can also have unintended but negative effects on the education
provided to all students by narrowing the curriculum and unduly emphasizing
basic skills to the exclusion of the arts, technology, sciences and humanities;
creating a culture of "teach-to-the-test"; increasing the psychological
stress on children and families; and decreasing teacher job satisfaction.
Further, schools may focus limited resources on efforts to directly improve
test scores, rather than on strategies to improve school climate
and student learning. Tests should inform instruction, not dictate
what is taught.
5) Impact on referrals to special education. Some
schools respond to increased pressure from accountability associated with
high stakes testing by increasing the number of children they identify as
needing special education supports. Systematic methods of collecting data
on special education referrals and placement are critical in order to accurately
monitor this trend across time and make comparisons within and among schools
from year to year.
6) Impact on student mental health. When
"failing" the test means failing the grade, failing to graduate, or even lesser
consequences such as attending summer school or loss of certain privileges,
students may experience long-term anxiety, low self esteem, depression, etc.
At a more systemic level, class-wide and building-wide testing can put students,
teachers and administrators at risk for anxiety and other forms of emotional
distress. These consequences can impact not only test-taking but also learning
and motivation.
Interpreting Results From Large Scale Assessments: Cautions and Considerations
NASP strongly urges districts and states to take great care
when applying test results from large-scale assessments to high stakes decisions
such as graduation, retention, merit pay, etc. School psychologists have expertise
in assessment, and can play a key role in helping others to appropriately
interpret and use results from large scale assessments. Factors that influence
the accurate interpretation of standards test results including the following:
Who is assessed?
There may be inconsistency in the groups of students included in the state
assessment reports over time. For example, when students are retained or drop
out, the group of students included in testing changes. Further, some states
and districts continue to exclude some students with disabilities and/or limited
English proficiency from their assessment systems, in violation of Civil Rights
statues. New mandates and funding incentives may further pressure states
to exclude groups of students who might tend to score below standards or require
extensive accommodations. Additionally, due to high student mobility in some
areas, many students tested in one school in a given year may have received
much of their instruction elsewhere. Measuring effectiveness of instruction
across schools or over time is severely compromised with highly mobile populations.
What tests are used and what do they measure?
Assessment programs vary in many ways across states, as some states compare
individual student performance to a national group, while others compare individual
student performance to established performance standards. Further, states
differ in the content measured and how proficiency is defined and demonstrated.
For example, some states may use "minimum standards" while others use "high
standards." Although trends within states are more reliable for comparison
than cross-state trends, even comparisons within a given state must be reported
carefully to assure similar data and standards are used.
What accommodations were provided? States have different rules about the kinds of accommodations
that can and cannot be used for students with disabilities and students with
limited English proficiency. It is important to know not only that students
were given appropriate accommodations, but also the kinds of accommodations
given, how reliably these accommodations were implemented, and if accommodations
were provided across all testing situations. The interpretation of test data
may be unreliable when accommodation practices are inconsistent.
How
are test results used? While following
recognized standards for test development and standardization will help to
assure reliable and valid results, administrators and other school personnel
should exercise extreme caution when applying results of large scale assessments
to decision making about individual students. Myriad factors can impact the
performance of any one student at a single point in time, significantly reducing
the reliability of test scores. Therefore decisions regarding the promotion,
graduation, placement or referral of individual students should be based on
multiple sources of individually obtained data rather than the results of
large scale assessment.
Recommended Guidelines for Large Scale Assessments
School
psychologists must take an active role in promoting the appropriate use of
large scale assessments and test results. The National Association of School
Psychologists recommends the following principles to guide large scale assessment
programs at the school, district, state and federal levels:
Including All Students
-
To support the necessary inclusion of all students in standards-based
assessment programs, schools must appropriately implement accommodations,
modifications, or alternate assessments when necessary, as determined by the
Special Education Team. In addition, all reports should include the results
for all students tested, clearly identifying which students are included in
each data set while protecting the personal identities of individuals. Further,
interpretation of student progress should be based upon high but realistic
expectations. While all students can learn, rates and styles of learning differ
widely even within the "normal" range of ability.
-
NASP urges careful consideration when interpreting the results
of large scale assessments for individuals or groups of students with disabilities
or limited English proficiency, as these tests may not adequately reflect
the content or level of their instruction or address realistic instructional
goals. Other strategies, such as curriculum-based measurement and progress
monitoring of skill performance, should be used in combination with large
scale assessments to reliably and validly measure progress in the general
education curriculum for these students.
Decision Making for Individual Students
-
NASP opposes the use of performance on a single test or standards
test battery as the sole determinant in any decision about promotion, retention,
instructional placement, scholarship or graduation for any individual student.
Rather, NASP supports the use of multiple measures of academic achievement,
including grades, curriculum-based procedures and teacher evaluations, as
well as parent input, in making such decisions.
-
If tests are used as components of high stakes decisions such
as graduation, NASP urges consideration of options such as phasing in testing
requirements over time, providing opportunities for retesting, or developing
procedures for appeals and waivers, as well as other sources of information
about student performance.
-
Consequences of high stakes testing for individual students
should not be posed as either-or choices, but as indicators of the need for
and implementation of educational strategies such as early intervention, programmatic
changes, or specific evaluation of learning problems.
-
If an assessment is to be used as part of a high stakes decision
(e.g., promotion, graduation), the test content must be aligned to curriculum
and instruction. Students must have adequate opportunity to learn the material
covered by the test.
Test Design and Selection
-
NASP
asserts that tests must meet professional standards of technical adequacy
and must be reliable and valid for the purpose for which they are being used.
Further, tests designed to measure progress towards standards must be appropriately
aligned with standards, curriculum and instruction, and opportunity to learn.
School psychologists should provide consultation to districts and policymakers
to assure that technical issues tied to test construction and selection are
addressed.
-
Tests should be critically reviewed to determine whether they
are designed and developed to be accessible and valid for the widest range
of children and youth, including students with disabilities and students with
limited English proficiency. When tests are so designed, fewer accommodations
will be needed and scores can be more validly compared.
-
States should distribute information about the appropriate
use of large scale assessments and caution educators and parents about the
limitations of such tests.
Training
-
Policymakers must fund and support ongoing staff development
and training opportunities for educators as required by NCLB. The rapid pace
of implementing inclusive large scale assessments has not allowed sufficient
opportunities for training. In many cases, parents, educators, and administrators
are making decisions related to testing and inclusion without clear procedures
in place or an understanding about the effects of their decisions.
Evaluation and Research
-
All standards testing programs must have a systematic evaluation
plan to assess appropriate selection and implementation of procedures as well
as student and system outcomes. Evaluation must consider the match between
the use of a test and its design; differences in performances across groups
of students and possible sources of bias; the degree to which all students
are included; compliance with intended accommodations, modifications and alternative
procedures; and the intended and unintended consequences of the testing program
for individual students, staff, schools, districts and states. Evaluation
plans should be continuously monitored and modified as needed.
-
NASP supports ongoing research to address many unanswered questions
about large scale assessment and to assure development and implementation
of accurate, fair and useful measures of student and system progress.
Funding
-
Policymakers must ensure that mandated state-wide or district-wide
assessment programs have sufficient funds and timelines to develop, implement,
maintain, and evaluate a high quality process. Large scale assessment is a
complex and costly endeavor when appropriately designed and implemented. However,
such assessment is even more costly when inadequate or inappropriate procedures
are used.
Summary
It is the position of the National Association of School Psychologists
that standards-based tests be used as global indicators of student and program
progress, and to highlight the need for additional resources, not to determine
educational placement or graduation eligibility for an individual child, or
to establish rewards or sanctions for any personnel, school or district.
Policymakers are urged to carefully monitor and evaluate the actual consequences
of large-scale assessment programs and to implement essential guidelines for
the development and application of these accountability systems.
Adopted by the NASP
Delegate Assembly, April 12, 2003
Key Supporting Resources
Almond,
P.J., Lehr, C.A., Thurlow, M.L., Quenemoen, R. (in press). Participation
in large-scale state assessment and accountability systems. Chapter in
Large-Scale Assessment Programs for All Students: Development, Implementation
and Analysis.
American Educational Research Association (2000). AERA position statement
concerning high-stakes testing in PreK-12 education. Available: www.aera.net/about/policy/stakes.htm
Elliott,
S.N., Braden, J.P., & White, J.L. (2001). Assessing one and all: Education
and accountability for students with disabilities. Washington, DC: Council
for Exceptional Children.
Heubert, J.P., & Hauser, R.M. (1999). High stakes testing for tracking,
promotion, and graduation. Washington, DC: National Academy Press. Available:
www.nap.edu/books/0309062802/html/index.html
National Association of School Psychologists (2002). Large
scale assessments and high stakes decisions: Facts, cautions and guidelines.
Bethesda, MD: Author.
National
Research Council (1999). Testing, teaching, and learning: A guide for states
and school districts. Washington DC: Author.
Quenemoen,
R.F., Lehr, C.A., Thurlow, M.L., Massanari, C.B. (2001). Students with
disabilities in standards based assessment and accountability systems: Emerging
issues, strategies, and recommendations. (Synthesis Report 37). Minneapolis,
MN: University of Minnesota, National Center on Educational Outcomes.
Ysseldyke, J.E. & Bielinski, J. (2001). Critical questions
to ask when interpreting or reporting trends in the large-scale test performance
of students with disabilities. Washington, DC: Council for Chief State School Officers, State Collaborative
on Assessment and Student Standards.
Additional Resources
Council for Exceptional Children (www.
cec.sped.org)
National
Association of School Psychologists (www.nasponline.org)
National
Center for Educational Outcomes, University of Minnesota (www.coled.umn.edu/NCEO)
© 2003, National Association of School Psychologists-4340 East
West Hwy #402, Bethesda, MD 20814