The Wyoming Comprehensive

 

Assessment System

 

Design Report

 

 

 

Prepared for the Wyoming State Legislature

 

by

 

The Statewide Assessment Design Team

 

 

 

 

 

January 16, 1998

 

 

 

 

 

 

 

 

Wyoming Department of Education

Judy Catchpole, Superintendent of Public Instruction

Hathaway Building, Second Floor

2300 Capitol Avenue

Cheyenne, WY 82002-0050

 

 

 

The Wyoming Department of Education does not discriminate on the basis of race, color, national origin, sex, age, or disability in admission or access to, or treatment or employment in its educational programs or activities. Inquiries concerning Title VI, Title IX, Section 504, and the Americans with Disabilities Act may be referred to the Wyoming Department of Education, Office for Civil Rights Coordinator, 2nd floor, Hathaway Building, Cheyenne, Wyoming 82002-0050 or (307) 777-6198, or the Office for Civil Rights, Region VIII, U. S. Department of Education, Federal Building, Suite 310, 1244 Speer Boulevard, Denver, CO 80204-3582, or (303) 844-5695 or TDD (303) 844-3417. This publication will be provided in an alternative format upon request.

 

 

The Statewide Assessment Design Team Members

 

 

Senator Irene Devin, Co-Chair
Wyoming Senate
3601 Grays Gables Rd.
Laramie, WY 82070
742-3901

Donn McCall, Co-Chair
Wyoming School Boards Association
405 East 10th
Casper, WY 82601
234-1000

Mary Ann Burton
Citizen
809 Olympus Dr.
Sheridan, WY 82801
672-9178

Kathy Milburn
Teacher, Wilson Elementary School
P.O. Box 729
Wilson, WY 83001
733-3077

Von Dahl
State Board of Education
601 West Lott St.
Buffalo, WY 82034

684-9571

Representative Pat Nagel
Wyoming House of Representatives
1105 South Durbin
Casper, WY 82601
265-1421

Lisa Halsey
Parent / Elementary Guidance Counselor
Box 1299
Wheatland, WY 82201
771-2570

Bill Schilling
Wyoming Heritage Society
139 West Second St.
Casper, WY 82601
577-8000

Jacque Harrod
Vocational Education Teacher
Worland High School
17th & Washakie
Worland, WY 82401
347-2412

Ruth Sommers
Parent
756 Silver Sage
Cheyenne, WY 82009
632-0157

Sharon Knudson
Principal
Jessup Elementary School
6113 Evers Blvd.
Cheyenne, WY 82009
771-2570

 

 

 

 

Technical Advisors to the Statewide Assessment Design Team

Mr. Joe Simpson, Wyoming Department of Education
Dr. Alan Sheinker, Wyoming Department of Education
Dr. Linda Hansche, Georgia State University
Dr. Dale Carlson, California Department of Education
Mr. Scott Marion, University of Colorado, Boulder

 

Support Staff

Ms. Sharon Brokaw, Wyoming Department of Education

 

Wyoming Comprehensive Assessment System Design Report Prepared by:

Scott Marion, Alan Sheinker, Linda Hansche, & Dale Carlson
under direction of the Statewide Assessment Design Team.

 

 

Table of Contents

The Statewide Assessment Design Team Members
Executive Summary
Recommendations from the Statewide Assessment Design Team
Some notes about costs
Questions and Answers
Introduction and Background
Introduction
The Statewide Assessment Design Team Process
An Overview of Standards-Based Assessment
Types of Standards
Assessment Systems
The Proposed Wyoming Comprehensive Assessment System
Overview
Assumptions
Timing
Purposes of the Wyoming Comprehensive Assessment System
Rationale
The State Assessment System Model
Special Populations
Financial Considerations
The District Role in the Wyoming Comprehensive Assessment System
The National Role in the Wyoming Comprehensive Assessment System
Reporting System
The Stakes Associated with the Assessment System
Implications
The Request For Proposal (RFP) Process
Evaluation of the Wyoming Comprehensive System
Conclusions
Appendix A: Glossary of Terms.

 

 

Wyoming State Assessment System Design Report

Executive Summary

 

Introduction

In 1992 Wyoming's fourth grade students ranked tenth of the forty-eight participating states on the National Assessment of Educational Progress mathematics test, but dropped to twenty-third in 1996. There was a similar pattern for eighth grade students. What happened? Why did students from so many states surpass Wyoming's fourth and eighth graders during this four year interval? It is difficult to conclusively explain this relative drop in Wyoming student achievement, but it is helpful to examine the states that have surpassed Wyoming. Montana, Texas, North Carolina, Michigan, and Vermont are examples of states that ranked lower than Wyoming in 1992, but higher in 1996. Many of the states now ahead of Wyoming in mathematics achievement have implemented massive statewide educational reform programs, including large-scale assessments.

 

There have been countless national reports in recent years calling for major improvements in the United States educational system. Closer to home, several prominent organizations, especially business-related groups such as the Wyoming Heritage Society, have called for improvements in the education of Wyoming school children. Many calling for reform have advocated a systemic approach for improving public education. A systemic strategy recognizes the importance of each level of the organizational structure from the highest level state policy makers to students and teachers in individual classrooms. Many of the states now ahead of Wyoming on national achievement rankings have instituted systemic educational reform programs. The proposed Wyoming Comprehensive Assessment System is one important piece of a statewide standards-based reform strategy. The proposed state-level assessment system is NOT designed to be the only assessment component of the full assessment system. Most educational testing will still be designed and implemented at the district level. The state system is not intended to replace or usurp the role of district and classroom assessments! It will compliment district and classroom assessments and provide a more complete picture of Wyoming's educational system.

 

Standards-Based Education

A standards-based educational system contains several key components. First, content standards are established for specific subject areas. Wyoming school districts began this process in 1990. The development of state content standards, based on local district standards, is nearing completion. These standards define the important knowledge and skills for students to master. Once content standards are written, performance standards are developed to define "how good is good enough?" State performance standards are currently under development. Assessments are designed to measure students' progress toward meeting the content standards at specific benchmarks. The assessments provide information about the achievement level of the students and help determine which performance level they reached.

 

Types of Assessments

Assessments can take many forms. Those associated with standards-based educational systems are referred to as standards-based or standards-led assessments. Quality in a standards-based system is judged according to a predetermined criterion of excellence in meeting the content standards. School-level reports usually emphasize the percentage of students actually meeting a particular performance level (e.g., partially proficient, proficient, advanced). Standards provide meaningful targets for the educational system. When students can meet the highest standards, there is evidence of real educational achievement. In norm-referenced assessment systems, on the other hand, students are compared to one another and to a representative "norming" sample. Quality is judged in relation to other students and is often reported as percentile rank, which is the percentage of students (or schools) that an individual (or school) outscored. To illustrate the differences between the two approaches, consider a fictional group of climbers making an ascent of the Grand Teton. If we were measuring the mountaineers’ progress using a norm-referenced approach, we would focus solely on the climbers’ relative position on the mountain to one another or to the records of some other group that had climbed the Grand previously. A standards-based scheme would establish performance levels at specific elevations with the highest standard being the summit. A progress report would inform us of the climbers’ particular achievement compared to these specific elevations and to the ultimate goal of the peak (see Table 1, p. 7).

 

The Statewide Assessment Design Team

As part of the continuing efforts to implement a standards-based educational system in Wyoming, the Wyoming Supreme Court mandated that the State develop an assessment system to monitor the academic performance of Wyoming students. Enrolled Act No. 2, which was enacted during the 1997 special session of the Wyoming Legislature provided the statutory authority of the Statewide Assessment Design Team and established the committee's charge. The primary mission of this group was to propose a statewide system of educational assessment, which measures student achievement of the Wyoming content standards, i.e., the assessments must be standards-based. The legislation also specifies that the statewide assessment system focus first and foremost on measuring mathematics, reading, and writing achievement in grades 4, 8, and 11.

 

Purposes of the Statewide Assessment System

After working for four months, including four, two-day public meetings, the design team formulated several recommendations and proposed an assessment system. Prior to proposing an assessment system, the design team had to be clear about the purposes of the assessment system. The following were considered most important purposes by the design team and will drive the entire assessment system:

 

 

Of the four, school improvement was the consensus choice as the single most important rationale for developing the Wyoming Comprehensive Assessment System. To fulfill these purposes, but most importantly school improvement, it was crucial to design a system that provides detailed information about school-level academic achievement. This includes information about performance related to specific Wyoming content standards. These data are vital to school improvement, but the design team also strongly supports providing test results to individual students and their parents.

 

It is virtually impossible to fulfill multiple purposes with a single test, especially when the purposes are not entirely consistent both in terms of the demands and stakes they place on schools and the type of information required to make decisions about the particular purposes. For example, school improvement requires detailed information about performance across most aspects of the curriculum so that information can be used to tailor instructional plans. These data should be returned in a timely manner. On the other hand, summary information collected from a sample of students on selected parts of a program is all that is necessary for monitoring. Accountability, depending on the stakes involved, falls somewhere between these two extremes. Finally, if a test is used to provide national comparisons, it must be administered to a nationally representative sample of students at least once every 5-8 years. Because the Statewide Assessment Design Team wanted to develop an assessment strategy faithful to these four purposes, a multi-faceted system is proposed which includes both norm-referenced and standards-based assessments. With the purposes and rationale defined, the design team’s recommendations can now be presented.

 

Recommendations from the Statewide Assessment Design Team

 

1. The Statewide Assessment Design Team recommends developing and implementing a comprehensive assessment system that includes both standards-based and norm-referenced tests. The standards-based assessments for reading/language arts and mathematics devotes approximately equal time to multiple-choice items, short, open-ended (constructed-response) problems, and longer, extended-response tasks. The multiple-choice items will be common to each student while the open-ended problems will use both common and matrix-sampled tasks. The Statewide Assessment Design Team supports a writing assessment where each student responds to two prompts requiring extended-responses: one common to all students and one as part of a matrix sample1. The Statewide Assessment Design Team recommends the use of an analytic scoring rubric statewide because it provides more information for school improvement purposes. Using multiple approaches provides a valid means of assessing students all along the continuum of cognitive complexity. This model is designed to assess fourth, eighth, and eleventh grade students’ progress toward meeting the Wyoming Content Standards in reading, writing, and mathematics. The assessment system also includes a nationally norm-referenced exam to provide national comparisons. The norm-referenced exam relies on the use of multiple-choice questions. The recommended model is designed to provide detailed information (i.e., performance for each standard) at the school and district level and accurate summary information to students.

 

2. The Statewide Assessment Design Team strongly recommends that the Legislature and other policy makers recognize the importance of the multi-faceted nature of a comprehensive assessment system and support the local and national components of this system. The state-level assessment system described above is only one piece of the Wyoming Comprehensive Assessment System. Assessments conducted at the local level, especially by teachers in individual classrooms, are the most vital components of a comprehensive assessment system. The National Assessment of Educational Progress (NAEP), already being used in Wyoming, represents the national level assessment. Having well-articulated components of the comprehensive system will increase the likelihood of improving schools and student achievement.

 

3. The Statewide Assessment Design Team recommends that the Legislature authorize the Department of Education to issue a Request for Proposal (RFP) to hire a testing company to develop and implement the state assessment system. The design team further recommends the following:

a. This design report serve as the framework for the RFP;

b. The authorization for the RFP be granted as soon as possible so that a testing contractor can be hired in time to conduct initial testing during spring, 1999;

c. The statutory authority of the Statewide Assessment Design Team be continued so that members could advise Department of Education personnel when writing the RFP;

d. In addition to providing advice during the writing of the RFP, the design team along with additional personnel from Wyoming school districts should serve as advisors to the Department of Education when selecting the assessment contractor;

e. In order to assess ALL students, the RFP should require the contractor to include the development of alternate assessments and versions of the tests to meet the needs of Limited English Proficient (LEP) students;

f. The RFP should require the assessment contractor to include a norm-referenced test as part of the assessment system and the bidders should have to provide evidence of the match between their norm-referenced test and Wyoming Content Standards;

g. The successful bidder should be required to provide evidence of a plan for maintaining the security of the assessment system;

h. The costs of ongoing system maintenance, such as the development of new test questions, reporting, and equating scores for year-to-year comparisons, should be included in the assessment contract; and

i. The importance of consistency in the assessment system, especially during the first few years, leads the Statewide Assessment Design Team to recommend the Department of Education enter into a five-year agreement with an assessment contractor, contingent on continued state funding.

 

4. While there are four purposes of the assessment system, the design team supports the use of the data for school improvement, first and foremost. The assessment system should always be evaluated, and modified if necessary, in terms of the quality of data provided to those educators responsible for school improvement decisions and actions.

 

5. The Statewide Assessment Design Team recommends that these data be incorporated into school report cards as part of the Uniform Reporting System. The assessment results are not intended for accountability decisions, in and of themselves. Rather, the test results should contribute useful achievement information to overall judgments about school quality.

 

6. The Statewide Assessment Design Team recommends that a systematic evaluation of the Wyoming Comprehensive Assessment System be commissioned in the legislation authorizing the RFP. Additionally, the design team recommends that the system be open to researchers, subject to Department of Education approval, for conducting studies of the assessment system so that Wyoming policy makers and citizens can develop a greater understanding of the assessment system.

 

7. The Statewide Assessment Design Team recommends that the Legislature include enough funding in the education appropriation to support at least five days of professional development during the summer or other times when substitute teachers would not be needed. The Statewide Assessment Design Team recognizes that merely writing content standards and implementing assessments is not enough to bring about educational improvement. A concerted effort at many layers of the educational organization is required. One of the most important components of standards-based is a systematic professional development program. Although, research clearly indicates the need for sustained professional development, five days is only a minimal amount. Therefore, the design team encourages school districts to use a portion of their funding from the Cost-Based Block Grant Model (also known as the Management Analysis & Planning Associates--MAP--Model), some of which is already designated for in-service education, to support staff development related to standards and assessment.

 

8. The Statewide Assessment Design Team recommends that the Wyoming State Board of Education study several potential proposals designed to give more credibility to high school diplomas in Wyoming. The design team would like the State Board to consider implementing a "high school graduation guarantee" similar to proposals instituted in several Wyoming school districts. Students who have satisfactorily met Wyoming's Common Core of Knowledge and Skills with grades of "B" or better would be eligible for free remediation if they were not ready for appropriate college-level courses. Similarly, the design team suggests having a signed diploma certifying that the student has completed the Common Core of Knowledge and Skills and is ready for the workplace or postsecondary education. These two types of certification fit the philosophy of a standards-based educational system.

 

9. The Statewide Assessment Design Team recommends that the results of these assessments be widely distributed through newsmedia and as a component of the Uniform Reporting System. The design team recognizes that the consequences associated with public reporting will vary by district and may result in a variety of community responses. The comprehensive nature of the assessment at the school level allows profiles of strengths and weaknesses to be constructed and permits monitoring of achievement trends over time. Local educators can use this information to make efficient choices about expending district resources, particularly professional development resources. The assessment results and subsequent school improvement decisions are crucial aspects of the school accreditation review and will be evaluated through this process. Further, the design team is committed to recognizing excellence and supports honoring exemplary schools and districts for high and/or improved performance. These schools could be studied to learn about and share "best practices" in Wyoming education.

 

10. The Statewide Assessment Design Team recommends that the Wyoming Legislature appropriate not more than $1.775 million per year ($3.55 million per biennium) for each of the next five years to fund this assessment system. The committee recommends that the contract, subject to continued funding, be awarded for five years to ensure consistency in the program, thereby making the year-to-year results more comparable. Another reason for a long-term contract is that, few, if any, contractors would bid on a proposal without the expectation of entering into at least a four-year agreement. The estimated cost of this system for five years would not exceed $8.875 million. While these costs are higher than the $950,000 per year allocated through Enrolled Act 2, the Statewide Assessment Design Team strongly encourages the Legislature to consider the educational value of the proposed system.

 

Some notes about costs

To compute approximate costs associated with implementing a comprehensive assessment system, the design team solicited preliminary estimates from four national testing contractors. These contractors were asked to include the costs associated with test development, scoring, reporting (to the state, school, and individual levels; including disaggregated results), setting performance standards, producing multiple forms of the test, developing an alternative assessment for students with severe disabilities to meet the requirements of the Individuals with Disabilities Education Act by the year 2000, printing, and mailing. As specified in the design, contractors were asked to include the cost of administering, scoring, and reporting the results of a norm-referenced test. The cost for the norm-referenced test is included in the estimate presented above, but if estimated separately, the norm-referenced test would cost approximately $5-$8/year/student tested or $250,000 - $400,000 for the biennium.

Contractors were also asked to include the cost of replacing extended response items each year so that a few of these tasks could be released each year. The design team hopes that teachers will incorporate these types of assessments in their classroom activities. Each summer, teachers could re-score a percentage of the extended-response items to learn more about scoring this type of assessment. Other ongoing costs include equating year-to-year assessment results, helping to teach educators and the public about assessment results, and public reporting. Committees of educators and citizens will be needed to review items for content and bias. Special population issues will consume more time and training during the first few years. Training in the areas of participation and accommodations will be critical to ensure comparability of the assessment across the state and to make sure that this truly is an assessment system for ALL students.

 

Wyoming citizens and policy makers need to realize that a political and financial commitment to a standards-based educational system is necessary to positively influence the academic achievement of Wyoming students. Many of the states that surpassed Wyoming on the National Assessment of Educational Progress have implemented successful standards-based educational programs. Their success is related to their financial commitment without which activities such as teacher professional development programs--perhaps the most important piece of the reform--could not be adequately supported. Related to financial commitment, political support from legislators and other key state-level policy makers is crucial to the long-term success of any educational improvement activity. For example, two major standards-based reform efforts in Kentucky and Vermont have come under intense scrutiny from both outside and within the respective states, but their policy makers have continued to publicly support these programs. Consequently, these are two of the most successful state-level educational programs in place today. The proposed statewide assessments system results from Wyoming Supreme Court and State Legislature actions and also serves to fulfill pending federal requirements, but these legal rationales alone are not enough to ensure a successful assessment system. The Statewide Assessment Design Team urges the Legislature, the Governor, the Superintendent of Public Instruction, parents and other influential public figures to fully support the Wyoming Comprehensive Assessment System.

 

Questions and Answers

 

Ten commonly asked questions about standards-based education and statewide assessment in Wyoming....

 

 

 

1. Why is the state mandating this requirement?

There are several reasons for instituting this state requirement. First, the Wyoming Supreme Court ruled that in order to evaluate the equality of Wyoming's education, a simple input model (e.g., spending, teacher qualifications) would not suffice. The Supreme Court ruled that a statewide assessment system is necessary to evaluate the equality of educational opportunity for Wyoming school children. Further, the Court mandated this assessment system must be designed to measure the achievement of Wyoming content standards.

Second, Wyoming receives approximately 38 million dollars from the federal government for such programs as Title I of the Improving America’s Schools Act (IASA) and the Individuals with Disabilities Education Act (IDEA). New federal laws require these programs to be evaluated through the use of multiple measures, some of which must be standards-based. Wyoming must be in compliance with federal regulations by 2000-2001.

Finally, if Wyoming citizens want ALL Wyoming students to keep pace with those from other states and countries, it is necessary to adopt a systematic educational improvement strategy. Standards-based educational approaches have proven effective in many other states, albeit with certain limitations, and Wyoming citizens need to take advantage of "best practices" to move our public educational system into the 21st century. Wyoming is one of three states that have not yet adopted content standards and a state assessment program. Wyoming citizens and policy makers cannot afford to wait any longer before taking action.

 

2. What is an assessment system?

 

An assessment system is designed to provide a comprehensive picture of student and school achievement. The terms assessment and test are essentially interchangeably except that assessment tends include a greater variety of methods. A statewide assessment system should include multiple components and be targeted to various organizational levels (i.e., local, state, national) of the educational system. The most important characteristic distinguishing an assessment system from a collection of tests is that a system is designed to provide a cohesive array of information for educators and policy makers. A systems approach allows the assessments to serve multiple purposes. Multiple sources of information are useful for making policy decisions, while some data are vital for teachers as they plan their instructional programs. The "system" ensures that multiple assessments focus on the same content standards and report results in compatible ways.

 

3. What is standards-based assessment?

 

Standards-based or "standards-led" assessments are closely linked to what is taught, i.e. content standards. This produces a tight coupling between the curriculum and the test. Standards-based assessments are designed to measure student achievement against a pre-established level or standard, whereas norm-referenced tests primarily compare one student's performance to other similar aged students. Standards-based exams often incorporate performance assessments requiring students to demonstrate thinking skills at a higher level than are typically required on multiple-choice tests. These items can be scored very reliably through the use of well-developed scoring guides and extensive scorer training.

 

4. Will standards-based education guarantee improvement in Wyoming’s educational system?

 

No, not necessarily. Standards-based education is a useful and effective method of aligning the multiple components of an educational system, but in and of itself, it is unlikely to lead to large gains in student achievement. In the states with the most successful reform programs, there have been financial and most importantly, political commitment to a larger reform effort. For example, monetary resources to support professional development so teachers can receive the proper training to teach and assess the new content standards is crucial to ensuring a successful program. However, political support of the new standards-based system is just as important. If the public and educators sense that the reform does not have substantial political support, many might "hunker down" with the hopes that "this too shall pass." Two states, in particular, serve as models in this regard. Both Kentucky and Vermont have weathered many attacks on their reforms because their public leaders continued to support their programs fiscally and politically. There is no question that the money is important, but political support (perhaps the financial commitment is evidence of political support) helps to make sure all players in this systematic reform from the highest level policy makers to classroom teachers share the same goals.

 

5. Why not just require all school districts to use the same norm-referenced test as the state assessment system?

 

There are two crucial reasons why this approach could not be adopted. Norm-referenced tests are designed to measure the achievement of students relative to one another and to a nationally representative sample of similar-aged students. Norm-referenced tests do not measure performance against a specific set of criteria. They are not standards-based and typically test only a narrow range of knowledge and skills. If all districts adopted the same norm-referenced test, it would be a de facto state test, but it would not serve to adequately monitor student progress toward meeting state content standards.

 

Norm-referenced tests are designed to survey achievement based on a wide variety of curricular approaches. They tend to focus on measuring as many common curricular areas as possible, but this often translates into assessing a low-skills common denominator. While many norm-referenced test items could be matched to Wyoming standards, the converse is not true. The content coverage, both in terms of depth and breadth, is much broader in the standards than would be found on any of the "off-the-shelf" norm-referenced tests. Therefore, many of the Wyoming state content standards would not be assessed at all if a norm-referenced test was used.

 

There is a great deal of evidence which indicates what gets tested, gets taught. Traditional norm-referenced, multiple-choice tests, with high stakes attached, tend to narrow the curriculum in ways counter to the goals associated with having rigorous content standards. Norm-referenced tests can be useful as a component of a state assessment system, but are unable to meet the legal or educational requirements as the centerpiece of a state testing program.

 

6. What is the relationship between the development of the state assessment system and the development of state standards?

 

Since 1990, Wyoming school districts have been developing content standards. A separate committee has been charged with developing the Wyoming State Content Standards based on these district standards. This bottom-up approach honors local efforts, and the state benefits from previous district work. The Statewide Assessment Design Team only needed to see drafts of the standards to get a sense of the types of knowledge and thinking skills that will be required of students. The standards will need to be formally adopted by the time a Request for Proposal for an assessment contractor is developed. At that time, the contractor and personnel from the Department of Education will need to align the assessments with the Wyoming content standards.

 

 

7. Will this assessment system replace local assessments?

 

No. Many districts have well developed assessment systems currently in place, and there is no intention to have the state system replace local tests. It is the design team's goal, however, to have the local and state assessment become two aspects of a well articulated assessment system. The state will be responsible for testing only mathematics, reading, and writing in fourth, eighth, and eleventh grades. The assessment of the remaining Common Core of Knowledge and Skills, as well as testing at grades other than 4, 8, & 11, is the responsibility of local authorities. The local assessment system, especially classroom assessment, is the most important component of the system for influencing teaching and learning. Classroom and district assessments are able to provide finer grained and more timely information to teachers and students. This information is crucial for helping teachers meet the academic needs of individual students.

 

8. Why not just adopt another state's assessment system?

 

In essence, Wyoming would be doing this. When an assessment contractor is hired, Wyoming will be able to benefit from their experiences with many other states. We can use the best items from well-tested item banks developed as part of many other state assessments match to Wyoming standards.

 

Pragmatically, adopting another state's assessment in its entirety is not a viable option because few, if any, states would allow another state to use their existing assessment system. Additionally, as other states modify their systems in response to public input, Wyoming’s system would be forced to change along with the other state. For example, the design, grade levels, and content areas assessed in Colorado's new state assessment program have changed substantially during its relatively short life span. If Wyoming had decided to adopt Colorado's assessment program, Wyoming policy makers and educators would have been forced to change by either following Colorado's lead or finding a new state assessment program to adopt. This would probably be unacceptable to Wyoming citizens.

 

9. How do you include all students in the assessment system?

 

There are many approaches to maximize student participation in the testing program. First, clear expectations that ALL students be included in the state assessment system are crucial. With this political commitment, we expect that up to 90-95% of Wyoming students, in the three grades tested, could participate in regular testing situations.

 

For those students unable to participate in regular testing due to a specific disability, accommodations will be provided to help them complete the regular assessment. For example, students with certain types of attention disorders might be allowed to take the tests in a smaller, less distracting room or they might be given extra time to complete the assessment. In general, assessment accommodations reflect the type of instructional accommodations listed on the student’s Individualized Educational Plan (IEP). Through the use of accommodations, an additional 3-5% of Wyoming students will now be able to participate in the state assessments.

 

For students with more severe cognitive disabilities, an alternate assessment would be used to provide them with a meaningful test of their educational achievement. These alternate assessments should include an additional 0.5-2% of Wyoming students. The few remaining students would be only those with the most severe disabilities.

 

 

10. How will second language learners or Limited English Proficient students (LEP) participate in the state assessment system?

 

Less than two percent of Wyoming’s students are classified as Limited English Proficient (LEP), but in spite of this small percentage, the design team recommends including as many of these students as possible in the assessment system. Most test publishers have Spanish versions (the predominant non-English speaking group) of their norm-referenced tests. The testing contractor would also be expected to develop a version of the standards-based assessments to meet the needs of LEP students according to "best practices."

 

 

Wyoming State Assessment System Design Report

 

Introduction and Background

 

Introduction

The 1995 Wyoming Supreme Court decision mandated that the State develop an assessment system to monitor the academic performance of Wyoming students. The Statewide Assessment Design Team was established from the legislation resulting from this court decision. The design team's primary mission was to propose a system of educational assessment to monitor how well Wyoming schools are meeting the educational needs of Wyoming students.

 

Enrolled Act No. 2, House of Representatives of the Fifty-Fourth Legislature of the State of Wyoming, 1997 Special Session, provides the statutory authority of the committee and establishes the design team's charge:

. . . The statewide assessment design team shall establish the assessment system it is required to implement under this section by reviewing existing assessment systems being used by Wyoming school districts and those available nationally and select or design from among those systems studied the most suitable one for Wyoming so as to measure Wyoming student progress and performance in a manner that is understandable to Wyoming citizens. The assessment system shall be aligned to the statewide educational program standards, shall specifically assess student performance in reading, writing, and mathematics at grades four (4), eight (8), and eleven (11) and may measure other common core of knowledge and skills established under W. S. 21-9-101 (b) which can be quantified.

 

The outcome of the Design Team's efforts to develop an assessment system for statewide assessment in Wyoming is reflected in this report. While an assessment system involves many components, there are facets of an assessment system that fall outside of the team's responsibility. The Statewide Assessment Design Team is NOT charged with developing content or performance standards, selecting specific assessments to be used at the local level, writing actual test items, writing the Request for Proposal (RFP),2 selecting a contractor through the RFP process, or designing a statewide reporting system. These functions are the purview of other committees and/or the Department of Education. However, this design reports provides the framework for developing the RFP.

 

The composition of the Statewide Assessment Design Team was specified in the enabling legislation that follows (in italics). Individuals selected to serve are identified in the same list.

The statewide assessment design team shall be comprised of the following:

(i) One (1) member appointed by the superintendent of public instruction (Mary Ann Burton);

(ii) One (1) member who is a school principal from a Wyoming school district appointed by the person's professional organization (Sharon Knudson);

(iii) Two (2) members who are teachers in Wyoming school districts appointed by the governor (Jacque Harrod and Kathy Milburn);

(iv) One (1) member who is serving as a trustee on a Wyoming school district board of trustees appointed by the person's professional organization (Donn McCall, Co-Chair);

(v) One (1) member who is serving as a member on the Wyoming state board of education appointed by the board of education (Von Dahl);

(vi) One (1) member appointed by the governor to represent private business (Bill Schilling);

(vii) One (1) member of the Wyoming senate appointed by the president of the senate (Senator Irene Devin, Co-Chair);

(viii) One (1) member of the Wyoming house of representatives appointed by the speaker of the house ( Representative Pat Nagel); and

(ix) Two (2) members who are parents of school children appointed by the governor (Lisa Halsey and Ruth Sommers). (Wyoming House of Representatives, 1997, pp. 22-23)

 

The Governor, State Superintendent, President of the Senate, Speaker of the House, State Board of Education, and state professional organizations are to be commended for appointing committee members who took their responsibilities seriously. The Statewide Assessment Design Team members officially volunteered their time between September, 1997 through January, 1998 to attend four two-day meetings and a video conference plus additional planning, travel, and "homework" time. More than 1,000 citizen hours were "donated" in developing this assessment system.

 

The interests in and experience with educational assessment among committee members were diverse. Because individual committee members wore various hats, a crossover of perspectives permitted the committee to function effectively as a unit with each member being able to relate to the other members of the committee.

 

The Statewide Assessment Design Team Process

To aid committee work, experts from within and outside the Wyoming Department of Education provided technical support to the Statewide Assessment Design Team. The technical support group included university researchers, assessment developers, and state personnel. The technical support group provided a wealth of information in educational assessment, including an overview of federal requirements related to state assessment systems, research on assessment issues, information on "best practices," and information on specific topics in response to the Statewide Assessment Design Team's requests for information. Members of the technical support group also helped to facilitate the Statewide Assessment Design Team meetings.

 

Wyoming District Assessments. As part of the fact finding, the Statewide Assessment Design Team examined local Wyoming school districts' standards and assessments. This "grass-roots" approach respected the work completed by local districts and avoided "re-inventing the wheel." Wyoming school districts were surveyed to find out which norm-referenced testing program they were using to fulfill a State Board of Education requirement as well as finding out how much money they were spending for their norm-referenced programs.

 

There was strong interest among several districts for the committee to examine one particular testing program--Northwest Evaluation Association's Levels Testing program. This program is an excellent component of a district assessment system, but it does not appear to meet enough of the goals of the legislation and the Statewide Assessment Design Team to be "scaled-up" for use at the state level. The Levels testing program uses only multiple-choice questions, therefore, it is doubtful that it could be used to adequately measure student progress toward all of the content standards. Further, NWEA's testing program is based on one particular Item Response Theory (IRT) model called a Rasch model. This IRT model has several advantages in terms of simplicity, but it comes at a price. Items are assumed to differentiate among higher and lower achieving examinees equally, and no guessing is thought to occur. Other models can incorporate items that might not fit into such a narrow spectrum of acceptability. From NWEA’s own document, "Once the scale is defined for the initial set of test questions, it is imperative that new test items be added to the scale in a way that doesn’t disrupt the original scale." This type of approach indicates an undue adherence to a specific model instead of best trying to model the data. Student performance is often "messy" and we might be able to design a more valid test if we are open to the use of other models. Finally, the Statewide Assessment Design Team wanted a norm-referenced component included in the Wyoming Assessment System. NWEA does not yet have the capacity to provide national norms. Again, NWEA's Levels testing program can be very useful to districts, but we doubt it could be used effectively as part of the state assessment system.

 

Several Wyoming educators were interested in having the design team explore using the New Standards Project's assessments. The New Standards Project (NSP), a joint creation of the Learning Research and Development Center at the University of Pittsburgh (LRDC) and the National Center on Education and the Economy (NCEE), is a national effort to bring about systemic educational reform by writing content standards, setting performance standards for students, measuring progress with authentic assessments, and providing professional development to help prepare teachers for these changes. The New Standards Project, organized as a consortium of approximately 20 states and several large-city school districts, was initiated in early 1991 with the intention of having the standards and assessments operational by 1995. In the mid 1990s, when it became clear that it would be difficult to have the project operational in enough states to keep the project financially viable, the New Standards Project's assessments were purchased by Harcourt Brace.

 

These are some of the best performance assessments on the market today because of the New Standards Project's extensive research and development program. To the extent that these assessments match Wyoming Content Standards, they could fulfill some requirements of the Wyoming Comprehensive Assessment System. However, it is unlikely that the match between Wyoming's content standards and the New Standards Projects will be as good as with a custom designed testing program. Additionally, the New Standards Project’s assessments are fairly expensive. The design team estimates, based on information in Harcourt Brace’s catalogue, that the tests and appropriate reporting would cost approximately $22 per student per subject area per year. While these costs are within range of the proposed Wyoming Comprehensive Assessment System, it does not cover the costs of such components as a national norm-referenced test, the development of alternative assessments, standard setting, designing additional items/tasks to make sure all Wyoming content standards are assessed, and custom score reporting. The design team supports the use of New Standards Project assessments as part of district assessment systems because of the high quality and likely similarity to open-ended problems that will eventually be found on the Wyoming state assessments.

 

Almost all districts use some type of formal district writing assessment. Some use a holistic scoring approach--a single summary score for each paper--and others use an analytical scoring method. The analytical approach contains a summary score in addition to subscores for each of several writing features such as mechanics, voice, organization, etc. The writing assessment is one component of district assessment programs that will be included in the state assessment design. The selection of writing prompts and scoring designs for the statewide system will necessarily be more standardized compared with district tests, but many features of the program will be very familiar to students and teachers. The Statewide Assessment Design Team supports the use of an analytic scoring rubric because we think it provides more information for school improvement purposes.

 

Assessment Programs In Other States. After investigating local testing programs, assessment systems from several other states were examined to get a sense of the range of programs currently in use. While forty-seven states have some type of state assessment program in place, only about a dozen are using a true standards-based system. The design team discussed assessment programs in Kentucky, Vermont, and Colorado in great detail.

 

Both Kentucky's and Vermont's assessment systems combine local- and state-level components that systematically assess students across a range of content areas and grade levels. The mandated local components were designed to support teaching and student learning, while the state-level assessments served as accountability measures for schools. Other standards-based assessment systems in Connecticut, Maine, Maryland, Missouri, and Rhode Island were also discussed. These systems include constructed- and/or extended-response tasks in addition to multiple-choice questions (except Maryland which uses no multiple-choice items). Each of these state programs have gone through tremendous revisions during the course of their development. We have benefited from the other states' experiences to frame our proposal. These investigations into other state assessment systems reinforced the Statewide Assessment Design Team’s beliefs that: (1) individual student scores are crucial for parental information and buy-in; (2) a mix of assessment formats best balance the technical, political, and practical considerations associated with large-scale testing; and (3) a comprehensive assessment system requires the incorporation of both local and state components.

 

An Overview of Standards-Based Assessment

 

Types of Standards

A standards-based educational system contains several key components. First, content standards are established for specific subject areas. Wyoming school districts began this process in 1990 and the development of state content standards is nearing completion. These standards define the important content for students to know and the skills they should be able to perform. Content standards typically are focused on grade ranges, i.e., K-4, 5-8, 9-12. However, benchmarks representing intermediate points --sometimes at the end of each grade level--are established to help with instructional planning. Once content standards are written, performance standards describe "how good is good enough?" State performance standards are currently under development. Assessments are designed to measure students' progress toward meeting the content standards at specific benchmarks. Assessment results are reported in terms of the performance standards. The assessments provide information about the achievement level of the students and help determine which performance level they have reached. Most importantly, they inform instruction.

 

Implementing content and performance standards linked to assessments is not enough to bring about real improvements in teaching and learning. Opportunity-to-learn (OTL) is critical. Opportunity-to-learn means each student is provided with instruction in the content standards. It is more than simply "verifying" that material was covered in class, or that a course was taken. It means that students were presented with reasonable instruction such that the instruction fosters learning and constitutes a valid opportunity to learn. Opportunity-to-learn can be used in determining the effectiveness of schools and districts in ensuring that their students meet the performance standards. When opportunity-to-learn has been established relative to the state content standards, an assessment system can have real impact.

 

Included in opportunity-to-learn is a staff development component that enables teachers and thus school systems to provide the quality of instruction necessary for students to meet these goals. Opportunity-to-learn reflects an educational system that has teachers in every classroom who are capable of teaching to high content and performance standards. Additionally, opportunity-to-learn includes administrators who are qualified instructional leaders and who are able to evaluate the quality of standards-based education in their schools.

Assessment Systems

An assessment system is designed to provide a comprehensive picture of student achievement and school progress. A system is not a single test at a single grade level. A comprehensive assessment system includes multiple components targeting local, state, and national levels of the educational system. The most important characteristic distinguishing an assessment system from a sometimes random collection of tests is that an assessment system is designed to provide a cohesive array of information in order to make decisions regarding the specific purposes for which the system was designed. The various components of the system across the different educational levels provide complimentary information so that decisions can be based on valid inferences.

 

Various tests in an assessment system can be designed to fulfill multiple purposes ranging from instructional feedback for teachers and students to high stakes accountability information for boards of education, the business community, and parents. The design of the system and the specific tests within the assessment system need to be aligned with whatever purpose(s) are agreed to be most important. The validity of an assessment system can only be evaluated in terms of its identified purpose(s).

 

For assessments to be useful for school improvement and instructional feedback, students need to be assessed regularly, both formally and informally. Teachers need routine assessments and quick turnaround of data in order make sound instructional decisions for their students. When the "close-up" data derived from classroom assessments is used in conjunction with the large-scale snapshots of state-level exams, an accurate and comprehensive picture of student and school achievement emerges. For this reason, any assessment system designed to support student learning should include multiple components to produce valid and reliable system. It is crucial that the assessments across the various levels, i.e., state and local, work cohesively together to send a similar message about expectations for teaching and learning. If both local and state assessments are based on the same content standards, the "message" is more focused; thus, both levels of testing can influence local classroom practice.

 

Standards-Based and Norm-Referenced Assessments. The major difference between standards-based and norm-referenced assessment systems, is how examinee performance is evaluated. In a norm-referenced assessment system, students are simply compared to one another and to a representative "norming" sample. Quality is judged in relation to other students and is often reported as percentile rank, which is the percentage of students (or schools) that an individual (or school) outscored. Quality in a standards-based system, on the other hand, is judged according to a predetermined criterion of excellence in meeting the content standards. The emphasis in reporting (at the school level) is often on the percentage of students actually meeting a particular performance level (e.g., basic, proficient, advanced). Standards provide meaningful targets for the educational system, and when students can meet the highest standards, there is evidence of real educational achievement. To illustrate the differences between the two approaches, consider a fictional group of climbers making an ascent of the Grand Teton. If we were measuring the mountaineers’ progress using a norm-referenced approach, we would focus solely on the climbers’ relative position on the mountain to one another or to the records of some other group that had climbed the Grand previously. A standards-based scheme would establish performance levels at specific elevations with the highest standard being the summit. A progress report would inform us of the climbers’ particular achievement compared to these specific elevations and to the ultimate goal of the peak (see Table 1).

 

Matrix-sampled and common tasks/items. Matrix-sampling is a method to reduce testing time for each individual student while providing detailed school information. A set of tasks is divided into relatively equal subsets and each student completes only one of these subsets. For example, if 16 tasks were divided into four relatively equal test forms, each student would complete four of sixteen open-ended problems. These four-problem test forms are systematically distributed across all children in a classroom, therefore, local educators would get performance data about all sixteen problems. All students complete all of the common tasks and these are useful for computing individual student scores. A particular task can be used as either a common or a matrix-sampled item; the only difference is whether all or a portion of the students are administered the item.

 

 

Table 1. Critical Differences Between Norm-Referenced Tests (NRTs) and Standards-Based Assessments (SBAs).

 

 

 

Norm-Referenced Tests

 

Standards-Based Assessments

 

Purposes best served

 

National or other comparisons, monitoring in relationship to other students, schools, or states; accountability.

Instructional feedback to teachers, students, and parents; school improvement; monitoring achievement; accountability; providing district-level comparisons.

 

Potential negative consequences

 

 

If high stakes, can narrow the curriculum to a focus on basic skills. Potential alienation of teachers and inflated impressions of academic achievement (e.g., Lake Wobegon effect).

 

Higher cost and greater efforts required of educators can lead to programs being abandoned. Can encounter resistance from some educators unwilling to change.

 

Ability to measure the full range of the content standards

 

 

Can measure a variety of skills, especially the more basic skills.

 

 

Can measure the a range of knowledge and skills, including higher level thinking skills.

Types of Questions

 

 

 

 

Primarily short answer and multiple-choice questions.

 

Uses a variety of types of questions, including multiple-choice, essays, and extended problem-solving exercises.

 

Time and Cost Factors

 

 

Tests can be administered and scored quickly and inexpensively.

Take longer and are more expensive to develop, administer, and score.

 

How Results are Reported

 

 

 

Students are compared with national samples; for example, "this student is at the 45th national percentile in reading."

Students are compared with the state’s performance standards; for example, "this student performs at the proficient level in reading."

 

Usefulness of Results

Limited usefulness in improving instruction.

 

Results can aid school improvement by illuminating strengths and weaknesses in instruction and curriculum.

 

Item formats. Multiple-choice items permit broad content coverage because numerous questions can be effectively administered in a relatively short period of time. Well-written multiple-choice items can tap students' ability to apply their understanding of content to novel problems, but multiple-choice items typically focus on cognitive processes such as recall or knowledge of basic principles. Multiple-choice items can be routinely scored by computer.

 

Extended-response tasks are well suited to measuring higher level understanding. Extended-response tasks cover content in depth. These tasks are time consuming, requiring approximately 20 – 40 minutes per task, and students are typically asked to produce a written response requiring an explanation of how and/or why they completed or responded to the task as they did. Such responses can probe the depth of student understanding. The limitation of extended-response tasks is that relatively few tasks can be asked in a given testing session so that it difficult to adequately assess all of the standards if only extended-response tasks are used. Further, scoring extended-response tasks involves careful training of raters and can be costly.

 

Constructed-response tasks are perhaps the most versatile of the item types considered appropriate for large-scale assessments. Constructed-response tasks tap thinking processes at a higher level than most multiple-choice items. Since they require less time to complete than extended-response tasks, approximately 5-15 minutes per task, more constructed-response tasks can be asked in the same amount of time. They are also relatively easy and quick to score using trained raters.

 

In addition to understanding the differences in cost, time, and technical qualities, it is also important to consider how different item types influence learning and instruction. As an assessment tool, multiple-choice items can serve a distinct purpose by being effective, inexpensive, and technically sound. But multiple-choice items are not necessarily desirable as models for instruction and informal assessments. Preferably, instruction should model the type of thinking required on extended- and constructed-response tasks as students are asked to explain or justify their responses (see Figure 1 for example of the three item/task types).

 

Figure 1. Sample Standards-Based Assessment Tasks.

 

Three sample items/tasks are presented below. The three different item types are all design to assess student understanding of the draft content standard for eighth grade mathematics presented below.

Wyoming Mathematics Content Standard (draft, August, 1997).

Statistics and Probability: Use statistics and probability to analyze given situations and the results of experiments. Communicate the reasoning used in arriving at the conclusion.

 

Multiple-choice item

Three coins are tossed at the same time. What is the probability that exactly two of the coins will land with the same face (e.g., 2 heads or 2 tails) showing?

A. 1/3

B. 2/3

C. 1/2

D. 3/4 (correct answer)

 

This item is a good application-level item, but doesn’t ask the student to show their work or explain their results. We can make this same item into a constructed response task and test a few additional skills, particularly communication.

 

Short constructed response task

Three coins are tossed at the same time. What is the probability that exactly two of the coins will land with the same face (e.g., 2 heads or 2 tails) showing? Show your work using picture and/or mathematical equations and explain why you believe your answer is correct.

 

This task tests essentially the same content as the multiple-choice item but requires the student to show and explain their work. In this case, partial credit can be awarded for students who demonstrate some evidence of understanding but who might have missed the correct answer.

 

Extended-response task.

With the four coins provided, flip the coins 20 times (one flip counts as having all four coins tossed) and record your results in a data table. Next, graph the results of your twenty coin tosses. Based on your data, what is the most likely result of your coin tosses? What is the probability of this result? Show your work and explain your results.

 

What do you think should be the most likely result? Do your results agree with this theoretical prediction? If there is a difference, explain why you think this difference occurred?

 

Again, this task tests similar content, but requires students to construct data presentations (tables and graphs), calculate a probability, and then communicate the results of their work. Further, it asks students to discuss the differences between their empirical results and theoretical predictions.

The Proposed Wyoming Comprehensive Assessment System

Overview

The proposed assessment model is one component of a comprehensive standards-based educational system (see Figure 2). State accreditation and school improvement drives the entire system. These factors led to the development of local and state content and performance standards. The proposed statewide assessment system is designed to be standards-based, i.e., measure progress toward student achievement of these standards. At the state level, students will be assessed in grades four, eight, and eleven in mathematics, reading/language arts, and writing. The local districts will be responsible for assessing all other subjects in the Common Core of Knowledge and Skills. These assessment results will be disseminated through a variety of methods, much of which will come under the purview of the Uniform Reporting Act. These results will lead to a variety of community responses, but because this is a school improvement model, educators will be expected to devise professional development plans in order to improve student achievement. Not only will educators be expected to devise plans for professional development, but community members will be expected to support staff development efforts. These staff development efforts cycle back to school accreditation. While the model is depicted as a cycle, it is actually much more iterative. The fairly linear presentation in Figure 2 was created for simplicity, but it is important to recognize that each component is related to and informs every other facet of the process.

 

Figure 2. Overview of the Wyoming Comprehensive Assessment System.

 

Assumptions

There are many assumptions or "givens" either as a result of legislative mandates or consensus decisions among members of the Statewide Assessment Design Team. These parameters have been included as essentially non-negotiable features of the assessment system and essentially provide the framework for the entire system.

 

The recent passage of the reauthorization of the Individuals with Disabilities Education Act (IDEA), Improving America's School Act (IASA) Title I and Goals 2000 all contain specific assessment requirements. IASA Title I requires states to develop content and performance standards in at least reading/language arts and mathematics. These assessments must measure complex skills and involve multiple approaches. The assessments must be administered some time during grades 3 through 5, grades 6 through 9, and grades 10 through 12. In addition, data must be disaggregated at the state, district and the school level by the following categories: Limited English Proficiency (LEP), disabled, migrant, economically disadvantaged, gender and ethnicity.

 

The State of Wyoming passed Enrolled Act 2 which requires the state to develop an assessment system in reading/writing and mathematics. The assessment system must be standards-based, administered in grades 4, 8, and 11, and assess mathematics, reading, and writing. The Wyoming Legislature also indicated that other content areas may be assessed, but only mathematics, writing, and reading were mandated. The results of the assessment must be reported at the state, district and school levels and done in a way that clearly communicates the results of student performance to all Wyoming constituents. Like IASA Title I, these assessment results must be disaggregated by ethnicity, socioeconomic status and gender.

 

In addition to these federal and state requirements, several other assumptions, presented below, were instrumental in framing the design of the Wyoming Comprehensive Assessment System.

1. The state-level assessment program is one piece of a comprehensive system that also includes classroom, district, and national assessments. The state-level tests are not at all intended to replace local assessments. The design team and the Department of Education do not intend to usurp local educators’ capacity to decide on the most appropriate district, school, and classroom assessments.

2. All students will be included with appropriate accommodation(s). This requirement follows from the legislation, but is also a consensus ethical position of the Statewide Assessment Design Team.

3. Assessments will be designed to measure Wyoming content standards as accurately as possible.

4. The system will meet or exceed important federal requirements--IASA Title I, Goals 2000, IDEA--as discussed above.

5. The Wyoming Comprehensive Assessment System should support the School Accreditation Process by providing accurate and meaningful achievement information to school officials and accreditation auditors.

 

6. Communication associated with the assessment system, including, but not limited to reporting of results, should be done in such a way to support parental and community involvement in education.

7. The assessment system should be designed to support all students in meeting their full educational potential.

8. The Wyoming Comprehensive Assessment System should be designed to positively influence teaching and learning. Further, the assessment system should be evaluated in terms of this criterion, among others.

 

A comprehensive statewide assessment system will allow us to meet all of these assumptions and requirements..

Timing

The federal laws discussed above require that assessments be administered during the spring. The Statewide Assessment Design Team also favored timing the tests in the spring so that results could be returned to schools before the beginning of the next school year. This timing is congruent with the school improvement purpose in that teachers and administrators will be able to use the test results to adjust their instructional plans. The final decision about when the assessments are administered will be negotiated with the testing contractor, but the current intentions are to have the assessments administered sometime during the spring semester so that scores can be returned at least one month prior to the beginning of school in August. The reason for timing the tests in this way is to facilitate school improvement decisions. This schedule will be different in the first year because on the length of time required for standard-setting.

 

The standards-based assessment will require approximately 2.5 hours for each content area, although the writing assessment will probably require less time. The norm-referenced survey batteries should require approximately two hours to assess both mathematics and reading/language arts.

 

Purposes of the Wyoming Comprehensive Assessment System

Four major purposes were identified for the Wyoming Comprehensive Assessment System. School improvement was the clear choice as the highest priority purpose. Assessment to support school improvement should provide administrators and teachers information enabling them to increase student achievement. An assessment system with school improvement as a primary purpose implies that the assessments should be capable of monitoring school and student achievement over time and be capable of providing relevant and timely information to teachers and instructional leaders.

 

The remaining three high priority purposes all received approximately the same amount of support. Accountability to stakeholder groups proved to be important for funding a statewide assessment system. The legislators on the design team indicated that other members of the Wyoming Legislature wanted a method to determine if increased education funding in recent years is positively related to student achievement. This priority relates to school improvement in that it is another way for various constituencies to get information about the performance of Wyoming schools and students.

 

Statewide monitoring of student achievement is similar to accountability, but refers to lower stakes decisions. Monitoring student achievement statewide is important for meeting federal program requirements, but is also important for keeping track of the progress of educational reform in Wyoming.

 

The capacity of the assessment system to provide national comparisons was considered an important reason for implementing a statewide assessment system. Fulfilling this goal implies that a nationally norm-referenced test be a part of the state assessment system. The Statewide Assessment Design Team felt it was important to include national comparisons to keep "ourselves honest" about how well Wyoming students are performing in comparison to students across the country. Because Wyoming will be competing with citizens nationwide for jobs and recognition, a test with national norms was deemed important.

 

Rationale

The proposed system was designed to provide accurate information about student achievement toward meeting the state content standards. Yielding accurate, systematic information at the school level is one of the ways assessment can contribute to school improvement and be accountable to stakeholders. Providing national comparisons requires a test that has been administered to a nationally representative sample of students. It is virtually impossible to meet all of these purposes with a single test. An assessment system aligned with the state content standards will be most useful for providing information for school improvement, accountability, and monitoring, but this type of assessment system cannot provide accurate national comparisons. Therefore, a separate norm-referenced test is required to fulfill this goal.

 

As reported earlier in this document, Wyoming districts are currently using a locally selected norm-referenced test to fulfill a State Board of Education requirement. Therefore, asking districts to administer a state-selected norm-referenced test will not be an imposition. Most districts also are conducting some form of a writing assessment; participating in a statewide on-demand writing assessment will not be an additional burden. The standards-based reading and mathematics assessments, on the other hand, will be new to districts.

The State Assessment System Model

The Statewide Assessment Design Team wanted to develop an assessment strategy faithful to the four primary purposes identified. Therefore, a multi-faceted system is proposed which includes both norm-referenced and standards-based assessments. The design team discussed many models before settling on the model proposed here. The first models considered by the design team met the legislative requirements but only reported results to the school-level. It quickly became clear that the design team wanted results available for individuals, in spite of the additional costs of doing so. This decision leads to an assessment design that has the capability to report accurate individual results.

 

In order to capture enough detailed information at the school level for making school improvement decisions, the committee suggests using a mix of matrix-sampled tasks and common tasks for the extended- and constructed-response portions of the assessments. The multiple-choice sections of the tests will include enough multiple-choice items for each student so that a matrix-sampling approach will not be necessary for this component. In order to fulfill one of the Statewide Assessment Design Team’s priorities of providing national comparisons, a norm-referenced test must be included in the assessment system.

 

In addition to using both standards-based and norm-referenced tests, the design team recommends using a variety of assessment formats to assess reading/language arts and mathematics on the standards-based assessments. A core set of multiple-choice items will be used to provide broad, efficient coverage across the content standards. A smaller number of short, open-ended (constructed-response) problems will be incorporated to provide in-depth coverage of the content standards by assessing higher level thinking skills. Relatively few longer, extended-response tasks will be included to probe students' highest levels of thinking for the most advanced levels of content (refer to Figure 1 for examples of these three item types). Although the quantity of each item type differs, the Statewide Assessment Design Team intends to have approximately equal time devoted to each type of task for the reading/language arts and mathematics assessments.

 

The writing assessment should be composed of prompts requiring extended-responses. This type of assessment should help support writing instruction. The Statewide Assessment Design Team supports an assessment where each student responds to two prompts--one common to all students and one as part of a matrix sample.

 

Many Wyoming districts conduct writing assessments. Some use a holistic scoring approach--a single summary score for each paper--and others use an analytical scoring method. An analytic scoring guide includes subscores for each of several writing features, including, for example, mechanics, voice, organization, etc. The writing assessment is one component of district assessment programs that will be recommended for inclusion in the state assessment design. The selection of writing prompts and scoring designs for the statewide system will necessarily be more standardized than current district programs, but many features of the state program will be familiar to students and teachers. The Statewide Assessment Design Team recommends the use of an analytic scoring rubric statewide because it provides more information for school improvement purposes.

 

Finally, to fulfill one of the Statewide Assessment Design Team’s priorities of providing national comparisons, a single norm-referenced test should be included in the assessment system. The norm-referenced test will contain only multiple-choice items. The norm-referenced test will be included in the Request for Proposal and the testing contractor will be expected to provide evidence of the match between the norm-referenced test and the Wyoming Content Standards and the degree of match will be an important criteria for the selection of the testing contractor.

 

 

Table 2. State-level components of the Wyoming Comprehensive Assessment System.

 

Subject Area

Standards-based

or

Norm-referenced

Item Type

Time

(minutes)

Approx.

Number Items/

Tasks/ per

student

Number of Forms

Total

Tasks or

Items for

Schools (students)

 

Reading & Lang. Arts

Standards-based

Multiple-choice

 

60

50 common

1

50 (50)

 

Standards-based

Constructed-response

 

45

4 matrix

4 common

3-6

16 (4) - 28 (4)

 

Standards-based

Extended- Response

 

45

1 matrix

1 common

3-6

4 (1) - 8 (1)

 

Norm-referenced

Multiple-choice

 

70

55 common

1

55 (55)

Writing

Standards-based

Extended- Response

 

60

1 matrix

1 common

3-6

4 (1) - 7(1)

Mathematics

Standards-based

Multiple-choice

 

60

50 common

1

50 (50)

 

Standards-based

Constructed-response

 

45

4 matrix

4 common

3-6

16 (4) - 28 (4)

 

Standards-based

Extended- Response

 

45

1 matrix

1 common

3-6

4 (1) - 8 (1)

 

Norm-referenced

Multiple-choice

 

40

32 common

1

32 (32)

Special Populations

Typically all has NOT meant all. In the not too distant past many students were excluded from the assessment system because these students might lower school or state scores. There are both moral and legal reasons for including all students in the assessment system.

 

The Individuals with Disabilities Education Act requires that states develop guidelines for the participation of disabled students in the regular and alternate assessments. Children with disabilities must be included in state and district-wide assessment with accommodations as appropriate. Alternate assessments must be developed and be available by July 1, 2000 for those students for whom the regular assessment is not appropriate.

 

The state must report the number of students with disabilities participating in the regular assessments and the number participating in the alternate assessments. By July 1, 1998, the performance of students with disabilities must be reported on regular or statewide, assessments. Every other year the state must submit a report to the Secretary of Education on the progress of the state over-all and children with disabilities specifically in meeting state standards. This progress report will be the focus of the state’s program improvement grant. The individualized education plan (IEP) has been expanded to include determination of use of modifications in the state or local district assessment. In addition, if the IEP team determines that the assessment is not appropriate for the child, the team must document and explain its rationale. The team must then determine how the child will be assessed.

 

There are many approaches for maximizing involvement of all students in the testing program. First, there should be clear expectations that as many students as possible be included in the state assessment system. With this political commitment, we expect that up to 90-95% of Wyoming students could participate in regular testing situations. For those students unable to sit for regular testing due to a specific disability, accommodations will be provided to help them complete the regular assessment. For example, students with certain types of attention disorders might be allowed to take the test in a smaller (and hopefully less distracting) room or they might be given extra time to complete the assessment. In general, assessment accommodations reflect the type of instructional accommodations found on the student’s Individualized Educational Plan (IEP). Through the use of accommodations, an additional 3-5% of Wyoming students will now be able to participate in the state assessments. For students with more severe disabilities, an alternate assessment would be used to provide them with a meaningful test of their achievement level. These alternate assessments should help to include an additional 0.5-2% of Wyoming students. The remaining students would include those with the most severe disabilities.

 

Only one percent or so of Wyoming’s students are classified as Limited English Proficient (LEP), but in spite of this small percentage, the design team recommends including as many of these students as possible in the assessment system. Most test publishers have Spanish versions (the predominant non-English speaking group) of their tests and these would be used for the norm-reference component. The testing contractor would also be expected to develop a Spanish version of the standards-based assessments according to "best practices."

 

Financial Considerations

To compute an approximate cost associated with implementing the type of assessment system described above, the design team solicited preliminary estimates from four national testing contractors. The contractors were asked to include the costs associated with test development, scoring, reporting (to the state, school, and individual levels including disaggregated results), setting performance standards, producing multiple forms of the test, developing an alternative assessment for students with severe disabilities as part of federal IDEA requirements, printing, and mailing. Additionally, some of the extended-response items will be released each year and new ones developed. Our hope is that teachers will incorporate these types of assessments in their classroom practice. Each summer teachers could re-score a percentage of the extended-response items to learn more about scoring this type of assessment. Other ongoing costs would include equating year-to-year assessment results, helping to teach educators and the public about assessment results, and public reporting. There will also be committees necessary for reviewing items for content and bias. Special population issues will consume more time and training during the first few years. Training in the areas of participation and accommodations will be critical to ensure comparability of the assessment across the state and to make sure that this truly is an assessment system for ALL students.

 

The committee strongly recommends that a contract be awarded for five years to ensure consistency in the program thereby making the year-to-year results more comparable. Additionally, few, if any, contractors would bid on a proposal without the expectation of entering into at least a four-year agreement. The estimated cost of this system, with development amortized over the period of the contract (i.e., five years), would be approximately $1.775 million per year or $3.55 million per biennium. This translates into a $8.875 million sum for a five year contract. While these costs are higher than the $950,000 per year allocated through Enrolled Act 2, the Statewide Assessment Design Team strongly encourages the Legislature to consider the educational value of the proposed system. As specified in the design, contractors were asked to include the cost of administering, scoring, and reporting the results of a norm-referenced test. The cost for the norm-referenced test is included in the estimate presented above, but if estimated separately, the norm-referenced test would cost approximately $5-$8/year/student tested or $250,000 - $400,000 for the biennium.

 

Approximately $25 per student per year has been appropriated to districts for assessment purposes. There was some concern among members of Wyoming Legislature’s Joint Appropriations Committee that the districts would be receiving "extra" money if the costs of norm-referenced testing for mathematics and language arts was covered by the state assessment system. To estimate the costs that might be considered double-covered, we used the assessment matrix presented on page 17 (Table 3) as an "average" district assessment plan. Excluding the national assessments, there are 46 assessments listed on this matrix. Of these 46, one could argue that six are being paid for twice (reading/language arts & mathematics in three grades). Six of forty-six is 13.04%. This percentage of the $25 in MAP funding is $3.26/student/year. We believe this to be a reasonable estimate of potential "double-spending." Instead of removing this money from district funds, the design team encourages the Wyoming Legislature to earmark this money for the district contribution to assessment staff development.

 

The District Role in the Wyoming Comprehensive Assessment System

The ability of an assessment system to influence school improvement is related to the quality and timeliness of information provided to local educators. We intend for the state-level assessments to contribute to school improvement, but testing only at three grade levels cannot yield enough information in and of itself for teachers to fine-tune their instruction. A local assessment system, assuming it is well-designed, can provide instructionally relevant data to teachers and educational leaders. The Statewide Assessment Design Team recognizes that in order to promote school improvement, it is important to have a local assessment system well-integrated with the state system.

 

In states such as Kentucky and Vermont, the local assessment component is able to use a variety of measures and can test content and grade levels not covered on the state-mandated exams. Several draft Wyoming standards (e.g., those related to listening and speaking and/or conducting scientific experiments) cannot be assessed easily through the use of an on-demand state test. If these standards are never assessed, it is unlikely they will receive much value in the curriculum. Similarly, waiting until a student is in fourth grade to test his/her reading achievement for the first time, for example, misses important chances for early intervention.

 

Local assessment systems should have more flexibility than the state system because it is easier to use multiple assessment methods. At the state level, we have included multiple item formats including multiple-choice questions and constructed and extended-response tasks. It would be difficult within current cost constraints to incorporate other methods such as portfolios or presentations in the state system. However, these types of methods are just two examples that could be potentially incorporated in a local assessment plan. We would like to encourage local authorities to use multiple methods for measuring student performance.

 

The local assessment system should include specifications for the ways in which educators intend to assess content areas other than reading, writing, and mathematics and in additional grades besides 4, 8, & 11. The Statewide Assessment Design Team does not want the Legislature to mandate a specific district assessment plan. Rather, the design team would like to suggest some broad parameters for district assessment models. The school accreditation process includes provisions for evaluating district standards and assessments and the design team believes the responsibility for examining district assessment plans should remain as part of the accreditation process. Nevertheless, the Statewide Assessment Design Team offers the following template in hopes that it will aid district personnel as they develop assessment plans as well as helping school accreditation committees develop a framework for evaluating district assessments (see Table 3).

 

The National Role in the Wyoming Comprehensive Assessment System

The most important national component of the Wyoming Comprehensive Assessment System is the National Assessment of Educational Progress (NAEP), particularly the "state NAEP." NAEP is the only ongoing nationally representative assessment of student achievement in various subject areas. NAEP is authorized by Congress, but policy is overseen by an independent body, the National Assessment Governing Board (NAGB). The National Center for Education Statistics (NCES) directs the assessment as well as the related research and evaluation3.

 

NAEP has been monitoring educational achievement of students aged nine, thirteen, and seventeen (generally, grades 4, 8, & 11) in the United States since 1969. NAEP’s mission was expanded in 1990 to provide state-by-state (i.e., the "state NAEP") results of achievement testing in mathematics and reading. Science was added in the 1996 assessment; writing will be added in 1998. Each content area will be tested once every four years. Reading and writing will be tested in the same year. Science and mathematics will be tested concurrently, but offset from reading and writing by two years. Participation in the state NAEP has been voluntary, but 48 states were included in the 1996 administration4. NAEP is distinct from the National Voluntary Test program currently being debated in Washington.

 

NAEP is well aligned with many of the national education standards such as the National Council of Teachers of Mathematics standards for K-12 mathematics education5 or the National Research Council's Science Education Standards6. Similarly, Wyoming affiliates of these national groups have contributed to the development of the Wyoming content standards. Wyoming content standards will undoubtedly have much overlap with these and other national standards; therefore, NAEP can serve as a good indicator of Wyoming student achievement.

 

The national NAEP is designed to track student achievement on a national level and while this is a very important monitor of national education goals, the state-NAEP provides much more relevant information to states. In order to collect enough data to derive accurate and precise state-level data, several thousand students must participate in the assessment. In large states, this is a relatively small percentage of the student population, but in small population states like Wyoming, almost half of the students in a given grade participate in the state-NAEP. This, in addition to our extremely high participation rate among those asked to take the test, means that Wyoming citizens get very good information about student achievement in grades 4, 8 and 11. In spite of the relatively high percentage of Wyoming students tested, NAEP does not report district- or school- level results. For this reason, the design team does not want NAEP used as the norm-referenced test.

 

There has been some controversy in recent years regarding the achievement (performance) levels used for NAEP7. In spite of the problems with these performance levels, we know they are some of the most rigorous standards being used anywhere in the United States. Therefore, participating in the state NAEP can serve to "keep us honest" about the achievement of Wyoming students. For example, there has been some contention regarding the academic performance of students in one particular southeastern state. Approximately 60% of the students were classified as "proficient" on the basis of that state’s assessment, but less than 25% were deemed proficient as a result of their performance on NAEP. This type of discrepancy cannot be explained solely by the extremely high standards used by NAEP.

 

In summary, there are several benefits from continued participation in the state NAEP. First, the only costs associated with NAEP for Wyoming are "in-kind" contributions for staff time related to logistics and administration. Most importantly, it can provide the most accurate nationally comparative data available about the relative academic performance of Wyoming students. We endorse its continued use and recommend that it be included in the Wyoming Comprehensive Assessment System. Finally, it can help provide an external check of our performance levels.

 

 

Table 3. Wyoming Comprehensive System District Model

 

 

Grade

Reading

Writing

Math

Science

Social Studies

Fine & Perf. Arts

Phys. Ed.

Health & Safety

Human-ities

Career Options

Foreign Cultures

Applied Tech.

K

1

D

2

D

D

3

D

D

D

4

N* S

N* S

N* S

N* D

5

D

D

D

D

D

D

6

D

D

D

D

D

7

D

D

8

N* S

N* S

N* S

N* D

D

D

9

D

D

D

D

10

D

D

D

D

D

11

N* S

N* S

N* S

N* D

D

D

12

D

D

D

D

 

D = District and School Assessments

S = State Assessment

N = National Assessment of Educational Progress

Note: It is assumed that classroom assessment is ongoing in every grade in which the subject is taught.

*The State-level National Assessment of Educational Progress (NAEP) is administered every two years, but each subject is only administered once every four years

Reporting System

Enrolled Act 2 specified that the state Department of Education "...shall develop a uniform reporting system on the statewide student assessment." Further, the Legislature stated that the reporting system should clearly indicate the level of student achievement and other performance measures. The public and ultimately political support for this assessment system will depend in large part on the quality of the reporting mechanism. The challenge in developing a reporting system for the assessment data will be to include enough necessary information for sophisticated users to find the data they need while still trying to make sure that members of the general public have accurate, yet easy to understand information. The design team intends to have both the norm-referenced and standards-based test results published in a single report to emphasize that both are part of a comprehensive system. The design team is especially concerned that the norm-referenced test might be given too much weight if the results were published separately. We discuss below how the data from each type of assessment would be presented in a single report.

 

Standards-based results. The results of the standards-based assessments will be presented in several ways. The amount of detail will depend on if it is for schools or students. Undoubtedly the most important information reported for both schools and students will be the proficiency levels which will be matched to Wyoming Performance Standards. Accurate descriptions of proficiency levels will be developed after the assessments are first administered. The percentage of students within each performance category--e.g., Advanced, Proficient, and Partially Proficient--at each school will be reported each year. Percentage of students within each proficiency level will be reported for the full subject area tested only and will not be reported for individual standards in this manner.

 

Scale scores also will be reported for each content area8. Scale scores tend to convey more accurate information than simple raw scores (i.e., number correct) because test questions can be weighted according the amount of information they provide. Scale scores also allow more accurate comparisons of student or school performance across tests in different subject areas. For example, seventy percent correct on a mathematics test might be a "better" score than 70% correct on a reading test if the math test was considerably more difficult for the group of students tested. A scale score will adjust for differences in difficulty so that a scale score of 500, for example, would mean almost the same thing on both tests. At the school-level, average scale scores will be reported for each content area and by each standard. Individuals will receive summary scale scores for each subject area. The scale score information, reported by standard, combined with the proficiency level information will allow the school to determine areas of strengths and weaknesses. Individuals will receive reports of their performance category for each content area.

 

Norm-referenced test results. Reporting these results will be similar to the standards-based reports, although norm-referenced test results will not include proficiency level reports. Scale scores will be reported for each subject area and for finer distinctions within each content area (e.g., computation within mathematics). The average national percentile ranking for these two levels of content also will be reported. For example, a school might receive a report that, on average, its students are at the 60th national percentile for mathematics, overall, and at the 55th percentile for computation. Individual students would receive information similar to the school reports.

 

Disaggregation of all scores. Both standards-based and norm-referenced assessment results need to be disaggregated by the following categories: (a) race/ethnicity, (b) gender, (c) free/reduced lunch, (d) Limited English Proficiency status, (e) migrant status, and (f) type of disability and/or accommodation. In order to protect the privacy rights of individual students, disaggregated results will not be reported if fewer than ten (10) students fall into a particular category. Disaggregated results are required for state-level reporting to fulfill both state and federal requirements. The design team believes that districts and schools will find this information useful in the interpretation of their results.

 

Electronic reporting systems. A single written report will be designed for broad distribution, but it is unlikely that this report could possibly contain enough information to meet the needs of all districts. Educators in each district may want access to very different types of information as they work to improve teaching and learning. The design team suggests that an electronic reporting system be developed with a "user-friendly" interface to allow district personnel to conduct additional analyses for their sites. This could be in the form of a common spreadsheet/statistical program such as Microsoft Excel or perhaps an appropriate World Wide Web interface could be designed to meet these needs.

 

Releasing scores to the newsmedia. Enrolled Act 2 specifies that the assessment results be presented in a form suitable for publication in a statewide newspaper. There is little doubt that newsmedia will be extremely interested in reporting the assessment results. How they will be reported is another question. An easily understood reporting system needs to be designed so the press can accurately report key information without contributing to the development of misconceptions about the assessment results. The Statewide Assessment Design Team encourages the Department of Education to develop a set of press releases with the publication of each year’s assessment results. These press releases should contain enough information so that the public can better understand the scores, but be brief enough so that the newsmedia are likely to use them in their entirety. The design team supports having the media publish assessment results, but we want these results published as accurately as possible.

The Stakes Associated with the Assessment System

Whether explicitly used to drive instruction or simply to monitor learning, tests often have an impact on the attitudes and performance of school personnel and on the structure of the curriculum. Consideration of potential consequences is crucial in situations where testing programs are designed to influence the instructional decisions made by teachers and administrators. The type of testing program where policy makers intend for tests to influence or drive instruction is referred to as measurement-driven instruction9.

 

Many expect that high stakes, in and of themselves, can drive instruction, but research indicates that it is the interaction of stakes, standards, and content that lead to instructional changes. The greatest impact on instruction will occur when stakes and standards are both high10. Supporters of standards-based education argue that whether we like it or not, what is taught and what is tested are intimately related. Because every test used for accountability can affect the curriculum, the following principles can serve as guidelines for accountability assessments: "(1) you get what you assess... (2) you do not get what you do not assess... [and] (3) build assessments toward which you want educators to teach.11" Proponents of standards-based education report that since most accountability testing bears little resemblance to school curricula, the alignment between standards and assessment in standards-based systems is a considerable improvement12.

 

The Statewide Assessment Design Team was concerned about the potential negative consequences of extremely high stakes. This was one particular component of Kentucky’s system that we did not want to mimic. The design team recognizes that the consequences associated with public reporting will vary by district and may result in a variety of community responses. The comprehensive nature of the assessment at the school level will allow profiles of strengths and weaknesses to be constructed and permits monitoring of achievement trends over time. This allows local educators to make efficient choices about expending district resources, particularly professional development resources. The assessment results and subsequent school improvement decisions are crucial aspects of the school accreditation review and will be evaluated through this process. Further, the design team is committed to recognizing excellence and supports honoring exemplary schools and districts for high and/or improved performance. These schools could be studied to learn about and share "best practices" in Wyoming education.

 

Statewide tests are occasionally used for high stakes individual accountability decisions. For example, several states require students to score at a certain level on a state test in order to graduate from high school. The use of single tests for these purposes has been widely criticized, in large part because the measurement error for individual student scores is often large enough to introduce considerable doubt into the interpretation of results. If test results are used to make high stakes decisions about individual students, they should be extremely precise. This generally requires a large sample of student behavior which could involve many different assessments over a given time frame or if only one test is used, it should be very long and detailed. The Wyoming Comprehensive Assessment System was not designed with these purposes in mind. This is primarily a school improvement model. Therefore, the focus of the assessment results should be at the school and not student level. Individual results primarily are for parent information and teachers making instructional decisions and NOT for individual student accountability.

Implications

Public education is a system. Like most other systems, if one part of the system is manipulated, other components of the system will be affected. In fact, the design team intends for these assessment results to lead to changes in teaching and learning. While this might occur, we cannot guarantee that these will be positive changes. We know, however, that simply instituting tests in the hopes that schools will improve without dealing with other parts of the educational system is just wishful thinking. In other words, there are several important implications of instituting a state assessment system with the primary goal of school improvement.

 

Professional development. A growing body of research indicates that teaching complex knowledge and skills is different in many ways from teaching lower-level rote behaviors. Higher level skills take longer to teach, they develop gradually over time, and not by quick association. Many teachers have not had the opportunity to learn how to teach to standards, especially those emphasizing complex knowledge and skills. Research suggests that for teachers to shift to new ways of teaching and assessing in reform-based classrooms, they will require, in many cases, major changes in their knowledge and practices.

 

Asking teachers to make these major changes in how they teach leads to questions about the adequacy of typical in-service13 teacher education. Traditional, one-shot "dog and pony shows" were never really very effective and now that we are asking teachers to shift their practice in radically different ways, the inadequacy of these old models is that much more apparent. Research is clear that professional development must be sustained programs that are situated in the experiences of teachers' classrooms. Teachers, as respected professionals, must have a voice in designing these in-service activities. Similar to our expectations of students, teachers should be allowed to develop their own understandings and not simply parrot back new teaching methods. As might be expected, these types of in-service educational programs are not cheap. For example, Kentucky increased its state funding of professional development to $23 per student per year, up from $1 per student per year prior to Kentucky's educational reform. While we are not suggesting that the Wyoming Legislature appropriate a similar increase, the design team strongly supports an increase in the state funding for professional development. We encourage the Legislature to include enough funding in the education appropriation to support at least five additional days of professional development during the summer or other times when substitute teachers would not be needed. Using these days for developing curriculum units aligned to standards, and learning how to develop an appropriate classroom assessments are just two to the many activities that might be targeted during these staff development days. A block of five days would provide a good start, but the research indicates that a great deal of follow-up activities during the school year would be critical to a successful program. One particular effective model relies on up-front time where teachers develop new curriculum and assessments, try these out in their classrooms, and then return for another professional development session to debrief these experiences14. In order to fund staff development beyond these five days, the design team encourages school districts to use a portion of their MAP funding to support staff development related to standards and assessment.

 

Many educational reform efforts fail in the first few years for a variety of reasons. They can come under political attack from groups who felt excluded from the reform process. They can also fail if teachers and other educators feel overwhelmed by the amount of work expected of them in a relatively short time. Therefore, the design team offers this word of caution. These staff development programs need to be sustained for several years so that educators do not have to try to do everything at once. Starting slowly and making continued progress will lead to more sustained efforts than trying to completely revamp the entire system in the first few years. Educational reform is a long-term endeavor.

 

The professional development ideas discussed above are focused on aligning curriculum and instruction with the standards-based assessment system. There is another component of staff development, directly related to the statewide assessment system, that needs to be incorporated into a systematic professional development model. The Department of Education should conduct outreach programs to ensure that district personnel have the knowledge to meaningfully interpret and use the results of the Wyoming Comprehensive Assessment System. This can occur through direct instruction, but would be more appropriate if Department personnel could work closely with local educators to interpret district and school assessment results.

 

Additionally, having teachers become trained to score the open-ended questions on the state assessments and to become part of item-review committees are some of the most effective means for developing teachers' knowledge of performance assessment in general and of the statewide assessment system in particular. These last two components of the professional development will be built into the RFP with the intent of having the costs shared by the testing contractor and the state.

 

Remediation efforts for low-performing schools. It would be nice to think that by implementing this standards-based educational system, we could expect all schools to improve and eventually reach high levels of academic achievement. Reality checks from other states, however, indicate that we cannot expect such a rosy picture. Unfortunately, a proportion of schools will not improve as much as we might hope. In order to ensure that students in these schools receive adequate opportunity to learn the content standards, the design team suggests developing strategies to intervene by such means as providing additional professional development or by assigning a "distinguished educator" to help local educators develop appropriate instructional strategies. The Statewide Assessment Design Team believes, however, that developing such approaches is the responsibility of the State Board of Education and the Department of Education, and the design team recommends that they begin to address these issues in the near future.

 

The Request For Proposal (RFP) Process

In order to implement the proposed assessment, the Legislature needs to authorize the Department of Education to issue a Request for Proposal (RFP) to hire a testing contractor. The design team intends for this report to serve as the framework for the RFP, although we recognize that, by necessity, the RFP will contain considerably more technical detail than this document.

 

Writing the RFP, evaluating bidders, and selecting the contractor will be the responsibility of the Wyoming Department of Education. However, the design team believes that its members with the potential addition of a few more local educators can provide valuable public input into this process. We suggest that the statutory authority of the Statewide Assessment Design Team be continued so that members could help fulfill this role.

 

In addition to the basic specifications of the assessment system described throughout this report, the assessment design team wants the following points included in the RFP:

1. In order to assess ALL students, the RFP should require the contractor to include the development of alternate assessments and versions of the tests that best meet the needs of Limited English Proficient Students.

2. The RFP should require the assessment contractor to include a norm-referenced test as part of the assessment system and the bidders should have to provide evidence of the match between their norm-referenced test and Wyoming Content Standards.

3. The successful bidder should be required to provide evidence of a plan for maintaining the security of the assessment system.

4. The costs of ongoing system maintenance such as the development of new test questions, reporting, and equating scores for year-to-year comparisons should be included in the assessment contractor.

 

Finally, the importance of consistency in the assessment systems, especially during the first few years, leads the Statewide Assessment Design Team to recommend that the Department of Education enter into a five-year agreement with an assessment contractor, contingent on continued state funding. The design team suggests that this authorization be granted as soon as possible so that a testing contractor can be hired in time to conduct initial testing during the spring of 1999.

 

Evaluation of the Wyoming Comprehensive System

This committee relied on the experiences, both positive and negative, of other state assessment programs in the development of the Wyoming system. In spite of these efforts, we will not know how well this system works until it as been operational for several years. The design team recommends that a systematic evaluation of the Wyoming Comprehensive Assessment System be commissioned in the legislation authorizing the RFP. This committee suggests that the evaluation be overseen by the Department of Education but conducted by independent professionals. Further, we recommend that the system be open to researchers, subject to Department of Education approval, for conducting studies of the assessment system. We briefly describe some of the issues to consider in the evaluation of this system.

 

An evaluation of the system would, in essence, be a validity investigation. One line of research and evaluation could focus solely on the technical quality of these assessments. At its simplest, this would entail checking the match between the Wyoming content standards and the questions on the assessments. Because validity refers to the interpretations we make from test results, the procedures that are used to set performance standards and establish "cut scores" between performance categories would be part of a validity investigation. Another related line of research could examine the accuracy of these judgments. Every score contains a certain amount of error and when making high stakes decisions, this error should be minimized as much as possible. Generalizability analyses that examine the consistency of scores (as well as trying to figure out the source(s) of the inconsistency) are important in the evaluation of technical accuracy.

 

A validity investigation would also focus on the consequences of the assessment system. There are many consequences that are intended (e.g., improving academic achievement, equalizing opportunity-to-learn), and the system should be evaluated to see whether or not these intentions were fulfilled. There are often unintended consequences that result when a state assessment is added to the educational system. The investigation should be geared toward searching for and evaluating these unintended consequences of the assessment system. One of the most common unintended consequences of an assessment program relates to the misuse of results. While the local district has the authority to use the results in a variety of ways, the design team outlined some suggested uses such as school improvement. For example, basing promotional decisions of individual students and/or teachers on these assessment results would undoubtedly create consequences unintended by the Statewide Assessment Design Team.

 

Conclusions

Wyoming citizens and policy makers need to realize that a political and financial commitment to a standards-based educational system is necessary to positively influence the academic achievement of Wyoming students. Many of the states that surpassed Wyoming on the National Assessment of Educational Progress have implemented successful standards-based educational programs. Their success is related to their financial commitment without which activities such as teacher professional development programs--perhaps the most important piece of the reform--could not be adequately supported. Related to financial commitment, political support from legislators and other key state-level policy makers is crucial to the long-term success of any educational improvement activity. For example, two major standards-based reform efforts in Kentucky and Vermont have come under intense scrutiny from both outside and within the respective states, but their policy makers continued to publicly support these programs. Consequently, these are two of the most successful state-level educational programs in place today. The Statewide Assessment Design Team urges the Legislature, the Governor, the Superintendent of Public Instruction, and other influential public figures to fully support the Wyoming Comprehensive Assessment System.

 


Footnotes
1 Matrix-sampling is a method to reduce testing time for each individual student while providing detailed school-level information. A set of tasks is divided into relatively equal subsets and each student completes only one of these subsets. Common tasks are one that all students complete and are useful for computing individual student scores.


2 While the Statewide Assessment Design Team is not charged with writing the RFP -- that is clearly the role of the Department of Education -- we will recommend (see recommendations) having this design team provide input into the drafting of the RFP.

3 Reese, C. M., Miller, K. E., Mazzeo, J., & Dossey, J. A. (1997). NAEP 1996 Mathematics Report card for the Nation and the States. Washington, DC: National Center for Education Statistics.

4 Reese, et al., (1997).

5 National Council of Teachers of Mathematics (1989). Curriculum and evaluation standards for school mathematics. Reston, VA: Author.

6 National Research Council (1996). National Science Education Standards. Washington, DC.: National Academy of Sciences.

7 Shepard, L. A., Glaser, R., Linn, R. L., & Bohrnstedt, G. (1993). Setting Performance standards for student achievement. A report of the national Academy of Education Panel on the evaluation of the NAEP Trial State Assessment: An evaluation of the 1992 achievement levels. Stanford, CA: National Academy of Education.

8 Scale scores are computed by a simple transformation of the raw scores by using one of several types of scaling methodology. Most testing companies use some type of item response theory scaling methods.

9 Popham, W. J. (1987). The merits of measurement- driven instruction. Phi Delta Kappan, 68, 679-682.

10 Airasian, P. W. (1988). Measurement driven instruction: A closer look. Educational Measurement: Issues and Practice, 7, 4, 6-11.

11 Resnick, L. B. & Resnick, D. P. (1992). Assessing the thinking curriculum: New tools for educational reform. In B. R. Gifford & M. C. O' Conner (Eds.), Changing Assessments: Alternative Views of Aptitude, Achievement and Instruction. Boston: Kluwer Academic Publishers. (p. 59).

12 Linn, R. L. (1993). Educational assessment: Expanded expectations and challenges. Educational Evaluation and Policy Analysis, 15, 1, 1-16.

13 In-service education is also known as staff development or professional development.

14 Borko, H. Mayfield, V. Marion, S. F., Flexer, R., & Cumbo, K. (1997) Teachersþ developing ideas and practices about mathematics performance assessment: Successes, stumbling blocks, and implications for professional development. Teacher and Teacher Education, 13, 259-278.

Appendix A: Glossary of Terms.

 

Alternate Assessment: An assessment approach designed for students with fairly severe learning disabilities so that they can participate in the state mandated assessment program.

 

Alternative Assessment: Also referred to as authentic assessment, these measures often require direct examination of student performance using "real-world" tasks requiring complex thinking processes.

 

Articulation of Standards: Systematically interrelating and emphasizing essential parts of content and performance standards to insure continuous advancement in learning.

 

Assessment: The process of collecting, analyzing, interpreting, and reporting information to aid classroom decision-making. Synonymous with test, but is generally considered a broader term and is often used when discussing performances or portfolios.

 

Basic Skills: The ability to read, write, and compute in a variety of content areas.

 

Benchmark: A concrete statement of skills and knowledge to be demonstrated at a specific performance level. Benchmarks are located on a performance continuum and are used as checkpoints to monitor progress from one level to the next..

 

Common Core of Knowledge: Areas of knowledge (specified as Language Arts, Social Studies, Math, Science, Fine and Performing Arts, P.E., Health and Safety, Humanities, Career Options, Foreign Culture including language, and Applied Technology) each student is expected to acquire at levels established by the district.

 

Common Core of Skills: Skills each student is expected to demonstrate (specified as problem solving, interpersonal communications, keyboarding and computer applications, critical thinking, creativity and life skills) at levels established by the district.

 

Criterion Referenced Test: Assessment that involves measurements that lead to information regarding whether a given student has mastered specific standards required, not how the student compares to other students. The test may consist of multiple choice, written performance and/or authentic tasks.

 

Equity: Equal opportunity for all students, who represent the complete diversity of the student population.

 

Generalizability: The extent to which a student’s knowledge or skill in a well-defined content domain can be inferred from performance on a sample of assessment tasks from that domain. Generalizability is an extension of reliability theory and relates to the consistency of measurement.

 

Norm Referenced Test: Assessments using a large representative sample of the general population to determine individual and group standing of those who took the test. This assessment compares the student's performance to the national group of students tested to establish norms.

 

Performance Standards: A description of student performance against a specific content standard. The standard describes measurable behaviors of student performance in levels such advanced, proficient or partially proficient.

 

Portfolio: A portfolio is a purposeful collection of student work that exhibits the student’s efforts, programs, and achievement in one or more areas. The collection must include student participation in selecting content, the criteria for selection, the criteria for judgment and evidence of self-reflection.

 

Reliability: The consistency or stability of assessment results--across time (test-retest), within a test (internal consistency) or other assessment procedure, or across different forms of an assessment. A different type of reliability, known as inter-rater reliability, is critical for performance-based assessment. It is an estimate of the consistency of the scores assigned by two or more raters. High inter-rater reliability indicates that the raters used the same criteria to evaluate a performance and that they understood and applied the criteria similarly. (O.D.S.)

 

Rubric: A scoring guide for open-ended questions which contains a description of the requirements for varying degrees of success in responding to the question. These criteria may be pre-defined or created as a result of reviewing a number of papers.

 

Staff Development: A process involving evaluation, identification of needs, and planned activities for individuals, school and the entire district designed to improve those elements of professional knowledge and skills that affect student learning. Also known as professional development and in-service education.

 

Validity: The extent to which the interpretation of the results of an assessment measures relate to what the assessment is designed to measure. Validity should always be evaluated in the context of a particular use. Current validity theory suggests that all evidence about the validity of a test interpretation should be related to providing information about the construct of interest. A construct is a set of theoretical relationships that describe a particular type of knowledge or skill such as mathematics achievement.

 


[Top] [Back] [Home]