Standardized Tests Don't Always Make the Grade

By Stuart Silverstein

March 19, 2006 12 AM PT

Share via
- Email
- Facebook
- X
- LinkedIn
- Threads
- Reddit
- WhatsApp

Times Staff Writer

High school seniors and their parents, along with college counselors and admissions officers, have been roiled by recent disclosures that at least 4,600 SAT exams taken in October were graded incorrectly.

But education testing experts point out that similar mistakes have happened before -- and inevitably will happen again.

“Tests are a fallible product,” said Kathleen Rhoades, a research associate with the Center for the Study of Testing, Evaluation and Educational Policy at Boston College. “The way we treat tests in this country, it’s as if the tests deliver these strong, solid, positive, irrefutable results. And that is not the case.”

By Rhoades’ count, there were 137 publicly disclosed cases of large-scale testing errors by educational testing companies from 1976 through early 2004, with most of them occurring since 1997.

“People are under the false illusion that because the tests are graded by a machine that the process is objective,” said Robert Schaeffer, a spokesman for the National Center for Fair and Open Testing, or FairTest. “But everything, including programming the machine and doing quality control on it, is, in fact, done by human beings. And all humans make mistakes.”

Rhoades and Schaeffer say a lack of regulation, cost pressures and tight deadlines aggravate the problems.

The consequences of grading mistakes and other mishaps have intensified over the last decade with the increasing reliance on so-called high-stakes testing. In recent years, President Bush’s No Child Left Behind Act has triggered expanded testing by many states.

At the college level, standardized tests play an important role in deciding who is admitted, who wins scholarships and who gets into honors programs.

In K-12 education, they help make such determinations as school rankings, teacher licensing and pay, and whether students graduate high school.

The SAT and other tests are usually graded accurately, but when problems occur, “it gets people crazy because it’s high stakes and it makes important decisions,” said Eva L. Baker, co-director of the National Center for Research on Evaluation, Standards and Student Testing at UCLA.

The potential for problems was driven home last week with the disclosure that the Educational Testing Service, one of the nation’s biggest testing organizations, had agreed to pay $11.1 million to settle a class-action lawsuit.

The case was brought on behalf of plaintiffs who received incorrect scores on teacher-licensing tests known as the Praxis exams.

In all, 4,100 test takers were wrongly told they failed.

The tests are used in many states but not in California.

California has suffered other testing problems, however. They include a foul-up in which a company now known as Harcourt Assessment Inc. miscalculated the results of 19,000 students and 22 schools on a Stanford 9 achievement test given six years ago.

The SAT grading problem, which came to light this month, involves incorrect scoring on at least 4,600 college entrance exams, less than 1% of the total. About 400 students received lower grades than they deserved.

The College Board, the owner of the exam, later disclosed that there were an additional 1,600 SAT exams from October and that some of those tests may have been incorrectly graded. Those exams had been flagged for scrutiny due to concerns about possible cheating, among other reasons.

Many admissions experts said the effect on this year’s high school seniors would be muted. Although some test scores were off by nearly 400 points -- out of a possible 2400 -- 83% of the errors were in the range of 10 to 40 points.

College Board officials have repeatedly expressed regret over the problem, which sent many colleges scrambling in the final days of the college admissions season to reevaluate applications from affected students.

But the officials said that with a test taken by 2.3 million students a year, including 495,000 in October, it was difficult to eliminate all errors.

“It is a very high-volume operation, and you are dealing with millions of tests a year, and you’re dealing with many questions on each test,” said Laurence Bunin, the College Board’s senior vice president for operations. “In any high-volume operation like this, there’s statistically the chance for some rare error.”

In this case, the problems were blamed on the scanning of the multiple-choice portions of the test performed by a College Board contractor, Pearson Educational Measurement, one of the nation’s biggest educational testing and assessment businesses.

Pearson officials said the problems stemmed mainly from humid weather that caused answer sheets to expand.

Also, some answers were penciled in too lightly or too incompletely to be read by scanners, they said.

Some testing experts said they were perplexed by that explanation and by the fact that two students, not Pearson, discovered the problems.

Craig Hoyle, a research consultant at Boston College’s education school, said there were ways to prevent problems, such as using equipment to screen out humidity-tainted answer sheets. In addition, he said, “dummy” quality-check tests can be interspersed with the tests to detect scanning problems.

Pearson spokesman David Hakensen said his company had used quality-check tests, but not until the exams arrived at its processing center in Texas.

He said they were not mixed with exams at test sites -- where they might pick up the same humidity that affected the actual tests -- because any stray marks made on the quality-check tests would undermine their effectiveness.

Critics attribute scoring problems partly to cost-cutting by testing companies.

Pearson officials dispute that point.

The company has experienced problems in Minnesota, where in 2002 a judge found that a company Pearson acquired “continually short-staffed” a testing program, and in other states.

But experts point out that no technology, be it an airline or a testing system, is foolproof.

“Funny stuff happens,” said Robert Boruch, a professor of education and statistics at the University of Pennsylvania. “The main objective is to reduce the likelihood of it happening. You can’t reduce it to zero.”

Standardized Tests Don’t Always Make the Grade

More to Read

More From the Los Angeles Times

Most Read in World & Nation