FLUNKING THE SATs: No faulty comparison left behind!


Part 4—Rising scores on the SAT and the ACT:
Are seniors in American high schools doing less well in reading and math?

In theory, that’s an important question. In theory, we the people want to know the answer. In theory, our journalists and “educational experts” are trying to help us get answers to questions like that.

In practice, things are quite different. In reality, no one gives a flying fig about questions like that, except to the extent that such questions can be used to advance prevailing narratives and agendas.

On the front page of last Thursday’s Washington Post, we saw the con happen here:
ANDERSON (9/3/15): The steady decline in SAT scores and generally stagnant results from high schools on federal tests and other measures reflect a troubling shortcoming of education-reform efforts. The test results show that gains in reading and math in elementary grades haven’t led to broad improvement in high schools, experts say. That means several hundred thousand teenagers, especially those who grew up poor, are leaving school every year unready for college.

“Why is education reform hitting a wall in high school?” asked Michael J. Petrilli, president of the Thomas B. Fordham Institute, a think tank. “You see this in all kinds of evidence. Kids don’t make a whole lot of gains once they’re in high school. It certainly should raise an alarm.”

It is difficult to pinpoint a reason for the decline in SAT scores, but educators cite a host of enduring challenges in the quest to lift high school achievement. Among them are poverty, language barriers, low levels of parental education and social ills that plague many urban neighborhoods.
Is it really difficult to “pinpoint a reason for the decline in SAT scores?” On the one hand, yes it is. In fact, it’s impossible to “pinpoint” some exact reason for the decline in (average) scores.

On the other hand, it isn’t even slightly hard to identify certain factors which have probably helped to lower average national scores. Doing so is the easiest task in the world. Everyone knows how to do it.

On the front page of last Thursday’s Post, Anderson skipped right past those obvious factors. Instead, he played a familiar game, misusing SAT scores as he did:

First, he assumed that the decline in average SAT scores means that high school seniors are doing less well in reading and math. From there, he proceeded to the only factor that entered his head as he tried to explain this decline. He assumed the decline must be related to efforts at “reform.”

Our view? In that passage, we’re looking at journalistic malpractice, of a familiar type.

In fact, the SAT is not designed to facilitate the type of year-to-year comparisons in which Anderson was engaging. The College Board makes no attempt to test representative samples of high school seniors. The tests are given to all who apply—and the demographics of the population taking the tests are changing every year.

Duh! These demographic changes affect average national scores in ways which everyone understands. Those changing demographics create a confounding statistical mess for those who want to use SAT scores in this ham-handed way.

How do demographic changes in the tested population create a statistical mess? Let us count the ways:

More kids taking the test: As the SAT notes in its basic materials, more students are taking the tests every year. Given the history of SAT testing, this tends to lower average national scores, although that won’t always be the case.

A changing racial and ethnic blend: Every year, more black and Hispanic students take the SATs. Given our nation’s brutal racial history—given some ongoing policy practices—this tends to lower average scores.

Everybody knows that, but there’s more! In some large urban districts (see below), all students are now required to take the SATs, whether they plan to go to college or not. This may be good educational practice, but it tends to lower national average scores.

The rise of the ACT: Over the course of the years in question, the ACT has supplanted the SAT as the nation’s predominant college admission test. Many students take both tests, but many students have switched from the SAT to the ACT.

This test-switching adds to the statistical chaos when we look at national average scores on these tests. By the way, average scores have not declined on the ACT, even as it has surged past the SAT as the nation’s predominant program.

Taking the SATs as a junior: A substantial number of students take the SATs as juniors. For whatever reason, the percentage has risen in recent years, to the current 35%. Presumably, this changing percentage adds to the difficulty in making valid year-to-year comparisons.

At this point, might we make the world’s most obvious statement? The SATs were not designed for year-to-year comparisons. In fact, the SATs weren’t designed to measure populations at all. The SATs were designed to measure individuals.

Full stop.

The SATs were not designed to measure populations! The College Board makes no attempt to test a representative sample of any population, whether it’s the entire national high school population or the entire national high school population of some particular group.

No representative samples are involved in this process at all! To understand how the changing demographics of the tested students can create a giant statistical mess, consider a recent news report in the Dallas Morning News.

Uh-oh! Average SAT scores have declined quite a bit in Texas in recent years. Last week, reporter Terrence Stutz asked a state official to explain why that is.

The state official in question may have been overstating. But she offered some obvious observations—the types of observations which were barred from Anderson’s front-page report:
STUTZ (9/3/15): State education officials have attributed the declining SAT scores in Texas to an increase in the number of minority students taking the exam. Minorities generally perform worse than white students on standardized achievement tests like the SAT and ACT, the nation's two leading college entrance exams.


Debbie Ratcliffe, a spokeswoman for the Texas Education Agency, said the lower SAT scores this year are at least partly the result of testing policies in two dozen school districts—including Dallas and Fort Worth—where all upperclassmen now take the SAT each year.

“The SAT takers in those districts include not only those who are college-bound, but the whole student population [of juniors and seniors],” Ratcliffe said. “That translates in lower average scores because the more test takers you have, the more scores will decline.”
Why have average statewide scores in Texas declined? We can’t “pinpoint” that! But Ratcliffe is citing one obvious possible explanation—a type of obvious explanation Anderson disappeared from the front page of the Post.

Duh! According to Ratcliffe, two dozen districts are now requiring all their students to take the SATs, not just those students who are college-bound as judged by traditional measures.

This may be good educational policy; we have no view about that. But in large urban districts like Dallas, this will mean that many low-achieving students will now be taking the tests. Previously, they would not have.

This will lower average scores; this fact is blindingly obvious. But the decline in average scores won’t tell us anything about the Texas student population as a whole, or about the role of “education reform” in their schools.

More precisely, this decline in average scores won’t “reflect a troubling shortcoming of education-reform efforts.” It will simply reflect the fact that a larger swath of students were tested. It will tell us nothing else.

To what extent was this year’s decline in nationwide average SAT scores caused by changes in the population tested? It’s very hard to answer that question. The SATs were not designed to let us “pinpoint” an answer.

That said, it’s journalistic malpractice when the Washington Post does what it did last week. When it offers a lengthy, front-page news report and fails to mention this obvious factor. When it moves instead to its favorite corporate theme:

Something has failed in the schools!

Has something failed in American high schools over the past few years? Because the SATs weren’t designed to answer such questions, there’s no way to tell from looking at average scores. But just to add to the fun, this is what some average scores look like on the SAT and the ACT in the last five years after you “disaggregate”—after you break average scores down by demographic group:
Average reading scores, SAT, 2011/2015
White students: 528/529
Black students: 428/431
Hispanic students: 451/450
Asian-American students: 517/525

Average composite scores, ACT, 2011/2015
White students: 22.4/22.4
Black students: 17.0/17.1
Hispanic students: 18.7/18.9
Asian-American students: 23.6/23.9
All of a sudden, we seem to be looking at small gains in average scores rather than small declines. But these comparisons are bogus too. Offering one example, here’s why:

Back in 2011, the SAT didn’t test a representative sample of the nation’s black students. It didn’t test a representative sample in 2015 either.

If the SAT had tested representative samples in each of those years, those reading scores would suggest that the nation’s black kids had shown some progress in reading. But the SAT didn’t test any such samples—and given all the statistical chaos involving the SAT and the ACT, you simply can’t make a valid comparison between the two groups of kids who did get tested those years.

Having said that, let us also say this: Between 2011 and 2015, those composite ACT scores held steady (or slightly rose), even as the number of students taking the ACT increased by almost twenty percent.

By normal reckoning, you might expect average national scores to drop as the tested population increased to that extent. In this case, that didn’t happen. Might that suggest that something good occurred in the nation’s high schools?

That wouldn’t be a valid conclusion! Simply put, the SAT and the ACT aren’t designed for the purpose of making such comparisons.

The SAT isn’t designed to do that! But at newspapers like the Washington Post, invalid comparisons will be made—and narrative will prevail. When it comes to education testing, all roads will lead to this:

Something has gone wrong in the schools! Nothing seems to be working!

Last Thursday morning, the Washington Post committed a form of journalistic malpractice. Copying from the Anderson’s paper, Laura Moser quickly made things worse with this gruesome report at Slate.

How did other newspapers do—the New York Times, for example? What did the AP write? What did “educational expert” Petrilli say when he expounded on these matters at greater length? And how in the world did Kevin Drum get dragged into this mess?

Meanwhile, what about the NAEP? Don’t they test representative samples of the nation’s twelfth-graders? What does their testing show or suggest?

We’ll continue our back-to-school series next week. In the meantime, please understand—the mainstream press will almost always do two things when it discusses, or pretends to discuss, educational testing and test scores:

Reporters will reliably bungle their work on a technical basis. Then, in obedience to Hard Pundit Law, they’ll also say that nothing is working in the nation’s schools.

First, they’ll flunk the SATs in their own front-page reporting. After that, they’ll lament the way the nation’s kids don’t know how to read, write and cipher!

Coming next week or even tomorrow: Petrilli and Drum and Slate oh my! Also, what did the New York Times say? What can we learn from the NAEP?

Then, it’s back to our pre-announced back-to-school topics:

Are black kids suspended more often in the South than in other parts of the country? Also, what sort of progress has occurred in the New Orleans schools?

In theory, those are important questions. In practice, it's clear that nobody cares!


  1. Which ongoing policy practices lower test scores?

    1. Is that a theoretical or practical question? Never mind. I just found out nobody cares.

  2. Oh, dear. The analysts must have been tearing up their love notes to Uncle Drum as they writhe in tears in their quarters.

    Uncle Drum seems to have been captured by the jihadists at the Post who have subjected him to Hard Pundit Law.

    Why Do High Schools Erase All the Test Score Gains of the Past 40 Years?

    —By Kevin Drum | Thu Sep. 3, 2015

    "I'm delighted to see an education story ( He's talking about Nick Anderson's piece in the Post) that acknowledges the plain evidence of test score gains, even if just in an aside. The simple fact is that through middle school, standardized test scores have risen significantly over both the past decade and the past four decades. Elementary and middle school test scores have not been either stagnant or dropping, but based on the usual reporting of this stuff, I doubt that one person in a hundred is aware of this.

    But I'm also happy to see the flip side of this acknowledged: in general, all these gains wash away in high school. On the "gold standard" NAEP test, math scores have gone up just a few points among 17 year olds and reading scores have been flat. The usual explanation is that education reforms have initially been centered on elementary and middle schools, and scores will go up for older kids once those reforms start to become widespread in high schools.

    Maybe. But that excuse is starting to look old in the tooth. And even if high schools haven't seen a lot of reforms yet, why is it that they seem to have a negative effect on student performance? If math scores were up, say, ten points by the end of middle school and remained ten points up by the end of high school, that would be one thing. High schools wouldn't be adding anything, but they wouldn't be doing any harm either. But that's not the case. Kids come out of middle school better prepared today, but come out of high school no better than they did in 1971. High school is actually erasing gains.

    This is, needless to say, troubling. Poverty, language barriers, low levels of parental education and social ills are problems at all ages, so that explains little. Nor does disaggregating scores by race, since demographic changes have been similar at all age levels. But the plain truth is that the only thing that really matters is how well prepared kids are when they finish high school. All the test score gains in the world mean nothing if they're gone by age 17. This is something we really need to figure out."


    1. Somerby has known about the Drum article for over a week. He commented on it at Mother Jones. More to the point, he posted his mantra about the SAT over and over.

      Unfortunately for Somerby, while Drum quoted from the article by Anderson that included the reference to the SAT, the bulk of Drum's own writing, as evidenced by your comment, had nothing to do with SAT, but focused on results from NAEP.

      So, Bob Somerby:

      Rather than reply to your comment to Drum over at his place, I'll reply to your comment here. In fact, I'll use your own words, and just change them to fit the correct test references.

      "Repeat after me:

      "The NAEP is designed for this purpose. The NAEP is designed for this purpose. The NAEP is designed for this purpose. The NAEP is designed for this purpose."

      The NAEP is taken by a representative sample of students. Every time it is given, a larger percentage of students taking the test come from lower-scoring demographic groups (minority kids, low-income kids). This does not, however, explain the stagnant scores for all groups since 1992, which is why Bob Somerby may ignore this test.

      The voluminous data are publicly available. Somerby's comment is in keeping with his practice of doing exactly what he claims to deplore in today's journalism."

    Forget it Rat, she doesn't know you exist and never will.

[LINK] (But see footnote 4)



      [LINK] (But see footnote 4)

    3. Not sure who "she" is CMike, but since the words are Kevin Drum's, not mine, I hope he doesn't mind her inattention.

    4. Drum's article is short, but the first thing he mentions is SAT scores. Then Drum immediately quotes paragraphs from WaPo including the general SAT decline as a shortcoming of reform and reflection of school decline. He goes on to discuss what he acknowledges was an "aside" to the original piece: recent non-SAT score gains. That's fine, but only an ideologue would think it's unfortunate for Somerby.

      One might indeed though think it unfortunate that while Drum decries the absence (long-noted by Somerby) of media coverage of those gains, he nevertheless lends credence of his own to the misunderstanding and misuse of the SAT.

      Does the SAT, as bolded by Drum, to all appearances in approval, really tell us "that gains in reading and math in elementary grades haven’t led to broad improvement in high schools?"

      It may or may not be true that "gains in reading and math in elementary grades haven’t led to broad improvement in high schools."

      But no. No, the SAT does not tell us that.

    5. What Drum bolded in the quote from the Washington Post article by Anderson is a reference to federal tests, the NAEP. Anderson does combine both SAT and federal tests in his references. Drum does not.

      Drum himself does not return to SAT scores.

    6. population. In the case of NAEP, the population of interest is the entire collection of American students in public or private schools at grades 4, 8, or 12 (or in the case of the long-term trend assessments, at ages 9, 13, and 17 years). The small samples of students that NAEP selects for the assessment permit inferences about academic performance to be made for all school students at the three grade or age levels.


      I don't have a link but I am certain that for 9 and 13 year-olds the universe of "students in public or private schools" has been nearer to a constant percentage of the total population for all children those ages and at a higher percentage for all children those ages than it has been for 17 year-olds. 17 year-olds can drop out of school, in days of yore they did so at a higher rate than they do today, those who were the least academically successful were the individuals most likely to do so, and, in most instances, they did so before being tested in Grade 12 by the NAEP project.

      If secondary schools have been able to maintain the average 17 year-olds are scoring on the NAEP while increasing the percentage of the total 17 year-old population the sample is taken from then that suggests secondary schools currently are doing a better job at educating students than they were in previous decades.

      (And that would be true regardless of the amount of umbrage "oh, dear" Rat has taken over the analysts' criticism of Rachel Maddow.)

    7. I find your logic odd @12:52. If kids have a growth spurt in early adolescence, so they grow several inches in a year, then reach close to adult height by age 17, does that mean something is wrong because they haven't maintained that several inches each year, and is something wrong because they wind up average height? There is no reason why improvement must be consistent or why those who start out ahead in early childhood must maintain the same advantage nearing adulthood. Some kids speak much earlier than others but they all speak by adulthood. Why shouldn't education be similar to those other developmental changes?

    8. It looks like CMike is beginning to disply something of a Rachel Maddow fixation. I'm not sure how someobdy quoting Drum at length indicates anything other than a criticism of Bob Somerby. Especially since, as Somerby frequently notes, Maddow doesn't covering testing of students as a topic.

    9. Jihads that target others have either heartfelt motivations or they're conducted by sadists. Quit the "whatever are you talking about?" act Rat, while you're still getting the benefit of the doubt.

    Email Kindly Contact: urgentloan22@gmail.com