the daily howler: The Post revisits those DC score gains!

The Post revisits those DC score gains!

MONDAY, SEPTEMBER 23, 2013

Guess who lacks minimal skills: Be careful what you ask for!

In late July, the DC schools announced the largest score gains in five years on its annual “statewide” tests. Emma Brown reported the ballyhooed gains in the Washington Post.

Heroically, we started our review of her report like this: “Everything we know about Emma Brown is good.” But we noted an obvious problem with her report:

Were this year’s system-wide tests “equivalent” to last year’s system-wide tests? That is, were the two sets of tests equally difficult? Unless you know that this is true, it doesn’t matter—it doesn’t mean squat—if passing rates go up!

Especially given other events unfolding around the nation, that was a blindingly obvious question. But in real time, Brown didn’t ask.

But always be careful what you ask for! Yesterday, Brown explored that very question in one of the most incoherent news reports we’ve ever seen in print.

Her report was very long, and it was very high-profile. Its 38 paragraphs burned 1996 words. It was the featured report on page one of the Post’s Sunday Metro section.

The report was long and high-profile. It was also incoherent. This is the way Brown began. We include the Post’s main headline:

BROWN (9/22/13): Scoring decision aided math gains

The four-point gains D.C. public school students achieved citywide on the most recent annual math and reading tests were acclaimed as historic, as more evidence that the city's approach to improving schools is working.

But the math gains officials reported were the result of a quiet decision to score the tests in a way that yielded higher scores even though D.C. students got far fewer math questions correct than in the year before.

“Scoring decision aided math gains,” the principal headline said.

That didn’t sound very good. But neither did the start of Brown’s report.

According to Brown, this year’s tests were scored “in a way that yielded higher scores even though D.C. students got far fewer math questions correct than in the year before.” Plainly, it sounded like something was wrong with the way these tests were handled.

But as she continued, Brown got murky quite fast. It was no longer clear if something was wrong. Frankly, we now had little idea what Brown was talking about:

BROWN (continuing directly): The decision was made after D.C. teachers recommended a new grading scale—which would have held students to higher standards on tougher math tests—and after officials reviewed projections that the new scale would result in a significant decline in math proficiency rates.

Instead, city officials chose to discard the new grading approach and hold students to a level of difficulty similar to previous years', according to city officials as well as e-mails and documents obtained by The Washington Post.

It now sounds like students may have been held to “a level of difficulty similar to previous years,” which is pretty much what you might want to do. But good lord! The jumble of confusion Brown has introduced!

We pretty much know what a “tougher math test” is. We’re not real sure we know what “a new grading scale” on a tougher test is—a new grading scale which would “hold students to higher standards” on those tougher tests.

Nor do we feel we know what happened even after several forced marches through Brown’s 2000 words. In our view, the piece is confusing from its start right on to its finish.

We’ve worked with these topics for forty years, but we feel heavily bollixed by what we have read. In our view, this is a tremendously confused and confusing news report.

Enter this forest ye who dare! We have struggled on two separate days with Brown’s lengthy, jumbled report. Our basic conclusion would be this:

Brown and her editor aren’t minimally competent in this subject area.

Brown graduated from Stanford in 2000. After teaching seventh grade in Juneau, she got a master’s degree in journalism from Berkeley in 2009.

That said, her hopelessly jumbled report took us back many years. Long ago, we were struck by the lack of technical competence on the part of many education reporters, even those at high-profile news orgs.

Everything else we know about Brown is still good. That said, the Washington Post is a major newspaper. Its education reporter, and her unnamed editor, seem to lack even minimal skills.

This is very much the way our society works.

The basic question under review: Did D.C. students show improvement on last year's system-wide tests?

After struggling with this report, we have no earthly idea. That may be the fault of the D.C. schools. It's clearly the fault of the Post.

37 comments:

TrollmesSeptember 23, 2013 at 12:25 PM
Emma Brown's article made clear that there are a number of moving parts in comparing test results from different years. Brown then described changes in difficulty of questions along with a changing standard of the number of correct answers required for students to be deemed proficient. Further, Brown's reporting showed how politics affected the decisions on the grading system. Did anyone else who read Brown's article find it confusing?

Can anyone tell what Somerby would deem proficient in writing about school testing? Obviously he gives failing grades to Brown ...as he has done to Ravitch and Sirota in the past. Does anyone ever pass Somerby's test?
ReplyDelete
Replies
AnonymousSeptember 23, 2013 at 12:54 PM
Emma Brown's article is incomprehensible, simple as that. A reader gains no sense at all of how well or poorly students are doing from the latest test results, and why there should be such confusion is of no evident concern to Emma Brown.

LTR
ReplyDelete
Replies
AnonymousSeptember 23, 2013 at 2:00 PM
Long ago, we were struck by the lack of technical competence on the part of many education reporters, even those at high-profile news orgs..... Our society staggers under the burden of this relentless conduct.

Fortunately this has no impact on our public school students.

Yet, that said, in reading and math, test scores have greatly improved among all three major student groups (whites, blacks and Hispanics) over that 40-year period. The score gains have been very large in the last two decades.

The Asian tigers—Japan, Korea and Taiwan—do outscore American students by wide margins in math.

On these recent tests, the students in our own Finland outscored the students in the real Finland.

Not bad for a society staggering and suffering from paralysis!

ReplyDelete
Replies
deadratSeptember 23, 2013 at 4:32 PM
TDH is correct that Emma Brown's story is incomprehensible, but all fairness requires that we examine her source, the 2012 Technical Report of DC CAS, that is The District of Columbia Comprehensive Assessment System.

Let's go to Section 6: "Methods," where we find Critical Element 4.4: "When different test forms or formats are used, the State must ensure that the meaning and interpretation of results are consistent." Now, I not exactly sure what that means, just as I'm not sure why CE 4.4 appears in section 6, but subparagraph (a) asks, "Has the State taken steps to ensure consistency of test forms over time?" I'm a trifle unsettled that this is phrased as a question, but I think it means that they want to be able to compare test scores from different years, so I think I may be in the right place.

A little further down I read "Assessment results must be expressed in terms of the achievement standards, not just scale scores or percentiles." This is clumsy writing, not least because its in the passive voice, but I'm encouraged. I think this means that they're gonna tell not just raw scores but what the scores mean.

A little further on: "CTB uses Mantel-Haenszel statistics … to evaluate DIF for both operational and field test items…. As with all statistical tests, Mantel-Haenszel DIF statistics are subject to Type I and II errors. An item flagged for DIF may or may not provide an unfair advantage or disadvantage for one examinee subgroup compared with another. However, the flag does show when an item is more difficult for a particular focal subgroup of students than would be expected based on their total test scores, when compared with the difficulty of the item for the comparison or reference subgroup with equivalent total test scores."

Now I know what Type I an Type II errors are. The first is a false positive; the second is a false negative. I have to look up DIF. It means "Differential Item Functioning," and apparently refers to the different probabilities of people from different groups giving different answers to the same test question. I have to look up Mantel-Haenszel statistics. Wikipedia tells me it's some kind of correlation test "used when the effect of the explanatory variable on the response variable is influenced by covariates that can be controlled."

I'm lost, and I haven't gotten to IRT (item response theory) or biserial coefficients. In case you were wondering, item response theory assumes that different questions on tests are not equally difficult.

Good. I guess.

This report goes on for 157 pages filled with educationist jargon in badly-written English and filled with tabular data generated by statistical packages. I defy anyone outside the educationist guild to make any sense of it before deadline.
ReplyDelete
Replies
AnonymousSeptember 23, 2013 at 5:08 PM
Anyone who had difficulty understanding Brown's piece would also have difficulty keeping up with the surging Poles, much less any hope of catching the runaway Koreans or Finns.

Blame who you want, parents, teachers, administrators, or politicians.
ReplyDelete
Replies
AnonymousSeptember 23, 2013 at 6:33 PM
Read article. Harder test. Easier grading. Higher scores.
ReplyDelete
Replies
Alan SnipesSeptember 23, 2013 at 9:21 PM
Since D.C. governs itself, I wonder which state's tests do they use?
ReplyDelete
Replies
AnonymousSeptember 24, 2013 at 9:20 AM
Part 1

This is not the first time that Emma Brown confounds Post readers.

In an August 21st article, Brown and co-author Lynh Bui regurgitate ACT propaganda, and then write this: “The ACT is a competitor of the SAT and is now the most popular college entrance exam in the country.”

http://articles.washingtonpost.com/2013-08-21/local/41431944_1_act-test-takers-minimum-scores-college-readiness

They might have told the truth – and informed the public – about the ACT and SAT.

The SAT is a badly flawed and virtually worthless test. College enrollment specialists say that their research finds the SAT predicts between 3 and 15 percent of freshman-year college grades, and after that nothing. As one commented, "I might as well measure their shoe size."

Matthew Quirk reported this in “The Best Class Money Can Buy:”

“The ACT and the College Board don't just sell hundreds of thousands of student profiles to schools; they also offer software and consulting services that can be used to set crude wealth and test-score cutoffs, to target or eliminate students before they apply...That students are rejected on the basis of income is one of the most closely held secrets in admissions; enrollment managers say the practice is far more prevalent than most schools let on.”

http://www.theatlantic.com/magazine/archive/2005/11/the-best-class-money-can-buy/4307/2/

The authors of a study in Ohio found the ACT has minimal predictive power. For example, the ACT composite score predicts about 5 percent of the variance in freshman-year Grade Point Average at Akron University, 10 percent at Bowling Green, 13 percent at Cincinnati, 8 percent at Kent State, 12 percent at Miami of Ohio, 9 percent at Ohio University, 15 percent at Ohio State, 13 percent at Toledo, and 17 percent for all others. Hardly anything to get all excited about.
ReplyDelete
Replies
AnonymousSeptember 24, 2013 at 9:21 AM
Part 2

Here is what the authors say about the ACT in their concluding remarks:

"...why, in the competitive college admissions market, admission officers have not already discovered the shortcomings of the ACT composite score and reduced the weight they put on the Reading and Science components. The answer is not clear. Personal conversations suggest that most admission officers are simply unaware of the difference in predictive validity across the tests.”

“They have trusted ACT Inc. to design a valid exam and never took the time (or had the resources) to analyze the predictive power of its various components. An alternative explanation is that schools have a strong incentive - perhaps due to highly publicized external rankings such as those compiled by U.S. News & World Report, which incorporate students’ entrance exam scores - to admit students with a high ACT composite score, even if this score turns out to be unhelpful."

Maybe Brown and Bui don’t know about predictive validity either. But it’s their business to know. And to report accurately. As former Post owner Eugene Meyer wrote, “The first mission of a newspaper is to tell the truth as nearly as the truth can be ascertained.”

In their article, however, Brown and Bui fall woefully short of that charge. So too does Education Secretary Arne Duncan, who is quoted as saying that “ we must be honest about our students’ performance.” But naturally, Duncan is anything but honest. He’s become a national education embarrassment.

One would hope that Emma Brown and Lynh Bui are smart enough to do better education reporting than what they’ve demonstrated thus far . Editors at The Post – and readers – should demand that they do so.

ReplyDelete
Replies
AnonymousSeptember 24, 2013 at 9:21 AM
Part 2

Here is what the authors say about the ACT in their concluding remarks:

"...why, in the competitive college admissions market, admission officers have not already discovered the shortcomings of the ACT composite score and reduced the weight they put on the Reading and Science components. The answer is not clear. Personal conversations suggest that most admission officers are simply unaware of the difference in predictive validity across the tests.”

“They have trusted ACT Inc. to design a valid exam and never took the time (or had the resources) to analyze the predictive power of its various components. An alternative explanation is that schools have a strong incentive - perhaps due to highly publicized external rankings such as those compiled by U.S. News & World Report, which incorporate students’ entrance exam scores - to admit students with a high ACT composite score, even if this score turns out to be unhelpful."

Maybe Brown and Bui don’t know about predictive validity either. But it’s their business to know. And to report accurately. As former Post owner Eugene Meyer wrote, “The first mission of a newspaper is to tell the truth as nearly as the truth can be ascertained.”

In their article, however, Brown and Bui fall woefully short of that charge. So too does Education Secretary Arne Duncan, who is quoted as saying that “ we must be honest about our students’ performance.” But naturally, Duncan is anything but honest. He’s become a national education embarrassment.

One would hope that Emma Brown and Lynh Bui are smart enough to do better education reporting than what they’ve demonstrated thus far . Editors at The Post – and readers – should demand that they do so.

ReplyDelete
Replies
AnonymousSeptember 24, 2013 at 12:44 PM
Ah, so let's apply the Somerby test of perfection and clarity so that the average reader with no expertise in the field could underdstand:

"Personal conversations suggest that most admission officers are simply unaware of the difference in predictive validity across the tests.”

What on earth does that mean? We don't know. It's never explained.

And the author is relying on "personal conversations"? Is that another way of saying "anecdotal evidence"?

ReplyDelete
Replies
urban legendSeptember 24, 2013 at 2:02 PM
Can you "anonymous" jerks pick a name, any name, even "anonymous 2042," so the rest of us can follow the damned discussion?
ReplyDelete
Replies
urban legendSeptember 24, 2013 at 2:02 PM
Can you "anonymous" jerks pick a name, any name, even "anonymous 2042," so the rest of us can follow the damned discussion?
ReplyDelete
Replies
urban legendSeptember 24, 2013 at 2:02 PM
Can you "anonymous" jerks pick a name, any name, even "anonymous 2042," so the rest of us can follow the damned discussion?
ReplyDelete
Replies

Add comment