The Post revisits those DC score gains!

MONDAY, SEPTEMBER 23, 2013

Guess who lacks minimal skills: Be careful what you ask for!

In late July, the DC schools announced the largest score gains in five years on its annual “statewide” tests. Emma Brown reported the ballyhooed gains in the Washington Post.

Heroically, we started our review of her report like this: “Everything we know about Emma Brown is good.” But we noted an obvious problem with her report:

Were this year’s system-wide tests “equivalent” to last year’s system-wide tests? That is, were the two sets of tests equally difficult? Unless you know that this is true, it doesn’t matter—it doesn’t mean squat—if passing rates go up!

Especially given other events unfolding around the nation, that was a blindingly obvious question. But in real time, Brown didn’t ask.

But always be careful what you ask for! Yesterday, Brown explored that very question in one of the most incoherent news reports we’ve ever seen in print.

Her report was very long, and it was very high-profile. Its 38 paragraphs burned 1996 words. It was the featured report on page one of the Post’s Sunday Metro section.

The report was long and high-profile. It was also incoherent. This is the way Brown began. We include the Post’s main headline:
BROWN (9/22/13): Scoring decision aided math gains

The four-point gains D.C. public school students achieved citywide on the most recent annual math and reading tests were acclaimed as historic, as more evidence that the city's approach to improving schools is working.

But the math gains officials reported were the result of a quiet decision to score the tests in a way that yielded higher scores even though D.C. students got far fewer math questions correct than in the year before.
“Scoring decision aided math gains,” the principal headline said.

That didn’t sound very good. But neither did the start of Brown’s report.

According to Brown, this year’s tests were scored “in a way that yielded higher scores even though D.C. students got far fewer math questions correct than in the year before.” Plainly, it sounded like something was wrong with the way these tests were handled.

But as she continued, Brown got murky quite fast. It was no longer clear if something was wrong. Frankly, we now had little idea what Brown was talking about:
BROWN (continuing directly): The decision was made after D.C. teachers recommended a new grading scale—which would have held students to higher standards on tougher math tests—and after officials reviewed projections that the new scale would result in a significant decline in math proficiency rates.

Instead, city officials chose to discard the new grading approach and hold students to a level of difficulty similar to previous years', according to city officials as well as e-mails and documents obtained by The Washington Post.
It now sounds like students may have been held to “a level of difficulty similar to previous years,” which is pretty much what you might want to do. But good lord! The jumble of confusion Brown has introduced!

We pretty much know what a “tougher math test” is. We’re not real sure we know what “a new grading scale” on a tougher test is—a new grading scale which would “hold students to higher standards” on those tougher tests.

Nor do we feel we know what happened even after several forced marches through Brown’s 2000 words. In our view, the piece is confusing from its start right on to its finish.

We’ve worked with these topics for forty years, but we feel heavily bollixed by what we have read. In our view, this is a tremendously confused and confusing news report.

Enter this forest ye who dare! We have struggled on two separate days with Brown’s lengthy, jumbled report. Our basic conclusion would be this:

Brown and her editor aren’t minimally competent in this subject area.

Brown graduated from Stanford in 2000. After teaching seventh grade in Juneau, she got a master’s degree in journalism from Berkeley in 2009.

That said, her hopelessly jumbled report took us back many years. Long ago, we were struck by the lack of technical competence on the part of many education reporters, even those at high-profile news orgs.

Everything else we know about Brown is still good. That said, the Washington Post is a major newspaper. Its education reporter, and her unnamed editor, seem to lack even minimal skills.

This is very much the way our society works.

The basic question under review: Did D.C. students show improvement on last year's system-wide tests?

After struggling with this report, we have no earthly idea. That may be the fault of the D.C. schools. It's clearly the fault of the Post.

37 comments:

  1. Emma Brown's article made clear that there are a number of moving parts in comparing test results from different years. Brown then described changes in difficulty of questions along with a changing standard of the number of correct answers required for students to be deemed proficient. Further, Brown's reporting showed how politics affected the decisions on the grading system. Did anyone else who read Brown's article find it confusing?

    Can anyone tell what Somerby would deem proficient in writing about school testing? Obviously he gives failing grades to Brown ...as he has done to Ravitch and Sirota in the past. Does anyone ever pass Somerby's test?

    ReplyDelete
    Replies
    1. If you change the level of difficulty of the questions then it doesn't matter whether you change the grading scale or leave it the same -- the scores cannot be directly compared to previous tests. THAT should have been made clear in Brown's report but instead Brown discusses how politics affected the choice of grading scale, as if such a comparison were possible no matter which scale was used. That problem should have been addressed in Brown's report but wasn't.

      Delete

    2. "Does anyone ever pass Somerby's test?"

      Yes: Richard Rothstein. He had an article in Slate not too long ago - which Somerby linked to but which I can't seem to find.

      http://www.epi.org/publication/us-student-performance-testing/

      Delete
    3. anon 900,

      That Rothstein piece was 40,000 words.

      Delete
  2. Emma Brown's article is incomprehensible, simple as that. A reader gains no sense at all of how well or poorly students are doing from the latest test results, and why there should be such confusion is of no evident concern to Emma Brown.

    LTR

    ReplyDelete
    Replies
    1. At present, only a fool would trust results from the statewide testing programs conducted by the fifty states over the past dozen years.

      Delete
    2. She reported the test results, the schools that did well, the schools that did not so well, as well as the caveats about reading too much into test results, particularly one year's results.

      What should she have done to make it more comprehensible to you, LTR?

      Delete
    3. Simply reporting the test results is not enough in a report about testing because it provides no context about how to interpret the results. Readers are not experts in education and do not know what to make of such data. A reporter should seek out information to help readers understand what the results mean to them, as parents perhaps or taxpayers. This reporter didn't seem to be able to do that for readers. When a reporter doesn't have sufficient knowledge to do it herself, she is expected to seek out expert help, digest the information and present it clearly to readers. That is what journalism is about.

      Delete
    4. Well, I disagree. I thought Brown added "context" very well, and even interviewed several "experts" for this story.

      I find that "lacks context" and "incomprehensible" are easy charges to throw around about newspaper stories which are very limited in what they can present, and must gather information in a rather short period of time -- especially these days of 24-hour Internet news cycles.

      Good grief, to provide all the context and to quote all the experts possible, Brown would have been writing a PhD thesis instead of newspaper story.

      Delete
    5. If Brown had ever written a dissertation in education she might known how to explain this simpler content in a news article. The point of becoming expert in some field is to teach someone how to do it again in real-life situations. I don't think the 24-hour news cycle is any excuse. The difference between something thrown together in a quick-and-dirty manner with the assumption that few people will know the difference and they don't matter anyway, and writing a truly informative, comprehensible piece, is a matter of integrity and journalistic standards, since most people will not notice or be able to tell apart a good job from a poor one. Somerby does a service by telling us when a poor job is being done. If it doesn't look too bad to you, I assume you do not have a Ph.D or Ed.D and aren't following the technicalities or trying to make sense out of the things that are incomprehensible about the article. Our papers count on there being more people like you and fewer like Somerby out there. Should they be doing that or should they be doing a better job? Do you want to run your life and make decisions and vote based on things that make only superficial sense but don't hold up to closer examination? Do you want to do this when reporters also have a political axe to grind, such as when they point out that an easier grading scale was adopted for political reasons to make students look like they were doing better than they were (those failing schools, those terrible teachers unions). Does it make sense that it was the teachers who wanted the stricter grading scale? Doesn't that contradict the idea that teachers have a vested interest in making students seem to be doing better than they are? Does this mean the administrators were trying to cheat in some way? Is that true or does someone just want you to think it is true -- who and for what reason? I think these questions should matter to you.

      Delete
  3. Long ago, we were struck by the lack of technical competence on the part of many education reporters, even those at high-profile news orgs..... Our society staggers under the burden of this relentless conduct.

    Fortunately this has no impact on our public school students.

    Yet, that said, in reading and math, test scores have greatly improved among all three major student groups (whites, blacks and Hispanics) over that 40-year period. The score gains have been very large in the last two decades.


    The Asian tigers—Japan, Korea and Taiwan—do outscore American students by wide margins in math.

    On these recent tests, the students in our own Finland outscored the students in the real Finland.

    Not bad for a society staggering and suffering from paralysis!



    ReplyDelete
    Replies
    1. Speaking of narratives, this is a long-running narrative of Somerby's.

      To wit: There is absolutely no reporter in the United States capable of reporting competently on education.

      Delete
    2. Long narrative? Maybe. All lines from the last month.

      Delete
    3. "There is absolutely no reporter...capable of reporting competently on education."

      Somerby has cited Richard Rothstein, who granted is not a reporter (he works for a think tank), but who has written articles on testing and education. He had an article in Slate not too long ago - which Somerby linked to but which I can't seem to find.

      http://www.epi.org/publication/us-student-performance-testing/

      Delete
    4. You do get the difference between reporters working on deadlines and a guy cranking out articles from a think tank, don't you?

      Delete
    5. The reporters can't be reasonably expected to get right even those minimal matters (disaggregation) that pertain to their constant theme (decline)?

      Delete
  4. TDH is correct that Emma Brown's story is incomprehensible, but all fairness requires that we examine her source, the 2012 Technical Report of DC CAS, that is The District of Columbia Comprehensive Assessment System.

    Let's go to Section 6: "Methods," where we find Critical Element 4.4: "When different test forms or formats are used, the State must ensure that the meaning and interpretation of results are consistent." Now, I not exactly sure what that means, just as I'm not sure why CE 4.4 appears in section 6, but subparagraph (a) asks, "Has the State taken steps to ensure consistency of test forms over time?" I'm a trifle unsettled that this is phrased as a question, but I think it means that they want to be able to compare test scores from different years, so I think I may be in the right place.

    A little further down I read "Assessment results must be expressed in terms of the achievement standards, not just scale scores or percentiles." This is clumsy writing, not least because its in the passive voice, but I'm encouraged. I think this means that they're gonna tell not just raw scores but what the scores mean.

    A little further on: "CTB uses Mantel-Haenszel statistics … to evaluate DIF for both operational and field test items…. As with all statistical tests, Mantel-Haenszel DIF statistics are subject to Type I and II errors. An item flagged for DIF may or may not provide an unfair advantage or disadvantage for one examinee subgroup compared with another. However, the flag does show when an item is more difficult for a particular focal subgroup of students than would be expected based on their total test scores, when compared with the difficulty of the item for the comparison or reference subgroup with equivalent total test scores."

    Now I know what Type I an Type II errors are. The first is a false positive; the second is a false negative. I have to look up DIF. It means "Differential Item Functioning," and apparently refers to the different probabilities of people from different groups giving different answers to the same test question. I have to look up Mantel-Haenszel statistics. Wikipedia tells me it's some kind of correlation test "used when the effect of the explanatory variable on the response variable is influenced by covariates that can be controlled."

    I'm lost, and I haven't gotten to IRT (item response theory) or biserial coefficients. In case you were wondering, item response theory assumes that different questions on tests are not equally difficult.

    Good. I guess.

    This report goes on for 157 pages filled with educationist jargon in badly-written English and filled with tabular data generated by statistical packages. I defy anyone outside the educationist guild to make any sense of it before deadline.

    ReplyDelete
    Replies
    1. That's why you interview experts (or hire journalists with sufficient expertise). Education is not transparent as a field simply because we all attended school at one time or another. It is a profession with technicalities and a knowledge base. You wouldn't expect a reporter to become an expert on physics to discuss news about the space program, but you would expect that reporter to interview competent experts and sort out their answers until they made sense, then perhaps run it by someone with some training to make sure it still made sense, before publishing a report. If you do not understand something yourself, it is impossible to communicate it clearly to others, even when trying to address an audience at a very basic level. The discussion about DIF is to make sure none of the items are biased against members of minority groups, as test items have been in the past when they asked inner city kids about yachts (for example). This can be tested statistically as described.

      Delete
    2. Lindy,

      So you think the educationist establishment has "competent experts" who can be trusted to sort out answers "until they made sense."

      Really? Before deadline? These people couldn't write a coherent English sentence in the active voice if their lives depended on it.

      But I think your faith in them is adorable.

      Delete
    3. Stories like this don't have short deadlines the way breaking news does. Editors should insist that the stories make sense before publishing them. I think newspapers should have experts on staff, but I guess that is quaint too.

      Delete
    4. Lindy,

      Well, yes, you're right that stories like this don't have the immediacy of the latest mass shooting, but this isn't an in-depth study piece, either. Its hook is the joyous announcement by educationists that they've succeeded once again in lifting test scores. It will stale quickly if not immediately.

      You're also right that editors should insist that stories make sense, but then the people who agreed to talk aren't making any sense. Here's a sentence from the article:

      <quote>
      The OSSE [the Superintendent's office] made an unwritten commitment years ago to maintain that trend line as a way to judge progress and the effectiveness of reform efforts, said Jeffrey Noel, who oversees testing at the agency.
      </quote>

      Noel's words aren't in quotes, which is odd, but his words, if they're his, are even odder. The "trend line" would mean student test scores over time, and that's not something the Sup can commit to. Perhaps he's talking about maintaining the comparability of the test scores over time.

      The people doing the testing from McGraw-Hill aren't talking, and the technical report says that the teachers involved sign nondisclosure agreements.

      Delete
    5. "Stories like this don't have short deadlines the way breaking news does."

      Uh, yes they do. This story is the first following the release of the test scores. It had to be pulled together in a matter of hours, at best, before the next edition.

      If the reporter and her editors want a more in-depth piece, then she's got another couple of days to pull that together. At best.

      A very wise man many years ago described journalism as "the first rough draft of history."

      Delete
    6. And "history is written by the winners" someone else (look it up!) said.

      So, we should expect exactly what we see: That even the "first draft of history" consistently reflects the prejudices, concerns and goals of "the winners" -- in the present case, the prejudices, concerns and goals (and yes, the falsehoods) of those who intend to "win" the battle of US education.

      Delete
  5. Anyone who had difficulty understanding Brown's piece would also have difficulty keeping up with the surging Poles, much less any hope of catching the runaway Koreans or Finns.

    Blame who you want, parents, teachers, administrators, or politicians.

    ReplyDelete
  6. Read article. Harder test. Easier grading. Higher scores.

    ReplyDelete
    Replies
    1. Maybe, maybe not. I'd say, different test. Possibly harder. Grading is the same as always, certainly on the math test. Lower scores. Scale jiggered to maintain or increase percentage of those marked "proficient" or better.

      But who can tell for sure?

      Delete
    2. Certainly not any loyal Somerby tribalist primed to think the article was incomprehensible before they actually read it.

      Delete
  7. Since D.C. governs itself, I wonder which state's tests do they use?

    ReplyDelete
  8. Part 1

    This is not the first time that Emma Brown confounds Post readers.

    In an August 21st article, Brown and co-author Lynh Bui regurgitate ACT propaganda, and then write this: “The ACT is a competitor of the SAT and is now the most popular college entrance exam in the country.”

    http://articles.washingtonpost.com/2013-08-21/local/41431944_1_act-test-takers-minimum-scores-college-readiness

    They might have told the truth – and informed the public – about the ACT and SAT.

    The SAT is a badly flawed and virtually worthless test. College enrollment specialists say that their research finds the SAT predicts between 3 and 15 percent of freshman-year college grades, and after that nothing. As one commented, "I might as well measure their shoe size."

    Matthew Quirk reported this in “The Best Class Money Can Buy:”

    “The ACT and the College Board don't just sell hundreds of thousands of student profiles to schools; they also offer software and consulting services that can be used to set crude wealth and test-score cutoffs, to target or eliminate students before they apply...That students are rejected on the basis of income is one of the most closely held secrets in admissions; enrollment managers say the practice is far more prevalent than most schools let on.”

    http://www.theatlantic.com/magazine/archive/2005/11/the-best-class-money-can-buy/4307/2/

    The authors of a study in Ohio found the ACT has minimal predictive power. For example, the ACT composite score predicts about 5 percent of the variance in freshman-year Grade Point Average at Akron University, 10 percent at Bowling Green, 13 percent at Cincinnati, 8 percent at Kent State, 12 percent at Miami of Ohio, 9 percent at Ohio University, 15 percent at Ohio State, 13 percent at Toledo, and 17 percent for all others. Hardly anything to get all excited about. 

    ReplyDelete
  9. Part 2

    Here is what the authors say about the ACT in their concluding remarks: 
     
    "...why, in the competitive college admissions market, admission officers have not already discovered the shortcomings of the ACT composite score and reduced the weight they put on the Reading and Science components. The answer is not clear. Personal conversations suggest that most admission officers are simply unaware of the difference in predictive validity across the tests.”

    “They have trusted ACT Inc. to design a valid exam and never  took the time (or had the resources) to analyze the predictive power of its various components. An alternative explanation is that schools have a strong incentive - perhaps due to highly publicized external rankings such as those compiled by U.S. News & World Report, which incorporate students’ entrance exam scores - to admit students with a high ACT composite score, even if this  score turns out to be unhelpful." 

    Maybe Brown and Bui don’t know about predictive validity either. But it’s their business to know. And to report accurately. As former Post owner Eugene Meyer wrote, “The first mission of a newspaper is to tell the truth as nearly as the truth can be ascertained.”

    In their article, however, Brown and Bui fall woefully short of that charge. So too does Education Secretary Arne Duncan, who is quoted as saying that “ we must be honest about our students’ performance.” But naturally, Duncan is anything but honest. He’s become a national education embarrassment.

    One would hope that Emma Brown and Lynh Bui are smart enough to do better education reporting than what they’ve demonstrated thus far . Editors at The Post – and readers – should demand that they do so.

    ReplyDelete
  10. Part 2

    Here is what the authors say about the ACT in their concluding remarks: 
     
    "...why, in the competitive college admissions market, admission officers have not already discovered the shortcomings of the ACT composite score and reduced the weight they put on the Reading and Science components. The answer is not clear. Personal conversations suggest that most admission officers are simply unaware of the difference in predictive validity across the tests.”

    “They have trusted ACT Inc. to design a valid exam and never  took the time (or had the resources) to analyze the predictive power of its various components. An alternative explanation is that schools have a strong incentive - perhaps due to highly publicized external rankings such as those compiled by U.S. News & World Report, which incorporate students’ entrance exam scores - to admit students with a high ACT composite score, even if this  score turns out to be unhelpful." 

    Maybe Brown and Bui don’t know about predictive validity either. But it’s their business to know. And to report accurately. As former Post owner Eugene Meyer wrote, “The first mission of a newspaper is to tell the truth as nearly as the truth can be ascertained.”

    In their article, however, Brown and Bui fall woefully short of that charge. So too does Education Secretary Arne Duncan, who is quoted as saying that “ we must be honest about our students’ performance.” But naturally, Duncan is anything but honest. He’s become a national education embarrassment.

    One would hope that Emma Brown and Lynh Bui are smart enough to do better education reporting than what they’ve demonstrated thus far . Editors at The Post – and readers – should demand that they do so.

    ReplyDelete
  11. Ah, so let's apply the Somerby test of perfection and clarity so that the average reader with no expertise in the field could underdstand:

    "Personal conversations suggest that most admission officers are simply unaware of the difference in predictive validity across the tests.”

    What on earth does that mean? We don't know. It's never explained.

    And the author is relying on "personal conversations"? Is that another way of saying "anecdotal evidence"?

    ReplyDelete
  12. Can you "anonymous" jerks pick a name, any name, even "anonymous 2042," so the rest of us can follow the damned discussion?

    ReplyDelete
  13. Can you "anonymous" jerks pick a name, any name, even "anonymous 2042," so the rest of us can follow the damned discussion?

    ReplyDelete
  14. Can you "anonymous" jerks pick a name, any name, even "anonymous 2042," so the rest of us can follow the damned discussion?

    ReplyDelete
    Replies
    1. If you can't follow it, the names isn't the problem. Get a new hobbyhorse.

      And no, *this* Anonymous never commented before 10 minutes ago.

      Delete