the daily howler: American schools get it wrong again!

American schools get it wrong again!

TUESDAY, OCTOBER 24, 2023

The Times makes a small deletion: Adam Grant is a good, decent person. According to the New York Times, he's "an organizational psychologist at the Wharton School of the University of Pennsylvania."

Also, he's the author of “Hidden Potential: The Science of Achieving Greater Things,” from which his guest essay in today's Times was adapted.

Grant's essay concerned American public schools. It carried a familiar headline:

What Most American Schools Do Wrong

You can't go wrong with a headline like that! Has any essay bearing some such title ever been rejected for publication in newspapers like the New York Times?

People, we're just asking!

At any rate, Grant has found the latest cure for our public school woes. His essay starts in a familiar fashion:

GRANT (10/24/23): Which country has the best education system? Since 2000, every three years, 15-year-olds in dozens of countries have taken the Program for International Student Assessment [the PISA]—a standardized test of math, reading and science skills. On the inaugural test, which focused on reading, the top country came as a big surprise: tiny Finland. Finnish students claimed victory again in 2003 (when the focus was on math) and 2006 (when it was on science), all while spending about the same time on homework per week as the typical teenager in Shanghai does in a single day.
Just over a decade later, Europe had a new champion. Here, too, it wasn’t one of the usual suspects—not a big, wealthy country like Germany or Britain but the small underdog nation of Estonia. Since that time, experts have been searching for the secrets behind these countries’ educational excellence. They recently found one right here in the United States.

Grant starts in standard fashion, conflating "highest scoring" with "best." This is a very dumb thing to do, but it's done all the time.

Judged by test scores on the PISA, "tiny Finland" started out as the best education system. In the most recent testing, Finland was nosed out by Estonia, a "small underdog nation."

Just for the record, is Finland really tiny and is Estonia small? Just for the record, here are the current populations of the nations we'll be discussing:

Finland: 5.6 million
Estonia: 1.4 million
United States: 333 million

Who's Grant calling tiny? Finland and Estonia are both small, but Estonia is known to be smaller.

With respect to Estonia's "underdog" status, it's almost surely easier to run public schools in small, single-culture nations like Finland and Estonia. As a general matter, it's more challenging in giant nations with a wide array of demographics groups, including several who have been treated very poorly down through the annals of time.

At any rate, the stage has now been set! A tiny nation, then later a small underdog nation, have shown the way on the PISA. As he continues, Grant reveals one of the alleged "secrets behind these countries’ educational excellence:"

GRANT (continuing directly): In North Carolina, economists examined data on several million elementary school students. They discovered a common pattern across about 7,000 classrooms that achieved significant gains in math and reading performance.
Those students didn’t have better teachers. They just happened to have the same teacher at least twice in different grades. A separate team of economists replicated the study with nearly a million elementary and middle schoolers in Indiana—and found the same results.

Intriguing! According to Grant, kids "achieve significant gains in math and reading performance" if they have the same teacher at least twice in different grades—in grades 4 and 5, let's say. According to Grant, some such practice is one of the secrets to that small nation's success.

For what it's worth, it's perfectly plausible that some such practice might produce good results. We once taught a group of Baltimore City kids in grade 5 and then in grade 6, and there's no doubt about it—that additional degree of familiarity can get the ball rolling very quickly in that second year.

On the other hand, people love this type of public school story so much that thumbs sometimes land on the scales. We looked at the first study to which Grant referred, and we found its authors instantly saying this:

Abstract
We provide new empirical evidence that increased student-teacher familiarity improves academic achievement in elementary school. Drawing on rich statewide administrative data, we observe small but significant test score gains for students assigned to the same teacher for a second time in a higher grade...

Once again, there's that little word "small!" As it turns out, it almost looks like Grant, and/or his editors at the Times, chose to make a small deletion!

The authors specifically said that the score gains they observed were "small." With the acquiescence of the Times, it looks like Grant chose to leave that large bit of buzzkill out.

This is very familiar practice. Anthropologically, this seems to be who we are.

We provide one additional point. We decided to look at the most recent PISA results—the results from 2018. The National Center for Education Statistics offers this overview:

PISA 2018 Reading Literacy Results
Reading literacy was the major domain in PISA 2018, as it was in 2000 and 2009. For 2018, the PISA reading literacy framework was updated to reflect the evolution and growing influence of technology. Reading involves not only the printed page but also digital formats. Increasingly, it requires readers to distinguish between fact and opinion, synthesize and interpret texts from multiple sources, and deal with conflicting information across source materials.
[...]
The U.S. average score (505) was higher than the OECD average score (487).
Compared to the 35 other OECD members, the U.S. average in reading literacy was lower than the average in 4 education systems, higher than in 21, and not measurably different than in 10.

It's true that our hopeless American kids were outscored by the kids from a certain "small underdog nation"—a wholly admirable though tiny nation which is about the size of an unlicked postage stamp.

On the other hand, our hapless kids performed on the same level as the kids of other large, demographically complex nations such as the U.K., Germany, and France.

In fact, they outperformed the kids from such recognizable nations as Japan, Australia, Denmark, Norway, Germany and France. They scored one point below New Zealand, one point above the U.K.

Tiny Finland and small Estonia did outperform our U.S. kids on the PISA reading test. That said, only two other OECD nations outscored our nation's loser kids in a way which was judged statistically significant.

Given the challenges our public schools confront, we're often amazed by how well our dullards do on these international tests. We're often puzzled to see that other nations don't outperform our kids.

That said, the Times will always be ready to pick and choose gloomy themes and results. It's a journalistic tradition.

Final point:

As has long been noted, and as you can see at the link we've provided, American kids show up better in reading and science on the PISA, significantly less well in math. There are ongoing disputes about why that is, but no one actually cares about such things, plainly including the New York Times.

Back to the practice of trashing the schools. When researchers said they found small gains, the Times made a small deletion!

16 comments:

AnonymousOctober 24, 2023 at 4:19 PM
I don't really care about the practice of trashing the schools, but (sorry to nitpick) are you sure Estonia is a single-culture nation?

Of course they (infamously) do suppress the culture of their large Russian (and even larger: Russophone) minority, but it's still there nevertheless.
ReplyDelete
Replies
AnonymousOctober 24, 2023 at 4:24 PM
Somerby has his own narrative. It goes like this: (1) the NY Times loves to trash American education and will print anything negative about our schools; (2) our schools should do worse than European countries because of our diversity and troubled racial history; (3) whatever cause is singled out is not going to explain much about why small countries such as Finland and Estonia do especially well; (4) these small countries don't have problems like we do in the USA.

These preconceived views explain a sentence like this one:

"Back to the practice of trashing the schools. When researchers said they found small gains, the Times made a small deletion!"

How is it trashing the schools to call a small gain large? It isn't, but Somerby insists that anyone at the times writing about education will not respect the size of differences generated by proposed causes (such as repeating with the same teacher). Somerby himself has never mentioned "effect size" in all the time he has been writing here. But he is happen to apply his own negativity to whatever study is claiming an effect, as if adjectives were any kind of measure of anything (tiny vs small). No respectable statistician would use the adjective small to assess the magnitude of an effect. There are numbers for that.

We are left with some confusion about Somerby's point. Would it be good for the US to keep kids with the same teachers? According to the studies described, the answer is that it would help a little. No examination of which kids are helped though -- Somerby doesn't actually care about the findings. He cares about knocking the researchers and reports. Is the best way to measure the size of Estonia to use its land mass or its population? Somerby chooses population without looked at a map. How close are the cities in Finland, how big are the schools, are they urban or rural? How does that compare to Estonia? Crickets from Somerby, who as done his work when he has trashed the reporter.

There are several valid criticisms of the PISA test. One important one is that the US schools do not emphasize the aspects of math and science that are tested on the PISA test. There is less congruence between the test questions and the curriculum of American schools. That maters a whole lot more than the things Somerby focuses on. Another question is about which kids take the test in various countries. All of them? A few selected districts in big cities? Does Somerby know?

And then Somerby says:

"We're often puzzled to see that other nations don't outperform our U.S. kids. That said, the Times will always be ready to pick and choose gloomy themes and results."

If Somerby is puzzled to see us outperform other developed nations, isn't he himself being somewhat gloomy? Why would he assume worse performance from our education systems? Why would he think so poorly of our students? He doesn't say, but perhaps it is because we are not tiny?

I dislike Somerby attitude toward schools and his students. His agenda to always knock progress and look for reasons to undercut performance claims is annoying, even offensive to those who work in our schools. The people who Somerby used to call "those ratty teachers with their infernal unions" are dedicated to helping kids improve. Our universities seek ways of better teaching math and reading and science, as well as other topics that never seem to be tested, especially internationally. Why assume they are doing a bad job, as Somerby himself does, but also accuses the NY Times of doing? And Somerby never met an education writer he likes. It is his way of stroking his own ego these days, even when he not only maligns students and school districts (e.g., MS) incompetently and incorrectly, refuses to retract his wrong statements (after Drum and others conceded their mistakes) but blindly and stubbornly continues repeating mistaken criticisms long after they have been debunked.

He is the last person anyone should be listening to for a discussion of American school progress.
ReplyDelete
Replies
AnonymousOctober 24, 2023 at 4:44 PM
You don't typically get large effect sizes in psychological measurement, because of the nature of what is being measured.

Effect size is a statistical way of figuring out how large the actual effect is, independent of the sample size. With very large samples, any difference is magnified so that something trivial that affects the data can appear to be important even when it has no real impact on the phenomenon being studied. A small effect would be one that survives the hypothesis test for noise versus something causal, but the effect size for that significant result tells how much the manipulation affects or explains the variability in the measurement (test scores in this case). A small effect can be important. The word small does not mean unimportant or unreliable.

There are so many factors about children that can affect their test performance on a given day. Some may be sick, some didn't get much sleep the night before, some had no breakfast, some are worried about an even later on, some are worried about the test before, some were absent on the day a topic was taught and had no exposure to it, some don't care about the test or are upset about a fight with a friend, some dislike their teacher and resent that they have to take the test at all, some arrived at school late and feel rushed, some broke their pencil. All of these contributors to a student's performance have a bigger impact on how well each child does, but they are random factors and they contribute to the noise in the data. To overcome that noise, a systematic effect, such as having the same teacher twice, would have to be fairly robust but it must compete with all the other things happening with each child. Psychological factors are complex and there are a lot of them going on all at the same time. It is difficult to control for them under experimental circumstances but in these test results, there is nothing like that being controlled. Only the grade level at which the kids are tested.

So, when Somerby dismisses something as being a small effect, ignoring that it is strong enough to be identified as an effect at all, he is displaying major ignorance and dismissing something that may be a way to help kids do better. There are half-educated people like Somerby who think they are smart when they are negative in the wrong circumstances. Drum does this sometimes, but Somerby has no idea what he is saying when he says the NY Times is cheating by omitting the word small to describe an effect found in a study. When nearly all psychological effects are small, the adjective is not desciptive and leaving it in may confuse readers. Statisticians don't pay any attention to the word at all -- they go for the number and look at the reported effect size. And that number is meaningful because they know what a meaningful effect should be under the circumstances of the study. Somerby is a buffoon.
ReplyDelete
Replies
AnonymousOctober 24, 2023 at 4:57 PM
When you read Grant's actual essay, he says that looping (keeping a child with the same teacher) helps the most with the lower performing students. Somerby does not mention this. He pretends the study showed no strong effects.

In the actual essay, there are other practices used by Estonia and Finland to help every child achieve their potential, rather than focusing attention only on the high achievers. They address deficits early with specialized attention (just as MS started doing). Grant suggests that American schools should do this, not to improve test scores, but to ensure that every child has a chance to succeed. Somerby doesn't tell his readers about that either.

He is too busy pretending that Grant was saying something negative and wrong. I found Grant's essay positive and inspiring, urging Americans toward less competition and more effort to help all children do better. I see no harm in that at all.

I didn't see a link to the looping studies in Somerby's essay or in Grant's. I suspect Somerby had no way to figure out whether that effect size was large or small. Here is a rule of thumb used in psychology (which is the field that tests human cognition and performance):

"Cohen suggested that d = 0.2 be considered a “small” effect size, 0.5 represents a “medium” effect size and 0.8 a “large” effect size. This means that if the difference between two groups” means is less than 0.2 standard deviations, the difference is negligible, even if it is statistically significant."

Cohen's d is a statistical measure with a formula for calculating it. You can find it in any statistics textbook. The values that correspond to small, medium and large tend to vary across fields of research.
ReplyDelete
Replies
AnonymousOctober 24, 2023 at 5:02 PM
Correction, here is the link:

https://www.sciencedirect.com/science/article/abs/pii/S0272775717306635

The effect size (.12 std deviations of improvement) suggests that the effect is small, as reported.
ReplyDelete
Replies
AnonymousOctober 24, 2023 at 7:14 PM
I love small countries. I am Corby.
ReplyDelete
Replies
AnonymousOctober 24, 2023 at 7:40 PM
Over the last 50 years, broadly speaking, the education level of Americans has improved (although gaps still persist among the oppressed cohorts), yet wealth inequality has only increased.
ReplyDelete
Replies
AnonymousOctober 24, 2023 at 8:21 PM
Be stupid, they don't deserve smart citizens if they just rob us blind anyway
ReplyDelete
Replies

Add comment