### A wide range of scores on those Naep reading tests!

MONDAY, JUNE 5, 2023

For those who are playing at home: For those who are playing at home, we'll offer some basic information concerning the wide range of scores fourth graders achieve on the National Assessment of Educational Progress (Naep).

Despite an impression which is often conveyed, fourth graders are not all alike! Here are a set of "percentile" scores from last year's Naep reading test:

U.S. public schools
90th percentile: 264.92
75th percentile: 245.23
50th percentile: 220.14
25th percentile: 189.89
10th percentile: 160.47

By way of a quick explainer, here's what those numbers mean:

Ten percent of last year's fourth graders scored at or above 264.92—the 90th percentile score. At the other end of a long, dusty road, ten percent of last year's fourth graders scored at or below 160.47.

A gigantic "achievement gap" is suggested by the difference between those two scores. Having said that, a giant achievement gap is also suggested by the difference between the 25th and 75th percentile scores.

We often think that the nation's fourth graders are pretty much all alike. This impression is often conveyed by low-information journalists and public officials, who may (for example) suggest that six weeks of summer school will inevitably bring Failing Students X, Y and Z right up to "grade level" for the coming year.

Fourth graders are not all alike. On their face, the gaps don't seem to be quite as wide in Grade 4 math:

U.S. public schools
90th percentile: 276.89
75th percentile: 258.22
50th percentile: 236.49
25th percentile: 212.66
10th percentile: 190.16

Those gaps don't seem to be quite as wide. Judged by normal rules of thumb, those gaps are still enormous.

Now this:

If a state makes ten percent of its third graders repeat third grade, that lowest scoring ten percent of kids are eliminated from the state's overall average Grade 4 score in the following year. They enter the testing pool the year after that with five years of graded instruction under their belts (Grades 1, 2, 3, 3 again and 4) instead of the usual four.

A statistical advantage is thereby gained. Also, it starts becoming hard to make valid state-to-state comparisons.

For the record, the Naep makes mountains of such data available to the American public. These data are extremely instructive, but of one thing you can feel certain:

Journalists never consult them. Instead, they churn Storyline.

The scores in Mississippi: Why not? Here's one corresponding set of scores from Mississippi last year:

Mississippi public schools
90th percentile: 260.60
75th percentile: 243.10
50th percentile: 221.33
25th percentile: 193.84
10th percentile: 166.98

Overall, Mississippi outperformed the nation on this test, by a single point. That's before statistical adjustment for low income and other demographic factors.

Warning!  Nine percent of the fourth graders who took that test in Mississippi had the presumptive advantage of that additional year of instruction.

To what extent did that skew statewide scores? We know of no sound way to tell.

1. Oh, good. More naep numbers. So important for the kids.

1. It’s important to know if the kids are learning.

2. No it's not.

2. Somerby today makes a point about how little 4th grade reading scores are instructive, yet weirdly also says the data are extremely instructive, and either way, then criticizes journalists for never consulting while also criticizing journalists for using them.

I see.

3. Citi Bike Karen lied and is factually guilty of attempting to bully a kid off a bike.

4. "Also, it starts becoming hard to make valid state-to-state comparisons."

But is that the purpose of the NAEP test? There are so many differences between the way states administer and teach, differences in curriculum, differences in the characteristics of the kids (extent of poverty, presence of immigrants, emphasis on education, funding of schools) that state-to-state comparisons are specious.

The AP article was about improvement in scores within one state over a decade, which was related to a dramatic change in the training of teachers, funding of schools, expertise of reading specialists and selection of a new program used to teach reading, mandated by state law. Such a change in how reading is taught should produce improvement and unsurprisingly it did. Somerby, for some undisclosed reason, objects to that report and considers the claimed improvement (measured by NAEP to be overblown. He is grasping at disproven alternative explanations for the changes because he does not wish to admit that improved teaching resulted in better reading scores.

New MS is doing better than old MS -- that is the NAEP message, not some comparison to other states, which as Somerby himself notes, is problematic. But we have strayed a long way from the original AP article now. Surely Somerby is not suggesting that the retention strategy be abandoned in order to produce more accurate NAEP scores for comparison with other states? That is indeed focusing on the wrong thing, as mh has pointed out in the other thread. NAEP is mostly of concern to national education experts, not to individual students and not to local schools, who largely use other assessment tools to help students. Somerby knows this, or he should know it. Why is he working overtime to deride the improvement in MS? How will that help anyone, especially the kids?

1. And, oddly, Somerby introduced his discussion of Mississippi in the context of (unrelated) school cheating scandals, as if to imply there was something similarly fishy about Mississippi’s naep scores. I find that odd.

5. “ Nine percent of the fourth graders who took that test in Mississippi had the presumptive advantage of that additional year of instruction. “

That was the percentage held back the first year, which was 9 years ago. Is this percentage still at 9%?

1. Probably not.

2. I calculate a 9% chance that the current percent held back is still 9%.

One of the things you learn in higher levels of study regarding statistics is the irony of statistics.

3. Statistics is how you measure variability, in any field.

4. In a sense, almost any analysis of any type is measuring variability.

There aren’t many things in the universe without the trait of variability.

One could do a statistical analysis of *how often* these notions occur.

5. Elementary particles don't change.

6. Not as clever as you think, because you’re wrong.

7. They can decay, but that’s really just a collision.

All electrons are the same. All photons are the same. And they’re the same as they were sixteen billion years ago.

8. Electrons on earth today are the same as electrons everywhere in the universe, ever since the big bang.

6. Why does Somerby think anyone is unaware of human variability, among children or adults? It is the overriding fact of the study of human behavior -- well known since Galton, who invented statistical methods of dealing with variability in order to measure and study human diversity.

People vary -- duh!

1. Somerby routinely dispenses with anything that might counter his manufacturing of ignorance.

7. I would like to see some science questions which are easier for blacks and hispanics, harder for whites and asians.

1. I don’t think there are any science questions easier for black and hispanic students than for white and asian students.

2. Ask Neil Degrasse Tyson whether he agrees with that, or Ronald McNair.

Why would there be questions only black kids can answer on a test that screens them out? The main reason they are at a disadvantage is because they have not had access to the enriched science curriculum at certain middle schools but not at the majority-minority schools attended by black and Hispanic students.

3. So we can’t increase black and hispanic acceptance rates by putting questions in the test that are easier for them.

4. The better approach is to use a test of material that all the kids have had a previous chance to learn. In the absence of that, it is better to select kids using some other measure, such as aptitute (instead of crystallized knowledge), motivation, interest, and ability to learn in a new situation that NO kids from any group have previously encountered, something not in the curriculum and not dependent on prior learning opportunities. I would favor a group activity that involves working together on an entirely novel problem or project, since science is a collective activity that requires both problem solving skills and ability to cooperate with others. It would also show leadership ability. Prior practice at such an activity could be provided to bring all kids up to speed with what is wanted in the task.

5. collaborative, that's the word I was looking for

6. CA has done away with high stakes testing, yet continues to be the 5th largest economy in the world (the US being 1st, in large part due to CA).

Also the weather is unbeatable, and the laid back disposition of its citizens make it a relatively pleasant place to live.

I don’t recommend individuals move to CA, to avoid overcrowding, but if you’re a business, CA is hard to beat with a highly skilled workforce and a fun lifestyle.

7. The only thing easier for “blacks and Hispanics” is experiencing oppression.

8. Asians are oppressed, too.

9. Not to the same degree.

10. Asians outscore whites.

8. Lots of Grand Jury Action this week, no wonder Bob is planning to lay low.

9. Want kids to read? Teach them phonics.

10. The Bible is unsuitable for elementary and middle school kids.

https://apnews.com/article/book-ban-school-library-bible-fc025c8ccf30e955aaf0b0ee1899608a

1. The Bible is unsuitable for anyone any age, as has been the case for centuries.

Aside from endorsing slavery, the Bible details horrible violence and perversions. Worse yet, there are some among us who fail to realize the Bible is a mix of legend and myth, it’s fiction. If read and taught seriously, especially to those in their formative years, it has the potential to cause trauma and warp youthful minds that are still developing.

11. I would like to know whether the "average" being discussed is mean or a median. Holding kids back probably has no impact on the median, because the kids who were held back will likely still be below average a year later.

1. Try this with a set of 11 numbers with a median of 6:

1 2 3 4 5 ** median 6 ** 7 8 9 10 11

If you add 2 new scores to that set of numbers, both of them 3, look what happens to the median:

1 2 3 3 3 4 ** median 5 ** 6 7 8 9 10 11

So, yes, when you add numbers to the lower end of the distribution, the median can and does change, becoming lower than it was before.

What if you keep the set of 11 numbers, as before, but instead replace a couple of the low numbers with the number 3? Then the median does not change, as long as the numbers being replaced are also below the median.

With something like the NAEP, unless the number of students being tested stays the same, replacing low numbers with low numbers will not change the median, but the median itself will be determined by which score is exactly in the middle, with half of the scores higher and half lower (or the average of the two middle scores if there are an even number of scores in the distribution). Because all of the scores will be different with a new group of kids being tested, who knows what the median will be. It depends on the kids in the new group. It could be something like:

1 3 3 3 6 ** median 9 ** 10 11 11 12 15

It doesn't have to consist of the same scores as in the previous NAEP group tested before.

Note that NAEP is not given every year, so a child who is held back is only going to be taking NAEP one time, in 4th grade, and other kids held back in the preceding and following years will not take it. No child is taking the NAEP exam twice, so there is no practice effect on NAEP, even if there is one on their 3rd grade reading test (unless they develop different forms of it to account for retesting).

12. a most riveting blog post!

this reminds me of a post by the great Fanny Mann-Fannimann on her blog.

test scores are low.

1. Men gave her fanny high scores.