I happened to bump into a well-reasoned rebuttal by David Burge to Paul Krugman’s op-ed warning of potentially catastrophic consequences to limiting the collective-bargaining rights of public education employees. (I’m particularly a fan of Burge’s turn of phrase summarizing the piece, “the point being, I suppose, is that unionized teachers stand as a thin chalk-stained line keeping Wisconsin from descending into the dystopian non-union educational hellscape of Texas.”).
Burge then goes on to make the test-scores data for Texas and Wisconsin into a fairly clear illustration of the dangers of Simpson’s Paradox to statistics gathered from unequal sample groups (although he doesn’t explicitly make the connection). Basically, whites score higher on test scores in TX than WI; African Americans score higher in TX than WI; Hispanics score higher in TX than WI; the average test score in WI is nonetheless higher than in TX in large part because the population is more uniformly white in WI.
That also launched me onto a related hunt which raises yet another dilemma: the discrepancy between event dropout rates and cohort graduation rates.
To briefly summarize each:
The event dropout rate is arrived at by measuring the number of students in grades 9-12 in a given year, and seeing what fraction of them drop out during that year.
The cohort graduation rate is arrived at by tracking a group of incoming 9th-graders and seeing how many of them complete high school within the next 4 years.
With that covered, here’s the dilemma. Typical numbers for “event dropout rate” are in the 4-5% range (it varies by state, ethnicity, and a whole host of other factors, but that’s about average). Typical numbers for “graduation” rate hover around 70%. If about 5% of students drop out in a given year, then about 95% stay enrolled per year, so the amount who should graduate is (0.95^4) = ~80%. Which in turn raises the question of what happened to the other 10% of students – the ones who didn’t drop out (we think), and didn’t graduate (again, we think).
The first possible explanation would be mathematical. If our event dropout rate was not uniform over the four grades, that could explain it. Generally, a higher early dropout rate would make our expected graduation rate be too low, while a higher late dropout rate makes expected graduation rate be too high.*
The second possible explanation could be temporal. If dropout rates had plummeted just before the study began(or went through some odd cyclic pattern and happened to be in a low-year in 2006), this could explain it as well. But to make a significant difference, event dropout rates would have literally had to be cut in half from one year to the next; such an educational miracle would have attracted significantly more attention (and, in fact, you’re free to check older dropout rate data as well; it stays in the same general range for immediately preceding years).
Digging around on the internet, I’ve found literally dozens of sources acknowledging that cohort graduation rates are much lower than would be suggested by event dropout. I found none explaining why (maybe it’s just obvious to everyone in the field), but my best guess is that it has to do with how dropout is defined. A student who goes back to repeat 10th grade can hardly be said to have “dropped out,” but at the same time they will not be counted as a graduate in the cohort graduation rate (which is determined by those who graduate within 4 years).
It may not be a neat mathematical “paradox” like the relative test scores in WI and TX, but it provides a nice illustration of the risks in recklessly mixing apparently equivalent statistics which are derived from distinct sources.
To make things even more confusing, there are literally dozens of different definitions of what “drop out” means, as well as a third common rate cited when considering graduation rates.
*The math is straightforward but tedious. Simply set up a system of four variables (a, b, c, d) representing the event dropout rate for each grade. Note that the population P of students in each grade is a function of the dropout rate for the previous grade – P(10) = P(9)*(1-a), P(11) = P(10)*(1-b), so forth. The overall event dropout rate is the population-weighted average of a, b, c, and d. It is then straightforward to show that if a > b > c > d, using the “average” event dropout will overestimate total dropouts.