Posted by: maroonmaurader | March 3, 2011

Graduation Rates and Statistics

I happened to bump into a well-reasoned rebuttal by David Burge to Paul Krugman’s op-ed warning of potentially catastrophic consequences to limiting the collective-bargaining rights of public education employees. (I’m particularly a fan of Burge’s turn of phrase summarizing the piece, “the point being, I suppose, is that unionized teachers stand as a thin chalk-stained line keeping Wisconsin from descending into the dystopian non-union educational hellscape of Texas.”).

Burge then goes on to make the test-scores data for Texas and Wisconsin into a fairly clear illustration of the dangers of Simpson’s Paradox to statistics gathered from unequal sample groups (although he doesn’t explicitly make the connection). Basically, whites score higher on test scores in TX than WI; African Americans score higher in TX than WI; Hispanics score higher in TX than WI; the average test score in WI is nonetheless higher than in TX in large part because the population is more uniformly white in WI.

That also launched me onto a related hunt which raises yet another dilemma: the discrepancy between event dropout rates and cohort graduation rates.

To briefly summarize each:

The event dropout rate is arrived at by measuring the number of students in grades 9-12 in a given year, and seeing what fraction of them drop out during that year.

The cohort graduation rate is arrived at by tracking a group of incoming 9th-graders and seeing how many of them complete high school within the next 4 years.

With that covered, here’s the dilemma. Typical numbers for “event dropout rate” are in the 4-5% range (it varies by state, ethnicity, and a whole host of other factors, but that’s about average). Typical numbers for “graduation” rate hover around 70%. If about 5% of students drop out in a given year, then about 95% stay enrolled per year, so the amount who should graduate is (0.95^4) = ~80%. Which in turn raises the question of what happened to the other 10% of students – the ones who didn’t drop out (we think), and didn’t graduate (again, we think).

The first possible explanation would be mathematical. If our event dropout rate was not uniform over the four grades, that could explain it. Generally, a higher early dropout rate would make our expected graduation rate be too low, while a higher late dropout rate makes expected graduation rate be too high.*

The second possible explanation could be temporal. If dropout rates had plummeted just before the study began(or went through some odd cyclic pattern and happened to be in a low-year in 2006), this could explain it as well. But to make a significant difference, event dropout rates would have literally had to be cut in half from one year to the next; such an educational miracle would have attracted significantly more attention (and, in fact, you’re free to check older dropout rate data as well; it stays in the same general range for immediately preceding years).

Digging around on the internet, I’ve found literally dozens of sources acknowledging that cohort graduation rates are much lower than would be suggested by event dropout. I found none explaining why (maybe it’s just obvious to everyone in the field), but my best guess is that it has to do with how dropout is defined. A student who goes back to repeat 10th grade can hardly be said to have “dropped out,” but at the same time they will not be counted as a graduate in the cohort graduation rate (which is determined by those who graduate within 4 years).

It may not be a neat mathematical “paradox” like the relative test scores in WI and TX, but it provides a nice illustration of the risks in recklessly mixing apparently equivalent statistics which are derived from distinct sources.

To make things even more confusing, there are literally dozens of different definitions of what “drop out” means, as well as a third common rate cited when considering graduation rates.

*The math is straightforward but tedious. Simply set up a system of four variables (a, b, c, d) representing the event dropout rate for each grade. Note that the population P of students in each grade is a function of the dropout rate for the previous grade – P(10) = P(9)*(1-a), P(11) = P(10)*(1-b), so forth. The overall event dropout rate is the population-weighted average of a, b, c, and d. It is then straightforward to show that if a > b > c > d, using the “average” event dropout will overestimate total dropouts.



  1. Question: Following a cohort group of students from 9th to 12th grade… Does that mean (1) these 100 students began h.s. in Fall ’07, how many are graduating in June ’11? or (2) 1,000 freshmen began in this district in Fall ’07, there are now 800 seniors graduating the district in June ’11.

    What I’d like to know is: How does this data reflect students who have moved out of the district (out of the school) or who have died or who have tested out (for h.s. equivalency diploma) at 16? I’m figuring students who are incarcerated or medically indisposed are counted as “dropouts.”

    I apologize if I’ve misunderstood the math.

    • As far as I can tell, every federal agency, state, district, and academic researcher has their own unique measuring stick, many of which go by the same names but are not quite identical.

      That said, here’s a sample formula for a cohort graduation rate (specifically, from a letter the Pennsylvania Department of Education put together to explain their transition to cohort graduation):

      “4-Year Cohort Graduation Rate:
      (Number of on-time graduates in 2010) / [(Number of first-time entering 9th grade students in 2006) + (Number of transfers to the class of 2010) – (# of transfers out of the class of 2010)] x 100”

      That is, the total pool of “potential” graduates doesn’t include students who transferred out at some point, but does include students who transferred in.

      Most of the approaches I’ve bumped into for getting a cohort graduation stat (1) do count students who graduate early, (2) do not count students who take more than four years to graduate, and (3) do not count students who get a GED or any other sort of accreditation than a HS diploma.

  2. This is a truly great piece of statistical analysis. If only my probability course in college had been this well stated.

    Nice work

    • I would also like to voice my concurrence with Nick and Miles. I enjoyed this post immensely. We need more “fun with statistics” here at the Lure.

  3. Nick’s is higher praise because he actually knows what he’s talking about, but I second it: yours, and IowaHawk’s, were both superb.

  4. Point of clarification: when you estimate total dropouts by using the average event dropout do you figure out P(12) by multiplying P(11) by (1 – [average dropout]), P(11)=P(10)*(1-[average dropout]) etc.? In other words, does the number by which you multiply each population remain constant when you’re using the average event dropout? I am ALMOST CERTAIN it does, but I do not trust myself on anything related to numbers, so I thought I’d double check.

    • I think you’re on target there, although there’s no real point to guessing at P(12) and such when using the average event dropout rate – you only need to know individual class sizes if you’re trying to use individual class’ dropout rates.

  5. Incidentally, Burges put up a follow-up post to his earlier piece:

    I don’t know that it really has that much new to add, but it goes into a little more depth on some points that people challenged him on.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: