Tuesday, March 15, 2011

A Moment of Silence

Let us all take a moment to quietly mourn the recent appearance of an actual regression in a New York Times article.

I was shocked at first, but then I realized this poor little testscore regression is playing the part of the villain, which just made me sad. It is a sorry state of affairs when regressions only show up for the express purpose of looking unclear, hopelessly complicated, disconnected from reality. This regression is a mere punching bag in an anecdote-driven story.

Journalists can be mean, testscore regression. I would be friends with you. (Well, maybe. That would depend on a whole bunch of things that weren't mentioned in this article...and virtually nothing that was).

Q-tip: Trevor


Added: Contrary to what the journalist might seem to think, "complicated" does not even begin to suggest "wrong." (The true objective is expected to be complicated!) In any case, in light of his complaints and my previous post, Rules versus objectives, the following occurs to me: It may actually be optimal if this regression is unintelligible to the teacher, who then does not know what particular rules she is being rewarded for following. Even if any set of rules tends to incentivizes suboptimal behavior given full information of those rules, incomplete knowledge of the rules may mean the teacher doesn't have enough knowledge to deviate significantly from pursuing the true objective. When there are many possible sets of rules that all get at the true objective from different angles, the expectation over them will be something in between. It need not be the case, but this "in between" may actually be closer to the true objective than any feasible set of rules!

That is, if you can't figure out how to game the system, you may as well play nice.


  1. Exogenous CombustionMarch 15, 2011 at 10:47 AM

    "It may actually be optimal if this regression is unintelligible to the teacher, who then does not know what particular rules she is being rewarded for following."

    I love this point.

    Based off this point, an interesting thing would be to see if teachers learn about this black box over many quarters or years of teaching. So, ignoring any issue with population composition, we see older teachers being able to game the system more as they, through "trial and error" (or coincidence) figure out what pays off and what doesn't.

    Then again, I might say "how can we expect them to be good Bayesians if they can't even be average frequentists?" But I know better.

  2. EC, could we see something like this:

    Imagine a 1-dimensional objective, a point on an interval. Each rule regime incentivizes teachers to target some particular "rule point" on that interval, perhaps to the left or right of the true objective. There is some distribution over possible rule regimes, let's say centered around the true objective. Also, suppose there's a designated "slack point," i.e. how teachers like to behave if they just don't care about the rules or objectives at all. (e.g. you might see a tenured teacher targeting the slack point)

    Draw two rule regimes out of a hat. Impose the first one for a number of years.

    Prior to tenure, a teacher improves as measured by rule regime 1. Call this pre-tenure teacher who does well under regime 1 a type-A teacher. Then tenure hits. Teacher's performance drops, consistent with two popular stories: a type-B teacher isn't being incentivized to do a good job and so slacks off (targets the slack point), while a type-C teacher actually is a benevolent soul who understands well the true objective and, with tenure in hand, is finally free to be a good teacher rather than chasing some misguided set of rules. Can we distinguish between these stories?

    Well suppose we switch rule regimes. Assuming the distribution is centered around the true objective, then we expect the new rule point to be closer to the true objective than it is to the old rule point or the slack point. Someone who is targeting the true objective, then, is likely to do better than someone who is targeting the old rule point or the slack point. In expectation the change should decrease type-A and type-B teacher's measured performance while keeping type-C teachers' measured performance the same. In addition, the difference in the decreases for A versus B gives some measure of how much the old rules actually bought us in terms of teacher performance, versus just letting them slack. (Slack is an overcharged word here; the introduction of these rules may have been good or bad)

    NB: we aren't even talking about an actual physical experiment here. The post-change behavior is completely unnecessary. You could just generate a bunch of different measures and see how teachers would do under each.