February 19, 2010

Tests in the modern college -- advances from 3000 BC

While grading a pile of exams that consisted of short answers and short essays, I thought, "Is this really the most cost-effective way to do tests?" It seems so backward for a TA to skim over ink-on-paper responses in a blue book, making a subjective judgment call on every answer, writing sub-scores in the margin, and then tallying those all up mentally to arrive at their final score. I imagine that Ancient Egyptian priests could have given and scored tests this way. Have we not developed anything more innovative within the past 5000 years?

The underlying problem is that test-makers and test-takers are not buyers and sellers in a market for educational services. College students don't pay anything out-of-pocket; it's like health care where it's almost all paid for by third parties -- the government, private foundations, their parents, etc. Students therefore have zero incentive to pay attention to what they're getting and ask whether it's worth what they're plunking down for it. Rather, our system encourages them to be slackers -- i mean whatever, i'm not paying for this crap anyway. Most people don't let their cars turn to shit because they've paid for the car itself and keep making out-of-pocket payments for gas, maintenance, and so on. Stripping away the insulation that students have from the price of education would be a good start.

But it's not just the students who are protected from market forces; the teachers, too, don't have any incentive to figure out what the most cost-effective way to teach is. Tenure isn't the source of the problem, since doctors aren't tenured at hospitals but are similarly insulated from the costs of their services. Rather, because the students don't really care whether they're getting the most bang for their buck, they provide very low-quality feedback about how well the teacher is serving his customers -- some under-the-breath griping, or perhaps a negative evaluation. But because they aren't paying, they can't use the ultimate discipline of withdrawing their monetary support from a terrible teacher, spending it instead on a superior teacher who would drive the first one out of business.

It's even worse, of course, because a lot of decisions that the professor makes have huge externalities -- namely, upon the TA. Should I write an exam that's short answer or multiple choice on a scantron? Well, short answer gives a richer depth, so we'll go with that. The costs and benefits of either approach won't go to the professor but to the grader, so we expect the professor to pay no attention to discriminating between the two based on merit but rather to express their worldview about How Tests Should Be Written. It's like voting -- if your vote fucks things up, you won't pay much of the cost, so why not use your vote to tell the world something about your unique lifestyle.

My basic take from having graded lots of papers and exams over the years is that you can boil most short answer questions into a multiple choice question and use a scantron. It takes slightly longer to think up a multiple choice question compared to "Write a short answer about X," but the savings you get from time and effort in scoring the responses is immense. How costly is it to feed the damn things into their machine? Do you lose a fuller richness of the student's thoughts? In most cases, there is no richness there at all -- if you've ever graded stuff for a large intro class, you know what I mean. For the minority of cases, you do lose something, but we always have to ask if that loss is compensated for by the huge gain in simplifying the grading.

For that, we should look to some kind of test-maker that operates under competitive market pressures, so that they've taken into account how cost-effective different approaches are, and so that the test-takers are paying most of the price themselves (or are a lot closer to this than with college). There are roughly two industries to look at: private tutor providers and standardized test sellers. I worked for two and a half years as a private tutor, both at a center and as a contractor, and most of the tests given in these settings (if at all) are short answer. This only works because the teacher-to-student ratio is so low that the teacher has quite a bit of free time to read over handwritten responses.

But what about when tens of thousands of students want to take a test? Well, you could hire tens of thousands of scorers, and it would be just like the tutoring center -- but that would cost a boatload of dough. Every one of these tests instead makes use of multiple choice, whether it's an aptitude test like the SAT, ACT, or GRE; something more like an IQ test like the Miller Analogies test; or even test on knowledge of specific subjects like the SAT II, AP, and GRE subject tests, not to mention the many tests for entry into a profession (law, law enforcement, military, etc.).

We really should think of tests as something that the test-makers sell, not to test-takers, but to evaluators of the test-takers -- potential employers, college admissions boards, etc. It's like a rating agency -- "We think this student is totally qualified to be a fireman," or "We think this student is dumb as rocks." If these ratings were not fairly reliable and predictive of real-world outcomes -- success in fighting fires, likelihood of flunking out during freshman year, or whatever -- then the evaluators would not "buy" them. They would require the test-takers to go take some other test that was more reliable and predictive.

There is obvious grade inflation of all of these tests, but still they work pretty well. That tells us that, because they've survived the competitive market pressures that their makers are under, the mostly multiple choice test is the best for the job. Sure, there are some parts that still require handwritten responses, namely where you can't split the question into several multiple choice ones, but are instead looking for a sustained chain of reasoning on a single topic. So the AP Calculus test has a handwritten section with word problems, and the GRE has two short analytic writing essays. But even the SAT II subject tests are generally all multiple choice -- including the math tests.

Sure, you're losing some of the richness of the student's knowledge by only allowing them to fill in this bubble or that one, but evidently you don't lose that much, and it's more than made up for by how simple it makes the scoring process. If this really did screw up the picture of who knew how much about what, then the SAT II math tests would place the 2nd grade drop-out on top, the numbers prodigy at the bottom, and so on. Mistakes may be made, but they must be trivial.

So why don't most college courses use multiple choice scantrons for big tests, maybe supplemented with an extended-response part? I remember taking those all the time in middle and high school -- where the teachers graded their own tests and therefore paid the costs and enjoyed the benefits of their decision about test format. How did I wind up back in the stone age of testing? It's simply because even though there's little market discipline in secondary school, the key actors are even more insulated from it in higher ed.

Finally, note how dearly professors cling to standardized multiple choice scantron tests when they're looking through graduate applications. How dare they look at some summary statistic like GRE scores and letter grades (or the still more compressed GPA) -- I mean, they're losing the full richness of the student's undergrad experiences. They should have to watch a 4 year-long videotape that captures the entire picture of the student's academic experience, both inside and outside of formal classroom instruction. If that's infeasible, they should at least have to review every homework assignment and exam the student took, in order to fully appreciate the vibrant detail that gets lost in crude letter grades and test scores.

Rather than indulge in feel-good talk about How Applications Should Be Made, they skip right to whatever's reasonably cost-effective. These are the same people who bloviate about how giving scantron tests as the default practice will lead us down a slippery slope toward a new Dark Age. So what needs to be done is to make them face the consequences of their decisions, which necessarily encourages prudent thinking. Have students pay for more out-of-pocket, even if it's still borrowed from a student loan group. And have the test-makers and test-graders be the same person, even if that's not the professor but the TA -- just not different individuals.


  1. university student2/19/10, 1:16 PM

    In my American public university's non-intro computer science courses, most exams are written and graded by the professors. Most homework and projects are graded by TAs, however, and problems arise when the TA accepts answers that the professor would not.

    Bad/unliked professors will have a difficult time getting graduate students to work with them, and won't have many students enroll in their elective courses.

  2. You sound like HG Wells- 'for all it's problems, the multiple choice test is the best-'
    From his 'Experiment in Autobiography'. From old memory.


  3. "Stripping away the insulation that students have from the price of education would be a good start."

    Related anecdote:

    My husband and his siblings were each given the same amount for college to spend as they wished. My husband spent all of his money going to a very selective out of state private college with a top rated math program. His sibs "saved" money by going to State U and had $$$ left over(to squander) when they graduated. My husband was recruited by a large multinational corp straight out of college to do mathematical modeling of their processes and has easily made twice as much money as his sibs combined. You get what you pay for. So even when young people get to make the decision and control the $$$, the smart ones make better choices.


You MUST enter a nickname with the "Name/URL" option if you're not signed in. We can't follow who is saying what if everyone is "Anonymous."