Fractal Exams
posted by Dave Hoffman
[Yes, it's the time of year when our readers are going to be bombarded with a stream of grading-related posts. Expect more in the next few days.]
Wikipedia defines a fractal as:
A fractal is generally “a rough or fragmented geometric shape that can be split into parts, each of which is (at least approximately) a reduced-size copy of the whole,”[1] a property called self-similarity. The term was coined by Benoît Mandelbrot in 1975 and was derived from the Latin fractus meaning “broken” or “fractured.” A mathematical fractal is based on an equation that undergoes iteration, a form of feedback based on recursion.
It strikes me that most exams look like fractals. If you grade any unit of any exam, the impression you get (translated into a grade) will be fairly representative of any other part, and of course the whole. Obviously, as the sample size gets tiny, correlation to the final grade will decrease. But let’s say you could grade a randomly selected 30% of a given exam. Wouldn’t that be a very fair proxy for the rest (fair defined: a correlation of .8 or higher)? Newcomers to faculties often hear this described as the first-paragraph heuristic: you can get a robust sense of an exam’s final grade just by reading one paragraph. Tempting.
Has anyone has done any empirical work on this, either using their own students’ exams or a freestanding dataset?
(Image Source: Flames on a Mandelbrot set, Wikicommons)
December 17, 2008 at 6:58 pm
Posted in: Law School
Print This Post










Responses (15)
Jeff Lipshaw - December 17, 2008 at 7:36 pm
One of my exams this term had 63 takers. 60 points were MC; 120 points were essay.
The correlation between essay scores and MC scores was .36.
The six essay questions had the following numbers of points allocated, with the associated correlation to the total essay portion:
10 .36
10 .42
40 .81
15 .62
25 .55
20 .53
Hope this helps!
Jeff Lipshaw - December 17, 2008 at 7:39 pm
By the way, I total up my exam scores on an Excel spread sheet.
By typing =CORREL(array 1, array 2) into a cell you can correlate anything you want. I had already done the MC/essay correlation, but I did the six question correlations to the total essay in about five minutes.
TRE - December 17, 2008 at 8:24 pm
What if it turned out that your impression of their handwriting or a single sentence correlated .8 with grading the whole thing?
The answer is no, it wouldn’t be fair to grade only a randomly selected 30% of the test. Would you think it was fair if your academic output was judged on the basis of say half a page of your work?
dave hoffman - December 17, 2008 at 8:28 pm
TRE: Fair, assuredly not. Also, pretty irritating for students, who’d rightly think that they weren’t getting the service they’d bargained for. It’s a thought experiment.
Jeff: Interesting data. Others, keep it coming.
DirtyLittleSecret - December 17, 2008 at 9:30 pm
Fascinating observation, which illuminates why exam-intensive evaluation of students is problematic….and why alternatives may be untenable, too.
I have had this experience with my exams, and I’d say it’s writing style, memory of case names, and organization that are the key heuristics. Good writing usually indicates command of the material.
I worry, though, that a) these are scarcely all the skills a lawyer needs and b) someone who’s bad at any of them must feel a repeated sense of falling short if they end up with bad grades again and again.
Has anyone studied the degree to which b) happens? If it is common, is there any alternative for students like that–for example, making more advocacy or skills courses where they can explore other skill sets than classic exam-writing?
Ethan Leib - December 17, 2008 at 11:32 pm
I use a combo of issue-spotting essays, MC, and take-home writing. I find only strong correlations at the very bottom of the heap — and quite often at the very top. But the A- to B- students jump around on these three different testing methods a lot, making me feel better about having the grade be based on a combo of the different components.
Jason Wojciechowski - December 18, 2008 at 12:31 am
Wouldn’t a very strong correlation indicate that the exam is not testing a wide enough variety of skills and knowledge? Expanding on this:
Prof. Leib’s (I’m going with last names from now on after the last time I participated in a thread here!) results sound like the ones you’d want: very good students are the ones who are good at everything; very bad ones are bad at everything. Everyone else has strengths in some areas and weaknesses in others.
A strong correlation on one part of an exam to the whole, when that exam evaluates a variety of skills and knowledge, would indicate that almost everyone has uniform abilities across all areas. While either of these models of student abilities is possible, the former seems more in line with the way we understand human capabilities.
Bruce Boyden - December 18, 2008 at 12:48 am
If exams are fractals, that means that if you delve deep enough into them, they are infinitely long. That sounds about right.
Todd Brown - December 18, 2008 at 12:55 am
Regardless of whether it is fair, this sort of off-the-cuff assessment is made regularly in other contexts. Some hiring personnel only review the first page or two of a writing sample. Lawyers and professors make snap judgments about subordinates and students based on limited exposure, even when that exposure may have been tainted by the lawyers’ or professors’ own shortcomings and the resulting assumptions are based on nothing but speculation. I may have blown off this subordinate’s requests for clarification and guidance, repeatedly treated them like they were not worth my time, and been wholly dismissive of anything they had to say; but …but I digress…
My own limited experience with grading (both in my current position and teaching high school students) is that some students show sufficient variation from one question to the next that this should remain a forbidden temptation. Like the people who take them, exams are not always so readily reduced to a mere fraction from which we can consistently extrapolate a true reflection of the whole.
A Voice of Sanity - December 18, 2008 at 3:53 am
See Adaptive Testing (LINK) for something very similar.
Brian Kalt - December 18, 2008 at 7:46 am
Jeff,
I run similar numbers, but instead of calculating the correlation of each question to the total score, I correlate it to the total score minus that question. Instead of correlations of 89%, 86%, and 73% (when correlated with the total), I got 69%, 67%, and 49%. I suspect that with six questions, the difference would be smaller for you, but I think it is important to do, so you don’t have the number inflated by the question score correlating with itself.
Along the lines of what Jason said, those correlations are right about where I want them. Too low and the reliability of the questions is suspect; too high and they don’t add much distinction.
As for the underlying question, I can easily avoid the temptation to “get fractal,” because even if the correlation between the random sample and the rest were 90%, relying on the sample would still mean that plenty of students would be getting grades that are unacceptably inaccurate. Those one or two thirds-of-a-grade increments at the margins matter a lot. And a handful are usually off by a lot more than that.
TRE - December 18, 2008 at 10:14 am
Why not just use all MC exams? They could be graded by computer. Surely that would be a temptation worth taking.
krs - December 18, 2008 at 11:07 am
I don’t know if Prof. Solove ran statistical analyses on his grades after he finished, but he also seems to be very thoughtful about this subject.
http://www.concurringopinions.com/archives/2006/12/a_guide_to_grad.html
Jeff Lipshaw - December 18, 2008 at 12:32 pm
Brian, good point. When I re-ran the correlations using your method, the results changed as follows:
10 .36 to .25
10 .42 to .30
40 .81 to .43
15 .62 to .47
25 .55 to .26
20 .53 to .27
I should also note that I grade each question separately. While it’s not perfect, I’m trying to give each student a fresh start on each question, without being biased by the previous answer.
Jeff Lipshaw - December 18, 2008 at 1:04 pm
FWIW, when I add in the multiple choice score to the previous calculation the numbers look like this (i.e., correlation of each essay question to the total exam – essay plus MC – minus the question itself):
10 .36 to .25 to .18
10 .42 to .30 to .31
40 .81 to .43 to .47
15 .62 to .47 to .40
25 .55 to .26 to .26
20 .53 to .27 to .35
(If you haven’t guessed, I am in a major procrastination mode here.)
Leave a Reply