Grading Postmortem

I just finished the followup of grading a programming/software engineering assignment with a mostly automated toolkit. The goal was to have “next class” turnaround. It wasn’t quite same day but it was definitely within 20 hours, for the bulk of people. Some of the problem was that Blackboard is terrible. Just terrible. It refused to upload marks and feedback for people who had had multiple submissions and then sent me into a hellscape to try to enter them manually. So there were some upload errors (2 people didn’t have any feedback and a 1 had the wrong feedback due to cut and paste fail). Out of 49 submissions, I had 17 people report a problem or request a regrade. Of those 5 resulted in a change of mark for a total of 19 marks added (the total possible for each was 10 per assignment so 490 total ; 170 were originally given thus 10% of “rightful” marks went missing and needed a manual update; one of these was due to a rouge extra submission after the deadline that was the wrong one to grade for 3 points, so 8.5 missing marks were due to grader bugs).

Now the people with wrong marks generally got “0” often when it was obvious that they shouldn’t have. This was because their program would either crash in a new way or return a really unexpected result. In the later case, since we try to parse the program out put, we’d through an expected exception for that odd output. In both scenarios, this unexpected scenario would crash the grader before it wrote any feedback. Missing feedback was inferred to be an “upload” problem so the students got 0 and an unhelpful error message.

These were stupidly hard bugs to track down! But they point to a couple of holes in our robustness and test isolation approach (we’re generally pretty good on that). In general, I’d like to review the 0s before uploading to confirm but the tight time frame was just too much. It was a tradeoff between the real anxiety, pain, and confusions some students would feel at getting an erroneous 0, and delaying feedback. It’d have been great if I could have turned around the corrections more quickly, but I have only so much time and energy. All students who filed an issue got a resolution by the subsequent Monday evening at the latest. So, two full days with correct feedback before the next assignment. Obviously, quicker is always better, but this isn’t unreasonable.

At least two people were misled by the feedback which basically said “You are missing this file” when it should have said “You are missing at least one of this file or that directory.” Oops! That was mostly work for me than anything else.

In the same day lab, the students did an over the shoulder code review of each other’s first assignment. I wish I had gathered stats on problems found. I told everyone who wanted to file an issue to send me an email aftertheir code review discovered no problems and they had some simple test cases passing. In many of those cases, there were very obvious problems that a simple sanity test would have revealed and oddities in the code which lept out (to me).

I feel this justifies my decision not to return granular feedback or explicit tests. The program is very small and they have an oracle to test against (they are reverse engineering a small unix utility). The points awarded are few and  2 come from basically not messing up the submission format. 1 comes from following the spec requirement to use tabs as an output separator.

But the goal of these assignments is to get people thinking about software engineering, not programming per se. They need to reflect on their testing and release process and try to improve them. I had several students ask for detailed feedback so they would lose fewer marks on the next assignment and that’s precisely what I don’t want to do. The learning I’m trying to invoke isn’t “getting this program to pass my tests” but “becoming better at software engineering esp testing and problem solving and spec reading and…”.

It’s difficult, of course, for students to care about the real goals instead of the proxy rewards. That’s just being a person! All I can do is try to set up the proxy rewards and the rest of my teaching so as to promote the real goal as much as possible.

Giving students low marks drives a lot of anxiety and upset on my part. I hate it. I hate it because of their obvious suffering. I hate it because it can provoke angry reactions against me. I hate it because I love seeing people succeed.

But it seems necessary to achieve real learning. At least, I don’t see other ways that are as broadly effective.

Advertisements

2 thoughts on “Grading Postmortem

  1. Sometime I ask for the details of testing just because I want to fix the bugs in my code(bugs made me crazy), not for the grades especially when I passed the 4 of 5 simple tests and it’s so hard to find out why I failed one.
    I know the self-testing is for this but sometimes people need to walk out the cages that shut the door to truth and when you spent so much time in considering your own code you may lost the correct observation.

    • I get that!

      But, for example, we did a code review. That is supposed to help get out of the cage. We’ll have another exercise on Thursday to also help.

      And in real situations, you have to figure out how to deal with that crazymakingness.

      I thought about letting people “buy” the tests eg by forgoing the associated points in the next round. That would let people relieve an itch if they need to. But that gets rather complicated to administer and has difficult to predict or control consequences. (There’s a big incentive for collusion there.)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s