The IEEE & ACM ACM 2009 Software Engineering Curriculum Recommendations suggest the following outcomes for an MSc program in software engineering:
A student who has mastered the [Core Body of Knowledge (CBOK)] will be able to develop a modest-sized software system of a few thousand lines of code from scratch, be able to modify a pre-existing large-scale software system exceeding 1,000,000 lines of code, and be able to integrate third-party components that are themselves thousands of lines of code. Development and modification include analysis, design, and verification, and should yield high-quality artefacts, including the final software product.
I’ve been very excited by this, though I currently only teach 1 MSc class and have some influence on 1 other (out of 4 specifically devoted to Software Engineering). I’ve made progress in my class toward getting them able to develop a modest-sized software system from scratch (though on the order of 100s, not 1000s of SLOC) and a tiny bit of integrating third-party components. But if we break this out:
- Develop 1k-5k (lets say) program from 0 lines of code.
- Modify an existing system with >1 million LOC.
- Integrate 3rd pary components of 1k-10k LOC (let’s say).
I’m struggling with the 1 million LOC. I mean, why 1 million specifically? I’d wadger that relatively few developers have to cope with such a system and even those who do deal with much smaller fragements thereof. (Necessarily!) If the million line system has a clean plug in architecture with a small API, then it’s not clear what the difference between writing such a plug in for a million line system is from one with 100k LOC or, for that matter, 10k LOC. Obviously, there are management differences, but as long as the whole system isn’t easily readable in the time available, I think the differences between those systems is immaterial. That is, being able to modify a 10K program roughly predicts being able to modify a 100k program or a 1 million LOC program pace tooling and infrastructure.
It seems to me that the heart of each of these is:
- Building a non-trivial system from scratch.
- Modifying a large system in a significant way.
- Assess and integrate (new) third party components.
Even with this, there’s so much variance in each category that it’s not clear that what we can do in a course setting is cleanly going to transfer. That is, there’s a big difference between being able to create some 1-5k program and being able to build a specific one (or a specific category of one). Someone who has built a bunch of command line utilties might struggle with building an ecommerce website even if the target LOCs are similar.
At the moment, for my class, I have them build a reverse engineered version of
wc (wordcount…the GNU version). This is the first time many of them had to actually, you know, write a full fledged program. The average SLOC for their final version (a full fledged GNU
--max-line-length) was 263 (min=69 and max=1084). I’ve not yet correlated this with correctness, so it’s a bit hard to say what’s going on at the extremes. I think a reasonable clone should be comfortably doable in ≈200 SLOC.
They also had to rework a version of their code to use the standard library module
argparse. They also had to write various sorts of tests and examine the performance. I think this approximates 1 and 3, though 1 better than 3. Given the wide range of skill levels in the class, I don’t know if we could hope to do a “longer” project. I’m also not sure what the gain would be.
But 2 is missing. I don’t get them to engage with a significant body of existing code. That’s what I’m searching for now. My leading candidates are various static site generation systems.