Language Productivity in Terms of Function Points

One of the perennial tough nuts for software engineering research is understanding the effect of programming language on productivity. Just understanding productivity is challenging! A fairly standard metrics is debugged, logical lines of source code (LOC) with the average productivity for a developer being in the teens (10-38 for large projects, maybe 140 for small). This is held to be language independent, thus a core hypothesis for improving productivity is to increase the “power” per line of code. This is an argument for higher and higher level languages and rich libraries.

Richard Kenneth Eng has blog post which claims that Smalltalk is a highly productive language in terms of number of hours it takes to product a function point. Function points are an alternative metric to LOC for measuring software. Just as when counting lines, we know to ignore comments or lines due to “mere” formatting nonsense because they don’t contribute to the actual functioning of the program, intuitively, function point analysis says not to focus on how many lines are needed to express a bit of functionality but on the functionality itself. In principle, this would make cross language productivity comparisons more accurate (i.e., the higher productivity the language, the greater function points per hour, regardless of the lines of code produced).

Smalltalk comes out tops (in his selection) in terms of number of hours to produce 1000 function points (6,879 for Smalltalk up to  26,273 for C…the least productive).

The cited report is pretty interesting (the blog post keys in on table 16) but is big and complex. It’ll take a while to digest. One interesting bit of analysis (table 17) is breaking out the effort into code and non-code efforts. Roughly, as productivity goes up, the percentage of time shifts between activites, e.g., for  C it’s 1.42% non code to
88.58% code, where as for Smalltalk it’s 43.61% non code to 56.39% code. This suggest that more effort is going in to design, looking stuff up, or requirements analysis (or maybe hanging out!).

As can easily be seen for very low-level languages the problems of LOC metrics are minor. But as language levels increase, a higher percentage of effort goes to non-code work while coding effort progressively gets smaller. Thus LOC metric s are invalid and hazardous for high-level languages.

It might be thought that omitting non-code effort a nd only showing coding may preserve the usefulness of LOC metrics, but this is not the case. Productivity is still producing deliverable for the lowest number of work hours or the lowest amount of effort.

Producing a feature in 500 lines of Objective-C at a rate of 500 LOC per month has better economic productivity than producing the same featu re in 1000 lines of Java at a rate of 600 LOC per month.

Objective-C took 1 month or 149 work hours for the feature. Java took 1.66 months or 247 hours. Even though coding speed favors Java by a rate of 600 LOC per month to 500 LOC per month for Objective-C, economic productivity clearl y belongs to Objective-C because of the reduced work effort.

I don’t see the methodology for this work (and they use “mathematical proof” in a weird way). This makes me a bit sad because it really means that citing these numbers is dubious.

We already discuss lines of code as a complexity metric in our class (using chapter 8 of Making Software). It would be interesting to try to introduce function point analysis at least conceptually.


A Panoply of SQL Injection Horrors

I hope that this year we’ll be able to migrate our Data on the Web course to Python and to focus a bit on manipulating data and formats we design.

Which means we can talk about APIs and the crappiness of string hacking for anything. Thus, SQL Injection!

The Code Curmudgeon maintains a SQL Injection Hall-of-Shame which is fascinating and depressing reading. (The page includes helpful links including the invaluable SQL Injection Prevention Cheat Sheet.)

On the one hand, the lesson seems to write itself. On the other, it’s really important to teach this stuff!

(I’ll throw the XSS Prevention Cheat Sheet on here too.)

A Reason to Learn/Study Obscure Programming Languages

Kasper Peulen looked at some newish programming languages (think Swift) and made a nice top 10 list of cool features they’d like to see get wider adoption such as destructuring or conditions as expressions rather than statements.

In an update (produced by pushback, I’d guess) he acknowledge that these were not novel:

Update: All the examples above are from Reason, Swift, Kotlin and Dart. However, many of the ideas above can already be found in much older languages such as Lisp (1958), Smalltalk (1972), Objective-C (1984), Haskell (1990), OCaml (1996) and many more. So while the examples are from “modern” languages, the ideas in this article are actually very “old”. (*)

Hence, the title of my post. You can learn a lot about the probable future of hot new languages by looking at old obscure ones.

Though perhaps you can flip it around and say “Don’t bother with the obscure. If a feature is good it will eventually show up in a hot language.”

Programming languages embody in a deep way ideas about software engineering. All of a language’s ecosystem does, but built in language features tend to be the most ubiquitous.

Worse is Better and Back Again

Richard Gabriel, 1991

I and just about every designer of Common Lisp and CLOS has had extreme exposure to the MIT/Stanford style of design. The essence of this style can be captured by the phrase the right thing. To such a designer it is important to get all of the following characteristics right:

  • Simplicity — the design must be simple, both in implementation and interface. It is more important for the interface to be simple than the implementation.
  • Correctness — the design must be correct in all observable aspects. Incorrectness is simply not allowed.
  • Consistency — the design must not be inconsistent. A design is allowed to be slightly less simple and less complete to avoid inconsistency. Consistency is as important as correctness.
  • Completeness — the design must cover as many important situations as is practical. All reasonably expected cases must be covered. Simplicity is not allowed to overly reduce completeness.

I believe most people would agree that these are good characteristics. I will call the use of this philosophy of design the MIT approach Common Lisp (with CLOS) and Scheme represent the MIT approach to design and implementation.

The worse-is-better philosophy is only slightly different:

  • Simplicity — the design must be simple, both in implementation and interface. It is more important for the implementation to be simple than the interface. Simplicity is the most important consideration in a design.
  • Correctness — the design must be correct in all observable aspects. It is slightly better to be simple than correct.
  • Consistency — the design must not be overly inconsistent. Consistency can be sacrificed for simplicity in some cases, but it is better to drop those parts of the design that deal with less common circumstances than to introduce either implementational complexity or inconsistency.
  • Completeness — the design must cover as many important situations as is practical. All reasonably expected cases should be covered. Completeness can be sacrificed in favor of any other quality. In fact, completeness must be sacrificed whenever implementation simplicity is jeopardized. Consistency can be sacrificed to achieve completeness if simplicity is retained; especially worthless is consistency of interface.

Early Unix and C are examples of the use of this school of design, and I will call the use of this design strategy the New Jersey approach I have intentionally caricatured the worse-is-better philosophy to convince you that it is obviously a bad philosophy and that the New Jersey approach is a bad approach.

However, I believe that worse-is-better, even in its strawman form, has better survival characteristics than the-right-thing, and that the New Jersey approach when used for software is a better approach than the MIT approach.

Olin Shivers, 1998

* Preamble: 100% and 80% solutions
There’s a problem with tool design in the free software and academic
community. The tool designers are usually people who are building tools for
some larger goal. For example, let’s take the case of someone who wants to do
web hacking in Scheme. His Scheme system doesn’t have a sockets interface, so
he sits down and hacks one up for his particular Scheme implementation. Now,
socket API’s are not what this programmer is interested in; he wants to get on
with things and hack the exciting stuff — his real interest is Web services.
So he does a quick 80% job, which is adequate to get him up and running, and
then he’s on to his orignal goal.

Unfortunately, his quickly-built socket interface isn’t general. It just
covers the bits this particular hacker needed for his applications. So the
next guy that comes along and needs a socket interface can’t use this one.
Not only does it lack coverage, but the deep structure wasn’t thought out well
enough to allow for quality extension. So *he* does his *own* 80%
implementation. Five hackers later, five different, incompatible, ungeneral
implementations had been built. No one can use each others code.

The alternate way systems like this end up going over a cliff is that the
initial 80% system gets patched over and over again by subsequent hackers, and
what results is 80% bandaids and 20% structured code. When systems evolve
organically, it’s unsuprising and unavoidable that what one ends up with is a
horrible design — consider the DOS -> Win95 path.

As an alternative to five hackers doing five 80% solutions of the same
problem, we would be better off if each programmer picked a different task,
and really thought it through — a 100% solution. Then each time a programmer
solved a problem, no one else would have to redo the effort. Of course, it’s
true that 100% solutions are significantly harder to design and build than 80%
solutions. But they have one tremendous labor-savings advantage: you don’t
have to constantly reinvent the wheel. The up-front investment buys you
forward progress; you aren’t trapped endlessly reinventing the same awkward

But here’s what I’d really like: instead of tweaking regexps, you go do your
own 100% design or two. Because I’d like to use them. If everyone does just
one, then that’s all anyone has to do.

Kevlin Henney, 2017:

A common problem in component frameworks, class libraries, foundation services, and other infrastructure code is that many are designed to be general purpose without reference to concrete applications. This leads to a dizzying array of options and possibilities that are often unused or misused — or just not useful.

Generally, developers work on specific systems; specifically, the quest for unbounded generality rarely serves them well (if at all). The best route to generality is through understanding known, specific examples, focusing on their essence to find an essential common solution. Simplicity through experience rather than generality through guesswork.

Speculative generality accumulates baggage that becomes difficult or impossible to shift, thereby adding to the accidental complexity those in development must face in future.

Although many architects value generality, it should not be unconditional. People do not on the whole pay for — or need — generality: they tend to have a specific situation, and it is a solution to that specific situation that has value.

We can find generality and flexibility in trying to deliver specific solutions, but if we weigh anchor and forget the specifics too soon, we end up adrift in a sea of nebulous possibilities, a world of tricky configuration options, overloaded and overburdened parameter lists, long-winded interfaces, and not-quite-right abstractions. In pursuit of arbitrary flexibility, you can often lose valuable properties — whether intended or accidental — of alternative, simpler designs.

Ok, the last one is a bit more…specific…than the first two. But it’s fun to read it in juxtaposition with the first two. One way to try bridge the difference between Henney and Shivers is to not that Shivers is saying that we need more 100% designs and Henney is saying that we need a lot of specific experience to get to a good 100% design. But then the differences becomes stronger…Shivers doesn’t want people to hack up a bunch of 80% solutions while Henney, roughly, thinks we have to have them before we have a hope for a right 100% one.

My heart is with Shivers, but my head is with Henney.

I think I have some readings and an exam question for next year’s class.

Unit Tests Effects on “Testability”

If your code base looks like this attic, I pity you.

Unit tests seem generally regarded as critical for quality code bases. Prima facie, the key effect we might expect is correctness. After all, that’s what most unit tests aim for testing!

Unit testing may not be sufficient for correctness (no test approach is, in the extreme) but it does seem that having lots of tests should promote correctness.

However, this primary, most direct outcome is not the only outcome we might expect:

  1. Proponents of test driven development argue that having good unit tests promotes refactoring and other good practices because you can make changes “with confidence” because your tests protect you from unintended effects
  2. Some units are harder to test than others, i.e. are less testable. Intuitively, long functions or methods, complex ones with lots of code paths, and complex signatures all make a given unit hard to test! So we might expect that writing lots of tests tends to promote testable code. We might expect synergy with 1.

It all sounds plausible (but defeatable). But what does reality say?

We are living in a golden age for empirical study of software engineering in many ways. There’s so much stuff freely accessible on the web (code of all sorts, with revision history, and a vast amount of side matter…issue and mailing lists, documentation, etc). It’s a lot easier to get a survey or experiment going.

That’s what Erik Dietrich did in a very nice blog post. He looked at 100 projects off of Github, characterized then binned them by percentage of methods which were test methods. If 50% of your methods are test methods, it’s a pretty good bet that it’s heavily tested.

Right off the bat we have the striking results:

Of the 100 codebases, 70 had unit tests, while 30 did not.

(I’m really loving the WordPress iPhone app EXCEPT for the fact that I can’t strip formatting when pasting text and can’t keep that formatting from contaminating subsequent text. That sucks WP especially FOR A FREAKING BLOGGING APP!!!

Update: It seems that the formatting nonsense is only in the app but doesn’t come through in the actual post. Yay!)

This could be an artifact of his detector or maybe the tests are elsewhere. Still!

Overall, only 5 of his 10 very natural hypotheses were correct. For example, testing anticorrelated with method length and complexity.

For cyclomatic complexity…this may not be surprising. You generally need more tests (to hit all the code paths). Also, as supported by “Beyond Lines of Code: Do We Need More Complexity Metrics?” (from the awesome Making Software, which needs a second edition!!), complexity metrics including cyclometric complexity, tend to correlate closely with lines of code. So larger methods and more complex methods are going to march together (and probably nesting too).

In any case, this is a very nice start.

Grumpy about Textbooks

I definitely need to do more research but I don’t feel that there is a really solid textbook on software engineering. I use Steve McConnell’s Code Complete (second edition) and Making Software for readings.

These are both pretty good. Code Complete is a bible for many people (not for me!) but regardless it’s definitely on a “you should read this if you are a software engineer” list. It has a few problems though:

  1. It’s not written with courses in mind, as far as I can tell. It introduces a lot of stuff and sometimes in a helpful order, but other times not. The “learning objects” are not clear at all.
  2. It’s not super well written. You get a lot of interesting lists (e.g., of program qualities) but they are often not coherent, have some redundancies, are are perfunctorily designed. These often feel revelatory on a first read but if you try to work with them you get a bit grumpy. For example, we have 4 kinds of tests: unit, component, integration, and system. Unit and component test bits of the same size: a unit. The difference is whether the unit is maintained by one team (thus a unit test) or more than one team (a component test). This is bonkers. It’s esp. bonkers to compare with integration or system tests. It could be part of an interesting axis (who’s the owner vs. who’s writing the tests). But there are much better frameworks out there.
  3. It’s a bit dated. The second edition came out in 2004 and is thus 12 years old. This doesn’t invalidate it per se, but given that the book itself has a prominent discussion of the need for life long learning because the fundamentals of software engineering keep changing, it’s a problem. I’d prefer something other than Basic as the “other” example language.
  4. It pretends to focus on code construction, but has just enough architecture, etc. to be almost a reasonably complete text. But the scattershot approach is a bit disorienting.

If you read it cover to cover and absorbed it all with an appropriately skeptical eye and organised it appropriately, then you’d be in great shape.

My pal Mark suggested reoriented on The Pragmatic Programmer, which is another classic and definitely on the must read list. But a lot of my concerns apply to it too. (That there’s a basic divide between those pushing Code Complete and those pushing the Pragmatic Programmer is interesting. The lines I’ve seen is that Code Complete aspires to be encyclopaedic and the Pragmatic Programmer is more opinionated and thus effective. Roughly. They both feel scattered to me.)

I could try both (not this year!). I could go with Pragmatic Programmer because it’s smaller and thus they could possibly read the whole thing.

But neither feel satisfactory as a textbook. The systematicity and pedagogic logic just don’t seem to be there. So I’m left imposing some order on them.

APIs on the Web Platform

blog post about Microsoft Edge (their new browser) contains an extraordinary tidbit. They are talking about compatibility with other browsers and one metric is “shared APIs”. Then they have this nifty little table:

Google Chrome 48 Apple Safari 9
Internet Explorer 11 4076 shared APIs 3769 shared APIs
EdgeHTML 13 4724 shared APIs (+16%) 4157 shared APIs (+10%)

Ok, clear improvement, but what’s staggering is the sheer number of APIs to share!!!

Is there even a list of these APIs publicly available?! And 4724 needs to be regarded as a lower bound on the number of APIs (even standard APIs)! One of the comments complains about lack of RSS support! So even very common APIs didn’t make it in yet.

The web platform is extraordinarily complex.

I am practicing British understatement.