Language Productivity in Terms of Function Points

One of the perennial tough nuts for software engineering research is understanding the effect of programming language on productivity. Just understanding productivity is challenging! A fairly standard metrics is debugged, logical lines of source code (LOC) with the average productivity for a developer being in the teens (10-38 for large projects, maybe 140 for small). This is held to be language independent, thus a core hypothesis for improving productivity is to increase the “power” per line of code. This is an argument for higher and higher level languages and rich libraries.

Richard Kenneth Eng has blog post which claims that Smalltalk is a highly productive language in terms of number of hours it takes to product a function point. Function points are an alternative metric to LOC for measuring software. Just as when counting lines, we know to ignore comments or lines due to “mere” formatting nonsense because they don’t contribute to the actual functioning of the program, intuitively, function point analysis says not to focus on how many lines are needed to express a bit of functionality but on the functionality itself. In principle, this would make cross language productivity comparisons more accurate (i.e., the higher productivity the language, the greater function points per hour, regardless of the lines of code produced).

Smalltalk comes out tops (in his selection) in terms of number of hours to produce 1000 function points (6,879 for Smalltalk up to  26,273 for C…the least productive).

The cited report is pretty interesting (the blog post keys in on table 16) but is big and complex. It’ll take a while to digest. One interesting bit of analysis (table 17) is breaking out the effort into code and non-code efforts. Roughly, as productivity goes up, the percentage of time shifts between activites, e.g., for  C it’s 1.42% non code to
88.58% code, where as for Smalltalk it’s 43.61% non code to 56.39% code. This suggest that more effort is going in to design, looking stuff up, or requirements analysis (or maybe hanging out!).

As can easily be seen for very low-level languages the problems of LOC metrics are minor. But as language levels increase, a higher percentage of effort goes to non-code work while coding effort progressively gets smaller. Thus LOC metric s are invalid and hazardous for high-level languages.

It might be thought that omitting non-code effort a nd only showing coding may preserve the usefulness of LOC metrics, but this is not the case. Productivity is still producing deliverable for the lowest number of work hours or the lowest amount of effort.

Producing a feature in 500 lines of Objective-C at a rate of 500 LOC per month has better economic productivity than producing the same featu re in 1000 lines of Java at a rate of 600 LOC per month.

Objective-C took 1 month or 149 work hours for the feature. Java took 1.66 months or 247 hours. Even though coding speed favors Java by a rate of 600 LOC per month to 500 LOC per month for Objective-C, economic productivity clearl y belongs to Objective-C because of the reduced work effort.

I don’t see the methodology for this work (and they use “mathematical proof” in a weird way). This makes me a bit sad because it really means that citing these numbers is dubious.

We already discuss lines of code as a complexity metric in our class (using chapter 8 of Making Software). It would be interesting to try to introduce function point analysis at least conceptually.

Advertisements

A Panoply of SQL Injection Horrors

I hope that this year we’ll be able to migrate our Data on the Web course to Python and to focus a bit on manipulating data and formats we design.

Which means we can talk about APIs and the crappiness of string hacking for anything. Thus, SQL Injection!

The Code Curmudgeon maintains a SQL Injection Hall-of-Shame which is fascinating and depressing reading. (The page includes helpful links including the invaluable SQL Injection Prevention Cheat Sheet.)

On the one hand, the lesson seems to write itself. On the other, it’s really important to teach this stuff!

(I’ll throw the XSS Prevention Cheat Sheet on here too.)

Constant Time Code

Timing attacks on crypto are on the rise. (They are one class of side channel attacks. In general, side channel attacks are very tricky.)

Most software has multiple execution paths and the time (or other resources) it takes to follow different paths can vary considerably. Indeed, one key aspect of efficient programs is the handling of potentially “fast paths” in an actually fast way. But even if you aren’t breaking out some paths as optimisations, normal program structuring leaves you with programs whose performance is sensitive in specific ways to input (of otherwise comparable size). This can allow attackers to infer things, including, sensitive information.

One way to combat this is to make all execution paths take (essentially) the same amount of time. For example, suppose I have a short circuiting Boolean operation “shortOp OR longOp”. Since my OR will only execute “longOp” if “shortOp” fails, I have two rather different execution paths. If I replace it with a non short circuiting “OR” i.e. one that always executes all its operands, then I’ve make this test constant time… for if my functions are constant time and there are no surprises from the compiler when optimising. To a first approximation, there are always surprises. At the very least, I need to check the output of my compiler.

The paper “FaCT: A Flexible, Constant-Time Programming Language” presents a new programming language and tool chain designed to support the implantation of constant time, thus timing attack resistant, functions.

It’s worth reading for part II alone which is a tour of vulnerabilities of normal C code and some standard mitigating tricks. Their solution seems interesting:

FaCT is designed to: (1) allow developers to easily write idiomatic code that runs in constant time, (2) be flexible enough to express real-world crypto code, (3) interoperate with C code, (4) produce fast assembly code, and (5) be verified to be resilient against timing attacks.

The DSL looks pretty neat and the use of verifiers and solvers as key points is fun. There’s no empirical evaluation so whether it helps is still open.

It’d be interesting to try to embed this DSL in a language like Rust. I’d think you’d have to do it at the language level and not using the macro facilities but I’m not sure. You definitely need to perform verification checks late in the compilation process and that might not be easily accessible from the language level.

Addendum: While cleaning up tabs, I found an interesting blog post on writing a “branchless” UTF-8 decoder. The goal there was performance by helping pipelining. This would also avoid speculative execution for the decoder, I’m guessing.

What’s up with NoSQL?

A blog post (by a database vendor) caught my eye entitled Why SQL is beating NoSQL, and what this means for the future of data. For years I’ve co-taught a course roughly on the prior generation of anti SQL database tech (primarily XML, all under the rubric of “semi structured data”). For a similar time frame, I’ve researched and advocated for a sort of “anti SQL” information tech: the RDF stack and ontologies. And I remember the MySQL  arguments against ACIDity which are a kind of anti-SQL move. Most of the time, the arguments are more against either the relational calculus (too inexpressive!) or the tech infrastructure (doesn’t scale!). Modern NoSQL was more focused on the latter so it’d be no surprise if as we technically catch up on scaling, SQL would make a come back. If your objection to relational is performance, that’s not a good long term bet. Infrastructure catches up and crappy piles of code doing all sorts of little query, maintenance, and analytical jobs suck. Big piles of SQL sucks too, but it sucks so much less:

Developers soon found that not having SQL was actually quite limiting. Each NoSQL database offered its own unique query language, which meant: more languages to learn (and to teach to your coworkers); increased difficulty in connecting these databases to applications, leading to tons of brittle glue code; a lack of a third party ecosystem, requiring companies to develop their own operational and visualization tools.

The post gives a reasonable (if polemical) history and observes that the infrastructure is catching up:

First came the SQL interfaces on top of Hadoop (and later, Spark), leading the industry to “back-cronym” NoSQL to “Not Only SQL” (yeah, nice try).

Then came the rise of NewSQL: new scalable databases that fully embraced SQL. H-Store (published 2008) from MIT and Brown researchers was one of the first scale-out OLTP databases. Google again led the way for a geo-replicated SQL-interfaced database with their first Spanner paper (published 2012) (whose authors include the original MapReduce authors), followed by other pioneers like CockroachDB (2014).

This is all fine…buuuuuut…it doesn’t actually tell you about market or mindshare. Boo! We want numbers!

And…the article presumed that NoSQL had been winning but SQL has made a comeback! That seems unlikely esp over the given time frame.

A bit of searching shows a different picture.

First, there’s this hilarious InfoWorld article from 2016 entitled “NoSQL chips away at Oracle, IBM, and Microsoft dominance”.

Back in 2014, Network World’s Brandon Butler declared that NoSQL was “giving SQL database vendors and users a scare,” and a year later InfoWorld’s Andy Oliver quipped that “the once red-hot database technology is losing its luster, as NoSQL reaches mass adoption,” becoming boringly mainstream.

the SQL incumbents must be a little nervous. A new Gartner report suggests that NoSQL continues to kick the shins of its legacy RDBMS competition.

The bad news, however, is that their dominance is slipping,

Oooo! Scaaaary!!!! But wait:

Yet relational database vendors continue to print money; the NoSQL contenders, many of which are open source — not so much.

A new Gartner report suggests that NoSQL continues to kick the shins of its legacy RDBMS competition. As Gartner analyst Merv Adrian notes, “Over the past five years, the megavendors have collectively lost share,” dropping 2 percentage points to a still-hegemonic 89 percent market share.

 

All this big data infrastructure, in short, is only 3 percent of the overall paid DBMS market.

(This is a friendly reminder that prima facie you should treat Gartner reports as garbage. If the garbage is helpful because it pimps your stuff, fine. That’s advertising. Otherwise, ignore.)

But take revenue out of the equation, and cracks in the DBMS market share numbers start to appear. According to DB-Engines — which measures database popularity across a range of factors (including job listings and search interest) but excludes revenue numbers — Oracle, Microsoft, and IBM are joined at the top by some noisy neighbors:

Ok! This is a possible story. Commodification and expanding the un- and underpaid database market might be a thing (maybe). But let’s look at the DB-Engines trend line:

A ranking based on a bunch of indirect metrics.

Yeah, see that big cluster of lines up top? Oracle, MySQL, and SQL server. See that orange line that rises up a tiny bit at the end? PostgreSQL. This is a reminder not to trust InfoWorld.

Now their ranking definition is interesting: It’s basically “web presence”, total search results, mentions in job adverts, questions about the systems, etc. Established base is going to skew these numbers toward old tech and hype toward new tech. Maybe these balance out?

A StackOverflow survey doesn’t have any trendlines, but does say something about recent DB popularity:

So, NoSQL as a (paid or unpaid) market force seems to be a bit of a bust at least relative to the hype.

Evaluating tech (and tech trends) is hard!

A Couple of Weeks with a Vivosmart 3

My Fitbit One gave up the ghost (again) and they don’t make them anymore. (Vicki has donated her unused one so I’m looking forward to that!) I have a Withings O2 whatever, but I didn’t really feel it. I went for several months without an activity tracker and…my activity went down. Since I’m fighting some sort of fatigue thing and exercise is the prescription, that’s bad.

The Vivosmart 3 claims to estimate heart rate variability which is the physiological marker for stress. (Robotic heartbeat happens when your stressed, at all heart rates, and is thought to be what damages the heart.) I’d read a bit about it for a 3rd year own project I’d supervised and was skeptical it would work. But why not experiment? I wanted to try continuous heart rate monitoring anyway.

Here’s some first impressions.

  • I can sorta live with in on my wrist thought I’d prefer nothing on my wrist.
  • The display triggers are flaky. Twist and lift is meh but so is double tap (you have to pound it). There’s no way afaict to have only double tap or easily sleep the display. (Reviews warned me.)
  • If you pick the watchface with heart rate you don’t see battery life. The phone app doesn’t show battery life. You don’t get an email warning (the way Fitbit gives you). To find battery life is a double tap, press and hold, at least three swipes, a tap, a couple of more swipes. I mean, fuck you Garmain. This is some grade A extra large bullshit. Forum responses which say that it’s technically impossible to display battery life in the app are filthy, trumpian lies.
  • It seems “generous” on steps, calories, etc. even more so than many. Conversely, it’s stingy about floors climbed (as a review warned me) and if you’re going to show descents…try to avoid flaking when I climb and descend an equal number of floors. (I mean, I climbed 6 and descended 6 and it said maybe that I climbed 8 and descended…3. I’ve never seen a climb estimator that flaky before.)
  • Activity detection is meh. I’m still not sure whether it includes the “trigger time” (i.e. the period of elevated behavior that indicates a defined activity). It’s real quick to stop those so if I interrupt a walk to buy something at a store I get…two walks. Not sure that’s helpful.
  • Sleep detection was good until last night when it failed. UPDATE: it figured it out 15 hours later. Some server hiccup I guess.
  • They don’t let you add, merge, or otherwise manipulate activities. Which is super dumb!
  • They don’t let you add, merge, or otherwise manipulate sleep (except to trim the ends of the one sleep period they allow). Hello, naps? Or just adding my sleep. All these events do is trigger certain kinds of analyses of a period of data. Let me trigger that for whenever.
  • They have some sort of heart rate zone stuff but unless you’re happy to pick a zone for your activity and stick with it, the reminders are annoying. There’s no *analytical* presentation of zone data (i.e. For this power walk how long was I in zone 2 and zone 3). UPDATE: if you swipe to the heart rate screen during an activity it will also show you the zone. That’s something! Why it doesn’t show me zone info post facto (e.g. you spend x minutes in zone 1, y in 2, etc.) eludes me.
  • There’s no heart rate recovery analysis. Which is super dumb. That’s something trivially to do in software and a PITA to do manually. It’s an important indicator!
  • I never ever want to see “pace”. Just show me MPH, ok?
  • The app is demon spawn. Pretty enough but space wasting, a twisty maze of screens and menus and weird things placed weirdly. Setting a silent alarm is an adventure. Finding the heart rate zone stuff is a nightmare. I have no idea how to add distance in miles to the app (it’s on the device! Maybe it’s like battery life?!)
  • Continuous heart rate monitoring is awesome. It seem accurate. I have just started experiencing some flakiness.
  • The stress stuff is probably bogus.
  • Either accelerometers can’t measure treadmill steps correctly or my treadmill isn’t well calibrated. (I believe the latter which conceptually bugs me. How hard is it for a treadmill to get distance right?!? And yet it gives me a pretty tough workout for me.) This borks any VO2max calculation, I’m pretty sure. Oh well. UPDATE: So, on the one hand, there’s definitely a mismatch between the treadmill and the tracker to the treadmills detriment. I mean the 4mph doesn’t feel like what the tracker wants 4mph to feel like and the tracker wants something closer to the GPS. Even 5mph treadmill is pretty slow. OTOH, 6.5-8 require actual running. Soooo…. The amusing side effect is that my computed VO2Max has gone down as I’ve clearly gotten fitter. It took a nose dive today (my fitness age went from 60 to 66 ;)) in spite of my handling a tougher workout well (i.e. same treadmill distance in 15 rather than 20 minutes). Now maybe it’s just getting better data? over time? Could be, but it seems worthless. I can do a specific effort to measure it but I don’t want to do it on that treadmill if it’s going to be this off. It’s amazing to me that they offer this and not simpler measures like heart rate recovery. Track that over time!!

Just to give you a feel, here’s what the app looks like when you open it:

They really really want you to scroll, swipe, and click. A lot.

UPDATE: The “collapse view” is actually reasonable:

But only if you don’t have any activities! Activities remain non compact and are on top which is why I thought collapse didn’t do anything.

A Reason to Learn/Study Obscure Programming Languages

Kasper Peulen looked at some newish programming languages (think Swift) and made a nice top 10 list of cool features they’d like to see get wider adoption such as destructuring or conditions as expressions rather than statements.

In an update (produced by pushback, I’d guess) he acknowledge that these were not novel:

Update: All the examples above are from Reason, Swift, Kotlin and Dart. However, many of the ideas above can already be found in much older languages such as Lisp (1958), Smalltalk (1972), Objective-C (1984), Haskell (1990), OCaml (1996) and many more. So while the examples are from “modern” languages, the ideas in this article are actually very “old”. (*)

Hence, the title of my post. You can learn a lot about the probable future of hot new languages by looking at old obscure ones.

Though perhaps you can flip it around and say “Don’t bother with the obscure. If a feature is good it will eventually show up in a hot language.”

Programming languages embody in a deep way ideas about software engineering. All of a language’s ecosystem does, but built in language features tend to be the most ubiquitous.

Worse is Better and Back Again

Richard Gabriel, 1991

I and just about every designer of Common Lisp and CLOS has had extreme exposure to the MIT/Stanford style of design. The essence of this style can be captured by the phrase the right thing. To such a designer it is important to get all of the following characteristics right:

  • Simplicity — the design must be simple, both in implementation and interface. It is more important for the interface to be simple than the implementation.
  • Correctness — the design must be correct in all observable aspects. Incorrectness is simply not allowed.
  • Consistency — the design must not be inconsistent. A design is allowed to be slightly less simple and less complete to avoid inconsistency. Consistency is as important as correctness.
  • Completeness — the design must cover as many important situations as is practical. All reasonably expected cases must be covered. Simplicity is not allowed to overly reduce completeness.

I believe most people would agree that these are good characteristics. I will call the use of this philosophy of design the MIT approach Common Lisp (with CLOS) and Scheme represent the MIT approach to design and implementation.

The worse-is-better philosophy is only slightly different:

  • Simplicity — the design must be simple, both in implementation and interface. It is more important for the implementation to be simple than the interface. Simplicity is the most important consideration in a design.
  • Correctness — the design must be correct in all observable aspects. It is slightly better to be simple than correct.
  • Consistency — the design must not be overly inconsistent. Consistency can be sacrificed for simplicity in some cases, but it is better to drop those parts of the design that deal with less common circumstances than to introduce either implementational complexity or inconsistency.
  • Completeness — the design must cover as many important situations as is practical. All reasonably expected cases should be covered. Completeness can be sacrificed in favor of any other quality. In fact, completeness must be sacrificed whenever implementation simplicity is jeopardized. Consistency can be sacrificed to achieve completeness if simplicity is retained; especially worthless is consistency of interface.

Early Unix and C are examples of the use of this school of design, and I will call the use of this design strategy the New Jersey approach I have intentionally caricatured the worse-is-better philosophy to convince you that it is obviously a bad philosophy and that the New Jersey approach is a bad approach.

However, I believe that worse-is-better, even in its strawman form, has better survival characteristics than the-right-thing, and that the New Jersey approach when used for software is a better approach than the MIT approach.

Olin Shivers, 1998

* Preamble: 100% and 80% solutions
———————————-
There’s a problem with tool design in the free software and academic
community. The tool designers are usually people who are building tools for
some larger goal. For example, let’s take the case of someone who wants to do
web hacking in Scheme. His Scheme system doesn’t have a sockets interface, so
he sits down and hacks one up for his particular Scheme implementation. Now,
socket API’s are not what this programmer is interested in; he wants to get on
with things and hack the exciting stuff — his real interest is Web services.
So he does a quick 80% job, which is adequate to get him up and running, and
then he’s on to his orignal goal.

Unfortunately, his quickly-built socket interface isn’t general. It just
covers the bits this particular hacker needed for his applications. So the
next guy that comes along and needs a socket interface can’t use this one.
Not only does it lack coverage, but the deep structure wasn’t thought out well
enough to allow for quality extension. So *he* does his *own* 80%
implementation. Five hackers later, five different, incompatible, ungeneral
implementations had been built. No one can use each others code.

The alternate way systems like this end up going over a cliff is that the
initial 80% system gets patched over and over again by subsequent hackers, and
what results is 80% bandaids and 20% structured code. When systems evolve
organically, it’s unsuprising and unavoidable that what one ends up with is a
horrible design — consider the DOS -> Win95 path.

As an alternative to five hackers doing five 80% solutions of the same
problem, we would be better off if each programmer picked a different task,
and really thought it through — a 100% solution. Then each time a programmer
solved a problem, no one else would have to redo the effort. Of course, it’s
true that 100% solutions are significantly harder to design and build than 80%
solutions. But they have one tremendous labor-savings advantage: you don’t
have to constantly reinvent the wheel. The up-front investment buys you
forward progress; you aren’t trapped endlessly reinventing the same awkward
wheel.

But here’s what I’d really like: instead of tweaking regexps, you go do your
own 100% design or two. Because I’d like to use them. If everyone does just
one, then that’s all anyone has to do.

Kevlin Henney, 2017:

A common problem in component frameworks, class libraries, foundation services, and other infrastructure code is that many are designed to be general purpose without reference to concrete applications. This leads to a dizzying array of options and possibilities that are often unused or misused — or just not useful.

Generally, developers work on specific systems; specifically, the quest for unbounded generality rarely serves them well (if at all). The best route to generality is through understanding known, specific examples, focusing on their essence to find an essential common solution. Simplicity through experience rather than generality through guesswork.

Speculative generality accumulates baggage that becomes difficult or impossible to shift, thereby adding to the accidental complexity those in development must face in future.

Although many architects value generality, it should not be unconditional. People do not on the whole pay for — or need — generality: they tend to have a specific situation, and it is a solution to that specific situation that has value.

We can find generality and flexibility in trying to deliver specific solutions, but if we weigh anchor and forget the specifics too soon, we end up adrift in a sea of nebulous possibilities, a world of tricky configuration options, overloaded and overburdened parameter lists, long-winded interfaces, and not-quite-right abstractions. In pursuit of arbitrary flexibility, you can often lose valuable properties — whether intended or accidental — of alternative, simpler designs.

Ok, the last one is a bit more…specific…than the first two. But it’s fun to read it in juxtaposition with the first two. One way to try bridge the difference between Henney and Shivers is to not that Shivers is saying that we need more 100% designs and Henney is saying that we need a lot of specific experience to get to a good 100% design. But then the differences becomes stronger…Shivers doesn’t want people to hack up a bunch of 80% solutions while Henney, roughly, thinks we have to have them before we have a hope for a right 100% one.

My heart is with Shivers, but my head is with Henney.

I think I have some readings and an exam question for next year’s class.