The GPL Won

It seems like GPL and related licenses dominate open source projects, by a lot:

At the beginning of his talk, DiBona said that according to Google’s net crawlers, the web now contains over 31 million open source projects, spanning 2 billion lines of code. Forty-eight per cent of these projects are under the GPL, 23 per cent use the LGPL, 14 per cent use the BSD license, 6 per cent use Apache, and 5 per cent use the MIT license. All other licenses are used with under 5 per cent of the projects.

So, GPL variants govern 71% of 31 million projects. Daaaamn. That’s a lot. A lot more than the rest, which are less restrictive.

I confess to being a bit surprised given the hostility (or exasperation) one often encounters by e.g., business folks when dealing with the GPL. Of course, it has two strong factors: it’s viral (so derived projects must use it) and it’s has a lot of advocacy, both dedicated (think Stallman) and more incidental (think Linux in general).

Ooo, I really have an itch to find out whether virality is a big factor….


On Facebook, Dan Brickley (thanks Dan!) points out that 1) this survey is from 2011 and 2) more recent surveys point to a shift away from GPL to more more permissive licenses, to wit, MIT and Apache:

Indeed, if we contrast each license’s share of the repositories surveyed by Black Duck [January 2017] versus January 2010, the shift is quite apparent….

In Black Duck’s sample, the most popular variant of the GPL – version 2 – is less than half as popular as it was (46% to 19%). Over the same span, the permissive MIT has gone from 8% share to 29%, while its permissive cousin the Apache License 2.0 jumped from 5% to 15%. What this means is that over the course of a seven year period, the GPLv2 has gone from being roughly equal in popularity to the next nine licenses combined to 10% out of first place.

All of which suggests that if we generally meant copyleft when we were talking about open source in 2007, we typically mean permissive when we discuss it today.

Read the whole thing, esp. the bit about the rise of unlicensed projects on Github.

Now, methodologically, their survey is smaller:

This open source licensing data reflects analysis of over two million open source projects from over 9,000 global forges and repositories.

So, it might be the case that the Google population wouldn’t show this shift. But, ex ante, a focused crawl is more likely (perhaps) to be dominated by “high quality” repositories, thus may reflect best or active practice better.

This all still cries out for some causal investigation.


2 thoughts on “The GPL Won

  1. 31 million projects spanning 2 billion lines of code… that makes the average open source project consist of around 65 lines of code. I’d have to question the methodology they used to come up with those numbers. Are they counting every version of Linux as a separate project but not separate lines of code? Did they count up every project or fork on Github, or Sourceforge or whatever the equivalents were in 2011 when that article was published, and did those sites have GPL as a default? What licenses do the important projects use? Linux is GPL, but WebKit is BSD/LGPL, MySQL is dual GPL/commercial, Hadoop is Apache, Android is Apache aside from the kernel, Python and Java and Mozilla have their own licenses…

    The hostility from the business community is easy to understand: IIRC Stallman in particular doesn’t believe in the traditional ideals of intellectual property when it comes to software, and thinks programmers in particular should prioritize the freedom of the software they write over concerns like making a living. How do GPL projects get paid for? In the case of Linux it’s by companies that make either the proprietary hardware it runs or ran on top of (IBM, Intel, HP, Cisco, SGI, etc.) or the proprietary software that runs on top of it (Google, Facebook, etc.). Some of the business concerns are misguided but some of them are real.

    • I have no idea of the methodology. However, I’m not horribly surprised. A good chunk of of the long tail are going to be *some* kind of garbage. Per usual, and per your comment, distinctness is going to be a challenge (what makes a project a distinct project).

      But there are other reasons that projects might out run code: The code repo was missing or down or just hard to access.

      I’m not (here) dissing the hostility of the business community to GPLing, just pointing out that that fact is potentially a push back on it’s spread.

      And yes, these don’t weight the numbers in anyway. (E.g., by users, deployment, value, etc.)

      Still, pretty striking.

Comments are closed.