It seems like GPL and related licenses dominate open source projects, by a lot:
At the beginning of his talk, DiBona said that according to Google’s net crawlers, the web now contains over 31 million open source projects, spanning 2 billion lines of code. Forty-eight per cent of these projects are under the GPL, 23 per cent use the LGPL, 14 per cent use the BSD license, 6 per cent use Apache, and 5 per cent use the MIT license. All other licenses are used with under 5 per cent of the projects.
So, GPL variants govern 71% of 31 million projects. Daaaamn. That’s a lot. A lot more than the rest, which are less restrictive.
I confess to being a bit surprised given the hostility (or exasperation) one often encounters by e.g., business folks when dealing with the GPL. Of course, it has two strong factors: it’s viral (so derived projects must use it) and it’s has a lot of advocacy, both dedicated (think Stallman) and more incidental (think Linux in general).
Ooo, I really have an itch to find out whether virality is a big factor….
Indeed, if we contrast each license’s share of the repositories surveyed by Black Duck [January 2017] versus January 2010, the shift is quite apparent….
In Black Duck’s sample, the most popular variant of the GPL – version 2 – is less than half as popular as it was (46% to 19%). Over the same span, the permissive MIT has gone from 8% share to 29%, while its permissive cousin the Apache License 2.0 jumped from 5% to 15%. What this means is that over the course of a seven year period, the GPLv2 has gone from being roughly equal in popularity to the next nine licenses combined to 10% out of first place.
All of which suggests that if we generally meant copyleft when we were talking about open source in 2007, we typically mean permissive when we discuss it today.
Read the whole thing, esp. the bit about the rise of unlicensed projects on Github.
Now, methodologically, their survey is smaller:
This open source licensing data reflects analysis of over two million open source projects from over 9,000 global forges and repositories.
So, it might be the case that the Google population wouldn’t show this shift. But, ex ante, a focused crawl is more likely (perhaps) to be dominated by “high quality” repositories, thus may reflect best or active practice better.
This all still cries out for some causal investigation.