Jacques Mattheij

Technology, Coding and Business

Review What You Fork on GitHub

Open source is, in my opinion, the way of the future. Sites like GitHub and competitors make it extremely easy to collaborate with people from all over the world on projects, and with a click of the mouse you can take an existing project and ‘fork’ it to adapt it whichever way you want.

This is a very quick way to get started on something new. After all, most software projects have a lot in common with other projects, and if you are aiming to producing another open source project anyway, you might as well build on the foundation of already written and debugged code produced by the efforts of others. This give-and-take is what the power of open source is all about.

But in spite of the obvious power of these tools, there is also a giant drawback and I think you need to be fully aware of all the possible consequences of what can happen when you make that click in a careless way.

When you do a “fork”, each and every one of the files that you copy is copyrighted, and each and every one of those files may have been tainted at some point in the past. You, the forker, are now distributing that code, and if some of it - or all of it - is found to be infringing on copyrights or patents, it is very well possible that you will be held liable. Understand fully that GitHub is not in the business of guaranteeing that the code that you find there is passed on with a clean bill of health attached. The fact that it is on GitHub does not make any statements about provenance or suitability, nor of the rights of the person you forked the code from.

In order to make sure that you are in the clear you will have to review all of the code that you are to become responsible for, and you should do so at the moment or before you fork the code. “I didn’t know” is not a defense, and even if you didn’t know, you will most likely be barred from using the fruits of your labor because you are now the proud owner of a derivative work. So, even if you’ve never even touched the code, and never so much as downloaded it, you could very well be on the hook for a copyright violation.

Reviewing code is time consuming. Typically, at a minimum, you will have to look at each file, analyze the code in there, trace who wrote it (first commit), for which sites such as GitHub have some tools (such as code search). There is a pretty good write-up about what this sort of process entails:

http://wiki.osgeo.org/wiki/Code_Provenance_Review

Prepare for some hard work here, this is not a simple job. Waiting until you receive a letter from some fancy lawyer or hope do not cut it, when you click that ‘fork’ button you are taking on some significant liability.

What to do if you find code of questionable provenance?

If and when you do come upon infringing code, make sure that you alert the parent and siblings of the code that you forked, and also make sure that everybody is aware of the risks.

The second thing to do would be to consider the cost of re-writing the code that infringes, using a “clean room” approach where the remainder of the project (assuming a non-substantial infringement) is used to specify what the code should do. Then, someone else that has no idea what the original code looked like, will re-write it from scratch.

That way you avoid contamination of the new code with information from the old code.

If the infringing section is larger (larger than, say, 10% or so of the total project), then I would recommend to stay away from it completely. Look for a more suitable project to use as the base, having such a large portion of the code of “questionable parentage” is going to mean that you will be spending a significant amount of time on cleaning it up and making sure that at least that section is without encumbrances, and it would cast a strong shadow of doubt over the remaining code, unless the boundaries in the code were very clearly defined (for instance, in the case of a library module).

Mixing licenses in code is fine, but do be aware that some licenses contain conflicting terms that may need to be resolved. Also, be aware that GitHub (and all other competitors on US soil) is subject to the DMCA, and that they can - and should - take down your code at a moment’s notice when given a DMCA takedown request. You can file a defense and they will put your code back up, but with the express note that you are completely on the hook for any damage caused by this, by filing a defense you are indemnifying them.

GitHub will publish all the DMCA take-down requests they receive here:

https://github.com/github/dmca

So at least they are completely transparent in this. If you do not wish to be subject to the DMCA, then you will have to move your code to a repository outside of the USA jurisdiction after you fork the project, but that is somewhat against the spirit behind services like this.

To date, corporations have been fairly slow in noticing that their code may have ended up on GitHub. Yet, companies like Sony have been fairly active recently in sending take-down requests, and I do not doubt that others will follow. What easier way to shut down an open source competitor than to be able to cast doubt on the originality of their product, it’s a sure bet that if you go head-to-head with some big corporation, and start to cut into their profits, that your code will be scrutinized. Make sure you are not caught through carelessness.

In closing, as always, trust but verify. If a project is advertised as GPL, or some other open source license, it probably is, but you will have to verify that for yourself, in the end you and nobody else is responsible for the code that you distribute, and ignorance of what you distribute is not a defense.