The purpose of self-learning artificial intelligence is to make computers learn from the actions of itself and others. In terms of Microsoft’s much-hyped Github Copilot technology, the line between learning and regurgitation has yet to be crossed.
At it turns out, Github Copilot isn't so much learning from others as it is stealing. In fact, it's using previously written code as a cheat sheet.
Github Copilot is a bit of a thief
On Friday, software engineer Kyle Peacock discovered that Github's new automatic code generator was pulling code from a very well-known coder. If the AI program attempts to generate an About Me page, it'll consistently reference coder David Celis.
As the program is based on millions of lines of code published to Github, it's no surprise that it would throw up familiar code. However, for something as basic as an About Me page to reference a specific individual, the legal implications of the program are being questioned.
At the moment, Celis says he's “not surprised” his repositories are directly referenced. He also doesn't mind. Additionally, Microsoft has attempted to state that their current training model is “considered fair use across the machine learning community”. However, others aren't so certain.
Where’s the legal boundary?
A report by The Verge explains that Microsoft may not be as in the clear as they think they are. The report states that there's no legal precedent that keeps Microsoft’s training method in the clear for Github Copilot.
Verge references the 2015 Google Books case as an example, a decade-long case which resulted in Google Books actually becoming legal. Google's use of millions of copyrighted books that users can freely search for snippets or quotes was deemed to be a transformative experience for the original content.
However, the case did not guarantee that creating new content learned from old content would also come under fair use. For example, Google Books could learn every book written by Neil Gaiman. However, if it then generated a new book in the style of Gaiman, that book would be legally unstable.
Is it fair use?
Github Copilot hasn't gone through a court case to decide whether or not its training is fair use, yet. However, The Verge approached Professor Mark Lemley who expressed the opinion that nabbed snippets of code are derivative in nature. He said:
“I fall in the camp that believes Copilot’s generated code is absolutely derivative work. My hope is that people would be happy to have their code used for training. Not for it to show up verbatim in someone else’s work necessarily, but we’re all better off if we have better-trained AIs.”