Microsoft's Github Copilot automatic coding is legally shaky

share to other networks share to twitter share to facebook

The purpose of self-learning artificial intelligence is to make computers learn from the actions of itself and others. In terms of Microsoft’s much-hyped Github Copilot technology, the line between learning and regurgitation has yet to be crossed.

Table of Contents

At it turns out, Github Copilot isn't so much learning from others as it is stealing. In fact, it's using previously written code as a cheat sheet.

Github Copilot is a bit of a thief

Advertisement

On Friday, software engineer Kyle Peacock discovered that Github's new automatic code generator was pulling code from a very well-known coder. If the AI program attempts to generate an About Me page, it'll consistently reference coder David Celis.

As the program is based on millions of lines of code published to Github, it's no surprise that it would throw up familiar code. However, for something as basic as an About Me page to reference a specific individual, the legal implications of the program are being questioned.

At the moment, Celis says he's “not surprised” his repositories are directly referenced. He also doesn't mind. Additionally, Microsoft has attempted to state that their current training model is “considered fair use across the machine learning community”. However, others aren't so certain.

Read More: China's new Quantum Computer is 8 years ahead of Google's latest supercomputer

Where’s the legal boundary?

A report by The Verge explains that Microsoft may not be as in the clear as they think they are. The report states that there's no legal precedent that keeps Microsoft’s training method in the clear for Github Copilot.

Verge references the 2015 Google Books case as an example, a decade-long case which resulted in Google Books actually becoming legal. Google's use of millions of copyrighted books that users can freely search for snippets or quotes was deemed to be a transformative experience for the original content.

Advertisement

However, the case did not guarantee that creating new content learned from old content would also come under fair use. For example, Google Books could learn every book written by Neil Gaiman. However, if it then generated a new book in the style of Gaiman, that book would be legally unstable.

Read More: Neil Gaiman gives "zero f*@ks" about Sandman casting "controversy"

Is it fair use?

Github Copilot hasn't gone through a court case to decide whether or not its training is fair use, yet. However, The Verge approached Professor Mark Lemley who expressed the opinion that nabbed snippets of code are derivative in nature. He said:

prime-day-trial
Don't Miss Out! Prime Day is coming! Claim your FREE 30-day Prime trial for exclusive Prime Day deals!
“I fall in the camp that believes Copilot’s generated code is absolutely derivative work. My hope is that people would be happy to have their code used for training. Not for it to show up verbatim in someone else’s work necessarily, but we’re all better off if we have better-trained AIs.”

Read More: Antitrust lawsuits are pointless if we don't hold conglomerates accountable