Devin demolishes benchmarks as the first Software Engineer AI

devin AI logo on black background

devin AI logo on black background

Devin, the first software engineer AI, has been the centre of tech news all over the globe with its groundbreaking performance in the SWE-bench coding benchmark. The AI has developers and engineers both excited and worried by its immense capabilities.

The Bay Area startup Cognition unveiled Devin, calling it the world's premier AI software engineer.

Devin recently made headlines when it emphatically passed multiple advanced-level job practical AI interviews successfully. The platform can do long-term reasoning alongside a feedback loop, which allows it to perform trial and error on its own output for precise decisions on complex projects. It can take a simple command and turn it into a functioning website or software program.

Testing has revealed that Devin correctly resolves 13.86%* of the issues end-to-end. This is a massive increase compared to the previous 4.8% from Claude-2, a number which was once called 'state-of-the-art'. According to the company “Even when given the exact files to edit, the best previous models can only resolve 4.80% of issues.”

devin AI benchmarks
expand image

The company highlighted on its website that the AI is fully capable of:

  • Learning how to use unfamiliar technologies
  • Running ControlNet on Modal to produce images with concealed messages
  • Building and deploying apps end-to-end
  • Creating an interactive website simulating the Game of Life
  • Autonomously finding and fixing bugs in codebases
  • Training and fine-tuning its own AI models
  • Addressing bugs and feature requests in open-source repositories
  • Contributing to mature production repositories
  • Handling real jobs on platforms like Upwork, such as writing and debugging code for computer vision models

Unsurprisingly, such a powerful AI creates a sense of nervousness and unease regarding the future of humans in software development. However, the company continues to insist that the AI is designed to assist and enhance human expertise, not replace it. They emphasise a mutually beneficial relationship between AI and human ingenuity.

Cognition boasts a $21 million Series A-led Founders Fund meaning the Devin project will continue development and might start rolling out to users soon. New developments will be fascinating to follow and we'll be sure to cover any updates.

Meanwhile, you can take a look at how Google developed a new AI that can learn and master every video game and a Microsoft AI engineer’s warning to the FTC over Copilot safety concerns.

This Article's Topics

Explore new topics and discover content that's right for you!

NewsTechAI
Have an opinion on this article? We'd love to hear it!