AI coding research reveals top models struggle with real-world software tasks, as highlighted by OpenAI’s SWE-Lancer benchmark. The study shows even leading AI, Claude 3.5 Sonnet, only solves 26.2% of coding tasks and 44.9% of management tasks, translating to about $400,000 in potential earnings from $1 million, indicating they lag behind human capabilities in practical scenarios.
AI Coding: New Research Shows Even the Best Models Struggle With Real-World Software Engineering
