Tuesday, June 17, 2025

[Thread] A new US paper shows the best frontier LLM models achieve 0% on hard real-life Programming Contest problems, domains where expert humans still excel (Rohan Paul/@rohanpaul_ai)

Rohan Paul / @rohanpaul_ai:
[Thread] A new US paper shows the best frontier LLM models achieve 0% on hard real-life Programming Contest problems, domains where expert humans still excel  —  This is really BAD news of LLM's coding skill. ☹️ The best Frontier LLM models achieve 0% on hard real-life Programming Contest problems, domains where expert humans still excel. LiveCodeBench Pro, a benchmark composed of problems from Codeforces, ICPC, and IOI ("International [image]



No comments:

Post a Comment

MediaTek says it has started to use Intel Foundry's advanced chip packaging in addition to TSMC's, as the mobile chip designer bets on AI demand for growth (Cheng Ting-Fang/Nikkei Asia)

Cheng Ting-Fang / Nikkei Asia : MediaTek says it has started to use Intel Foundry's advanced chip packaging in addition to TSMC's...