Hacker Newsnew | past | comments | ask | show | jobs | submit | lieret's submissionslogin
1.Show HN: All the LM solutions on SWE-bench are bloated compared to humans (twitter.com/klieret)
1 point by lieret 44 days ago | past
2.Show HN: New eval from SWE-bench team evalutes LMs based on goals not tickets (codeclash.ai)
5 points by lieret 5 months ago | past | 1 comment
3.Show HN: Randomly switching between LMs at every step boosts SWE-bench score (swebench.com)
5 points by lieret 8 months ago | past | 1 comment
4.GPT-5 on SWE-bench: Cost and performance deep-dive (mini-swe-agent.com)
4 points by lieret 8 months ago | past | 3 comments
5.Show HN: New SWE-bench leaderboard compares LMs without fancy agent scaffolds (swebench.com)
2 points by lieret 8 months ago | past
6.Show HN: Mini-swe-agent achieves 65% on SWE-bench in 100 lines of python (github.com/swe-agent)
7 points by lieret 8 months ago | past | 4 comments

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: