Rendered at 02:40:37 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
ricardobeat 5 hours ago [-]
It’s interesting how little press Minimax M3 gets, given it outperforms Deepseek V4 Pro, previously the SOTA for open models. Meanwhile GLM has been in the news daily.
besterman23 10 hours ago [-]
I wonder if multiple attempts at the opossum would produce better results.
If we didn’t have the previous example I would interpret this as pretty solid evidence that labs were training on the Pelican “benchmark”.
I just can’t imagine a model dropping so significantly from one version to the next on such a silly task.
ChrisArchitect 6 hours ago [-]
Related:
GLM-5.2 is the new leading open weights model on Artificial Analysis
If we didn’t have the previous example I would interpret this as pretty solid evidence that labs were training on the Pelican “benchmark”.
I just can’t imagine a model dropping so significantly from one version to the next on such a silly task.
GLM-5.2 is the new leading open weights model on Artificial Analysis
https://news.ycombinator.com/item?id=48567759