
o3 and o4-mini - they’re great, but easy to over-hype
Critical analysis of the two most powerful new models behind ChatGPT, o3 and o4-mini. Not just the system cards, benchmarks, and my own tests, but some you may not have seen before. Yes, they can whip up amazing front-end in a few seconds, but you always have to ask what is in their data. Either way, they prove the gains from RL are just beginning…
https://weave-docs.wandb.ai/?utm_source=sponsorship&utm_medium=simple_bench&utm_campaign=ai_explained
AI Insiders ($9!): https://www.patreon.com/AIExplained
Chapters:
00:00 - o3 and o4-mini
https://simple-bench.com/
Plus, Teams and Pro, plus token count: https://x.com/btibor91/status/1912568994512662679
System Card: https://openai.com/index/o3-o4-mini-system-card/
Release Notes: https://openai.com/index/introducing-o3-and-o4-mini/
https://deepmind.google/technologies/gemini/pro/
https://x.com/DeryaTR_/status/1912558350794961168
https://x.com/polynoamial/status/1912564068168450396
API Pricing:https://openai.com/api/pricing/
https://aider.chat/docs/leaderboards/
Non-hype Newsletter: https://signaltonoise.beehiiv.com/
AI Explained Official Podcast
Covering the biggest news of the century - the arrival of smarter-than-human AI. From the author of Simple Bench, which reveals the remaining gap between LLM and human reasoning. Hype-free, and the British accent is a freebie bonus.
- No. of episodes: 23
- Latest episode: 2025-04-16
- Education News Self-Improvement Tech News