Can I stop you right here? Whisper is a few years old and it wasn't the best model for a long time. There are like 10 models that are smaller and faster and outperform both of them.
> There are like 10 models that are smaller and faster and outperform both of them.
As someone who is currently relying on Whisper for some things, what models are those exactly? I still haven't found anything that is accurate as Whisper (large), are those models just faster or also as accurate/more accurate?
Yes for parakeet, but only comparing benchmark results for canary. Whisper also has severe hallucinations on silence and noise and WhisperX helps a lot, it adds voice activity detection i.e. a model to detect when someone speaks, to filter the input before running whisper. https://github.com/m-bain/whisperX
There are many streaming ASR models based on CTC or RNNT. Look for example at sherpa (https://github.com/k2-fsa/sherpa-onnx), which can run streaming ASR, VAD, diarization, and many more.
Well, I don't doubt that they did get those results, but it is clear now that it was not an official collaboration. It was heavily implied in a statement by IMO's president a few days ago (the one where they said they'd prefer AI companies wait a week before announcing, so that the focus is first on the human competitors).
Goog had an official colab with IMO, and we can be sure they got those results under the imposed constraints (last year they allocated ~48h for silver IIRC) and an official grading by the IMO graders.
So from 48 hours for silver down to 4.5 hours for gold in one year? And all reasoning generated is clear and easy to follow? That's one hell of an improvement.
> 6/N In our evaluation, the model solved 5 of the 6 problems on the 2025 IMO. For each problem, three former IMO medalists independently graded the model’s submitted proof, with scores finalized after unanimous consensus. The model earned 35/42 points in total, enough for gold!
That means Google Deepmind is the first OFFICIAL IMO Gold.
> We've now been given permission to share our results and are pleased to have been part of the inaugural cohort to have our model results officially graded and certified by IMO coordinators and experts, receiving the first official gold-level performance grading for an AI system!
Childish. And, of course they must have known there was an official LLM cohort taking the real test, and they probably even knew that Gemini got a gold medal, and may have even known that Google planned a press release for today.
I think maybe all Altman companies have used tactics like this.
> We were trying to get a big client for weeks, and they said no and went with a competitor. The competitor already had a terms sheet from the company were we trying to sign up. It was real serious.
> We were devastated, but we decided to fly down and sit in their lobby until they would meet with us. So they finally let us talk to them after most of the day.
> We then had a few more meetings, and the company wanted to come visit our offices so they could make sure we were a 'real' company. At that time, we were only 5 guys. So we hired a bunch of our college friends to 'work' for us for the day so we could look larger than we actually were. It worked, and we got the contract.
> I think the reason why PG respects Sam so much is he is charismatic, resourceful, and just overall seems like a genuine person.
>> > I think the reason why PG respects Sam so much is he is charismatic, resourceful, and just overall seems like a genuine person.
does he? wasn't sama ousted of YC in some muddy ways after he tried to co-opt in into an OpenAI investment arm, was funny to find the YC Open Research project landing page on yc's website now defunct and pointing how he misrepresented it as a YC project when it was his own
maybe he fears him, but I doubt pg respects him, unless he respects evil, lol
So, a more charismatic version of Zuck is Zucking, what a surprise. Company culture starts at its origin. Despite Google's corruption, its origin is in academia and it shows even now.
> brought back competitive open source audio transcription
Bear in mind that there are a lot of very strong _open_ STT models that Mistral's press-release didn't bother to compare to, making impression they are the best new open thing since Whisper. Here is an open benchmark: https://huggingface.co/spaces/hf-audio/open_asr_leaderboard . The strongest model Mistral compared to is Scribe, ranked 10 here.
The best model there is 2.5B parameters. I can believe that a model 10x bigger is somewhat better.
One element of comparison is OpenAI Whisper v3, which achieves 7.44 WER on the ASR leaderboard, and shows up as ~8.3 WER on FLEURS in the Voxtral announcement[0]. If FLEURS has +1 WER on average compared to ASR, it would imply that Voxtral does have a lead on ASR.
There are larger models in there, a 8B and a 6B. By this logic they should be above 2B model, yet we don't see this. That's why we have open standard benchmarks, to measure this directly - not hypothesize by the models' sizes or do some cross-dataset arithmetics.
Also note that, Voxtral's capacity is not necessarily all devoted to speech, since it "Retains the text understanding capabilities of its language model backbone"
Yet your Google Research colleagues still earned way more than in academia, even without the promo.
Plus, there were quite a few places where a good publication stream did earn a promotion, without any company/business impact. FAIR, Google Brain, DM. Just not Google Research.
DeepMind didn't have any product impact for God knows how many years, but I bet they did have promos happening:)
You don't understand the Silicon Valley grind mindset :) I personally agree with you - I am happy working on interesting stuff, getting a good salary, and don't need a promo. Most times I switched jobs it was a temporary lowering of my total comp and often the level. But most Googlers are obsessed with levels/promotion, talk about it, and the frustration is real. They are hyper ambitious and see level as their validation.
And if you join as a PhD fresh grad (RS or SWE), L4 salary is ok, but not amazing compared to costs of living there. From L6 on it starts to be really really good.
> I am happy working on interesting stuff, getting a good salary, and don't need a promo
People who don't contribute to the bottom line are the first to get a PIP or to be laid off. Effectively the better performers are subsidizing their salary, until the company sooner or later decides to cut dead wood.
I think all FAIR researchers and engineers are aware of the NC-CA license limitations.
They still use it because releasing the code for an already approved paper under NC-CA is super easy (~ self-approving by clicking a few buttons) vs following a slow open-sourcing process needed for the MIT license (includes approvals from a sponsoring director or two, committing to support the code for at least a year, etc). Releasing under MIT can easily take a few weeks with each stage requiring finding someone responsible and chasing them across time-zones.
The best practice seems to release under NC-CA and re-license it later under MIT. Maybe this will happen here, too.
Can I stop you right here? Whisper is a few years old and it wasn't the best model for a long time. There are like 10 models that are smaller and faster and outperform both of them.
And these models existed before Voxtral.