nomad_horse's comments

nomad_horse · 2025-09-09T10:22:43 1757413363

> Voxtral outperforms Whisper

Can I stop you right here? Whisper is a few years old and it wasn't the best model for a long time. There are like 10 models that are smaller and faster and outperform both of them.

And these models existed before Voxtral.

diggan · 2025-09-09T10:39:27 1757414367

> There are like 10 models that are smaller and faster and outperform both of them.

As someone who is currently relying on Whisper for some things, what models are those exactly? I still haven't found anything that is accurate as Whisper (large), are those models just faster or also as accurate/more accurate?

artemisart · 2025-09-09T10:57:25 1757415445

Nvidia parakeet and canary are better and faster, here is a leaderboard: https://huggingface.co/spaces/hf-audio/open_asr_leaderboard

diggan · 2025-09-09T10:59:58 1757415598

> Nvidia parakeet and canary are better and faster

Is that based on your own experience using those and also Whisper, comparing them side-by-side? Or is that based just on those benchmark results?

artemisart · 2025-09-09T23:34:50 1757460890

Yes for parakeet, but only comparing benchmark results for canary. Whisper also has severe hallucinations on silence and noise and WhisperX helps a lot, it adds voice activity detection i.e. a model to detect when someone speaks, to filter the input before running whisper. https://github.com/m-bain/whisperX

wahnfrieden · 2025-09-09T11:44:57 1757418297

Parakeet isn’t more accurate than whisper large

nomad_horse · 2025-08-13T11:52:01 1755085921

Whisper has the encoder-decoder architecture, so it's hard to run streaming efficiently, though whisper-streaming is a thing.

https://kyutai.org/next/stt is natively streaming STT.

woodson · 2025-08-13T15:06:00 1755097560

There are many streaming ASR models based on CTC or RNNT. Look for example at sherpa (https://github.com/k2-fsa/sherpa-onnx), which can run streaming ASR, VAD, diarization, and many more.

nomad_horse · 2025-07-21T17:11:12 1753117872

Do I understand it correctly that OpenAI self-proclaimed that they got their gold, without the official IMO judges grading their solutions?

NitpickLawyer · 2025-07-21T17:31:16 1753119076

Well, I don't doubt that they did get those results, but it is clear now that it was not an official collaboration. It was heavily implied in a statement by IMO's president a few days ago (the one where they said they'd prefer AI companies wait a week before announcing, so that the focus is first on the human competitors).

Goog had an official colab with IMO, and we can be sure they got those results under the imposed constraints (last year they allocated ~48h for silver IIRC) and an official grading by the IMO graders.

Tadpole9181 · 2025-07-21T20:46:16 1753130776

So from 48 hours for silver down to 4.5 hours for gold in one year? And all reasoning generated is clear and easy to follow? That's one hell of an improvement.

tshadley · 2025-07-21T18:18:23 1753121903

Yes, OpenAI:

https://x.com/alexwei_/status/1946477754372985146

> 6/N In our evaluation, the model solved 5 of the 6 problems on the 2025 IMO. For each problem, three former IMO medalists independently graded the model’s submitted proof, with scores finalized after unanimous consensus. The model earned 35/42 points in total, enough for gold!

That means Google Deepmind is the first OFFICIAL IMO Gold.

https://x.com/demishassabis/status/1947337620226240803

> We've now been given permission to share our results and are pleased to have been part of the inaugural cohort to have our model results officially graded and certified by IMO coordinators and experts, receiving the first official gold-level performance grading for an AI system!

nomad_horse · 2025-07-21T18:23:50 1753122230

Do you know if OpenAI used the same grading criteria as official judges?

tshadley · 2025-07-21T18:33:15 1753122795

As IMO medalists they would be expected to I'm sure.

But this can be verified because the results are public:

https://github.com/aw31/openai-imo-2025-proofs/

jacquesm · 2025-07-21T17:25:11 1753118711

Why am I not surprised?

ipsum2 · 2025-07-21T17:13:50 1753118030

danjl · 2025-07-21T17:25:51 1753118751

Childish. And, of course they must have known there was an official LLM cohort taking the real test, and they probably even knew that Gemini got a gold medal, and may have even known that Google planned a press release for today.

cma · 2025-07-21T17:35:21 1753119321

I think maybe all Altman companies have used tactics like this.

> We were trying to get a big client for weeks, and they said no and went with a competitor. The competitor already had a terms sheet from the company were we trying to sign up. It was real serious.

> We were devastated, but we decided to fly down and sit in their lobby until they would meet with us. So they finally let us talk to them after most of the day.

> We then had a few more meetings, and the company wanted to come visit our offices so they could make sure we were a 'real' company. At that time, we were only 5 guys. So we hired a bunch of our college friends to 'work' for us for the day so we could look larger than we actually were. It worked, and we got the contract.

> I think the reason why PG respects Sam so much is he is charismatic, resourceful, and just overall seems like a genuine person.

https://news.ycombinator.com/item?id=3048944

tough · 2025-07-21T21:50:10 1753134610

>> > I think the reason why PG respects Sam so much is he is charismatic, resourceful, and just overall seems like a genuine person.

does he? wasn't sama ousted of YC in some muddy ways after he tried to co-opt in into an OpenAI investment arm, was funny to find the YC Open Research project landing page on yc's website now defunct and pointing how he misrepresented it as a YC project when it was his own

maybe he fears him, but I doubt pg respects him, unless he respects evil, lol

cma · 2025-07-21T22:56:36 1753138596

The post was from 14 years ago, before that.

tough · 2025-07-22T02:34:08 1753151648

oh, gotcha

pishpash · 2025-07-21T18:49:46 1753123786

So, a more charismatic version of Zuck is Zucking, what a surprise. Company culture starts at its origin. Despite Google's corruption, its origin is in academia and it shows even now.

gowld · 2025-07-21T20:36:40 1753130200

For a long time, the YC application asked founders for an example of how they "hacked" (cheated) a system.

tough · 2025-07-21T21:51:08 1753134668

I think pg was into it way before sama was a baby lol https://www.paulgraham.com/gh.html

gametorch · 2025-07-21T17:28:11 1753118891

(deleted because I was mistaken)

nomad_horse · 2025-07-21T17:29:53 1753118993

isn't he an IOI medalist? and even if he was an IMO medalist, isn't there a bit of a conflict of interests?

cbsmith · 2025-07-21T17:30:33 1753119033

Just a wee bit.

cbsmith · 2025-07-21T17:30:17 1753119017

I'm pretty sure when they got the gold medal they weren't allowed to judge themselves.

nomad_horse · 2025-07-17T17:55:25 1752774925

> brought back competitive open source audio transcription

Bear in mind that there are a lot of very strong _open_ STT models that Mistral's press-release didn't bother to compare to, making impression they are the best new open thing since Whisper. Here is an open benchmark: https://huggingface.co/spaces/hf-audio/open_asr_leaderboard . The strongest model Mistral compared to is Scribe, ranked 10 here.

This benchmark is for English, but many of those models are multilingual (eg https://huggingface.co/nvidia/canary-1b-flash )

espadrine · 2025-07-17T18:49:31 1752778171

The best model there is 2.5B parameters. I can believe that a model 10x bigger is somewhat better.

One element of comparison is OpenAI Whisper v3, which achieves 7.44 WER on the ASR leaderboard, and shows up as ~8.3 WER on FLEURS in the Voxtral announcement[0]. If FLEURS has +1 WER on average compared to ASR, it would imply that Voxtral does have a lead on ASR.

[0]: https://mistral.ai/news/voxtral

nomad_horse · 2025-07-17T18:57:24 1752778644

There are larger models in there, a 8B and a 6B. By this logic they should be above 2B model, yet we don't see this. That's why we have open standard benchmarks, to measure this directly - not hypothesize by the models' sizes or do some cross-dataset arithmetics.

Also note that, Voxtral's capacity is not necessarily all devoted to speech, since it "Retains the text understanding capabilities of its language model backbone"

jiehong · 2025-07-17T21:24:06 1752787446

I just can’t find dictation apps for Mac using those models except for open whisper.

IBM’s granite models seems multilingual and well ranked, but can’t find any app using it.

Anybody aware of a dictation app using one of those "better" models?

M4v3R · 2025-07-18T06:03:10 1752818590

Have you tried https://spokenly.app/ ?

They do support Voxtral, among others.

jiehong · 2025-07-18T21:41:05 1752874865

Thanks I’m gonna try

irqlevel · 2025-07-18T12:37:33 1752842253

Try https://whisperclip.com —it delivers low-latency, real-time voice-to-text streaming to any macOS app.

nomad_horse · on Dec 24, 2024

Yet your Google Research colleagues still earned way more than in academia, even without the promo.

Plus, there were quite a few places where a good publication stream did earn a promotion, without any company/business impact. FAIR, Google Brain, DM. Just not Google Research.

DeepMind didn't have any product impact for God knows how many years, but I bet they did have promos happening:)

bartwr · on Dec 24, 2024

You don't understand the Silicon Valley grind mindset :) I personally agree with you - I am happy working on interesting stuff, getting a good salary, and don't need a promo. Most times I switched jobs it was a temporary lowering of my total comp and often the level. But most Googlers are obsessed with levels/promotion, talk about it, and the frustration is real. They are hyper ambitious and see level as their validation.

And if you join as a PhD fresh grad (RS or SWE), L4 salary is ok, but not amazing compared to costs of living there. From L6 on it starts to be really really good.

nomad_horse · on Dec 25, 2024

I assure you, before the LLM race, those research shops (DM, FAIR) had many directors that didn't contribute to any product whatsoever.

david-gpu · on Dec 25, 2024

> I am happy working on interesting stuff, getting a good salary, and don't need a promo

People who don't contribute to the bottom line are the first to get a PIP or to be laid off. Effectively the better performers are subsidizing their salary, until the company sooner or later decides to cut dead wood.

nomad_horse · on Aug 31, 2024

Those are likely for Cloud, used by clients.

nomad_horse · on June 18, 2024

FAIR is considerably downscaled from what it was before, in eg 2022.

nomad_horse · on Aug 30, 2023

I think all FAIR researchers and engineers are aware of the NC-CA license limitations.

They still use it because releasing the code for an already approved paper under NC-CA is super easy (~ self-approving by clicking a few buttons) vs following a slow open-sourcing process needed for the MIT license (includes approvals from a sponsoring director or two, committing to support the code for at least a year, etc). Releasing under MIT can easily take a few weeks with each stage requiring finding someone responsible and chasing them across time-zones.

The best practice seems to release under NC-CA and re-license it later under MIT. Maybe this will happen here, too.