Their disclosed run rate was 14bn around the time of those filings IIRC, they started showing meaningful revenue around start of 2025, so if you just linearly extrapolate up that would give you ~7bn-ish actual revenue over that period. The more the growth is weighted towards the last few months the more that number goes down
So I don't think those numbers are really in tension at all
This makes no sense. You can describe the brain reductively enough and make it sound like it can't have an original insight either. Transformers are expressive enough function approximaters in theory, there's no reason why a future one couldn't have novel insights.
This is such a weird misconception I keep seeing - the fact that the loss function during training is minimising CE/maximizing prob of correct token doesn't mean that it can't do "real" thinking. If circuitry doing "real" thinking is the best solution found by SGD then it obviously will
I think this is valid criticism, but it's also unclear how much this is an "inherent" shortcoming vs the kind of thing that's pretty reasonable given we're really seeing the first generation of this new model paradigm.
Like, I'm as sceptical of just assuming "line goes up" extrapolation of performance as much as anyone, but assuming that current flaws are going to continue being flaws seems equally wrong-headed/overconfident. The past 5 years or so has been a constant trail of these predictions being wrong (remember when people thought artists would be safe cos clearly AI just can't do hands?). Now that everyone's woken up to this RL approach we're probably going to see very quickly over the next couple years how much these issues hold up
(Really like the problem though, seems like a great test)
Yeah, that's a great point. While this is evidence that the sort of behavior LeCun predicted is currently displayed by some reasoning models, it would be going too far to say that it's evidence it will always be displayed. In fact, one could even have a more optimistic take - if models that do this can get 90+% on AIME and so on, imagine what a model that had ironed out these kinks could do with the same amount of thinking tokens. I feel like we'll just have to wait and see whether that pans out.
And people with short term memory loss nevertheless have theory of mind just fine. Nothing about LLM's dropping context over big enough windows implies they don't have theory of mind, it just shows they have limitations - just like humans even with "normal" memory will lose track over a huge context window.
Like there are plenty of shortcomings of LLMs but it feels like people are comparing them to some platonic ideal human when writing them off
> Nothing about LLM's dropping context over big enough windows implies they don't have theory of mind
ToM is a large topic, but most people, when talking about an entity X, they have a state in memory about that entity, almost like an Object in a programming language. Thta Object has attributes, and conditions etc that exist beyond the context window of the observer.
If you have a friend Steve, who is a doctor. And you don't see him for 5 years, you can predict he will still be working at the hospital, because you have an understanding of what Steve is.
For an LLM you can define a concept of Steve, and his profession and it will adequately mimic replies about him. But in 5 years that LLMs would not be able to talk about Steve. It would recreate a different conversation, possibly even a convincing simulacrum of remembering Steve. But internally, there is no Steve, nowhere in the nodes of the LLM does Steve exist or have ever existed.
That inability to have a world model means that an LLM can replicate the results of a theory of mind but not posses one.
Humans lose track of information, but we have a state to keep track of elements that are ontologicaly distinct. LLMs do not, and treat them as equal.
For a human, the sentence Alice and bob go to the market, when will they be back? is different than Bob and Alice went to the market, when will they be back?
Because Alice and Bob are real humans, you can imagine them, you might have even met them. But to an LLM those are the same sentence. Even outside of the argument about The Red Room/ Mary's room there simply are enough gaps in the way a LLM is constructed to be considered a valid owner of a ToM
ToM is about being able to model the internal beliefs/desires etc of another person as being entirely distinct from yours. You're basically bringing up a particular implementation of long-term memory as a necessary component of it, which I've never once seen? If someone has severe memory issues, they could forget who Steve is every few minutes, but still be able to look at Steve doing something and model what Steve must want and believe given his actions
I don't think we have any strong evidence on whether LLMs have world-models one way or another - it feels like a bit of a fuzzy concept and I'm not sure what experiments you'd try here.
I disagree with your last point, I think those are functionally the same sentence
> ToM is about being able to model the internal beliefs/desires etc of another person as being entirely distinct from yours.
In that sentence you are implying that you have the "ability to model ... another". An LLM cannot do that, it can't have an internal model that is consistent beyond its conversational scope. Its not meant to. Its a statistics guesser, its probabilistic, holds no model, and its anthropomorphised by our brains because the output is incredibly realistic not because it actually has that ability
The ability to mimic the replies of someone with that ability, is the same of Mary being able to describe all the qualities of Red. She still cannot see red, despite her ability to pass any question in relation to its characteristics.
> I don't think we have any strong evidence on whether LLMs have world-models one way or another
They simply cannot by their architecture. Its a statistical language sampler, anything beyond the scope of that fails. Local coherance is why they pick the next right token not because they can actually model anything.
> I think those are functionally the same sentence
Functionally and literally are not the same thing though. Its why we can run studies as to why some people might say Bob and Alice (putting the man first) or Alice and Bob (alphabetical naming) and what human societies and biases affect the order we put them on.
You could not run that study on an LLM because you will find that statistically speaking the ordering will be almost identical to the training data. If the training data overwhelmingly puts male names first or whether the training data orders list alphabetically you will see that reproduced on the output of the llm because Bob and Alice are not people, they are statistical probably letters in order.
LLM seem to trigger borderline mysticism in people who are otherwise insanely smart, but the kind of "we cant know its internal mind" sounds like reading tea leaves, or horoscopes by people with enough Phds to have their number retired on their university like Michael Jordan.
Do you work in ML research on LLMs? I do, and I don't understand why people are so unbelievable confident they understand how AI and human brains work such that they can definitely tell what functions of the brain LLMs can also perform. Like, you seem to know more than leading neuroscientists, ML researchers, and philosophers, so maybe you should consider a career change. You should maybe also look into the field of mechanistic interpretability, where lots of research has been done on internal representations these models form - it turns out, to predict text really really well, building an internal model of the underlying distribution works really well
If you can rigorously state what "having a world model" consists of and what - exactly - about a transformer architecture precludes it from having one I'd be all ears. As would the academic community, it'd be a groundbreaking paper.
This prety much seems to boil down to "brain science is really hard so as long as you dont have all the answers then AI is maybe half way there is a valid hypothesis". As more is understood about the brain and more about the limitations of LLMs arch then the distance only grows. Its like the God of the gaps where god is an answer for anythign science cant explain, ever shrinking, but with the LLM ability to have capabilities beyond striking statistical accuracy and local coherance.
You dont need to be unbelievably confident or understand exactly how AI and human brains work to make certain assesments. I have a limited understanding of biology, I can however make an assesment on who is healthier between a 20 year old person who is active and has a healthy diet compared to someone with a sedentary lifestyle, in their late 90s and with a poor diet. This is an assesement we can do despite the massive gaps we have in terms of understanding aging, diet, activity and overall health impact of individual actions.
Similarly, despite my limited understanding of space flight, I know Apollo 13 cannot cook an egg or recite french poetry. Despite the unfathamobly cool science inside the space craft, it cannot, by design do those things.
> the field of mechanistic interpretability
The field is cool, but it cannot prove its own assumption yet. The field is trying to prove you can reverse engineer a model to be humanly understood. Their assumptions such as mapping specific weights or neurons to features has failed to be reproduced multiple times, with the weight effects being way more distributed and complicated than initially thought. This is specially true for things that are equally mystified as the emergent abilities of LLMs. The ability of mimicking nuanced language being unlocked after a critical mass of parameters, does not create a rule as for which increased parameterisation will increase linerly or exponentially the abilities of an LLM.
> it turns out, to predict text really really well, building an internal model of the underlying distribution works really well
yeah, an internal model works well because most words are related to their neighbours, thats the kind of local coherance the model excels at. But to build a world model, the kind a human mind interacts with, you need a few features that remain elusive (some might argue impossible to achieve) to a transformer architecture.
Think of games like chess, an llm is capable of accurately expressing responses that sound like game moves, but the second the game falls outside its context window the moves become incoherent (while still sounding plausible).
You can fix this, with arch that do not have a transformer model underlying it, or by having multiple agents performing different tasks inside your arch, or by "cheating" and using a state outside the llm response to keep track of context beyond reasonable windows. Those are "solutions" but all just kinda prove the transformer lacks that ability.
Other tests abour casuality, or reacting to novel data (robustness), multi step processes and counterfactual reasoning are all the kind of tasks transformers still (and probably always) will have trouble with.
For a tech that is so "transparent" in its mistakes, and so "simple" in its design (replacing the convolutions with an attention transformer, its genius) I still think its talked about in borderline mystic tones, invoking philosophy and theology, and a hope for AGI that the tech itself does not lend to beyond the fast growth and surprisingly good results with little prompt engineering.
With computer use, you can get Claude to read and write files and have some persistence outside of the static LLM model. If it writes a file Steve.txt, that it can pull up later, does it now have ToM?
If you have virtually no pricing power and have to drop your $200/mo to $15/mo that's a big deal if your $300bn valuation is implying that not happening, which is what OP's point is about
Idk what you mean by saying this doesn't preclude a monopoly - having your pricing power eroded by competition is kinda one of the key features of what a monopolistic market isn't
Not at all. Monopolies don't imply an anti-rigid price curve. In fact, monopolies almost never have that.
A monopoly means a company has enough leverage to corner and disproportionately own the market. This is entirely possible (and usually the case) even with significant pricing pressure.
I'm very pro some systematic auditing/cleaning of out sclerotic waste, but I don't see how anyone can look at the way this is being handled and not be incredibly worried
I think it's the second-order stuff here. Even assuming Musk were to do a fantastic job at just clearing out inefficiency in a smart way (which seems unlikely given the actions he's taken/leaks around cutting funding based on key-word matching etc.), the higher-order point that someone can just buy their way into the President's inner-circle and have complete free-reign to seize government operations and make changes with 0 transparency/accountability seems like it does just stupid amounts of harm to the integrity of the system
> make changes with 0 transparency/accountability seems like it does just stupid amounts of harm to the integrity of the system
pray tell who was accountable for the grant issuance in the first place? was congress approving every disbursal? could the citizenry vote up/down on every RO1 or SBIR that went past the NIH desk?
Hey man, if you wanna make a point just make a point - no need to try the whole snarky rhetorical thing
Ofc not every decision is fully democratic, but the people making them are beholden to rules and systems which are - or at the least, have a clear chain of command back to individuals who Congress has direct authority over. No one ever said you needed 100% democratic oversight on every action, as long as those actions are obeying the system that was democratically established
The problem is doing it in an extra-legal way, where the Executive Office is giving a crony power his branch doesn't/shouldn't be able to bestow, where people telling this crony no when he tries things he shouldn't be able to do all seem to get put on leave etc
the executive has broad leeway to spend as it sees fit. i 100% guarantee you that disbursal of funds to grant recipients involves calling on extralegal outside-the-government "experts" making advisory recommendations without direct consultation of congress or the voter.
point is, live by the sword, die by the sword. it's hypocritical to whine about cutting funding by the exact same mechanism that is used to give it out because you dont like the political party of the cutter.
and you can't say "keep politics out of science". because when you're pulling from the public purse, it is inherently political.
there are ways to fund science that are apolitical. HHMI, ACS, ADA, AHA, etc.
Executive branch has leeway to decide on what to fund within the parameters set for the program by Congress. It can evaluate grants and set processes but not completely change the acceptance criteria or scope, which is under the jurisdiction of Congress - USAID is jointly under the purview of the executive and legislative branches. This isn't a "team" thing - Congress sets the scope of what USAID should be doing, and anyone changing that - or dismantling the program altogether - without their authority is overreaching
And again, my main issue here is that under any reasonably interpretation, Musk would qualify as a Principal officer, which as the Appointments clause of the Constitution clearly lays out requires Senate approval. It is beyond ridiculous that the head of a new "Department" who seems to have unilateral power over other departments now, is not subject to any kind of oversight or accountability to other branches of government - this is exactly the kind of shit the checks and balances were designed for
I mean those same conditions already just lead the human to cutting corners and making stuff up themselves. You're describing the problem where bad incentives/conditions lead to sloppy work, that happens with or without AI
Catching errors/validating work is obviously a different process when they're coming from an AI vs a human, but I don't see how it's fundamentally that different here. If the outputs are heavily cited then that might go someway into being able to more easily catch and correct slip-ups
Making it easier and cheaper to cut corners and make stuff up will result in more cut corners and more made up stuff. That's not good.
Same problem I have with code models, honestly. We already have way too much boilerplate and bad code; machines to generate more boilerplate and bad code aren't going to help.
Yep, I agree with this to some extent, but I think the difference in the future is all that stress will be bypassed and people will reach for the AI from the start.
Previously there was alot of stress/pressure which might or might not have led to sloppy work (some consultants are of a high quality). With this, there will be no stress which will (always?) lead to sloppy work. Perhaps there's an argument for the high quality consultants using the tools to produce accurate and high quality work. There will obviously be a sliding scale here. Time will tell.
I'd wager the end result will be sloppy work, at scale :-)
They're also still deep in their loss-making phase, the whole "incumbent squashing upstarts" stance is a lot easier to pull off when you're settled and printing money
> Is the key idea here that current AI development has figured out enough to brute force a path towards AGI?
My sense anecdotally from within the space is yes people are feeling like we most likely have a "straight shot" to AGI now. Progress has been insane over the last few years but there's been this lurking worry around signs that the pre-training scaling paradigm has diminishing returns.
What recent outputs like o1, o3, DeepSeek-R1 are showing is that that's fine, we now have a new paradigm around test-time compute. For various reasons people think this is going to be more scalable and not run into the kind of data issues you'd get with a pre-training paradigm.
You can definitely debate on whether that's true or not but this is the first time I've been really seeing people think we've cracked "it", and the rest is scaling, better training etc.
> My sense anecdotally from within the space is yes people are feeling like we most likely have a "straight shot" to AGI now
My problem with this is that people making this statement are unlikely to be objective. Major players are in fundraising mode, and safety folks are also incentivised to be subjective in their evaluation.
Yesterday I repeatedly used OpenAI’s API to summarise a document. The first result looked impressive. However, comparing repeated results revealed that it was missing major points each time, in a way a human would certainly not. In the surface the summary looked good, but careful evaluation indicated a lack of understanding or reasoning.
Don’t get me wrong, I think AI is already transformative, but I am not sure we are close to AGI. I hear a lot about it, but it doesn’t reflect my experience in a company using and building AI.
Yeah obviously motivations are murky and all over the place, no one's free of bias. I'm not taking a strong stance on whether they're right or not or how much of it is motivated reasoning, I just think at least quite a bit is genuine (I'm mainly basing this off researchers I know who have a track record of being very sober and "boring" rather than the flashy Altman types)
To your point, yeah the models still suck in some surprising ways, but again it's that thing of they're the worst they're ever going to be, and I think in particular on the reasoning issue a lot of people are quite excited that RL over CoT is looking really really promising for this.
I agree with your broader point though that I'm not sure how close we are and there's an awful lot of noise right now
“The worst they’re going
to be” line is a bit odd. I hear it a lot, but surely it’s true of all tech? So why are we hearing it more now? Perhaps that is a sign of hype?
Yeah that's a fair point! It's def a more general tech thing, but I think there are a couple specific reasons why it comes up more here though. Firstly, I think most tech does not improve at the insane rate that AI has been historically, so people's perception of capabilities become out of date just incredibly rapidly here (think about how long people we're banging on about "AI can't draw hands!" well after better models came out that could). If you think of the line as a way to say "don't anchor on what it can do today!" then it feels more appropriate to go on about this more for a more rapidly-changing field
Secondly, I think there's a tendency in AI for some ppl to look at failures of models and attribute it to some fundamental limitation of the approach, rather than something that future models will solve. So I think the line also gets used as short-hand for "Don't assume this limitation is inherent to the approach". I think in other areas of tech there's less of a tendency to try to write off entire areas because of present-day limitations, hence the line coming up more often
So you're right that the line is kind of universally applicable in tech, I guess I just think the kinds of bad arguments that warrant it as a rejoinder are more common around AI?
I agree with your take, and actually go a bit further. I think the idea of "diminishing returns" is a bit of a red herring, and it's instead a combination of saturated benchmarks (and testing in general) and expectations of "one llm to rule them all". This might not be the case.
We've seen with oAI and Anthropic, and rumoured with Google, that holding your "best" model and using it to generate datasets for smaller but almost as capable models is one way to go forward. I would say that this shows the "big models" are more capable than it would seem and that they also open up new avenues.
We know that Meta used L2 to filter and improve its training sets for L3. We are also seeing how "long form" content + filtering + RL leads to amazing things (what people call "reasoning" models). Semantics might be a bit ambitious, but this really opens up the path towards -> documentation + virtual environments + many rollouts + filtering by SotA models => new dataset for next gen models.
That, plus optimisations (early exit from meta, titans from google, distillation from everyone, etc) really makes me question the "we've hit a wall" rhetoric. I think there are enough tools on the table today to either jump the wall, or move around it.
So I don't think those numbers are really in tension at all
reply