> It does sort of give me the vibe that the pure scaling maximalism really is dying off though
I think the big question is if/when investors will start giving money to those who have been predicting this (with evidence) and trying other avenues.
Really though, why put all your eggs in one basket? That's what I've been confused about for awhile. Why fund yet another LLMs to AGI startup. Space is saturated with big players and has been for years. Even if LLMs could get there that doesn't mean something else won't get there faster and for less. It also seems you'd want a backup in order to avoid popping the bubble. Technology S-Curves and all that still apply to AI
Though I'm similarly biased, but so is everyone I know with a strong math and/or science background (I even mentioned it in my thesis more than a few times lol). Scaling is all you need just doesn't check out
I started such an alternative project just before GPT-3 was released, it was really promising (lots of neuroscience inspired solutions, pretty different to Transformers) but I had to put it on hold because the investors I approached seemed like they would only invest in LLM-stuff. Now a few years later I'm trying to approach investors again, only to find now they want to invest in companies USING LLMs to create value and still don't seem interested in new foundational types of models... :/
I guess it makes sense, there is still tons of value to be created just by using the current LLMs for stuff, though maybe the low hanging fruits are already picked, who knows.
I heard John Carmack talk a lot about his alternative (also neuroscience-inspired) ideas and it sounded just like my project, the main difference being that he's able to self-fund :) I guess funding an "outsider" non-LLM AI project now requires finding someone like Carmack to get on board - I still don't think traditional investors are that disappointed yet that they want to risk money on other types of projects..
> I guess funding an "outsider" non-LLM AI project now requires finding someone like Carmack to get on board
And I think this is a big problem. Especially since these investments tend to be a lot cheaper than the existing ones. Hell, there's stuff in my PhD I tabled and several models I made that I'm confident I could have doubled performance with less than a million dollars worth of compute. My methods could already compete while requiring less compute, so why not give them a chance to scale? I've seen this happen to hundreds of methods. If "scale is all you need" then shouldn't the belief that any of those methods would also scale?
I think an important part is to recognize that fundamental research is extremely foundational. We often don't recognize the impacts because by the time we see them they have passed through other layers. Maybe in the same way that we forget about the ground existing and being the biggest contributor to Usain Bolt's speed. Can't run if there's no ground.
But to see economic impact, I'll make the bet that a single mathematical work (technically two) had a greater economic impact than all technologies in the last 2 centuries. Calculus. I haven't run the calculations (seems like it'd be hard and I'd definitely need calculus to do them), but I'd be willing to bet that every year Calculus results in a greater economic impact than FAANG, MANGO, or whatever the hot term is these days, does.
It seems almost silly to say this and it is obviously so influential. But things like this fade away into the background the same way we almost never think about the ground beneath our feet.
I have to say this because we're living in a time where people are arguing we shouldn't build roads because cars are the things that get us places. But this is just framing, and poor framing at that. Frankly, part of it is that roads are typically built through public funds and cars through private. It's this way because the road is able to make much higher economic impact by being a public utility rather than a private one. Incidentally, this makes the argument to not build roads self-destructive. It's short sighted. Just like actual roads, research has to be continuously performed. The reality is more akin to those cartoon scenes where a character is laying down the railroad tracks just in front of the speeding train.[0] I guess if you're not Gromit placing down the tracks it is easy to assume they just exist.
But unlike actual roads, research is relatively cheap. Sure, maybe a million mathematicians don't produce anything economically viable for that year and maybe not 20, but one will produce something worth trillions. And they do this at a mathematician's salary! You can hire research mathematicians at least 2 to 1 for a junior SWE. 10 to 1 for a staff SWE. It's just crazy to me that we're arguing we don't have the money for these kinds of things. I mean just look at the impact of Tim Berners Lee and his team. That alone offsets all costs for the foreseeable future. Yet somehow his net worth is in the low millions? I think we really should question this notion that wealth is strongly correlated to impact.
I think a somewhat comparable situation is in various online game platforms now that I think about it. Investors would love to make a game like Fortnite, and get the profits that Fortnite makes. So a ton of companies try to make Fortnite. Almost all fail, and make no return whatsoever, just lose a ton of money and toss the game in the bin, shut down the servers.
On the other hand, it may have been more logical for many of them to go for a less ambitious (not always online, not a game that requires a high player count and social buy-in to stay relevant) but still profitable investment (Maybe a smaller scale single player game that doesn't offer recurring revenue), yet we still see a very crowded space for trying to emulate the same business model as something like Fortnite. Another more historical example was the constant question of whether a given MMO would be the next "WoW-killer" all through the 2000's/2010's.
I think part of why this arises is that there's definitely a bit of a psychological hack for humans in particular where if there's a low-probability but extremely high reward outcome, we're deeply entranced by it, and investors are the same. Even if the chances are smaller in their minds than they were before, if they can just follow the same path that seems to be working to some extent and then get lucky, they're completely set. They're not really thinking about any broader bubble that could exist, that's on the level of the society, they're thinking about the individual, who could be very very rich, famous, and powerful if their investment works. And in the mind of someone debating what path to go down, I imagine a more nebulous answer of "we probably need to come up with some fundamentally different tools for learning and research a lot of different approaches to do so" is a bit less satisfying and exciting than a pitch that says "If you just give me enough money, the curve will eventually hit the point where you get to be king of the universe and we go colonize the solar system and carve your face into the moon."
I also have to acknowledge the possibility that they just have access to different information than I do! They might be getting shown much better demos than I do, I suppose.
I'm pretty sure the answer is people buying into the scaling is all you need argument. Because if you have that framing then it can be solved through engineering, right? I mean there's still engineering research and it doesn't mean there's no reason to research but everyone loves the simple and straight forward path, right?
> I think a somewhat comparable situation is in various online game platforms
I think it is common in many industries. The weird thing is that being too risk adverse creates more risk. There's a balance that needs to be struck. Maybe another famous one is movies. They go on about pirating and how Netflix is winning but most of the new movies are rehashes or sequels. Sure, there's a lot of new movies, but few get nearly the same advertising budgets and so people don't even hear about it (and sequels need less advertising since there's a lot of free advertising). You'd think there'd be more pressure to find the next hit that can lead to a few sequels but instead they tend to be too risk adverse. That's the issue of monopolies though... or any industry where the barrier to entry is high...
> psychological hack
While I'm pretty sure this plays a role (along with other things like blind hope) I think the bigger contributor is risk aversion and observation bias. Like you say, it's always easier to argue "look, it worked for them" then "this hasn't been done before, but could be huge." A big part of the bias is that you get to oversimplify the reasoning for the former argument compared to the latter. The latter you'll get highly scrutinized while the former will overlook many of the conditions that led to success. You're right that the big picture is missing. Especially that a big part of the success was through the novelty (not exactly saying Fortnite is novel via gameplay...). For some reason the success of novelty is almost never seen as motivation to try new things.
I think that's the part that I find most interesting and confusing. It's like an aversion of wanting to look just one layer deeper. We'll put in far more physical and mental energy to justify a shallow thought than what would be required to think deeper. I get we're biased towards being lazy, so I think this is kinda related to us just being bad at foresight and feeling like being wrong is a bad thing (well it isn't good, but I'm pretty sure being wrong and not correcting is worse than just being wrong).
I wonder about this too because it seems to have to do with long term planning. Which on one hand seems to be one of the greatest feats humans have accomplished and sets us apart from other animals (we don't just plan for hibernation) but at the same time we're really bad at it and kinda always have been. I think there is an element of the "Marshmallow experiment" here. Rewards now or more rewards in the future. People do act like there's an obvious answer but there's also the old saying "one in the hand is worth two in the bush." But I do think we're currently off balance and hyper focused on one in the hand. Frankly, I don't think the saying is as meaningful if we're talking about berries instead of birds lol.
>I think part of why this arises is that there's definitely a bit of a psychological hack for humans in particular where if there's a low-probability but extremely high reward outcome, we're deeply entranced by it, and investors are the same.
Venture capital is all about low-probability high-reward events.
Get a normal small business loan if you don't want to go big or go home.
So you agree with us? Should we instead be making the argument that this is an illogical move? Because IME the issue has been that it appears as too risky. I'd like to know if I should just lean into that rather than try to argue it is not as risky as it appears (yet still has high reward, albeit still risky).
We see both things: almost all games are 'not fortnite'. But that doesn't (commercially) invalidate some companies' quest for building the next fortnite.
Of course, if you limit your attention to these 'wanabe fortnites', then you only see these 'wannabe fortnites'.
>Really though, why put all your eggs in one basket? That's what I've been confused about for awhile.
I mean that's easy lol. People don't like to invest in thin air, which is what you get when you look at non-LLM alternatives to General Intelligence.
This isn't meant as a jab or snide remark or anything like that. There's literally nothing else that will get you GPT-2 level performance, never-mind an IMO Gold Medalist. Invest in what else exactly? People are putting their eggs in one basket because it's the only basket that exists.
>I think the big question is if/when investors will start giving money to those who have been predicting this (with evidence) and trying other avenues.
Because those people have still not been proven right. Does "It's an incremental improvement over the model we released a few months ago, and blows away the model we released 2 years ago." really scream, "See!, those people were wrong all along!" to you ?
> which is what you get when you look at non-LLM alternatives to General Intelligence.
I disagree with this. There are a good ideas that are worth pursuit. I'll give you that few, if any, have been shown to work at scale but I'd say that's a self-fulfilling prophecy. If your bar is that they have to be proven at scale then your bar is that to get investment you'd have to have enough money to not need investment. How do you compete if you're never given the opportunity to compete? You could be the greatest quarterback in the world but if no one will let you play in the NFL then how can you prove that?
On the other hand, investing in these alternatives is a lot cheaper, since you can work your way to scale and see what fails along the way. This is more like letting people try their stuff out in lower leagues. The problem is there's no ladder to climb after a certain point. If you can't fly then how do you get higher?
> Invest in what else exactly? ... it's the only basket that exists.
I assume you don't work in ML research? I mean that's okay but I'd suspect that this claim would come from someone not on the inside. Though tbf, there's a lot of ML research that is higher level and not working on alternative architectures. I guess the two most well known are Mamba and Flows. I think those would be known by the general HN crowd. While I think neither will get us to AGI I think both have advantages that shouldn't be ignored. Hell, even scaling a very naive Normalizing Flow (related to Flow Matching) has been shown to compete and beat top diffusion models[0,1]. The architectures aren't super novel here but they do represent the first time a NF was trained above 200M params. That's a laughable number by today's standards. I can even tell you from experience that there's a self-fulfilling filtering for this kind of stuff because having submitted works in this domain I'm always asked to compare with models >10x my size. Even if I beat them on some datasets people will still point to the larger model as if that's a fair comparison (as if a benchmark is all the matters and doesn't need be contextualized).
> Because those people have still not been proven right.
You're right. But here's the thing. *NO ONE HAS BEEN PROVEN RIGHT*. That condition will not exist until we get AGI.
> scream, "See!, those people were wrong all along!" to you ?
Let me ask you this. Suppose people are saying "x is wrong, I think we should do y instead" but you don't get funding because x is currently leading. Then a few years later y is proven to be the better way of doing things, everything shifts that way. Do you think the people who said y was right get funding or do you think people who were doing x but then just switched to y after the fact get funding? We have a lot of history to tell us the most common answer...
>I disagree with this. There are a good ideas that are worth pursuit. I'll give you that few, if any, have been shown to work at scale but I'd say that's a self-fulfilling prophecy. If your bar is that they have to be proven at scale then your bar is that to get investment you'd have to have enough money to not need investment. How do you compete if you're never given the opportunity to compete? You could be the greatest quarterback in the world but if no one will let you play in the NFL then how can you prove that?
On the other hand, investing in these alternatives is a lot cheaper, since you can work your way to scale and see what fails along the way. This is more like letting people try their stuff out in lower leagues. The problem is there's no ladder to climb after a certain point. If you can't fly then how do you get higher?
I mean this is why I moved the bar down from state of the art.
I'm not saying there are no good ideas. I'm saying none of them have yet shown enough promise to be called another basket in it's own right. Open AI did it first because they really believed in scaling, but anyone (well not literally, but you get what I mean) could have trained GPT-2. You didn't need some great investment, even then. It's that level of promise I'm saying doesn't even exist yet.
>I guess the two most well known are Mamba and Flows.
I mean, Mamba is a LLM ? In my opinion, it's the same basket. I'm not saying it has to be a transformer or that you can't look for ways to improve the architecture. It's not like Open AI or Deepmind aren't pursuing such things. Some of the most promising tweaks/improvements - Byte Latent Transformer, Titans etc are from those top labs.
Flows research is intriguing but it's not another basket in the sense that it's not an alternative to the 'AGI' these people are trying to build.
> Let me ask you this. Suppose people are saying "x is wrong, I think we should do y instead" but you don't get funding because x is currently leading. Then a few years later y is proven to be the better way of doing things, everything shifts that way. Do you think the people who said y was right get funding or do you think people who were doing x but then just switched to y after the fact get funding? We have a lot of history to tell us the most common answer...
The funding will go to players positioned to take advantage. If x was leading for years then there was merit in doing it, even if a better approach came along. Think about it this way, Open AI now have 700M Weekly active users for ChatGPT and millions of API devs. If this superior y suddenly came along and materialized and they assured you there were pivoting, why wouldn't you invest in them over players starting from 0, even if they championed y in the first place? They're better positioned to give you a better return on your money. Of course, you can just invest in both.
Open AI didn't get nearly a billion weekly active users off the promise of future technology. They got it with products that exist here and now. Even if there's some wall, this is clearly a road with a lot of merit. The value they've already generated (a whole lot) won't disappear if LLMs don't reach the heights some people are hoping they will.
If you want people to invest in y instead then x has to stall or y has to show enough promise. It didn't take transformers many years to embed themselves everywhere because they showed a great deal of promise right from the beginning.
It shouldn't be surprising if people aren't rushing to put money in y when neither has happened yet.
> I'm saying none of them have yet shown enough promise to be called another basket in it's own right.
Can you clarify what this threshold is?
I know that's one sentence, but I think it is the most important one in my reply. It is really what everything else comes down to. There's a lot of room between even academic scale and industry scale. There's very few things with papers in the middle.
> I mean, Mamba is a LLM
Sure, I'll buy that. LLM doesn't mean transformer. I could have been more clear but I think it would be from context as that means literally any architecture is an LLM if it is large and models language. Which I'm fine to work with.
Though with that, I'd still disagree that LLMs will get us to AGI. I think the whole world is agreeing too as we're moving into multimodal models (sometimes called MMLMs) and so I guess let's use that terminology.
To be more precise, let's say "I think there are better architectures out there than ones dominated by Transformer Encoders". It's a lot more cumbersome but I don't want to say transformers or attention can't be used anywhere in the model or we'll end up having to play this same game. Let's just work with "an architecture that is different than what we usually see in existing LLMs". That work?
> The funding will go to players positioned to take advantage.
I wouldn't put your argument this way. As I understand it, your argument is about timing. I agree with most of what you said tbh.
To be clear my argument isn't "don't put all your money in the 'LLM' basket, put it in this other basket" by argument is "diversify" and "diversification means investing at many levels of research." To clarify that latter part I really like the NASA TRL scale[0]. It's wrong to make a distinction between "engineering vs research" and better to see it as a continuum. I agree, most money should be put into higher levels but I'd be amiss if I didn't point out that we're living in a time where a large number of people (including these companies) are arguing that we should not be funding TRL 1-3 and if we're being honest, I'm talking about stuff in currently in TRL 3-5. I mean it is a good argument to make if you want to maintain dominance, but it is not a good argument if you want to continue progress (which I think is what leads to maintaining dominance as long as that dominance isn't through monopoly or over centralization). Yes, most of the lower level stuff fails. But luckily the lower level stuff is much cheaper to fund. A mathematician's salary and a chalk board is at least half as expensive as the salary of a software dev (and probably closer to a magnitude if we're considering the cost of hiring either of them).
But I think that returns us to the main point: what is that threshold?
My argument is simply "there should be no threshold, it should be continuous". I'm not arguing for a uniform distribution either, I explicitly said more to higher TRLs. I'm arguing that if you want to build a house you shouldn't ignore the foundation. And the fancier the house, the more you should care about the foundation. Least you risk it all falling down
>Can you clarify what this threshold is?
I know that's one sentence, but I think it is the most important one in my reply. It is really what everything else comes down to. There's a lot of room between even academic scale and industry scale. There's very few things with papers in the middle.
Something like GPT-2. Something that even before being actually useful or particularly coherent, was interesting enough to spark articles like these.
https://slatestarcodex.com/2019/02/19/gpt-2-as-step-toward-g...
So far, only LLM/LLM adjacent stuff fulfils this criteria.
To be clear, I'm not saying general R&D must meet this requirement. Not at all. But if you're arguing about diverting millions/billions in funds from x that is working to y then it has to at least clear that bar.
> My argument is simply "there should be no threshold, it should be continuous".
I don't think this is feasible for large investments. I may be wrong, but i also don't think other avenues aren't being funded. They just don't compare in scale because....well they haven't really done anything to justify such scale yet.
1) There's plenty of things that can achieve similar performance to GPT-2 these days. We mentioned Mamba, they compared to GPT-3 in their first paper[0]. They compare with the open sourced version and you'll also see some other architectures referenced there like Hyena and H3. It's the GPT-Neo and GPT-J models. Remember GPT-3 is pretty much just a scaled up GPT-2.
2) I think you are underestimating the costs to train some of these things. I know Karpathy said you can now train GPT-2 for like $1k[1] but a single training run is a small portion of the total costs. I'll reference StyleGAN3 here just because the paper has good documentation on the very last page[2]. Check out the breakdown but there's a few things I want to specifically point out. The whole project cost 92 V100 years but the results of the paper only accounted for 5 of those. That's 53 of the 1876 training runs. Your $1k doesn't get you nearly as far as you'd think. If we simplify things and say everything in that 5 V100 years cost $1k then that means they spent $85k before that. They spent $18k before they even went ahead with that project. If you want realistic numbers, multiply that by 5 because that's roughly what a V100 will run you (discounted for scale). ~$110k ain't too bad, but that is outside the budget of most small labs (including most of academia). And remember, that's just the cost of the GPUs, that doesn't pay for any of the people running that stuff.
I don't expect you to know any of this stuff if you're not a researcher. Why would you? It's hard enough to keep up with the general AI trends, let alone niche topics lol. It's not an intelligence problem, it's a logistics problem, right? A researcher's day job is being in those weeds. You just get a lot more hours in the space. I mean I'm pretty out of touch of plenty of domains just because time constraints.
> I don't think this is feasible for large investments. I may be wrong, but i also don't think other avenues aren't being funded.
So I'm trying to say, I think your bar has been met.
And I think if we are actually looking at the numbers, yeah, I do not think these avenues are being funded. But don't take it from me, take it from FeiFei Li[3]
| Not a single university today can train a ChatGPT model
I'm not sure if you're a researcher or not, you haven't answered that question. But I think if you were you'd be aware of this issue because you'd be living with it. If you were a PhD student you would see the massive imbalance of GPU resources given to those working closely with big tech vs those trying to do things on their own. If you were a researcher you'd also know that even inside those companies that there aren't much resources given to people to do these things. You get them on occasion like the StarFlow and TarFlow I pointed out before, but these tend to be pretty sporadic. Even a big reason we talk about Mamba is because of how much they spent on it.
But if you aren't a researcher I'd ask why you have such confidence that these things are being funded and that these things cannot be scaled or improved[4]. History is riddled with examples of inferior tech winning mostly due to marketing. I know we get hyped around new tech, hell, that's why I'm a researcher. But isn't that hype a reason we should try to address this fundamental problem? Because the hype is about the advance of technology, right? I really don't think it is about the advancement of a specific team, so if we have the opportunity for greater and faster advancement, isn't that something we should encourage? Because I don't understand why you're arguing against that. An exciting thing of working at the bleeding edge is seeing all the possibilities. But a disheartening thing about working at the bleeding edge is seeing many promising avenues be passed by for things like funding and publicity. Do we want meritocracy to win out or the dollar?
I guess you'll have to ask yourself: what's driving your excitement?
[4] I'm not saying any of this stuff is straight de fact better. But there definitely is an attention imbalance and you have to compare like to like. If you get to x in 1000 man hours and someone else gets there in 100, it may be worth taking a look deeper. That's all.
I acknowledge Mamba, RWKV, Hyena and the rest but like I said, they fall under the LLM bucket. All these architectures have 7B+ models trained too. That's not no investment. They're not "winning" over transformers because they're not slam dunks, not because no-one is investing in them. They bring improvements in some areas but with detractions that make switching not a straightforward "this is better", which is what you're going to need to divert significant funds from an industry leading approach that is still working.
What happens when you throw away state information vital for a future query ? Humans can just re-attend (re-read that book, re-watch that video etc), Transformers are always re-attending, but SSMs, RWKV ? Too bad. A lossy state is a big deal when you can not re-attend.
Plus some of those improvements are just theoretical. Improved inference-time batching and efficient attention (flash, windowed, hybrid, etc.) have allowed transformers to remain performant over some of these alternatives rendering even the speed advantage moot, or at least not worth switching over.
It's not enough to simply match transformers.
>Because I don't understand why you're arguing against that.
I'm not arguing anything. You asked why the disproportionate funding. Non-transformer LLMs aren't actually better than transformers and non-LLM options are non-existent.
So fair, they fall under the LLM bucket but I think most things can. Still, my point is about that there's a very narrow exploration of techniques. Call it what you want, that's the problem.
And I'm not arguing there's zero investment, but it is incredibly disproportionate and there's a big push for it to be more disproportionate. It's not about all or none, it is about the distribution of those "investments" (including government grants and academic funding).
With the other architectures I think you're being too harsh. Don't let perfection get in the way of good enough. We're talking about research. More specifically, about what warrants more research. Where would transformers be today if we made similar critiques? Hell, we have a real life example with diffusion models. Sohl-Dickstein's paper came out at a year after Goodfellow's GAN paper and yet it took 5 years for DDPM to come out. The reason this happened is because at the time GANs were better performing and so the vast majority of effort was over there. At least 100x more effort if not 1000x. So the gap just widened. The difference in the two models really came down to scale and the parameterization of the diffusion process, which is something mentioned in the Sohl-Dickstein paper (specifically as something that should be further studied). 5 years really because very few people were looking. Even at that time it was known that the potential of diffusion models was greater than GANs but the concentration went to what worked better at that moment[0]. You can even see a similar thing with ViTs if you want to go look up Cordonnier's paper. The time gap is smaller but so is the innovation. ViT barely changes in architecture.
There's lots of problems with SSM and other architectures. I'm not going to deny that (I already stated as much above). The ask is to be given a chance to resolve those problems. An important part of that decision is understanding the theoretical limits of these different technologies. The question is "can these problems be overcome?" It's hard to answer, but so far the answer isn't "no". That's why I'm talking about diffusion and ViTs above. I could even bring in Normalizing Flows and Flow Matching which are currently undergoing this change.
> It's not enough to simply match transformers.
I think you're both right and wrong. And I think you agree unless you are changing your previous argument.
Where I think you're right is that the new thing needs to show capabilities that the current thing can't. Then you have to provide evidence that its own limitations can be overcome in such a way that overall it is better. I don't say strictly because there is no global optima. I want to make this clear because there will always be limitations or flaws. Perfection doesn't exist.
Where I think you're wrong is a matter of context. If you want the new thing to match or be better than SOTA transformer LLMs then I'll refer you back to the self-fulfilling prophecy problem from my earlier comment. You never give anything a chance to become better because it isn't better from the get go.
I know I've made that argument before, but let me put it a different way. Suppose you want to learn the guitar. Do you give up after you first pick it up and find out that you're terrible at it? No, that would be ridiculous! You keep at it because you know you have the capacity to do more. You continue doing it because you see progress. The logic is the exact same here. It would be idiotic of me to claim that because you can only play Mary Had A Little Lamb that you'll never be able to play a song that people actually want to listen to. That you'll never amount to anything and should just give up playing now.
My argument here is don't give up. To look how far you've come. Sure, you can only play Mary Had A Little Lamb, but not long ago you couldn't play a single cord. You couldn't even hold the guitar the right way up! Being bad at things is not a reason to give up on them. Being bad at things is the first step to being good at them. The reason to give up on things is because they have no potential. Don't confuse lack of success with lack of potential.
> I'm not arguing anything. You asked why the disproportionate funding.
I guess you don't realize it, but you are making an argument. You were trying to answer my question, right? That is an argument. I don't think we're "arguing" in the bitter or upset way. I'm not upset with you and I hope you aren't upset with me. We're learning from each other, right? And there's not a clear answer to my original question either[1]. But I'm making my case for why we should have a bit more of what we currently use so that we get more in the future. It sounds scary, but we know that by sacrificing some of our food that we can use it to make even more food next year. I know it's in the future, but we can't completely sacrifice the future for the present. There needs to be balance. And research funding is just like crop planning. You have to plan with excess in mind. If you're lucky, you have a very good year. But if you're unlucky, at least everyone doesn't starve. Given that we're living in those fruitful lucky years, I think it is even more important to continue the trend. We have the opportunity to have so many more fruitful years ahead. This is how we avoid crashes and those cycles that tech so frequently goes through. It's all there written in history. All you have to do is ask what led to these fruitful times. You cannot ignore that a big part was that lower level research.
[0] Some of this also has to do with the publish or perish paradigm but this gets convoluted and itself is related to funding because we similarly provide more far more funding to what works now compared to what has higher potential. This is logical of course, but the complexity of the conversation is that it has to deal with the distribution.
[1] I should clarify, my original question was a bit rhetorical. You'll notice that after asking it I provided an argument that this was a poor strategy. That's framing of the problem. I mean I live in this world, I am used to people making the case from the other side.
> Funding multiple startups means _not_ putting your eggs in one basket, doesn't it?
Different basket hierarchy.
Also, yes. They state this and given how there are plenty of open source models that are LLMs and get competitive performance it at least indicates that anyone not doing LLMs is doing so in secret.
If OpenAI isn't using LLMs then doesn't that support my argument?
Really though, why put all your eggs in one basket? That's what I've been confused about for awhile. Why fund yet another LLMs to AGI startup. Space is saturated with big players and has been for years. Even if LLMs could get there that doesn't mean something else won't get there faster and for less. It also seems you'd want a backup in order to avoid popping the bubble. Technology S-Curves and all that still apply to AI
Though I'm similarly biased, but so is everyone I know with a strong math and/or science background (I even mentioned it in my thesis more than a few times lol). Scaling is all you need just doesn't check out