Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What do you (or anyone else, feel free to chime in) do with other LLMs that makes them useable for anything that is not strictly tinkering?

Here is my premise: We are past the wonder stage. I want to actually get stuff done efficiently. From what I have tested so far, the only model that allows me to do that halfway reliably is GPT-4.

Am I incompetent or are we really just wishfully thinking in HN spirit that other LLMs are a lot better at being applied to actual tasks that require a certain level of quality, consistency and reliability?



I still wonder what makes GPT-4 so much better than its contemporaries. That's why I saw tons of people trying to explain how GPT-4 works starting from simple neural network distasteful, tons of people already knew and do that but none of them is nearly close to GPT-4.l


> I still wonder what makes GPT-4 so much better than its contemporaries.

OpenAI have had many years to craft their dataset down from the noisy public datasets, and GPT4 is (supposedly) a mixture of 8 "expert models" each of which is 220B (5x+ larger than the Falcon 40B) with a total of 1.7B parameters (3x+ Google's huge 540B PaLM). The hardware and software to train networks of that scale is also a deep moat. Relatively speaking, the model architecture ("gpt from scratch") is the easiest piece.


From my understanding. GPT-4 is the biggest, or one of the biggest. It was trained on low quality internet datasets, like the others. What makes it different is post-training on custom data with human supervision. We know they even outsourced that to Africa. Second, they integrated it with external tools. Like Python interpreter, internet browser. But the first is most important. Also most likely they have experimented and found some tricks which make it bit better.


They pay tons of people to type out conversations that they can feed into it. It's just a lot of people doing a lot of work.


This line of thinking only works if it's impossible to imagine a world where OpenAI isn't the leader. In 2 years if the non OpenAI models are better then it will serve us much better to allow these tools to work with other models as well.


Since OpenAI is all just APIs with simple interfaces, I don't think that plugging a different, capable model in whatever tool you are building is going to be an issue.


You are correct in this assesment. A majority of individuals and startups playing around with turning LLMs into products aim to be prepared for the arrival of the subsequent generation of models. When that occurs, they'll already have a product or company in place and can simply integrate the new models.

Models are getting commoditized, well executed ideas are not.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: