That’s how I’d want it to be honestly. LLMs are tools and I’d hope we’re going to keep the people using them responsible. Just like any other tools we use.
I’ve been involved in building a system that reads structured data from a special form of contracts from a specific industry. Prices, clauses, pick up, delivery, etc. A couple hundred datapoints per contract. We had many discussions around how to present and sell an imperfect system. The thing is, the potential customers are today transcribing the contracts manually and we quickly realized that people make a ton of mistakes doing that. It became obvious when we were working on assertion datasets ourself. It’s not a perfect system and you have to consider how you use the data (aggregating for price indexing for instance), but we’re actually doing better than what people are achieving when they have to transcribe data for hours a day.
Agreed. I’ve used their platform to train smaller, specialized models. Something I could have done in Codelab or some other tool, but their platform allows me to just upload a training set and as soon as it finishes I have a hosted model available at an endpoint. It obviously has some constraints compared to running the training yourself, but it also opens up the opportunity to way more people.
I find Stripes fees excessive too, but I don’t think I’ll ever switch. I’ve been running a small SaaS product on the side of other work for >15 years and if it taught me one thing, it’s that I need to reduce the things I have to maintain, reduce manual work, reduce the things that can go wrong. There’s nothing worse than having to fix a bug in a codebase you haven’t touched for a year and possibly in a feature you haven’t touched in many years. I simply love that Stripe handles not just the payment, but the payment application, the subscription billing, the price settings, the exports for bookkeeping. I’ve had a few instances where my site was used fraudulently to check stolen credit cards and it was quickly flagged and I could resolve it with Stripe. I’m sure someone can mention alternatives and I’m sure that I could build something that would work myself, but they keep a big part of what it takes to run the business out of my mind and I’m willing to pay for that.
Fair point, though a lot has changed from 15 years ago. A lot of what you mentioned is sort of the new baseline most payment gateways ship with, and working on code you haven't touched in a while is certainly a lot easier nowadays with agents too. All that said, if you're satisfied with the price and the product I am not here to convince you to swap.
To be honest I haven’t even looked at competitors for some years. I guess one drawback of using third-parties for such a big part of the responsibilities is the lock in. The benefits of switching would have to be rather big for me to put in the effort.
I very much appreciate this take. I will say though that I’ve had experience myself where using coding agents lead me to what I’d consider (in your terminology) a better mapping between information and code. Not because the agent was able to do things better than myself, but because, as my project grew and I got wiser on how to best map the information, it was incredibly fast for me to change the code in the right direction and do refactorings that I otherwise might not have gotten around to.
I use Tidewave as my coding agent and it’s able to execute code in the runtime. I believe it’s using Code.eval_string/3, but you should be able to check the implementation. It’s the project_eval tool.
In my experience it’s a huge leap in terms of the agent being able to test and debug functionality. It’ll often write small code snippets to test that individual functions work as expected.
It depends where I look. Among colleagues and tech-native friends, I feel like there’s healthy skepticism as well as the excitement about new tech. On the other hand, all the investment podcasts that I’ve been following for years are nothing but ignorant AI hype and reciting articles about how all the jobs are about to disappear. I guess the people who doesn’t make firsthand experiences are not leaving the hype yet.
Both groups will operate on a wide spectrum, but if we're already generalizing...
Perhaps there's a matter of competing priorities?
Programmers are usually quite cynical overall, but in this case I see it as a "My CEO is telling me _out loud_ that they want to replace me, so why would I help them speed up that process?"
Investors likely want what they're invested in to appreciate, so I imagine they're likely over-leveraged and are doing what they can to get their bag.
And if you make someone 3x faster at producing a report that 100 people has to read, but it now takes 10% longer to read and understand, you’ve lost overall value.
This is one of my major concerns about people trying to use these tools for 'efficiency'. The only plausible value in somebody writing a huge report and somebody else reading it is information transfer. LLM's are notoriously bad at this. The noise to signal ratio is unacceptably high, and you will be worse off reading the summary than if you skimmed the first and last pages. In fact, you will be worse off than if you did nothing at all.
Using AI to output noise and learn nothing at breakneck speeds is worse than simply looking out the window, because you now have a false sense of security about your understanding of the material.
Relatedly, I think people get the sense that 'getting better at prompting' is purely a one-way issue of training the robot to give better outputs. But you are also training yourself to only ask the sorts of questions that it can answer well. Those questions that it will no longer occur to you to ask (not just of the robot, but of yourself) might be the most pertinent ones!
Yep. The other way it can have net no impact is if it saves thousand of hours of report drafting and reading but misses the one salient fact buried in the observations that could actually save the company money. Whilst completely nailing the fluff.
> LLM's are notoriously bad at this. The noise to signal ratio is unacceptably high
I could go either way on the future of this, but if you take the argument that we're still early days, this may not hold. They're notoriously bad at this so far.
We could still be in the PC DOS 3.X era in this timeline. Wait until we hit the Windows 3.1, or 95 equivalent. Personally, I have seen shocking improvements in the past 3 months with the latest models.
Personally I strongly doubt it. Since the nature of LLM's does not allow them semantic content or context, I believe it is inherently a tool unsuited for this task. As far as I can tell, it's a limitation of the technology itself, not of the amount of power behind it.
Either way, being able to generate or compress loads of text very quickly with no understanding of the contents simply is not the bottleneck of information transfer between human beings.
Yeah, definitely more skeptical for communication pipelines.
But for coding, the latest models are able to read my codebase for context, understand my question, and implement a solution with nuance, using existing structures and paradigms. It hasn't missed since January.
One of them even said: "As an embedded engineer, you will appreciate that ...". I had never told it that was my title, it is nowhere in my soul.md or codebase. It just inferred that I, the user, was one. Based on the arm toolchain and code.
It was a bit creepy, tbh. They can definitely infer context to some degree.
> We could still be in the PC DOS 3.X era in this timeline. Wait until we hit the Windows 3.1, or 95 equivalent. Personally, I have seen shocking improvements in the past 3 months with the latest models.
While we're speculating, here's mine: we're in the Windows 7 phase of AI.
IOW, everything from this point on might be better tech, but is going to be worse in practice.
Context size helps some things but generally speaking, it just slows everything down. Instead of huge contexts, what we need is actual reasoning.
I predict that in the next two to five years we're going to see a breakthrough in AI that doesn't involve LLMs but makes them 10x more effective at reasoning and completely eliminates the hallucination problem.
We currently have "high thinking" models that double and triple-check their own output and we call that "reasoning" but that's not really what it's doing. It's just passing its own output through itself a few times and hoping that it catches mistakes. It kind of works, but it's very slow and takes a lot more resources.
What we need instead is a reasoning model that can be called upon to perform logic-based tests on LLM output or even better, before the output is generated (if that's even possible—not sure if it is).
My guess is that it'll end up something like a "logic-trained" model instead of a "shitloads of raw data trained" model. Imagine a couple terabytes of truth statements like, "rabbits are mammals" and "mammals have mammary glands." Then, whenever the LLM wants to generate output suggesting someone put rocks on pizza, it fails the internal truth check, "rocks are not edible by humans" or even better, "rocks are not suitable as a pizza topping" which it had placed into the training data set as a result of regression testing.
Over time, such a "logic model" would grow and grow—just like a human mind—until it did a pretty good job at reasoning.
Upvoted, as it basically 99% matches my own thinking. Very well said. But I, personally, would not predict a breakthrough in this direction in the next 2-5 years, as there is no pathway from current LLM tech to "true reasoning". In my mental model LLM operates in "raster space" with "linguistic tokens" being "rasterization units". For "true reasoning" an AI entity has to operate fluently in "vector space", so to speak. LLM can somewhat simulate "reasoning" to a limited degree, and even that it only does with brute force - massive CPU/GPU/RAM resources, enormous amount of training data and giant working contexts. And still, that "simulation" is incomplete and unverifiable.
I would argue that the research needed to enable such "vector operation" is nowhere near the stage to come to fruition in the next decade. So, my prediction is, maybe, 20-50 years for this to happen, if not more.
> I would like to see the day when the context size is in gigabytes or tens of billions of tokens, not RAG or whatever, actual context.
Might not make a difference. I believe we are already at the point of negative returns - doubling context from 800k tokens to 1600k tokens loses a larger percentage of context than halving it from 800k tokens to 400k tokens.
There's many things that used to be called AI, but as their shortcomings became known we started dropping them from the AI bucket and referring to them by a more specific name: expert systems, machine learning, etc. Decades later plenty of people never learned this and those things don't pop into mind with "AI" so LLMs were able to take over the term.
Hehe, yeah there's some terms that just are linguistically unintuitive.
"Skill floor" is another one. People generally interpret that one as "must be at least this tall to ride", but it actually means "amount of effort that translates to result". Something that has a high skill floor (if you write "high floor of skill" it makes more sense) means that with very little input you can gain a lot of result. Whereas a low skill floor means something behaves more linearly, where very little input only gains very little result.
Even though its just the antonym, "skill ceiling" is much more intuitive in that regard.
Are you sure about skill floor? I've only ever heard it used to describe the skill required to get into something, and skill ceiling describes the highest level of mastery. I've never heard your interpretation, and it doesn't make sense to me.
Yes, I am very sure. And it isn't that difficult to understand, it is skill input graphed against effectiveness output. A higher floor just means that with 1 skill, you are guaranteed at least X (say, 20) effectiveness output.
The confusion comes from people using "skill floor" for "learning curve" instead of "effectiveness".
But this is a thing where definitions have shifted over time. Like jealousy. People use "jealousy" when they really mean "envy", but correcting someone on it will usually just get you scorn and ridicule, because like I mentioned, language is fluid.
If the skill floor is high and therefore "effectiveness" is the same for a wide range of skill levels, isn't that the same as having a high barrier to entry? It seems that any activity or game where it takes a lot of skill before you can differentiate yourself from other players would be described that way.
No, a high skill floor is the opposite. It means that anyone can pick up the thing and immediately do decently.
To put it simply, think assault rifle vs sniper rifle. Anyone can use the AR and spray and pray and do pretty okay. You can't do that with the sniper rifle. So the AR has a high skill floor (minimum effectiveness) whereas the sniper rifle has a low skill floor (low minimum effectiveness). But the AR has a low skill ceiling too a point where you can put in endless amounts of skill and see no improvement in effectiveness. The sniper being an infinite range OHKO can scale to the end given aim skill and map knowledge.
Another example would be Reinhardt in Overwatch. You can tell a noob to "look in that direction and deploy shield" and they will contribute to the team. You can't put a noob on Widowmaker and have them contribute (as) significantly.
It reminds me of that Apple ad where a guy just rocks up to a meeting completely unprepared and spits out an AI summary to all his coworkers. Great job Apple, thanks for proving Graeber right all along.
> Those questions that it will no longer occur to you to ask (not just of the robot, but of yourself) might be the most pertinent ones!
That is true, but then again also with google. You could see why some people want to go back to the "read the book" era where you didn't have google to query anything and had to make the real questions.
One thing AI should eliminate is the "proof of work" reports. Sometimes the long report is not meant to be read, but used as proof somebody has thoroughly thought through various things (captured by, for instance, required sections).
When AI is doing that, it loses all value as a proof of work (just as it does for a school report).
My AI writes for your AI to read is low value. But there is probably still some value in "My AI takes these notes and makes them into a concise readable doc".
> Using AI to output noise and learn nothing at breakneck speeds is worse than simply looking out the window, because you now have a false sense of security about your understanding of the material.
i may put this into my email signature with your permission, this is a whip-smart sentence.
and it is true. i used AI to "curate information" for me when i was heads-down deep in learning mode, about sound and music.
there was enough all-important info being omitted, i soon realized i was developing a textbook case of superficial, incomplete knowledge.
i stopped using AI and did it all over again through books and learning by doing. in retrospect, i'm glad to have had that experience because it taught me something about knowledge and learning.
mostly that something boils down to RTFM. a good manual or technical book written by an expert doesn't have a lot of fluff. what exactly are you expecting the AI to do? zip the rar file? it will do something, it might look great, lossless compression it will be not.
P.S. not a prompt skill issue. i was up to date on cutting edge prompting techniques and using multiple frontier models. i was developing an app using local models and audio analysis AI-powered libraries. in other words i was up to my neck immersed in AI.
after i grokked as much as i could, given my limited math knowledge, of the underlying tech from reading the theory, i realized the skill issue invectives don't hold water. if things break exactly in the way they're expected to break as per their design, it's a little too much on the nose. even appealing to your impostor syndrome won't work.
P.P.S. it's interesting how a lot of the slogans of the AI party are weaponizing trauma triggers or appealing to character weaknesses.
"hop on the train, commit fully, or you'll be left behind" > fear of abandonment trigger
"pah, skill issue. my prompts on the other hand...i'm afraid i can't share them as this IP is making me millions of passive income as we speak (i know you won't probe further cause asking a person about their finances is impolite)" > imposter syndrome inducer par excellence, also FOMO -- thinking to yourself "how long can the gold rush last? this person is raking it in!! what am i doing? the miserable sod i am"
1. outlandish claims (Claude writes ALL the code) noone can seem to reproduce, and indeed everyone non-affiliated is having a very different experience
2. some of the darkest patterns you've seen in marketing are the key tenets of the gospel
3. it's probably a duck.
i've been 100% clear on the grift since October '25. Steve Eisman of the "Big Short" was just hopping onto the hype train back then. i thought...oh. how much analysis does this guru of analysts really make? now Steve sings of AI panic and blood in the streets.
these things really make you think, about what an economy even is. it sure doesn't seem to have a lot to do with supply and demand, products and services, and all those archaisms.
For all the technology we develop, we rarely invest in processes. Once in a blue moon some country decides to revamp its bureaucracy, when it should really be a continuous effort (in the private sector too).
OTOH, what happens continuously is that technology is used to automate bureaucracy and even allows it to grow some complexity.
See, this is an opportunity. Company provides AI tool, monitors for cases where AI output is being fed as AI input. In such cases, flag the entire process for elimination.
Hey. thanks!
The API doesn't actually use name sets like that. Though that was my first approach. I changed it to use lists of profiles from social networks. So when a name is requested it looks up every profile with that name and counts the number of times each gender is represented. If you use any localization parameters it will of course only look up profiles associated with the particular country or language.
I quickly realized with the initial approach that my lists would never be sufficient, since most countries allow for almost any name to be given and when combining lists from the whole world, a lot of names would end up as unisex, that's why i went for a probability factor instead. Also i'm hoping that by using social profiles, it might one day be able to tell the gender of Superman or Catwoman and things like that. People can after all call themselves what they want on the internet.
I've actually thought about adding like a baseline of names from different lists though, to backup the names that are not yet represented in the dataset. Do you have a link to the names you are mentioning? Could be interesting.
Many, but not all, of the people mentioned in the 87 data sets (and counting!) that make up this database have a gender explicitly declared. Locale is the former province of Galicia in the Austro-Hungarian Empire, which is today eastern Poland and southwestern Ukraine. Time period is mostly 19th century and some early 20th century. Ethnicity is strongly biased towards Ashkenazi Jewish, but we also have some data sets that have representation of all the people in the community at that time, such as tax lists or phonebooks or school lists. I can get you data in JSON or XML, let me know.
I also have access to another large given name database that could be useful to you -- but that one is entirely Ashkenazi Jewish from what used to be northeastern Hungary, from roughly 1850 to 1906.
reply