> It's very difficult to define a specification that works as intended, even with tools.
Agree, we are in the stone age in software design and dev. We have not figured out a good way to communicate the design of complex systems in a way the business can understand.
Nope; this is just a silly trope which gets repeated without thought. The fact that it is hard to do does not mean we don't know how to do it.
Everything exists and was known from 1960s/1970s. People are just not studying, adapting and using the well-known standards/techniques. Standard Engineering is built on them and Software Engineering adapts/extends those for its own needs.
> Seems like tacit acknowledgment that IBM mothership is not the right place for a speculative growth play from both a management and capital perspective.
I'm not understanding your logic, can you explain?
What I see with the program and amounts companies were awarded is some level of acknowledgment of the current state of quantum research (i.e. IBM is generally considered the leader) and their pragmatic approach that piggy-backs on current technologies (for obvious speed+cost benefits).
You must not talk to competent people. IBM is very experienced at this grift. I remember when I used to go to conferences in a different field and IBM would announce "state of the art" results that were very obviously done by cheating (making an ensemble model and tuning the weights on the test set). Everyone doing real work would ignore them, and then they'd go sell to clueless midcap companies on the basis of that announcement.
> And then, a follow-up: what is actually the bottleneck at most companies? What causes "requirements gathering" to take long?
Complexity.
In my experience (medium size businesses, i.e. 200 million to 2 billion annual revenue) we're trying to understand how a complex set of systems and business processes and different businesses (external partners) interact and then trying to morph all of that into a shape that now has capability X layered on top or in the middle.
Here's a concrete example, business X that makes their own products and has retail stores as well as an ecom site wanted to add the ability to put complementary items built by other companies on the website and have them drop shipped from the vendors to the consumers. The final solution involved 21 different interfaces between 4 different systems (ecom system, store system, omni channel system, external drop ship mgmt system) as well as a new internal system to manage this activity. It's takes a significant amount of time to understand and solve for all of the low level details.
A typical example of trying to add a new significant capability involves many meetings (days, weeks, months, etc. )with the business to understand how their work flows between systems X, Y and Z as well as all of the significant exceptions (e.g. we handle subset A this way and subset B that way, but for the final step we blend those groups together, except for subset C which requires special process 97).
Then with that understanding comes the system solutioning across multiple systems that can be a blend of internal system or vendor's system, each with different levels of ability to customize, which pushes the shape of the final solution in different directions.
There is certainly value in speeding up coding, but it's just one piece of the puzzle and today LLM's can't help with gathering the domain information and defining a solution.
What I've seen in an AI-forward looking environment is that it's much more common for PM/POs to be knocking up at least a UI prototype now, and experimentation is happening often even before writing the tickets. Similarly when devs are proposing something they often are coming with a couple of prototypes already implemented. Both of those mean decisions are coming a lot quicker.
I wouldn’t discount the value of moving small tasks away from developers, nor the value of fast cheap prototypes.
Product owners can very quickly get, for many problems, an interactive demo without coding. For lots of problems this can be somewhere from a static html page which shows the interactions to a hacked in feature that lets them actually test if it solves the customer need and try several variations before handing over much more concrete specs of what they want to happen. So much time is lost between getting an idea from someone’s head to code to use to then find out it wasn’t communicated well and then finally that the idea didn’t help anyway and we want it in a different way.
Yes yes I know someone is about to say that now there’s pressure to push the prototype out but that’s an organisational level problem that existed anyway.
And small problems can much faster to solve as well, or even move away from devs. Often people just need some text changed somewhere or html putting together, or some basic code for analysis. They could understand the logic, but the task of writing it from scratch and how to run things may be too much - now you don’t need to prioritise work for a dev to get some sql written and they can spend their time on the larger more software engineering level problems.
"that’s an organisational level problem that existed anyway"
That's very true to many organizations. One cannot just slap an AI tool on it when you are dealing with fundamental organizational problems in the first place.
"they can spend their time on the larger more software engineering level problems"
For sure, devs still needs to focus on the right type of work and maintain the balance. I built a tool to just do that: https://worktypefocus.com/
I've seen proposals for Product Managers to define those conditions themselves by speaking with the LLM. A continuing architectural diagram is constructed and graph is updated until all cases are covered and then the LLM writes the code, writes the validations, pushes to CI environments, runs tests, schedules prod deploy (by looking at company event schedule), gets CAB approval, deploys code, tests in prod, and fixes regressions.
I'm not saying this is the correct thing, but companies are implementing it and it is "working". I don't think keeping our head in the sand is helping.
> I've seen proposals for Product Managers to define those conditions themselves by speaking with the LLM.
But the LLM is not aware of how the business works and why, so someone needs to work with the business to extract the information. Typically it's not well documented.
> someone needs to work with the business to extract the information. Typically it's not well documented.
LLM extraction of the information from the Product Owner is becoming the way to overcome poorly-documented business context.
Non-technical folk are using things like `/grill-me` [0] to seed the LLM with the long-tail complexities that they didn't know they didn't know they needed to put out.
They can ask, they can do a back and forth and they can write documentation to be used from that point onwards and write it in a common style and structure.
These are language models, being able to talk through something with them and have them extract some information is what they excel at. Given that you’d probably get a halfway decent result with a literal fixed set of questions (an Eliza level docbot) gpt 5.5 is going to nail that as a task.
is it working though? The main outcome we've seen with companies that drink the AI Kool aid en masse is buggy unstable systems. clearly there's a level of rigor that's being missed for ship velocity
> "We don't understand intelligence, so you literally have no idea whether what we recognize as intelligence is some suitable arrangement of "statistical token generation""
Do you mean "token" as in the LLM sense?
Or are you thinking that thoughts in the human brain are also constructed out of some sort of underlying "token" even though the abstract thought happens and is held before any words are used to try to communicate that thought to an external party?
LLMs also don't run on tokens internally, they're just the inputs and outputs. The reasoning models do operate (partially) in the token space, but then so do I.
> the thought happens and then the words are generated to reasonably describe that thought.
Thoughts don't happen in a vacuum, they are triggered by external or internal stimuli, and these stimuli/thought precursors could very easily be tokens (dense info packets), which then map to latent space vectors, which very well could be thoughts.
Claims like "humans don't operate the same way" has no basis. Not only do we literally not know how humans operate mechanistically, and so we literally don't know the logical structure of human thought, but any system that is Turing complete is so easy to create that many wildly different mechanistic systems are fundamentally equivalent/interconvertible.
> Thoughts don't happen in a vacuum, they are triggered by external or internal stimuli, and these stimuli/thought precursors could very easily be tokens (dense info packets), which then map to latent space vectors, which very well could be thoughts.
Yes, possible, that's why I asked you above if that's what you meant by "token". Someone else responded and I didn't notice it wasn't you.
> Claims like "humans don't operate the same way" has no basis. Not only do we literally not know how humans operate mechanistically, and so we literally don't know the logical structure of human thought, but any system that is Turing complete is so easy to create that many wildly different mechanistic systems are fundamentally equivalent/interconvertible.
I think this position is too extreme, we do have some information.
We know how LLM's work when generating a sequence of words and I know that my brain does not work the same way for word generation because I am fully aware of the complete thought in advance of any words getting generated by me externally or internally.
I know prior to generating words that my thought is X and the words I'm about to produce need to express that thought.
But with LLM's we know that the essence of what they produce is not known in advance, that it must complete the word generation process to fully realize the end result and that multiple different end results are possible.
What I'm saying is that this is incorrect. An "idea" exists within a model before it generates tokens. This property does not distinguish humans from LLMs.
Additionally "from learned stats" doesn't disambiguate between a wider variety of things. I'm not aware of any other way to acquire knowledge from measurements. I'd bet that humans do this differently, based on the fact the humans can get further with less training data and that they learn actively during operation, but not so differently that 'learning stats' would be an inaccurate description.
> What I'm saying is that this is incorrect. An "idea" exists within a model before it generates tokens.
If that were the case, then the systems would generate words based on the fully resolved idea, but that is not how the LLM systems currently work (per vendors descriptions).
They choose words sequentially and both the specifics of the input as well as the chosen output words significantly impacts not just the rest of the output but the very correctness of the output.
> but not so differently that 'learning stats' would be an inaccurate description.
Agreed, humans are generalizing using some mechanism that can be modeled with math.
But the execution of our reasoning and thought processes is not obviously similar to LLM's next word generation based on probabilities.
>that is not how the LLM systems currently work (per vendors descriptions)
Anthropic says of the their model[0]:
"""Claude sometimes thinks in a conceptual space that is shared between languages, suggesting it has a kind of universal “language of thought.”
{...}
Claude will plan what it will say many words ahead, and write to get to that destination. We show this in the realm of poetry, where it thinks of possible rhyming words in advance and writes the next line to get there. This is powerful evidence that even though models are trained to output one word at a time, they may think on much longer horizons to do so."""
Anthropic also created 'golden gate claude'[1] by identifying the region of its architecture that corresponded to the concept of the golden gate bridge and activating it. What would such a region exist for if claude could only think one token at a time?
>the execution of our reasoning and thought processes is not obviously similar to LLM's
"Not obviously similar" I can agree with. I don't think you've identified a way in which they are obviously different, though.
> Notably, his essay “no silver bullet” states that there has never been a new technology or way of thinking or working that has led to a 10X increase in the speed of software development.
I think the statement is generally true, but I also think there are specific tools that do vastly increase dev time for something that follows that specific pattern.
Program templates and also code generation have been used forever to simplify and speed up specific types of programs, like all of the CRUD programs in an ERP system for maintaining the hundreds to thousands of different master data like Customers, etc.
Those approaches can speed a specific activity that follows a specific pattern by some multiplier (e.g. 4x or 10x).
But the trade-off is time to build the template or code gen vs number of instances it will be used.
I think where "no silver bullet" has the strongest case is for something completely new where the patterns and templates have not been established.
When I think about LLM's and compare to past templates and code gen, I wonder what exactly is the nature of the LLM trade-off in the broad sense. For a template, it's pretty clearly up front design+dev compared to number of instances of usage of the template.
LLM's are more general and more broadly applicable, but still not the same as a human. Is the trade-off purely cost that was sunk into the model and tool that now needs to be recouped by the AI company? (e.g. charging the actual $1,000 to $1,500 that it really costs instead of $200
Or are the big trade-offs and costs going to show up in the future: the dwindling of human expertise due to reliance on the tool and the loss of understanding of complex systems.
> but I find it almost as hard to see why evolution did.
Ignoring the concept of consciousness, it seems that self-awareness would be a strong attribute related to survival. It seems like it would help drive or amplify critical emotional states (e.g. my own survival, competition/success, love for self and relatives, etc.)
I can't see anywhere in the LLM machinery that would support the notion of self awareness in advance of the token selection process.
Possibly it could be argued that during token selection internal state is included and the result functionally looks like self awareness was included in the process, but that seems unconvincing.
Yeah self-awareness is a very different thing, and I agree it's easier to see how evolution would produce this. Many apparent signs of self-awareness in LLMs are probably baked into the models at end via post-training (RLHF), where they learn to behave as conversation agents and maintain a more consistent personality. The raw model probably shows no signs of self-awareness. In fact, I'm pretty sure that LLMs learn that they are LLMs only through post-training.
the latent space of the LLM when it chooses each token is 10s or even hundreds of GB for each word that it chooses. It's not really useful to look at LLMs from the perspective of its prediction head which is a very small part of the model.
Agreed there is significant information in the latent space, but what is missing is a fully resolved "thought" based on that information plus current context plus validation against an internal working model of the world.
Except that latent space does not change in response to new information, something that thoughts famously do. If you read a book that captures the author's thoughts, disagree, and write an eloquent arguments to the author, you might change the author's mind. But you will not change the "book's thoughts" on the subject.
Latent spaces are maps of thoughts other people have had, not the thoughts themselves.
This gets a bit tricky. Over very long task contexts (1M tokens) or with prompt compression (10s of millions of tokens) the model can alter its priors based on updated evidence. This form of knowledge based learning is not necessarily robust, but demonstrably does occur.
In addition to what you said, the idea that the endpoint should determine state and drive the possible sequence of actions is not obviously universally a good thing.
Many projects involve stringing together capability from different systems (via their api's) and creating essentially a new business layer on top of the combination of those systems. Now the abstract high level state is really managed by the new system and the component systems are just providing foundational capabilities.
The idea of HATEOS seems interesting but with limited application.
Agree, we are in the stone age in software design and dev. We have not figured out a good way to communicate the design of complex systems in a way the business can understand.
reply