Benefit of doubt is a great concept. The other extreme is: extraordinary claims require extraordinary evidence. Why should we restrict ourselves to a binary choice? Can we not think in a more nuanced fashion, in Bayesian terms? In other words look at all available evidence and assign probabilities?
"We are the next DeepMind" is easy to say ... The DeepMind founders had a stellar predigee in computer games, AI and neuroscience, the Verses founders have a cryptocurrency background.
Verses also released [1] last month. What both the Atari and the Mastermind announcements have in common is the lack of details, including
code. Why do they not show their code? How do we know their figures are real? We've just had the OpenAI vs FrontierMath discussion [2, 3]. Presumably, being able to play Pong, a 1972 computer game, is unlikely to be their moat ...
Interesting also their 2024 MLST presentation [4]. Does that inspire confidence? It was that video that made my priors on Friston having had a breakthrough in ML change downwards dramatically ... But do not take my word for it, please make up your own mind.
So I'm not the only who wonders about the hyperbole emanating from Friston et al! Some more morsels:
- The CEO is an "International Bestselling Author" [1].
- The company blog states that Friston has "successfully [decoded] the underlying mechanisms of intelligence as it functions in the brain and biological systems" [2].
But they got $10M investment from G42, an Emirati VC [3]. Note that G42 have also invested in Cerebras and OpenAI [4]. So their PR works.
Friston is a bona fide world-famous neurobiologist (although renowned mostly for his papers being completely undecipherable).
If I were a VC, I'd give him $10M no questions asked for the small chance he's on to something. I'd expect him to be able to raise a $100M seed. So for me this is evidence against.
edit: he seems to have joined only in 2022, they were 4 years old at the time.
Agreed, Friston's Bona Fides are impressive. (Aside: His fame in neuroscience comes from him having written important FMRI software that everybody cites.)
That's also why I worked with his team and read a lot of his papers for a while.
His principal idea was originally that neurons perform free energy minimisation. This idea makes a lot of sense, once you understand what free energy means. But, to the best of my knowledge, it has not at all been empirically verified for neurons (I'd be delighted to be proven wrong in this belief). So he went the route of generalising the free energy principle: "the free energy principle asserts that any “thing” that attains a nonequilibrium steady state can be construed as performing an elemental sort of Bayesian inference".
Terms
"can be construed"
and "elemental sort of Bayesian inference" do a lot of work here. Updating and generalising one's research hypothesis is legitimate (albeit one could be more explicit about this), but it weakens the claim being made. Anyway, under a charitable interpretation of those terms, I agree that this is
true, but, at the same time doesn't say much. Indeed, under the
charitable interpretation it basically equates doing free energy
minimisation with existence. Friston has lately said that the FEP is
not falsifiable.
Take it from the horse's mouth (i.e. a Verses employee): "the free energy principle just applies to stones, it applies to birds, it applies to any kinds of animals" on Machine Learning Street Talk [1].
Here is my current position: from a mathematical principle this general one cannot derive scalable ML algorithms!
> he seems to have joined only in 2022, they were 4 years old at the time.
The company founders have a cryptocurrency and (later) metaverse background.
Geoff Hinton was denied an academic position at the University of Sussex's CS department where he had done postdoc work (That department is now 'famous' for consciousness studies and integrated information theory https://osf.io/preprints/psyarxiv/zsr78. I bet they are kicking themselves now ...)
> "Academia will one day wake up, and realize that"
Charlie Munger famously said, "Show me the incentive and I'll show you the outcome" ...
> Geoff Hinton was denied an academic position at the University of Sussex's CS department
These kind of anecdotes are fairly common I believe. Understanding anyone's academic potential is enormously difficult, and the competition is fierce. Hindsight is 20/20 and whatnot.
Friston's work is (in)famous for being so vague as to be completely untestable. He has been criticised over the extreme vagueness of his ideas many times, and he has never given a good answer. Sometimes some of his followers try to make it testable as neuroscience or useful for AI. Both have failed so far. As far as I can see, leading working neuroscientists don't take this Friston / free-engery stuff seriously. He's even got a parody Twitter account now: https://twitter.com/farlkriston
He now claims to have the best COVID model based on free-energy. Can I please see the code and form my own opinion? Has anybody seen this code?
"The figures in these manuscripts can be reproduced using annotated (MATLAB/Octave) code that is available as part of the free and open source academic software SPM. The routines are called by a demonstration script that can be invoked by typing DEM_COVID or DEM_COVID_X at the MATLAB prompt. At the time of writing, these routines are available in the development version of the next SPM release."
In my view Friston's ideas are hardly vague. Hard to understand sometimes, yes, but when I have put in the effort to understand them I have always been rewarded.
I cannot see the "these routines [being] available in the development version of the next SPM release". The development version is a 111 MB zip file [1]. When I uncompress the file I get a big flat directory with 100s of files. Which of those is is the software used in the paper? I have a bad feeling about this. I don't see how the authors are displaying intellectual integrity by not releasing, concurrently with the paper, software for such an important problem public health issue.
ideas are hardly vague.
Hard to understand
The core intuition is easy to understand: brain predicts its observations including observations about itself (proprioception) and acts in a way to minimise surprise. This can be seen as a form of self-supervised learning in the terminology of contemporary machine learning. Lots of people have said somewhat similar things before at a similar level of vagueness. Nobody disagrees that "somehow" the brain learns about the world by prediction and interaction. The interesting question is to go beyond this vagueness: what exactly is the brain doing? Where exactly is the brain minimising 'free energy'? Can I have a testable prediction please?
If read literally, Friston's core intuition is false: people regularly and deliberately expose themselves to surprise, e.g. gambling, watching sports, speed dating. Now there are various ad-hoc fixes to save free-energy-minimisation, which should make the theory more testable, but Friston then has to state clearly which of the many conflicting ad-hoc fixes are in place, and explain how they manifest themselves in the brain! Friston has been confronted with those problems many times, but he basically ignores them.
Your challenges to the core intuition are predicated on a simplistic and uncharitable interpretation.
Gambling, watching sports, and speed dating all have secondary motivations (earning money, tribal success, potential to spread your genes, respectively), but what's more is that these are all arenas of controlled and quite specific surprise. You know exactly the type of surprise that you are going to get, and the satisfaction you get from being right or the post-rationalization you perform for being wrong are both useful to the human. Contrast this to the "surprise" of a global pandemic, or massive social unrest. No one knows what's going to happen next and so you have a large contingent of people who are desperately trying to enact conservatism of the "move things back to normal" flavor. This is a stress response, and the stress is induced by not knowing what kind of surprises lay ahead.
The latter is the kind of surprise that is being minimized in the free-energy framework.
I have explicitly stated that I am using a simplistic interpretation.
I am neither seeing that Friston has (A) produced anything even remotely resembling a testable framework this "kind of surprise that is being minimized in the free-energy framework" and (B) pointed to any plausible mechanisms in the brain that should that this is in fact "the kind of surprise that is being minimized". He just handwaves.
What clearcut evidence can you give me that humans minimise this "kind of surprise"? What evidence would you accept as falsifying this? Where does Friston make clear that "secondary motivations" don't count? Also making a super vague, unquantified statement like "large contingent of people who are desperately trying to enact conservatism ..." in defense of Friston / free-energy doesn't give me a lot of confidence in the social milieu that this theory comes from. All the more so, since my OPs explicitly criticised Friston for vagueness.
In the zip file, see the files toolbox/DEM/spm_COVID*
What follows is my understanding.
> what exactly is the brain doing?
This is outside my area of expertise, but it is updating brain states (whatever that turns out to mean, neural mass activity, individual neural activity), and parameters, likely candidates being neurotransmitters. The mechanism has been proposed to be message passing among hierarchical regions of the cortex.
> Where exactly is the brain minimising 'free energy'?
It is a global effect, but whenever a state or parameter are updated (again whatever those are found to be) the free energy decreases. If these turn out to be localized then that would be the (context dependent) "where".
> Can I have a testable prediction please?
The one I am most interested in is, since generative models are the core of active inference, if active inference is true then we should expect to be able to identify such models and setup conditions under which they update according to the FEP, including actions.
This is a difficult task and I suspect it will be shown in a simple biological system like C-elegans first. My own interest is in cyber-physical systems.
> If read literally, the core intuition is also false because people regularly and deliberately expose themselves to surprise, e.g. gambling, watching sports. Now there are various ad-hoc fixes to save free-energy-minimisation, but then which of them many conflicting ad-hoc fixes?
This is the dark-room argument, which as you suggest has been beat to death. I admit to not understanding what the problem is. If a system has an internal model that keeps it from exploring then it would die (of starvation). What states are surprising is all about the priors (that are designed by evolution presumably) and experience. I think it is also important to be clear that surprise is used in a very technical statistical sense.
Gambling etc is not the dark-room argument, I've explicitly left out the dark-room.
Coincidentally, Friston's treatment [1] of the dark room is not convincing, but it nicely illustrates Friston's tendency to make ad-hoc adjustments, for example in [1] he talks about "average" surprise, but there are many ways you can average. Which one is it? How for example do the 302 neurons of C elegans average? Saying this is a difficult task is correct given our understanding of neurons in 2020, but the fact that Friston seems to think Free Energy accomodates all possibilities means it in "not even wrong" territory. In it's current shape, Free Energy does not make interesting predictions for neuroscience, and none of the progress in AI/ML has come from the Free Energy millieu either.
If "surprise is used in a very technical statistical sense" means something concrete, precise, for example minimising KL-divergence of states, the question becomes: show me that this is what the brain does.
Or build an AI that does something that is competitive with other forms of contemporary AI.
Regarding average: which average in the sense of: average over what time window? Any specific choice here needs to be justified as happening in the brain.
Regarding "we will find brain models with observable activity that follows the FEP?": abstractly you are saying that your prediction for theory T is that we will eventually confirm T. This does not exclude anything, I can state this for any theory T whatsoever. (For fun, try to instantiate T with outlandish theories, e.g. with "We will eventually find weapons of mass destruction in Irak", or with plausible theories that have failed so far, e.g. "We will eventuallly see supersymmetry". Does your prediction rule anything out?)
Regarding variational Bayes, that was not invented by the Free Energy millieu.
Unfortunately it still works really well with students, who are not yet knowledgable enough to be able to recognise the Fristonian ideas for what they are--hot air--and who waste their time with active learning before realising that it's currently mostly hot air. I've seen this play out several times by now with my (current and former) students.
There are way to many probably unfixable issues at this point, from the top of my head:
* Lack of proper Linear and !Movable types, and being in a spot where adding them to the language would be a backward incompatible change. This results in complex abstractions, like Pin.
* Fallible destructors without an easy way to return failure. In Rust, destructors (Drop::drop) can fail, but the only way for them to fail is to unwind the stack, which doesn't play very well with the rest of the Rust error handling story. C++ gets this a bit better with destructors being noexcept(true) by default, and providing tools to query whether that's the case.
* Inconsistent handling of Out-of-memory errors. Initially, Rust decided that OOM errors would be fatal, and settled on using unwinding for them. Then realized that some projects driving Rust design, like Servo, actually needed to handle them (e.g. imagine downloading and image that's too big to fit in memory bringing your webbrowser down..), so they patched being able to handle OOM on top, by adding methods to some of the collections, like `Vec::try_push`, `Vec::try_reserve`, etc. This essentially means that there are two methods to do any operations on collections, that it is pretty much impossible to make sure that code that must handle all OOM errors for correctness does so, and does not call an unwinding method by default, etc. Its also one of the reasons why colelcitons parametrized by allocators (Vec<T, A>) its taking so long.
* Lack of parametrized modules, probably a decision that's just too hard to fix at this point.
* Lots of lang items for singletons, like global allocator, panic runtime, etc.
* Unsafe keyword being too coarse grained. Unsafe functions implying unsafe function body block, unsafe keyword used to allow different types of contracts, from dereferencing a raw pointer, to calling a target-feature function, ...
* const and mut raw pointers: this distinction adds pretty much zero value, and complicates certain types of code quite a bit, for no reason.
* too monolithic standard library: libcore adds floats suport, which essentially requires you to implement floats for embedded targets, for which they might make absolutely no sense. No way to prevent programs there from using floats, etc. No way to only implement the different parts of the standard library that make sense for a platform, like threading, etc. but instead requiring targets to just mock the standard library with "unimplemented!()" hacks that error at run-time when you try to use certain APIs.
* PartialOrd, Ord, PartialEq, Eq, traits are not very mathematically sound. They attempt to achieve what C++ spaceship operator attempts, but fail quite hard. They assume that there is only one "ordering" that might make sense for a particular type, but that's usually not true. For example, floats have many orderings (the classic partial order, and IEEE total order, etc.).
* Operator overloading traits like Add or PartialOrd mix half-backed semantics with operator overloading. The standard library ends up, e.g., implementing addition for Strings, to mean concatenation. So essentially any program that constraints generics using the Add trait and requires, e.g., associativity for being free of logic errors kind of fails when passed strings (it will work, but produce garbage).
* Iterator::next returns &Target, which means that you cannot implement Target to return a "proxy" reference type.
* Many of the standard library algorithms, like "sort" and "unstable_sort" only work on slices, there is no way to use them on Lists or other collections, requiring everybody to re-implement them. C++ got this a bit better.
* BinaryHeap is only a min-heap, reusing it as a max-heap is quite weird. In general, these std::collection API mistakes are consequences of having bad fundamental traits (like PartialOrd and friends mentioned above), and a lack of more meaningful Operation traits.
* std-library is a mixed bag, with some things being quite good thought out (Result, Option, The smart pointer types, Cell types), and others being kind of an after-thought (operator overloading traits, collections). Method names are also often inconsistently named across the different parts of the library, does not feel very cohesive.
* C FFI feels like an afterthought. Some things got bolted afterwards like va_args, unions, packed, etc. Its pretty much not clear at this point how to use these correctly, or whether they can be used correctly at all in C FFI. libc is one of the most used Rust libraries, and its a huge mess of duck tape, ABI incompatible depending on where you run your binaries, which resultes in crashes that are hard to debug, etc. It basically assumes that all operating systems that Rust will ever support never will break their platform ABI. Which is an assumption that no widely-used OS satisfies. How to pass C callbacks that panic, etc. is also still super weird.
And probably many more that I can't remember right now. These are just a small fraction of the issues I've encountered and that I believe are not fixable in a backward compatible way. None of them is, in isolation, a big deal. Whether they lead to a death by 1000 paper cuts at some point or not, only time will tell. Rust also has maaany more issues, that are hopefully fixable.
Wow, thanks for this beautiful post! I'm coincidentally about to publish a language similar to Rust which addresses #2, and I'm now seeing perhaps I should improve a lot of these other things!
I've been looking for someone knowledgeable about Rust's benefits and drawbacks, to learn how the language compares and how to best explain it, would you be willing to chat?
(My email is my username @gmail.com, and I'm also on discord at Verdagon#9572)
The C FFI worries me because if I choose Rust for our project I'll need to interface a huge number of C / C++ libraries with the Rust core. OTOH, Mozilla seems to be doing fine mixing Rust with C++.
Its not that it doesn't work, but rather that it is not guaranteed to work, and the design is basically a pile of duck tape. For example, you can use Rust arrays in a C FFI declaration, but that won't do what you think it does:
Why not?
The prime minister of Singapore is an active C++ programmer, and has shared
source code on his Facebook page, asking for bug reports [1].
Code is at [2].
France has been world-leading in verification, e.g. CompCert and Coq come from INRIA, model-checking was co-invented in France. This stuff is largely language independent. Yet the big sellers of this kind of stuff (e.g. EDA software from Synopsys,
Cadence, and
Mentor) is in the US.
That doesn't invalidate my points. SAP is a market leader and comes from Germany. A lot of leading finetch also comes from Germany or UK. A lot of major antivirus vendors in the Czech Republic or Romania, etc. Berlin, Barcelona, Paris, London are all big IT tech innovation and startup hubs in Europe.
Yet the overall scene is not quite as dynamic as the one on the other side of the Atlantic. Startups never seem to be as well funded or advertised. People take fewer risks here, there's no obvious culture of risking it all to found a startup. The whole ecosystem is not designed around this. Maybe this is changing now but today the market really isn't flooded with local products but rather with products of Silicon Valley. A lot of investments in European startups still come from SV instead of being local. There are some products that are big in one European country but don't really seem to make it over the border and it's probably because Europe is not as unified a marked as it could be.
I agree with most of your points, and the huge unified, and rather homogeneous market is a core advantage of the US in certain product categories. Since you mention SAP:
clearly, SAP is successful in a space where Europe's heterogeneity should be a problem -- different legal systems, different accounting rules etc ... and yet SAP succeeded.
Maybe it was because SAP was founded in 1972, half a century ago, when European decline was not as pronounced as it is in today?
- https://www.activeinference.institute/
- https://spatialwebfoundation.org/