>In the world of harness development I think that's an interesting question to answer!
The challenge isn't about harness development though, and a sufficiently complex harness can solve these tasks rather easily.
And presenting it as if you've made a novel development for solving ARC-AGI-3 leads me to believe you're willing to waste all of our time for your benefit at every step in the future.
> a sufficiently complex harness can solve these tasks rather easily.
I claim this is not so easily done, and earlier iterations of ARC-AGI did not have the constraint in the first place. You want something that generalizes across all puzzles (hopefully even the private ones), and these puzzles are extremely diverse ... and hard; telling the model the controls and some basic guidelines for the game is the only "obvious" thing you can do.
The other point of my reply was efficiency, both in terms of creating and using the harness; the discussed solution is something that anyone (in fact, likely even an LLM itself) can cook up in a few minutes; it's not much more than a game control wrapper so the agent can play around with the game in live python and some generalities as laid out in the prompt.
(But I'm always happy to be proven wrong. What harnesses did you have in mind?)
The harness seems extremely benchmark specific that gives them a huge advantage over what most models can use. This isn't a qualifying score for that reason.
Um, yes this is a extremely specific as a benchmark harness. It has a ton of knowledge encoded about the tasks at hand. The tweet is dishonest even in the best light.
The hard part of these tests isn't purely reasoning ability ffs.
S300 is very good AA, but in practice modern SEAD with a sizeable number of planes can outrange them and they're not great at protecting themselves. We saw this in India-Pakistan and seeing this again in Iran-USA. You can see more of a stale mate when they aren't getting outranged in Ukraine-Russia.
There's a few of these guys that make posts about technology that doesn't materialize after a few years, they can be ignored. There are plenty of pro-China observers that offer grounded analysis of Chinese military-industrial base out there that don't make claims that China has unobtainium technology. /r/LessCredibleDefence has a shortlist of these propagandists.
Yeah it's certainly unimaginable that the civilization that invented gunpowder, cannons, guns, rockets a thousand years ago can make it for cheap now :)
'Hypersonic' missile makes it sound like it's alien technology, no it's solid boosters that do not follow the usual ballistic trajectory with a computer from 1970.
The raw materials cost less than half of a standard car.
> Mach 5, high maneuverability, inside the atmosphere.
Out of these, Mach 5 and inside the atmosphere have been doable for several decades. Pretty much all countries that make missiles can make missiles with these two characteristics.
My point, which you seem to either misunderstand or deliberately misrepresent, is the other one - "maneuverability" - being the distinguishing factor for what we call hypersonic missiles. That makes these difficult to defend against.
Think of it like calling humans hyper-limbed animals, but limbs being not what really distinguishes humans from, say, chimpanzees.
That's pretty much the entire point of what people are calling hypersonic missiles. All ballistic missiles fly at hypersonic speeds. The advance is being able to do so at low altitude with maneuverability.
You are correct, but I should point out that Russia has described its Kinzhal missiles as hypersonic, when they are really more of a traditional ballistic missile fired horizontally. So very fast (Mach 10), but not as maneuverable as what the U.S. has been calling hypersonic.
Since the original story here does not provide many details, we can't know which side of that fence this falls on (assuming it is real).
Was there any evidence that the Kinzhals fired, for example, toward Kyiv during the current conflict were fired on a depressed trajectory? I remember reading one account that looked like a plain old interception of a ballistic missile. (which is impressive enough to someone who remembers when "Patriot missile" was not exactly synonymous with excellence)
> That's pretty much the entire point of what people are calling hypersonic missiles.
Most missiles endowed with the "hypersonic" moniker are simply theater ballistic missiles used for standard ballistic missile things, which is part of why I asked the question.
> The advance is being able to do so at low altitude with maneuverability.
Hate to burst your bubble but arms dealers and governments are as capable as anyone else of marketing spin.
Every security engineer I know working at Azure is on the verge of self-harm because of the current situation, or is the dumbest IC I've ever met and somebody I think should have never become a security engineer. Sample size ~12.
I am not very close with every one of these engineers, and some no longer work at MSFT, but yes talking to employees in Seattle working on security made me never want to use Azure.
When it is online, I agree with things asides from the "fast" part, actually. But many companies have a secondary service for async comms/chat when being Teams cannot be online, and compared to Slack.
The challenge isn't about harness development though, and a sufficiently complex harness can solve these tasks rather easily.
And presenting it as if you've made a novel development for solving ARC-AGI-3 leads me to believe you're willing to waste all of our time for your benefit at every step in the future.
reply