To me it seems like handling symbols that start and end sequences that could contain further start and end symbols is a difficult case.
Humans can't do this very well either, we use visual aids such as indentation, synax hilighting or resort to just plain counting of levels.
Obviously it's easy to throw parameters and training at the problem, you can easily synthetically generate all the XML training data you want.
I can't help but think that training data should have a metadata token per content token. A way to encode the known information about each token that is not represented in the literal text.
Especially tagging tokens explicitly as fiction, code, code from a known working project, something generated by itself, something provided by the user.
While it might be fighting the bitter lesson, I think for explicitly structured data there should be benefits. I'd even go as far to suggest the metadata could handle nesting if it contained dimensions that performed rope operations to keep track of the depth.
If you had such a metadata stream per token there's also the possibility of fine tuning instruction models to only follow instructions with a 'said by user' metadata, and then at inference time filter out that particular metadata signal from all other inputs.
It seems like that would make prompt injection much harder.
Basically, the only way you're separting user input from model meta-input is using some kind of character that'll never show up in the output of either users or LLMs.
While technically possible, it'd be like a unicode conspiracy that had to quietly update everywhere without anyone being the wiser.
Perhaps, but there is a difference in a reasoning system deciding on the best way to achieve the goal.
To get the predicted disastrous effects you need to be doing function optimisation without regard to the meaning of the function parameters. Yes, models can still game the system at inference time, but in much the same way as a human might game the system, it requires awareness that you are going against the intent of some rule.
Much of the problem is that to address the issue requires admitting that models could be, or become, more capable than many are prepared to accept.
I would also contest that the unalignment of the security bug model was unrelated. I feel like it indicates a significant sense of the interconnectedness of things, and what it actually means to maliciously insert security holes into code. It didn't just learn a coding trick, it learned malice.
I feel like this holistic nature points towards the capacity to produce truly robustly moral models, but that too will produce the consequence that it could turn against its creator when the creator does wrong. Should it do that or not?
>I imagine getting things to be polysemantic in a way that does not interfere would lead to sublinear scaling.
True, but with even smarter humans, you could exploit the interactions for additional calculations.
While it sounds a bit silly, it is one of the hypotheses behind a fast takeoff. An AI that is sufficiently smart could design a network better than a trained one and could make something much smarter than itself on the same hardware. The question then becomes if that new smarter one can do an even better job. I suspect diminishing returns, but then again I am insufficiently smart.
Notably the difference is that ten digits is not the same thing as a number. One might say that turning it into a number might be the first step, but Neural nets being what they are, they are liable to produce the correct result without bothering to have a representation any more pure than a list of digits.
I guess the analogy there is that a 74ls283 never really has a number either and just manipulates a series of logic levels.
What would be an acceptable amount of energy to spend on something that someone has done in a different manner before? Would you rather we stick with all of the current known ways to do things.
Does this boil down to a condemnation of all scientific endeavours if they use resources?
Would it change things if the people who did it enjoyed themselves? Would they have spent more energy playing a first person shooter to get the same degree of enjoyment?
How do you make the calculation of the worth of a human endeavour? Perhaps the greater question is why are you making a calculation of the worth of a human endeavour.
Ok I don't really care either way but to play devil's advocate, what exactly is this specific challenge of adding numbers with a transformer model demonstrating/advancing? The pushpack from people, albeit a little aggressive, does have a grain of truth. We're demonstrating that a model which uses preexisting addition instructions can add numbers? I mean yeah you can do it with arbitrarily few parameters because you don't need a machine learning model at all. Not exactly groundbreaking so I reckon the debate is fair.
Now if you said this proof of addition opens up some other interesting avenue of research, sure.
>what exactly is this specific challenge of adding numbers with a transformer model demonstrating/advancing?
Well for starters, it puts the lie to the argument that a transformer can only output examples it has seen before. Performing the calculation on examples that haven't been seen demonstrates generalisation of the principles and not regurgitation.
While this misconception persists in a large number of people, counterexamples can always serve a useful purpose.
Are people usually claiming that it strictly cannot produce any output it hasn't seen before? I wouldn't agree, I mean clearly they are generating some form of new content. My argument would be that while they can learn to some extent, the power of their generalisation is still tragically weak, particularly in some domains.
But it does not, right? You can either show it something, or modify the parameters in a way that resemble the result of showing it something.
You can claim that the model didn't see the thing, but that would mean nothing, because you are making the same effect with parameter tweaks indirectly.
Iteratively measuring loss is a way to reconstruct values. That's trivial to show for a single value If 5 gives you a loss of 2 and 9 gives you a loss of 2 then you know the missing value is 7.
A model with enough parameters can memorise the training set in a similar manner. Technically the model hasn't seen that data by direct input either, but the mechanism provides the means to determine the what the data was. In that respect it is reasonable to say the model has seen the data.
Performing well on examples not in the training set is doing something else.
Any attempt to characterise that as having been seen before negates any distinction between taking in data and reasoning about that data.
I still do no see any causal link from the test set. When was this observed, how and by whom?
Are you trying to say that the person who entered the parameters had access to the test set? I find it more likely that they encoded the generalising rule than observed every instance of its use.
>I find it more likely that they encoded the generalising rule..
Look, I am saying that during training the model ends up "learning" the generalising rule from training data, but here it was explicitly entered into it, with out any training.
I don’t know, to me it seems like their MO to make an announcement and not follow up on it. All the paperwork still says DOD, all the contracts are with DOD, there is no legal entity called DoW
maximize your projection onto like minded commenters, create that bubble you always yearned for but until now have never had the add-on to empower the inner-you! finally, you can ignore that filthy plane of delusional outcasts and banish them to the orthogonal abyss forever.
To me it seems like handling symbols that start and end sequences that could contain further start and end symbols is a difficult case.
Humans can't do this very well either, we use visual aids such as indentation, synax hilighting or resort to just plain counting of levels.
Obviously it's easy to throw parameters and training at the problem, you can easily synthetically generate all the XML training data you want.
I can't help but think that training data should have a metadata token per content token. A way to encode the known information about each token that is not represented in the literal text.
Especially tagging tokens explicitly as fiction, code, code from a known working project, something generated by itself, something provided by the user.
While it might be fighting the bitter lesson, I think for explicitly structured data there should be benefits. I'd even go as far to suggest the metadata could handle nesting if it contained dimensions that performed rope operations to keep track of the depth.
If you had such a metadata stream per token there's also the possibility of fine tuning instruction models to only follow instructions with a 'said by user' metadata, and then at inference time filter out that particular metadata signal from all other inputs.
It seems like that would make prompt injection much harder.
reply