The Toonify produces avatars which match the source photo in terms of detail and feature correspondence. In that regard, at first take, the algorithm might appear to be working at a near pixel level of granularity. However, at least in the examples I tried, there is a loss of similarity and recognisability, to the extent that I don't think my friends would get that my avatar is me, and I don't get that feeling about the avatars generated from pictures of my friends. For that reason, I am wondering if the algorithm is based on feature recognition (eyes, nose, mouth, eyebrows etc), followed by a feature level of replacement/distortion rather than pixel level?
The celebrity examples might be a different case in terms of our recognition sentiment, because of their familiarity, and the juxtaposition of a number of phenotypically diverse celebrities in one set, which gives a correspondence in the diversity of the input set and the output set.
What modifications would be required to the algorithm to score consistently higher in facial recognition of the avatar by people who view the avatar cold without access to the original photo?
I'm asking rhetorically, of course. Such research might take a long time. Toonify looks like a good foundation for what might be an open ended project.
Specifically, there's this snippet:
"These StyleGAN face models can produce a huge diversity of faces and it’s actually possible to find basically any face inside the model. It’s actually a straight forward process to search for any image of a face in the model. So given an example image you want to find you can find a “code” (aka latent vector) which, when given as an input to the model, will produce an output which looks almost exactly like the face you’re looking for."
So it seems like they first generate an image that is similar to yours with one model, and then translate that too its more cartoony counterpart.
The celebrity examples might be a different case in terms of our recognition sentiment, because of their familiarity, and the juxtaposition of a number of phenotypically diverse celebrities in one set, which gives a correspondence in the diversity of the input set and the output set.
What modifications would be required to the algorithm to score consistently higher in facial recognition of the avatar by people who view the avatar cold without access to the original photo?
I'm asking rhetorically, of course. Such research might take a long time. Toonify looks like a good foundation for what might be an open ended project.