I wish a smarter person would research or comment on this theory I have: Trainin...

drdaeman · 2025-11-14T03:14:51 1763090091

That's basically how "AI detectors" work, they're just ML models trained to classify human- vs LLM-generated content apart. As we all (hopefully) know, despite provider claims, they don't really work any well.

Semaphor · 2025-11-14T08:59:54 1763110794

In a non-adversial context (so when the author isn't disclosing it, but also not actively trying to hide it), AI image detection is giving me great results.

I think (currently) the problems are more about text, or post processing of other media to hide AI.

VHRanger · 2025-11-14T03:22:55 1763090575

Correct, hence slopstop leveraging other signals than just the content

Animats · 2025-11-14T02:55:45 1763088945

Something like that would probably work for six months. This is going to be like CAPCHAs. Schools have been trying to do this for essays for years. They're failing. The machines will win.

delves

fnord

raincole · 2025-11-14T04:46:08 1763095568

It might work for real photos vs AI-gen photos, but I really don't see how 'entropy' is so important when distinguish human-gen text from Ai-gen text.

I also don't see why AI can't be trained to fool this detection.

VHRanger · 2025-11-14T03:21:54 1763090514

There's already methods that attempt that.

It works for images because diffusion models leave artifacts, but doesn't work so well for text.

Text is an incredibly information dense data format. The diffusion artifacts kind of sneaks into the "extra data" in an image.

The other part is that GPT style models are effectively explicitly trained to minimize that entropy you're mentioning.

veunes · 2025-11-19T13:19:52 1763558392

The idea is interesting, but it's still operating within the content analysis paradigm. As soon as entropy-based detectors become popular, the next generation of LLMs will be specifically fine-tuned to generate higher-entropy text to evade them.

It's a cat-and-mouse game where the generator will always be one step ahead. It's far more robust to analyze things that are hard to fake at scale: domain age, anomalous publication frequency, and unnatural link structures

raxxorraxor · 2025-11-14T07:55:56 1763106956

I doubt AI slob is the solution of AI slob, far too error prone. Problem is we already had a slob advertising/attention economy, AI just made the problem more visible.

Any AI model can easily increase entropy by adding info bits and we would have a weird AI info war where people will become victims. If you consume info we deal with unknown spaghetti. Generating false info is too easy for a model.

throwaway2037 · 2025-11-14T04:23:08 1763094188

    > Consider the "will smith eating spaghetti test"

I thought this was a casual joke... then I Googled it. Yep, it's real: Consider the "will smith eating spaghetti test"

throwaway2037 · 2025-11-14T11:14:19 1763118859

I cannot edit my original post, but I meant to include the Wiki link: https://en.wikipedia.org/wiki/Will_Smith_Eating_Spaghetti_te...

Grosvenor · 2025-11-14T03:19:21 1763090361

That's basically the entire idea behind GANs - Generative AI.

nalekberov · 2025-11-14T07:29:52 1763105392

That would flag poorly encoded videos too.

Another problem is AI generators will try to find “workaround”s to bypass this system. In theory sounds good, in practice I doubt it would work.