Hacker Newsnew | past | comments | ask | show | jobs | submit | ml_more's commentslogin

We did a test of GPT5 yesterday. We asked it to generate a synopsis of a scientific topic and cite sources. We then checked those sources. GPT5 still hallucinated 65% of the citations. It did things like: Make up the paper title Make up the authors for a real paper title Mix a real title and a real journal If it can't even reference real papers it certainly can't be trusted to match up claims of fact with real sources.

Current AI tools generate citations that LOOK real but ARE fake. This might not be solvable inside the LLM. If anyone could do it, it'd be OpenAI. (OK maybe I'm giving them too much credit, but they have a crap-ton of money and seem to show a real interest in making their AI better)

If it can't be done in the LLM we can't trust LLMs basically ever. I suppose there's a pretty big loophole here. Doing it outside the LLM but INSIDE the LLM product would be good enough.

The first AI tool to incorporate that (internal citation and claim checking) will win because if the AI can check itself and prevent hallucinated garbage from ever reaching the user we can start to trust them and then they can do everything we've been promised. Until that day comes we can't trust them for anything.


Google already did this, give free gemini deepresearch a spin. It's not perfect, but I have a feeling you'll be surprised if this is your honest impression.


I can't decide who should be more mortified the publisher or the author. The good news is that publishers will soon have access to automated citation checking.

https://groundedai.company/veracity/publishing/


The problem is that LLMs are just convincing enough that people DO trust them which is sort of a problem since AI slop is creeping into everything.

What can be done to solve it (while not perfect) is pretty powerful. You can force feed them the facts (RAG) and then verify the result. Which is way better than trusting LLMs while doing neither of those things (which is what a lot of people do today anyway). See the recent 5 cases of lawyers getting in trouble for ChatGPT hallucinating citations of case law.

LLMs write better than most college students so if you do those two things (RAG + check) you can get college graduate level writing with accurate facts... and that unlocks a bit of value out in the world.

Don't take my word for it look at the proposed valuations of AI companies. Clearly investors think there's something there. The good news is that it hasn't been solved yet so if someone wants to solve it there might be money on the table.


> and that unlocks a bit of value out in the world.

> Don't take my word for it look at the proposed valuations of AI companies. Clearly investors think there's something there.

Investors back whatever they think will make them money. They couldn’t give less of a crap if something is valuable to the world, or works well, of is in any way positive to others. All they care is if they can profit from it and they’ll chase every idea in that pursuit.

Source: all of modern history.

https://www.sydney.edu.au/news-opinion/news/2024/05/02/how-c...

https://www.decof.com/documents/dangerous-products.pdf


> Investors back whatever they think will make them money.

A not-flagrantly-illegal example of this might be casinos, where IMO it is basically impossible to argue the fleeting entertainment they offer offsets the financial ruin inflicted on certain vulnerable types of patron.

> All they care is if they can profit from it

Notably that isn't the same as the business itself being profitable: Some investors may be hoping they can dump their stake at a higher price onto a Greater Fool [0] and exit before the collapse.

[0] https://en.wikipedia.org/wiki/Greater_fool_theory


> They couldn’t give less of a crap if something is valuable to the world

"The world" is an abstraction: concretely, every bit of value that is generated within that abstraction accrues to someone in particular -- investors in AI projects, for example.


How do you check it?

Take the example of case law. Would you need to formalize the entirety of case law? Would the AI then need to produce a formal proof of its argument, so that you can ascertain that its citations are valid? How do you know that the formal proof corresponds to whatever longform writing you ask the AI to generate? Is this really something that LLMs are suited for? That the law is suited for?


Sure, using RAG is great, but it limits the LLM to functioning as a natural-language search engine. That's a pretty useful thing in its own right, and will revolutionize a lot of activities, but it still falls far short of the expectations people have for generative AI.


> Clearly investors think there's something there

Of course. Because enterprise companies take a long time to evaluate new technologies. And so there is plenty of money to be made selling them tools over the next few years. As well as selling tools to those who are making tools.

But from my experience in rolling out these technologies only a handful of these companies will exist in 5-10 years. Because LLMs are "garbage in, garbage out" and we've never figured out how to keep the "garbage in" to a minimum.


I'd buy a mini PC on craigslist for $100-200. You can usually get one with 16gb of ram and i5, a 500gb ssd for that. Install a debian based linux on it (ubuntu is very common and common is good for training) Be sure to get one with the power adapter if you go ebay.

Configure it fresh from install using ansible. Prefer ansible built-ins when possible. If you're calling shell scripts you're probably doing it wrong. Create named user accounts and create keys for that user. grant your named user sudo nopasswd and disable password login. Named users are preferable for security audit purposes. ("Shit who logged in at 2PM on tuesday and deleted critical resources?", "dunno it was the 'ubuntu' user", "great I wish we logged in as ourselves so we at least had some way of knowing who's account got compromised") Learn how to troubleshoot ssh login problems (-v), did you know that you can't log in if the permissions on the authorized keys file are wrong?

Enable and configure unattended-upgrades. Learn how UFW works and learn how to show iptables rules. Learn how to check which services are running, which are listening, which tcp connections are active. Learn how to write a systemd start/stop script (it's like 5 lines, super easy) Learn how to tail and grep logs to diagnose problems. Learn how to use find and grep.

When using ssl you'll probably use let's encrypt and you'll probably want to renew automagically with certbot and dns verification.

Create an elk stack, ship some logs to it. (maybe loki too) Hold off on k8s for now. It is advanced and you're wasting your time and your most precious resource (your ability to work through frustration) if you beat your head against it too early. When you learn k8s assume the cluster is already set up and learn to deploy a single service (there's minikube and k3s for this sort of thing). Anyone running k8s already has a lot of k8s skills (or they shouldn't be using k8s to begin with). Contribute in other ways. If they don't have a lot of k8s skills and they're using it 1) they don't need a junior, they need a senior and 2) they need to get onto something simple and easy like ecs. And if they already screwed up that bad they probably need to move to heroku. (but i digress)

Run through some command line practice, learn awk, sed, the general way commands are structured (command, options, arguments). Learn to chain commands. Learn vi (did you know you can call arbitrary shell commands?). Learn how to replace lines in config files with sed, learn how to concatenate content to the end of a file, learn how to do that for a file owned by root (hint, look into tee)

Learn how to deploy code with github actions

Learn Terraform Take the free online AWS solutions architect training courses and begin taking practice exams. The other clouds are clones. Azure with a preference for active directory, ghoogle with a preference for unnecessary complexity and pedantic bullshit.

Skip configuring a mail server, if you're doing that you're probably doing it wrong. (hint, you'll be hitting a mail sending service. Several of the small ones have a free tier which is great for learning). You'll need some more info here but you can search it up.

When learning bash scripting focus on readability and maintainability. The Google style guide is a great reference. https://google.github.io/styleguide/shellguide.html In bash there are 5 ways to do something, the best way is the most readable way. The person who thanks you may very well be yourself.

While we're on the topic, Google's SRE book is free online and epic. Learn the section on postmortems backwards and forwards. If you got a job and only brought the google postmortem template and culture with you you could improve nearly every company in the world. Same with improving their code testing and deployment. Same with doing simple cloud security and cost accountability tasks (like reviewing network perimeter and cost control suggestions in trusted advisor, finding unpatched servers and patching them, resolving dependbot vulnerability notices, etc.)


We do that for ourselves since some customers require it. Happy to help on a contract basis. https://groundedai.company/services/

We haven't put a lot of marketing into the service, maybe we should do that.


Find a community where people are pretty chill and ask. You'd be surprised how often simply asking for what you want works. (perhaps that is lesson #1)

Post some contact info in your bio, maybe someone here is looking for a mentee.


It is ironic that the guy who wanted to build a free speech platform cannot tolerate criticism.


Elizabeth? Is that you? I thought you were on your way to prison.


The film GATACA explores the endgame of this strategy quite nicely.


I was tested recently and my doctor said I had dangerously low Vitamin D. It was so low they were going to proscribe high dose supplements. I eat very healthy, mostly salads and a bit of dairy and fish here and there. I bet a lot of people didn't get enough sunlight during covid. Also people with darker skin need more sun exposure to get enough. (Among Black people in snowy climates something like 70% of the population have a Vitamin D deficiency)

If you take over-the counter supplements and follow the instructions you're unlikely to overdose.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: