Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Another thing I see a lot is scammers scraping my blog's RSS feed and re-publishing my articles on their site, then filling it with adware. Sometimes the page they show to googlebot is completely unrelated to the page you get when clicking through.


I thought Google heavily penalized your site if the human version was different from the googlebot version


The fact that Pinterest links still appear in every search suggests this isn't true, or isn't very true.


How do they detect that? Presumably human review, which can't possibly cover every malicious page on the internet. I assume if you report the site they queue it up to be scanned by a human, unless their solution is just to have versions of googlebot that are harder to detect - possible, but if someone is already going out of their way to trick googlebot, I don't know how well this would work in practice.

As a starting point, your not-googlebot needs to spider sites differently from googlebot (so it can't be detected by traffic analysis), imitate average user hardware well (GPU acceleration + high GPU performance, more realistically slow network, slower CPU hardware, etc), use network addresses not obviously Google's, and imitate user behavior (plausible input events, scrolling, etc). This is within Google's capabilities but is definitely an undertaking and SEO types could eventually identify their strategies.


>How do they detect that?

Easy, their crawler has a google bot user agent. Then they sample some number of links with a human like user agent, and diff the output, plug the diff into some algorithm to assess the score.


Have you had any luck getting them to take that content down?


It's not really an issue unless they outrank you or rank near you.

In my experience, Google's dupe content detection is pretty good and their penalty is harsh. I once ran a website that tried to curate and touch up old usenet material that never could rank all that well because of dupe content (another website had monospaced dumps of the usenet content).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: