Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You implicitly give a license to crawlers if you don’t take action to block them via robots.txt or otherwise block them via your server. If you do either of these, google will respect the site’s decision and you probably could take them to court if they tried to evade blockers that block google bot (but since google always respects robots.txt and never craws from a different ASN or different user agent, even for safe browsing crawls, they’re fine).

So if Instagram wants to block google from downloading their videos, they can

  Disallow: /video/
(Or however their url scheme works)


Pretty sure crawling and scraping is legal even if there's a robots.txt.


As long as it’s public. If you need to bypass auth or similar, that’ll get you in trouble.

https://www.eff.org/deeplinks/2019/09/victory-ruling-hiq-v-l...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: