Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Google's CAPTCHA is a cancer of the internet. We're all training their AI without any renumeration. I genuinely hope that some government figures out how to sue them. I've sat down for minutes, repeating CAPTCHAs over and over again, just to log in to an account or download something.


It is not without remuneration. We use CAPTCHA to accessing a service. Imagine the "worst case scenario": you don't use Google services but a paid service has Google's CAPTCHA. It may seem like you are paying for training Google's AIs and you get nothing in return... but:

- Bots cause trouble to the service

- The service has to find some mitigation

- Mitigations cost money, and that cost will be passed down to the customer in some way

- Google offers a free solution, maybe not the best, but it will not cost the service anything (or close)

- So you are training Google's AI in exchange for cheaper service

Of course, you may disagree with it, you may think that a paid service shouldn't use Google's CAPTCHA, that you already pay too much, etc... But you are free to go elsewhere, and the rest is just market considerations, not something governments generally mess with, at least not governments that support free markets.

You could do lawsuits based on specific terms and conditions, but I guess large companies, Google first, have lawyers who know they stuff and get the company covered.


> It is not without remuneration. We use CAPTCHA to accessing a service.

I don't have a sufficient capacity for sarcasm to faithfully reproduce the emotions I feel when reading this.


Let Googles AI solve Googles CAPTCHAs for you!

"Buster: Captcha Solver for Humans" is one of my top 3 browser extensions.


Buster is a godsend and has probably saved me from having to abandon half the internet sites I visit.

Now if only someone could come up with similarly useful extensions that got rid of the other two forms of cancer that are ruining the internet:

1: Cookie consent dialogues*

and

2: Those modal overlays that suddenly cover up what you're reading and ask you to subscribe to some poxy newsletter or mailing list.

*not "I Don't Care About Cookies". It slowed my browser to a crawl when I installed it.


Unfortunately the developers of those particular things (often like ad developers) use obscurity of things like div names to make the task less trivial.


So we have "Buster: Captcha Solver for Humans", "uBlock Origin" and what is the last one?


SponsorBlock?


I had no idea this extension existed, thanks a lot!


That is some ingenious bait for HN readers! Of course everyone is going to passionately defend their favourite.

(Mine is, of course, tree style tabs - I could never manage without it.)


Go on then. I'll play. My 'must have' half dozen which get installed first, on every browser on every device I use:

* Bitwarden

* HTTPS everywhere

* Privacy Badger/Possum/Other Furry Animal

* uBlock Origin

* uMatrix

* Windscribe


> HTTPS everywhere

Not anymore, firefox has it by default now, just gotta turn a setting on.

> uMatrix

Sadly this is dead, I guess I'll switch to a fork once I have time to figure out who does what.


  >Not anymore, firefox has it by default now, just gotta turn a setting on.
I don't use Firefox as my main browser. I use Yandex. I do sometimes wonder if HTTPS everywhere is becoming a bit redundant these days anyway. Most sites seem to have moved to HTTPS these days.

  >Sadly this [uMatrix] is dead
Yes, that's a shame. It still works though. Hopefully whoever takes on the task of keeping it alive via a fork will make it a bit more mobile friendly. [The same could be said about uBlock Origin]


For me, it's Dark Reader. I can't browse the web without it.


NoScript would be up there for me, or else Bitwarden.


"I don't care about cookies"?


uMatrix


TabMixPlus of course.


Lastpass for me


Imagus, Bypass Paywalls, and Tampermonkey.


No, the owner of the site is getting a free service from Google to try and prevent bots from using their site. If you don't like CAPTCHAs then your problem is with the site owner, not google.


Google's CAPTCHA is specifically engineered to gaslight you though. Often it will ask you to identify fire hydrants or bicycles, and despite selecting all of them (and it being easy to recognise them as they are fairly unique objects), it will still give you an error, and give you a new set of images to try again. If it is feeling really malevolent it will even give you a third round. Hell, it happens regularly on the second and especially the third try it will give you excruciatingly slowly crossfading images, and the last image routinely seems to need 2-4 clicks to finally make the object it wants you to identify go away.

Hcaptcha (Cloudflare's alternative) is an absolutely breeze by comparison, and it even allows you to pre-generate a bunch of tokens via an extension (PrivacyPass).


Ok, apart of the fact hackernews just made me fill in a captcha to register this one-time account (neverthelss, cudos HN, you are still better than the rest):

I do not understand Hcaptcha hype here. Since CloudFlare switched to Hcaptcha, as a VPN user, every time I run into a site that has CloudFlare / Hcaptcha, I can forget to pass that test. No matter how many times I try, CloudFlare still shows it all the time. It is so bad, so that now every time I see now Hcaptcha shown, I just close the site without even trying anymore. PrivacyPass :) - the biggest joke I seen on privacy - they that endorse it are either naive or have some other agenda.

These "free" zero-cost captchas for site owners are now in all places where a captca is not really needed. If CloudFlare, supper-dupper DOS solution relies on captchas, or if Google super-dupper search relies on captchas to get results, that demostrated how bad these companties really are at the main thing they do, and only want more data.

Even pirate stream sites ask now for captchas. It is so easy to integrate them, why not.

captchas, followed by 2-factor authentication with phone SMSs that we are forced to accept in all main life services slowly, combined with and laws to register phone SIM cards so they really know who we are, killed the web we know - people are being identified with hight quality.

We all rant about this here. It comes same in waves in HN all the time, but what can you do?

At least, if you own or manage a site or blog do not use: CloudFlare, Google, etc - no CDNs, no shared fonts.


There might be something really particular about your setup that trips up Hcaptcha.

I run Firefox with a stringent profile config, Ublocker in medium mode (3rd party default deny), I don’t care about cookies, Cookiebro, LocalCDN, Canvasblocker, Smart Referrer, on PIA VPN, and I never have issues with Hcaptcha.

As far as Privacy Pass goes: their code is on GitHub and you can verify the checksums. Doesn’t give me a 100% guarantee but it’s good enough for me.


I hate CAPTCHAs as much as the next person, just pointing out the it is a service that Google provides and no site is required to implement it. So if you don't like CAPTCHAs then your beef is with site owners.


The only good thing about hcaptcha - what's better than one page of images? - is it doesn't seem to care what you click, but then neither does Google if you're using Chrome.


Anything you do on google is training their AI for free. Do you think they're offering you free email out of the kindness of their hearts?


Don't they sell ads in the web UI like many other free email providers do?


And they sell the business accounts; because so many people are used to gmail for their personal accounts, quite a few companies are paying for that.


I use gmail but I never see ads because I always use a non-Google mail client. (Thunderbird on Windows and Linux), SimpleEmail on Android.


You're an edge case and a rounding error. Enough people use the web UI and the Gmail app that they likely don't care about other users.


> We're all training their AI without any renumeration.

This. This is not said enough. Everything you do with a Google service is used to train their AI. I don't want to train their AI.


It’s almost like a bad sci fi flick: - how much data do we need to train an AGI? - a googol. - let that be the company name.


I absolutely don't aim to defend Googles captcha here, but the other day I was setting up an Outlook account for my son, and I ran into Arkose Labs bot detection, and I actually got heatedly angry, which is extremely rare for me. I wanted to punch anyone related to that abomination.

Upon researching it seems to be used by the Epic launcher and Roblox (among others), which might explain why I've never encountered it before.

Someone else's screengrab, which looks larger than what I was presented on my laptop: https://imgur.com/a/jF1HxbN

So they:

* are very small (or I'm old)

* use faux 3d walls which further complicates the image

* have to be solved 10 in a row correctly

* have an unspecified time limit (which in my untempered rage felt like maybe 3 seconds per image tops, no promises)

* don't tell you you've failed by answer or time until you're through all 10.

I as a full grown human with ~25-30 years on the internet, as well as video games and puzzles for fun, could not get through it in less than 5 (*10) tries. I accept I might be occasionally slow, but this should not be an issue.

TL:DR; Can someone at Arkose Labs please just do an rm -rf /

Edit: Apparently they have other types as well: https://www.reddit.com/r/CrappyDesign/comments/gkpz0f/how_to...


Adding insult to injury, it seems pretty easy to write a quick image filter + path finding algo to solve these... as apparently all the walls have solid borders, while none of the walkable paths have them. So a targeted bot should have a much easier time solving these than a human.


Absolutely, they've produced something fairly consistent making it easy for bots, yet by design made it harder for humans to see (small, image noise), and solve (10 consecutive, short time limit, no user feedback).

I recognize that by looking at just the screengrab it's an extremely simple concept to solve, it's just that at every implementation turn they made the worst choices, and it just infuriates me.



Wow what the hell is that. Thank god I've never had to see that. It's better to load a broken image and ask to enter the numbers (that has happened to me though).


Oh, that actually reminds me of an entire other dimension of the whole thing. I blame rage induced fugue state.

What I described was just the procedure of 1 "level" of captcha. I had to complete either 2 or 3, the delineation is kind of blurred at this point.

The one I had before the above was audio based, but it failed to load a bunch of times, and failed my answers a few times as well, inexplicably.

It read out not 4-5, but 10-12 numbers, which honestly was manageable, but there was no audio spacing between the numbers that anyone who has to look at their keyboard to type would have to re-listen to it a few times to keep up. This one would also be entirely solvable by a bot, but problematic for a significant portion of humans.

I just don't understand how they make money, nor why Microsoft specifically would pay them for their services. I find the LEAST outrageous explanation that they're bribing someone in Microsoft's COTS purchasing.


As other comments said, I'd just drop it and go somewhere else. If the website believes they're Fort Knox, let them have the same amount of traffic.


They provide a service to the website owner in exchange


But Google CAPTCHA has been asking the same questions now pretty much since its inception. Are we really still training it? Or is it just running on auto-pilot at this point? I'm guessing it's likely the latter.


I'm old enough to remember when reCAPTCHA was first introduced to help with deciphering text from OCR'd books. At that time, it didn't feel so bad answering those, as we were using our intelligence to genuinely help preserve our cultural history.

It then switched to being numbers on buildings and street signs, and it immediately felt worse - we were now doing a job for Google, and an annoying one. Mechanical Turk from Amazon was invented to do this kind of chore.

It's now creating training datasets for whatever else Google wants them to - it appears to mostly be for self-driving cars now, to identify landmarks and road signs.

It's definitely not on autopilot, and it's definitely a real problem.


There's no difference between helping google solve book scanning vs other datasets for whatever else they decide.

The fact that captcha exists on a website is the website owner's intension to cause friction for their users. It has nothing to do with google's use of reCapture directly. The only vote you have is to not use said website - sometimes harder said than done but that's the only option you have.


>we were now doing a job for Google

And in return you get an internet that is not filled with spam and bots. Or at least less filled with bots.


Bots that we've trained. And there's also an xkcd for this: https://xkcd.com/2228/.


Every time I get a captcha with traffic lights I imagine a Google self driving car stopped at an intersection waiting for me to complete the captcha so it can figure out what it sees and move along :)


Naturally, there's an xkcd for this: https://xkcd.com/1897/


I used to have fun filling in those reCAPTCHAs with incorrect answers. They would show you two words, and it was always obvious which word was the actual CAPTCHA test and which was the OCR input, because the former would be warped into a funny shape while the latter was always a rectangular block of normal text. So I'd type in the correct answer for the warped word, and something like "fuckface" or "cocksucker" for the regular text, and it would be accepted.

I like to think that somewhere out there on Google Books, my efforts have resulted in an innocuous word being replaced with something offensive.


Before I discovered the Buster plugin, I used to try to be as unhelpful as possible in my forced Google AI training sessions.

I'd use the audio option in the reCraptcha and then see how little of what I heard I could get away with actually entering into the form. Often it sufficed to just type one word out of the entire audio clip or even enter a word which sounded similar [eg. audio clip says "tranced", I write "transit"]. The most satisfying ones of all where when, after listening to a complete sentence containing either word, I could pass the reCraptcha by simply typing "the" or "an" into the form.

One of the things that I find slightly disappointing about Buster is that it types in the entire sentence, when solving the reCraptcha. I hate to think Google are under the impression I've suddenly started trying harder!


Probably not, as statistically your answers are filtered out as noise. Remember, hundreds of people were getting the same images.


A literal oldie but goodie: https://wraabe.wordpress.com/2009/03/07/an-ocr-cliche-into-h...

Granted this is mostly a kerning issue.

Without wider textual context the correct answer is hard to determine.


What a sad way to indulge the urge for petty vandalism. At least graffiti carries some risk and sex appeal.


>>I'm old enough to remember when reCAPTCHA was first introduced to help with deciphering text from OCR'd books

Am I the only one who always, on purpose, put in the wrong answer for the clearly scanned word? For me it was kind of a rebelion for being used this way, but it was always super easy to tell which word is generated and which one is scanned - and the algorithm only required the generated word to be correct, so I always put in absolute nonsense for the scanned word to break their OCR detection.


They do not ask the same questions since the beginning. At first, CAPCTHA has been used to make old books digitally accessible, which meant that you solved those potato quality scans of old books. When bots became advanced enough, reCAPTCHA was used to improve Google Maps by reading house numbers on poor quality cropped street view images.

Then the zeroCAPTCHA became a thing, which turned out to be even more annoying than others if you slightly cared about your privacy (meaning you at least installed an abblocken) and faced those multi-stages challenges. By looking at those challenges I think it is pretty obvious what CAPTCHA is used for nowadays, for training a recognition model for a self driving vehicle.


I've been picking out pictures of fire hydrants and stoplights for five years now with no end in sight.

Sometimes literally. They'll show me series of fire hydrants that don't end after several minutes, and I just give up.


Yup, exactly. My thoughts on CAPTCHAs expressed well by https://xkcd.com/1897/ https://xkcd.com/2228/.


This is really really weird from you comparing it to cancer of the internet.

Catpcha was not invented to make the internet worse and solving one is not a real hurdle to you.

I'm not sure why you 'sat down for minutes' to solve any of them, you might just be really bad in comparison to a lot of other peoples.

Nonetheless, this helps YOU it makes YOUR experience better by making it harder for bots.

Have you ever seen a community killed by spam bots? You are probably quite happy that the amount of spam in comments etc. are as low as it is.


It would be interesting to see why someone would downvote it.

Since when are captchas a bigger issue than spam bots and fraud users?

and yes i also think that those tasks are actually helping our society. Adding models for self driving cars, for security/emergency breaking systems etc. So whats the issue?


I did not downvote you, but I assume it's because you're wrong. Depending on your browser settings, addons, etc. CAPTCHAs may be nearly impossible to solve. That is to say, you ARE solving them correctly, but Google thinks otherwise, and punishes you with ever more pictures that fade in and out ever more slowly. So the fact that they're easy to solve or work well for you means nothing. Other people are having real issues with it, and not because they are unable to solve them.


They could easily make that point by commenting right?


> It would be interesting to see why someone would downvote it.

If I had to guess:

> "solving one is not a real hurdle to you"

My experience is that it can be a real hurdle.

Sometimes reCaptcha forces me to complete 2 or 3 captchas (or more), some of them with an annoying artificial delay when loading new images. It seems to be worse when using a VPN or a browser like Firefox (or some privacy extension) that blocks Google tracking.

I understand why people use reCaptcha and that it works well for some users, but it can be a pain in the butt if you don't use Chrome, aren't logged in to your Google account, or if other devices in the network did something that Google doesn't like. I sometimes get captchas on my phone when using mobile data (on a major provider here in the UK) and Google search because of "unusual traffic from your computer network"...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: