Hacker Newsnew | past | comments | ask | show | jobs | submit | Micaiah_Chang's commentslogin

Yes, the point of the GP comment is exactly this, if Bentham becomes an agent that goes for C, he also explicitly discourages the mugger from being an agent that would cut off their fingers for a couple of bucks.

Notice that what Bentham is altering is their strategy and not their utility. If they could spend 10 dollars to treat gangrene and save the fingers, they would do it. It's not clear many other morality systems would be as insistent on this as utilitarianism, because practitioners of other moralities curiously form epicycles defending why the status quo is fine anyway, how dare you imply I'm worse at morality.

Edit: Slight wording change for clarity


> if Bentham becomes an agent that goes for C, he also explicitly discourages the mugger

How is this different from saying that if Bentham decides to not adhere to utilitarianism, he is no longer vulnerable to such a mugging? If Bentham always responds C, even when actually confronted with such a scenario (the mugger was not deterred by Bentham's claim), then Bentham is not a utilitarianist.

In other words, the GP is saying: "if Bentham doesn't always maximize the good, he is no longer subject to an agent who can abuse people who always maximize the good." But that is exactly the point -- that utilitarianism is uniquely vulnerable in this manner.


My wording is wrong, because it sounds like I'm saying that Bentham is adopting the policy ad hoc. A better way to state this is that Bentham starts out as an agent that does not give into brinksmanship type games, because a world where brinksmanship type games exist is a substantially worse world than ones where they don't (because net-negative situations will end up happening, it takes effort to set up brinksmanship and good actions do not benefit more from brinksmanship). It's different because by adopting C, Bentham prevents the mugger from mugging, which is a better world than one where the mugger goes on mugging. I don't see any contradiction in utilitarianism here.

If the world where the thought experiment is not true and "mugging" is net positive, calling it mugging then is disingenuous, that's just more optimally allocating resources and is more equivalent to the conversation "hi bentham i have a cool plan for 10 dollars let me tell you what it is" "okay i have heard your plan and i think it's a good idea here's 10 bucks"

Except that you are putting the words "mugging" and implying violence so that people view the interaction as more absurd than it actually is.


> It's different because by adopting C, Bentham prevents the mugger from mugging, which is a better world than one where the mugger goes on mugging.

This assumption is wrong. You are assuming that the mugger is also a utilitiarian, so will do cost-benefit analysis, and thus decide not to mug. But that is not necessarily true.

If the mugger mugs anyway, despite mugging being "suboptimal," Bentham ends up in a situation where he has exactly the same choice: either lose $10, or have the mugger cut off their own finger. If Bentham is to follow (act-)utilitarianism precisely, he must pay the mugger $10. (Act-)utilitarianism says that the only thing that matters is the utility of the outcome of your action. It does not matter that Bentham previously committed to not paying the mugger; the fact is, after the mugger "threatens" Bentham, if Bentham does not pay the mugger, total utility is less than if he does pay. So Bentham must break his promise, despite "committing" not to. (Assuming this is some one-off instance and not some kind of iterated game; iteration makes things more complicated.)

(In fact, this specific objection -- that utilitarianism requires people to "give up" their commitments -- is at the foundation of another critique of utilitarianism by Williams: https://123philosophy.files.wordpress.com/2018/12/bernard-wi...)

If everyone were a utilitarian, then there would be far fewer objections to utilitarianism. (E.g. instead asking people in wealthy countries to donate 90% of their income to charity, we could probably get away with ~5-10%.) Bentham's mugging is a specific objection to utilitarianism that shows how utilitarians are vulnerable to manipulation by people who do not subscribe to utilitarianism.

Also, to be precise, Bentham's mugging does not show a contradiction. It's showing an unintuitive consequence of utilitarianism. That's not the same thing as a contradiction. (If you want to see a contradiction, Stocker has a different critique: https://www.jstor.org/stable/2025782.)


Except that eventually the mugger will run out of fingers (and/or reattaching them will eventually start not working out), so the mugger will be forced to stop mugging. Well, ok, they could start cutting off toes, or threaten some other form of bodily mutilation.

But regardless, giving in to the mugger enables the mugger to continue mugging indefinitely. Not giving in -- assuming the mugger goes through with whatever self-mutilation they've threatened -- will eventually cause the mugger to stop mugging. This would be a net positive, better than allowing the mugger to mug indefinitely.

On top of that, I was disappointed that, in the story, Bentham does actually bring up the idea that capitulating could encourage copycats to run similar schemes, but this rationale for not cooperating is hand-waved away. This is pretty standard "don't negotiate with terrorists" stuff. Giving in just tells the mugger -- and other potential muggers -- that this strategy works. Surely it's more utilitarian to stamp this out at the source, even if it costs the original mugger some fingers.

(But I guess this is in part the point of Bentham being an act utilitarian in the first encounter, as he wouldn't consider the larger implications of his actions, just on the specific, immediate result of the action in front of him.)


> practitioners of other moralities curiously form epicycles defending why the status quo is fine anyway

This is exactly what the Bentham in the story is doing!


Do we want to talk about a hypothetical world where deontology was the underlying moral principle? Where, for example, a large agency in charge of approving vaccines decided to delay approval of a life saving because, even though they received the information on November 20th, they scheduled the meeting for December 10-12th dammit, and that's when it'll be done? By potentially delaying several months because, instead of using challenge trials to directly assess the safety of a vaccine by exposing willing volunteers to both the supposed cure and disease, instead gave the cure to a couple of tens of thousands of people, and just waited until enough of them got sick and died to a disease "that would have got them anyway" to gather enough statistics for safety? Which is definitely good, you see, because no one got directly harmed by said agency, even if many more people in the country were dying of this theoretical disease. [0]

Or, even better, what if distribution of this life saving cure was done based on the deontological concept of fairness? Surely, this wouldn't result in limited and highly demanded vaccines being literally thrown away[1] in the name of equity and where vaccination companies wouldn't need to seek approval for something as simple as increasing doses of vaccines in vials. [2]

You know, just all theoretically, since it would be a terrible shame if any of these things happened in the real world, since this is just one specific scenario and I'm sure I can make up various [3] other [4] ways [5] in which not carefully evaluating the consequences of moral actions would turn out poorly, but hey!

I'm sure glad that utilitarianism isn't being entertained more on the margin, since we already live in the best of all possible moral universes.

(Footnote, I'm not going to justify these citations within this post, because it's pithier this way. I recognize this is not being fully honest and transparent, but I'd be happy to fully defend the inclusion of any these, if necessary)

[0] https://www.cdc.gov/mmwr/volumes/70/wr/mm7014e1.htm

[1] https://worksinprogress.co/issue/the-story-of-vaccinateca ctrl f "On being legally forbidden to administer lifesaving healthcare"

[2] https://www.businessinsider.com/moderna-asks-fda-approve-mor...

[3] https://news.climate.columbia.edu/2010/07/01/the-playpump-wh...

[4] https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2641547

[5] https://papers.ssrn.com/sol3/papers.cfm?abstract_id=983649


I've been studying the language for a while, and recently made the switch to Japanese-Japanese dictionaries, after using EDICT for a long time.

This has highlighted some reservations I have about it.

The most available example (not the best) is 適当, where it can be interchangeably be used to mean "adequate" and "half-assed", sort of sarcastically. The definition being a mostly undifferentiated bag of words, without necessarily regard for nuance or typical use cases.

Contrast this with the goo dictionary, which has a slightly better structure (http://dictionary.goo.ne.jp/leaf/jn2/151064/m0u/%E9%81%A9%E5...) and the 類語 dictionary, that gives synonyms and situations where you would use one over another (http://dictionary.goo.ne.jp/leaf/thsrs/2512/m0u/).

I understand that there's probably no way to deal with this in a scaleable way that would be as easy to turn it into flash cards but it's kinda sad to see the gap between the two solutions.

Apologies for not having a more constructive suggestion.


I've been relying on japanese-japanese dictionaries for a while, and they work best IMHO once you're past a certain level. Also, most good non-jp-jp resources are in english, and, after all, english is not my native language either, so I might as well deal with everything in japanese. I just wish I could find an app that does everything in japanese, but I never found one. Tried a few, but there are so many that it's hard to find something decent. The result is that I haven't actually actively studied japanese for a while, and rely on "passive" learning from talking, reading and watching TV (I live in Japan, that helps). I'm actually sufficiently annoyed that I'm not retaining as much as I would like that I'm considering writing a web app to handle my own learning.

Edit: by the way, I once stumbled upon an online dictionary that also showed the "standard" intonation for words (something that I've very rarely seen mentioned), but I can't find it again :(


Yeah, it's definitely a weakness in Nihongo. I'd love to have a really well-integrated Japanese-Japanese dictionary, but I've never been able to find a J-J open-source database like JMDict/EDict. My solution for now is that you can tap on the menu button in the top-right of any entry, and open it in the built-in iPhone dictionary, which includes a J-J dictionary. But that's not a very satisfying answer.


Maybe being shocked means that the person talking about the subject is misrepresenting it, because they themselves don't understand the arguments and are inadvertently projecting.

For example, Ray Kurzweil would disagree about the dangers of AI (He believes more in the 'natural exponential arc' of technological progress more than the idea of recursively self improving singletons), yet because he's weird and easy to make fun of he's painted with the same stroke as Elon saying "AI AM THE DEMONS".

If you want to laugh at people with crazy beliefs, then go ahead; but if not the best popular account of why Elon Musk believes that superintelligent AI is a problem comes from Nick Bostrom's SuperIntelligence: http://smile.amazon.com/Superintelligence-Dangers-Strategies...

(Note I haven't read it, although I am familiar with the arguments and some acquaintances tend to rate it highly)


But then that's precisely the point: Bostrom is a philosopher. He's not an engineer, who builds things for a living, or a researcher, whose breadth at least is somewhat constrained by the necessity to have some kind of consistent relationship to reality. Bostrom's job is basically to sit and be imaginative all day; to a good first approximation he is a well-compensated and respected niche science fiction author with a somewhat unconventional approach to world-building.

Now, don't get me wrong -- I like sf as well as the next nerd who grew up on that and microcomputers. But it shouldn't be mistaken for a roadmap of the future.


I'm not sure it should be mistaken for philosophy either.

Bostrom's doesn't understand the research, he doesn't understand the current or likely future of the technology, and he doesn't really seem to understand computers.

What's left is medieval magical thinking - if we keep doing these spells, we might summon a bad demon.

As a realistic risk assessment, it's comically irrelevant. There are certainly plenty of risks around technology, and even around AI. But all Bostrom has done is suggest We Should Be Very Worried because It Might Go Horribly Wrong.

Also, paperclips.

This isn't very interesting as a thoughtful assessment of the future of AI - although I suppose if you're peddling a medieval world view, you may as well include a few visions of the apocalypse.

I think it's fascinating on a meta-level as an example of the kinds of stories people tell themselves about technology. Arguably - and unintentionally - it says a lot more about how we feel about technology today than about what's going to happen fifty or a hundred years from now.

The giveaway is the framing. In Bostrom's world you have a mad machine blindly consuming everything and everyone for trivial ends.

That certainly sounds like something familiar - but it's not AI.


> Bostrom is a philosopher

That is my main concern about people writing about the future in general. You start with a definition of a "Super Intelligent Agent" and draw conclusions based on that definition. No consideration is (or can be) placed on what limitations AI will have in reality. All they consider is that it must be effectively omnipotent, omnipresent and omniscient, or it wouldn't be a super intelligence, and thus not fall into the topic of discussion.

which right now is (and imo will continue to be) that you need a ton of training examples generated by some preexisting intelligence.


I agree with a lesser form (timescale on which the AI self improves is probably going to be long enough for humans to deal with it) of your first argument, but I'm confused at how, given 'merely' lots of compute capacity can be countered by 'a number of ideas for this myself'.

Security for human level threats is already very poor, and we already have a relatively good threat model. If you suppose that an AI could have radically different incentives and vectors than than a human, it seems implausible that you could be secure, even in practice. I suppose you could say that these would be implemented in time, but it's not at all clear to me that a humanity which has trouble coordinating to stop global warming or unilateral nuclear disarmament would recognize this in time.

On the other hand, I'm slightly puzzled by why you think there's a huge unjustified leap between lack of value alignment and threat to the human race. Does most of your objection lie in 1) the lack of threat any given superintelligent AI would pose, because they're not going to be that much smarter than humans or 2) the lack of worry that they'll do anything too harmful to humans, because they'll do something relatively harmless, like trade with us or go into outer space?

For 1, I buy that it'd be a lot smarter than humans, because even if it initially starts out as humanlike, it can copy itself and be productive for longer periods of time (imagine what you could do if you didn't have to sleep, or could eat more to avoid sleeping). And we know that any "superintelligent" machine can be at least as efficient as the smartest humans alive. I would still not want to be in a war against a nation full of von Neumanns on weapons development, Muhammads on foreign policy and Napoleons on Military strategy.

For 2... I would need to hear specifics on how their morality would be close enough to ours to be harmless. But judging by your posts cross thread this doesn't seem to be the main point.

By the way, I must thank you for your measured tone and willingness to engage on this issue. You seem to be familiar with some of the arguments, and perhaps just give different weights to them than I do. I've seen many gut level dismissals and I'm very happy to see that you're laying out your reasoning process.


Have you ever read The Checklist Manifesto[0]? I may be reading too much into this post, but the lessons you learned from this interview process have frighteningly close parallels to the lessons in the books. I doubt the book had any influence on your interview process, seeing as it was published after the interviews were formalized, but the book seems like it might have new lessons.

For example, a good portion of doctors absolutely hated using checklists. Yet, when pressed, readily admitted that it prevents simple mistakes and that they would prefer to have them rather than not to. Another is that entries that address more human concerns, e.g. "Have everyone introduce themselves", have a place on good checklists.

[0] http://www.amazon.com/Checklist-Manifesto-How-Things-Right/d...


Extremely true. This is about systematizing the right things about interviewing, and making them solid. Checklist Manifesto is all about that, a great simple way to create systems that work.

At a high level, this is all about Deming ( http://en.wikipedia.org/wiki/W._Edwards_Deming ) and TQM concepts -- if you want to achieve a high-quality output, measure the things that matter, and understand the variation present in the system. Once you have a stable system with good data achieved by good methods, you may then begin improving it. Attempting to improve a complex system without knowledge results in unpredictable changes—we call that tampering. Simple but beautiful.

So, in essence, this is an extremely natural and correct application of quality management principles to the hiring process. Stellar.



You seem to be implying that intentional slowing is MIRI's official stance, without any showing support. Given that the original response was in response to the specific accusation of "no AI experts say that this is a problem", I think you are reading too much into this. I agree that it is slightly disingenuous for lukeprog to have posted that list, but I feel your disagreement is far too uncharitable and motivated to be productive.

Disclaimer: I have read a bunch of the LessWrong "canon" and believe many of their points, sans perhaps the timescale on which recursive self improvement can happen. I think most of my acceptance comes from the relatively poor quality of their critics, who seem to attribute many strawmanish positions to them or who seem more concerned with namecalling-via-cult.

I cannot help but think that if a better criticism exists, why hasn't anyone said it yet?


I cannot help but think that if a better criticism exists, why hasn't anyone said it yet?

The whole lesswrong/MIRI thing is built around the strawman of unfriendly AI, as though it were already real. What you are asking is, why aren't there better arguments against strawmen and radical pontification?

It's like if I said, "There is a chance that mean aliens will come soon, therefore we need to start building defense systems." Ok, show me any proof that there are aliens coming or even an avenue for aliens to come here and be unfriendly.

Sure, it's a possibility that there are mean aliens who are going to attack us, but there is absolutely nothing to think that is a thing that is going to happen soon.

Granted, this is not a perfect analogy but I think it makes my point well. I am uncharitable because it's charlatanism and seems to be gaining traction - in the same vein as antivaxxers.


Even antivaxxers have people more patient and more understanding of their opponents actually debunking their object level thinking. They say that vaccines cause autism? We say that the original study was wrong! Why do you get to be even less charitable than "pro"-vaxxers?

Consider that the general position seems to be "we don't know when SMI would happen, but if we use the metric that our critics say is more reliable than ours we see that a survey of many AI experts we can find seem to say that human level AI is possible in about 30-50 years at 50% probability." (numbers are quoted from memory and most likely incorrect)

Yet here you are saying that MIRI/LW claims "it's a thing that's going to happen soon", that there is "absolutely nothing to think that is a thing". I can't help but think most of the "charlatanism" you see is manufactured in your own head and not a product of reading and understanding your opponent's position.

Yes, this is an unabashed ad hominem, but I wish people would attack arguments that exist rather than arguments that are easy to knock down. It's upsetting to me that, when I say that your arguments are lacking, your response is "So what? The Enemy is Evil and Stupid and I do not need to understand them."


I think you're building a bit of a strawman here yourself. MIRI's main argument is pretty simple: building a beneficial AGI is a difficult problem with more constraints than just building an AGI, so we should start working on that problem now, rather than waiting until AGI is almost here.


This seems as if it is conflating the worry that nuclear bombs would ignite the atmosphere with supersonic travel.

Who predicted this?


I'm afraid of making a middlebrow dismissal but I'm going to post it anyway, in hopes that someone just skimming would not be mislead.

The question is what Michael Jordan thinks of the "concept of the singularity", and then he dismisses it out of hand.

Crucially, he does this after confessing that no one in his social circle has talked about this issue with him, and without saying anything about what form of Singularity he is dismissing.

I mention this, because oftentimes I see people appealing to authority, quoting them on the issue and the authority in question is not even talking about the same issue!

I worry that my credence in all this superintelligence stuff only stems from familiarity with the arguments and the complete inability of people to engage with the actual argument. Some of the 'rebuttals' in this comments section have answers in Sam's article for crying out loud!


Er, sorry for giving the impression that it'd be a supervillain. My intention was to indicate that it'd be a weird intelligence, and that by default weird intelligences don't do what humans want. There are some other examples which I could have given to clarify (e.g. telling it to "make everyone happy" could just result in it giving everyone heroine forever, telling it to preserve people's smiles could result in it fixing everyone's face into a paralyzed smile. The reason it does those things isn't because it's evil, but because it's the quickest+simplest way of doing it; it doesn't have the full values that a human has)

But for the "off" switch question specifically, a superintelligence could also have "persuasion" and "salesmanship" as an ability. It could start saying things like "wait no, that's actually Russia that's creating that massive botnet, you should do something about them", or "you know that cancer cure you've been looking for for your child? I may be a cat picture AI but if I had access to the internet I would be able to find a solution in a month instead of a year and save her".

At least from my naive perspective, once it has access to the internet it gains the ability to become highly decentralized, in which case the "off" switch becomes much more difficult to hit.


So like it's clear to me why you wouldn't want to take a system based on AI-like technology and have it control air traffic or missile response.

But it doesn't take a deep appreciation for the dangers of artificial intelligence to see that. You can just understand the concept of a software bug to know why you want humans in the observe/decide/act loop of critical systems.

So there must be more to it than that, right? It can't just be "be careful about AI, you don't want it controlling all the airplanes at once".


The "more to it" is "if the AI is much faster at thinking than humans, then even humans in the observe/decide/act are not secure". AI systems having bugs also imply that protections placed on AI systems would also have bugs.

The fear is that maybe there's no such thing as a "superintelligence proof" system, when the human component is no longer secure.

Note that I don't completely buy into the threat of superintelligence either, but on a different issue. I do believe that it is a problem worthy of consideration, but I think recursive self-improvement is more likely to be on manageable time scales, or at least on time scales slow enough that we can begin substantially ramping up worries about it before it's likely.

Edit: Ah! I see your point about circularity now.

Most of the vectors of attack I've been naming are the more obvious ones. But the fear is that, for a superintelligent being perhaps anything is a vector. Perhaps it can manufacture nanobots independent of a biolab (do we somehow have universal surveillance of every possible place that has proteins?), perhaps it uses mundane household tools to macguyver up a robot army (do we ban all household tools?). Yes, in some sense it's an argument from ignorance, but I find it implausible that every attack vector has been covered.

Also, there are two separate points I want to make, first of all, there's going to be a difference between 'secure enough to defend against human attacks' and 'secure enough to defend against superintelligent attacks'. You are right in that the former is important, but it's not so clear to me that the latter is achievable, or that it wouldn't be cheaper to investigate AI safety rather than upgrade everything from human secure to super AI secure.


First: what do you mean 'upgrade everything from human secure'? I think if we've learnt anything recently it's that basically nothing is currently even human secure, let alone superintelligent AI secure.

Second: most doomsday scenarios around superintelligent AI are, I suspect, promulgated by software guys (or philosophers, who are more mindware guys). It assumes the hardware layer is easy for the AI to interface with. Manufacturing nanites, bioengineering pathogens, or whatever other WMD you want to imagine the AI deciding to create, would require raw materials, capital infrastructure, energy. These are not things software can just magic up, they have to come from somewhere. They are constrained by the laws of physics. It's not like half an hour after you create superintelligent AI, suddenly you're up to your neck in gray goo.

Third: any superintelligent AI, the moment it begins to reflect upon itself and attempt to investigate how it itself works, is going to cause itself to buffer overrun or smash its own stack and crash. This is the main reason why we should continue to build critical software using memory unsafe languages like C.


By 'upgrade everything from human secure' I meant that some targets aren't necessarily appealing to human targets but would be for AI targets. For example, for the vast majority of people, it's not worthwhile to hack medical devices or refrigerators, there's just no money or advantage in it. But for an AI who could be throttled by computational speed or wishes people harm, they would be an appealing target. There just isn't any incentive for those things to be secured at all unless everyone takes this threat seriously.

I don't understand how you arrived at point 3. Are you claiming that somehow memory safety is impossible, even for human level actors? Or that the AI somehow can't reason about memory safety? Or that it's impossible to have self reflection in C? All of these seem like supremely uncharitable interpretations. Help me out here.

Even ignoring that, there's nothing preventing the AI from creating another AI with the same/similar goals and abdicating to its decisions.


My point 3 was, somewhat snarkily, that AI will be built by humans on a foundation of crappy software, riddled with bugs, and that therefore it would very likely wind up crashing itself.

I am not a techno-optimist.


Didn't you see Transcendence? The AI is going to invent all sorts of zero days and exploit those critical systems to wrest control from the humans. And then come the nanites.


What if the AI was integral to the design and manufacturing processes of all the airplanes, which is a much more likely path?

Then you can see how it gains 'control', in the senses that control matters anyway, without us necessarily even realizing it, or objecting if we do.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: