Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Why blocks make Ruby methods slower (omniref.com)
205 points by chippy on Feb 27, 2015 | hide | past | favorite | 48 comments


Wow, that is super interesting!

I think they buried the lede though. (Edit: not complaining! I like tech taught via a discovery narrative like they use here.) Given this:

    def a(&block)
      yield
    end

    def b
      yield
    end
Running `a {1+1}` is 4x slower than running `b {1+1}`. I didn't even know you could yield in a method without an explicit &block parameter. I'll guess I'll change my ways then, although I like having the block declared in the method signature so I know at a glance I can pass one. Too bad!

So I understand from the article that the slowness is because of extra memory allocation etc., but I still don't understand why. I mean, if both methods behave the same, what is the difference? Is it because declaring &block means my code in the method body can now "do stuff" with `block`, whereas in the implicit case I don't have a reference so I can't do any funny business but just yield? That would make sense.


> Is it because declaring &block means my code in the method body can now "do stuff" with `block`, whereas in the implicit case I don't have a reference so I can't do any funny business but just yield? That would make sense.

That's exactly it. With `&block`, you get a reified Proc on which you can do all kinds of stuff[0] and which you can move around. With a `yield` and an implicit block, you can't do any of that, you can just call the block. So the interpreter can bypass all the Proc reification as it knows the developer won't be able to use it, it can keep just a C-level structure instead of having to setup a Ruby-level object.

There's a not-completely-dissimilar situation with `arguments` in javascript: if you're using it, the runtime has to leave a number of optimisations off because you can do much, much more with it than with just formal arguments.

And keep in mind MRI/YARV is a relatively straightforward bytecode interpreter. It'd be interesting to see how Rubinius or JRuby fare: the should be able to compile away the reification if it's not necessary (though there's a trap: they may also have plain cheaper procs which they don't need to optimise away).

[0] http://ruby-doc.org//core-2.2.0/Proc.html has plenty of fun methods, which you can only access via a reified proc


You can create that Proc from implicit block on demand by calling `Proc.new` without a block.


At which point you eat the reification cost.

It's real easy to test too: just bench

  def a flag
    Proc.new if flag
    nil
  end
with the flag set to `true` or `false`. Note that the proc is never actually called or returned to a possible caller, the only difference is that one reifies the proc and the other one not.

You'll get roughly the same 5x difference (with `flag=false` being the fastest) as TFA notes.


Yes. But only when you actually need it (for example to store/pass it).


Note that this was actually the original way to do so. The &block argument format was added later.


Even though it's slower, I avoid the latter form as a matter of style. I think it's super important that if a method yields, it's shown in the method signature. Otherwise, your code won't be self-documenting, and it's going to cause weird and hard-to-debug bugs later on.


> Even though it's slower, I avoid the latter form as a matter of style. I think it's super important that if a method yields, it's shown in the method signature.

Using &block in the signature does not indicate that the method yields, it indicates that the method reifies a Proc. To me, its a sign that the method does something which makes it likely that the passed block will be called outside of the usual lifetime of the called method.

Now, it may be an issue that Ruby has no self-documenting signature element which indicates that a method expects a block for the purpose of yielding without reification, but overloading the signature element which is for reification doesn't really address that (and, destroys one element of self-documentation in order to create another one.)


So, that the function reifies the block is not part of its interface, though, it's part of its implementation. If anything it's a problem that this implementation detail leaks out through the interface.

Furthermore, if you pass it a block with & on the call-side, it's not really reifying a block anyways (even if there is still a performance penalty, which I've never checked), so it's not even true that's what it means in general.

The truth is all functions take a block in ruby. Only some of them do anything with it. Given that, indicating that you take a block with the &block argument seems entirely reasonable to me, if you don't mind the performance hit.


I strongly agree.

Code is read from top to bottom, and finding out that a method uses a block after reading through most of it, is confusing, whereas `&block` in the method signature clarifies that this method will do something with a block.

The performance hit, as bad as it may be, is not something that should be a concern in many programs. After all, you are using Ruby...


Make it faster by replacing the block parameter with a comment:

    - def a(&block)
    + def a #&block
This works when a method uses `yield` and doesn't do more with the block parameter.

Ideally Ruby would have a way to write a method signature to indicate a block is expected and/or or a yield happens, all without triggering the transformation of the block into a first class object.


yielding in a method without an explicit block parameter is actually my favourite design decision in all of ruby. it basically means every method gets a "free" optional block passable to it at call time, which in turn means that the language highly encourages (syntactically) code designs where a method implements some pattern and then yields to the calling code to fill in the details. compare ruby's "File#open" with common lisp's "with-open-file" - it was just the natural thing to do in ruby, versus having to implement something specifically for the open-file case in lisp.


I'm a guy who hasn't written that much ruby, I'm mostly writing python these days but I understand a lot of the concepts have written a bit of ruby and wish that some python things could be more like ruby.

Also I'm a guy who took a Smalltalk class taught by a passionate teacher back when I was getting my degree.

Smalltalk is one of the main parents of ruby. It's less practical then ruby in many ways but its more pure/simpler.

To me ruby's blocks/yield is one of the ugliest of parts of ruby. In Smalltalk just about everything is objects and messages. In Smalltalk Blocks are just special syntax for BlockClosure objects that you evaluate by sending the 'value' method on them and they don't have to be passed at the end, you can pass multiple blocks(if you want ideas of why you might want to do this see some of the c# linq overloads: https://msdn.microsoft.com/en-us/library/system.collections.... (or some thread/future like thing with a run action and an exception callback) although I admit many of them can be done with an extra map function.

Ruby Blocks aren't objects, yield is not a method, it's harder to assign/store closure to variables and pass them around.

I may be wrong but I was under the impression that blocks are the way they are because they are a speed hack, the original ruby being a simple interpreter and blocks allowed the common pattern of defining and passing a closure to the method calling the closure without overhead of allocating/initializing/gc a block closure object.

Other objections ruby doesn't seem to be a language that the number of arguments in the function declaration doesn't bear resemblance to the number of arguments you can pass ala javascript I don't see why blocks should be special, I don't want to glance at the body of the function to see if it expects a block or not.

Also if you want to overload on block_given? you can use a named/default argument at the end defaulting to nil.


> it was just the natural thing to do in ruby, versus having to implement something specifically for the open-file case in lisp.

Uh what?

1. File#open is a special implementation, the canonical opening of a file is File#new

2. with-open-file is just a wrapper macro around unwind-protect/open/close


i meant that it's a natural thing to do in ruby to implement patterns like open/yield/close or decorate/sort/undecorate as methods that yield to a block. with-open-file was probably a bad example, but from experience it does change the feel of the language when you have to explicitly specify a closure argument versus when you just expect that your method should allow for a block to be attached to the method call. i'm not saying that these things aren't possible in other languages, just that ruby's "free" block makes them a culturally default thing to do.


> I mean, if both methods behave the same, what is the difference?

Not much. They just optimized the yield case so its basically skips the red tape.

"As it turns out, MRI has optimized this common case, providing an entirely separate pathway, from parsing through execution, for blockless yields (it's actually a primitive in YARV, the meta-language to which Ruby is compiled at execution time). Because of this special pathway, calls to yield end up being handled by a dedicated C function in the interpreter (vm_invoke_block in vm_inshelper.c, if you're interested), and are much faster than their explicitly blocked bretheren"


Downward funargs are cheap. First class functions are more expensive. Naming a value permits first-class operations; unnamed values are somewhat harder to manipulate.

But in this specific case, the code could go through data flow analysis and the value passed to block determined never to escape the function, so that callers could save creating a proc. But late-bound callers may still pass an explicit proc, so the body of the function would need to work with both forms. It's not a trivial optimization.


If I'm reading this right, I think it is because they implemented it twice and once they did it in a less optimized way. Hopefully this will lead to a speedup.


Perhaps it's more correct to say that one syntax permits an optimized path because the work of creating a full ruby-level Proc object can be omitted. So you'd only be able to optimize the explicit block parameter case further if MRI were able to determine additional cases where this optimization could safely be applied.

As another poster points out, JRuby seems to have a much lower relative overhead (~1.13x, IIRC), so there may or may not be optimization concepts there that can eventually be applied to MRI.


This isn't just a syntactic difference. A Proc can outlive the method that defined it, so the interpreter must lift local variables from the stack to the heap. A block, however, can never outlive the method that defined it, so local variables can remain on the stack. In theory, the interpreter could perform an escape analysis and implicitly convert `block.call` to `yield` if no `block` reference survives local scope. Presumably MRI 2.3 does this analysis.

Python has a similar performance impact for closures. Defining a lambda forces local variables onto the heap, even if the lambda is never instantiated, e.g. when defined inside an infrequently true condition.


It is highly unlikely that this will make a non-trivial performance difference in any real code. Please don't go micro-optimizing your code for this, it's unlikely to be worth any loss in readability or maintainability. And it's platform dependent, may behave entirely differently in JRuby, or in future versions of MRI.


I love those kind of clear explanations! It's probably not life changing, but at least you have the thorough explanation just not a statement telling you that x is faster than y. Props for Omniref and the reddit guys!


This title really should read "493% slower" rather than just "slower."


It was submitted as such, but must have been altered by a moderator.


Thank you, this was great! Both my work and side projects all run on Ruby (except the ones that run on js), and this has been extremely helpful. I actually wrote def(whatever, &block) ... all the time because I was told that it would be clearer that a block was needed, but if just dropping the &block means I can get a 4x speed upgrade... well, time for a bit of refactoring in various critical loop sections.


> if just dropping the &block means I can get a 4x speed upgrade

Well, up to 4x. If you're doing anything interesting or complex, or, god forbid, making a database call, then this might turn into an instance of that famous anecdote where Bill Gates wouldn't even bend over to pick up a thousand dollars laying on the ground.


This can't be emphasized enough. Do almost anything worth doing in that function or in the block itself and the 400% improvement on the 1% of that functions execution time becomes almost certainly meaningless. These kinds of performance improvements have very steep diminishing returns.


I like posts like this from OmniRef exploring the RubyVM internals.

Is there anyway to search through all of these types of source code annotations across the site?

I wanted to learn about method dispatch in Ruby, there was a post earlier on HN about this using OmniRef, I don't know how to retrieve it from the site directly.


We've made a page with all of the tutorials we've done so far: https://www.omniref.com/tutorials

I think this is the method dispatch one you're wanting:

https://www.omniref.com/ruby/2.2.0/files/method.h?#annotatio...


This doesn't really answer your question but you might be interested in this book

http://patshaughnessy.net/ruby-under-a-microscope

http://www.amazon.com/Ruby-Under-Microscope-Illustrated-Inte...


Is this 400-500% true for more expensive operations inside of the block? It seems like he's just comparing the cost of procs to the cost of 1+1. I don't think the generalization has been established here.


You're right, he's comparing the cost for that very simple case. Do anything remotely complex and it quickly becomes just noise.


As with all things, it depends on context. If you're trying to write a graphics or audio system in Ruby, this kind of thing can really matter. If you're writing a Rails app, it's rounding error on the time waiting for I/O.

Profiling is essential when figuring out how to improve your code. What we're doing here is explaining one weird benchmark result.


Except that's not what's being said:

"Why blocks make Ruby methods 439% slower" implies something that isn't happening here, at least to me.


Despite me having very little interest in Ruby this was pretty interesting to read, I would love to see some similar walkthroughs (?) with the Python interpreter source code.


There was a post on a Reddit discussion of this that was quite interesting: http://www.reddit.com/r/ruby/comments/2x9axs/why_blocks_make...

Apparently, this is due to overhead from creating a Proc object. The performance is better with blocks in JRuby, and the same will apply to MRI 2.3 when that gets released.


So, I know very little about the practical implementation of a non-trivial interpreter (read: next to nothing), but... it's difficult for me to understand why this would be the case. What is the reasoning behind the interpreters lack of semantic awareness? Can't it, in the most simple case, see that the block is not being modified or accessed in any way and optimize the reification out?


It seems like it's obvious that it should be able to remove the allocation of the block, however as with everything in Ruby there are edge cases.

What happens if you're halfway through executing this method, you've not allocated the block, and somebody wants to start a debugger? Then you have you go back and somehow recreate the Proc as if it was always there. Same thing if someone uses ObjectSpace to look for all live Procs.


Huh. Is it standard in interpreted language land to avoid optimizations in case for the debugger use case? Seems like that would rule out a ton of potential optimizations...


> Is it standard in interpreted language land to avoid optimizations in case for the debugger use case?

Yes. Otherwise, you'd have to have separate "being debugged" and "not being debugged" runtimes that keep track of different information and do different things, and it's not unlikely that you'd end up with programs doing different things as a result. You can actually access debugger information from inside the program, and it's not that uncommon to do that.


It doesn't rule them out, you just need a more sophisticated implementation, using things like JIT compilation.


Interestingly, there's a comment there that indicates that the cost is much lower on JRuby. Be interesting to see if this is true, and what the cost is like on Rubinius? Possible low-hanging fruit for MRI to improve?


I wonder if it costs less in JRuby or if JRuby's version of yield is less optimized.


It's probably that jruby is just much more likely to inline it either way, at which point whether it was reified or not doesn't matter, unless you call something other than .call on the Proc object.


If someone cares about speed, why would that someone be using Ruby? It feels like optimizing a Ruby program is making "really fast slow code".


The performance profile of an application evolves over time. It can vary not just based on code changes you've made, but also based on how many users you have or how much data they have. This is why libraries should just strive to be fast rather than get hung up on hot paths because hot paths are invariably going to be different for each library user.

Taking that a step further, you've started in Ruby, it was fast enough, now you have a performance issue. Your options are: ignore it; make Ruby faster; use a faster idiom; somehow push this off to a C or Java extension; or rewrite in a new language. Rewrites have a spectacular failure rate. Ignoring it is bad for business. Ruby is already implemented in C and Java, so if you're going to go down the extension path to make a core language feature faster, you may as well just make Ruby faster -- it's virtually the same effort. And using faster idioms is just the lowest cost option.


Perhaps because it's a nice language?

Languages are not the same as implementations, and languages are not fast or slow.


Languages are not fast or slow, but languages do have traits that make them more or less well suited to fast or slow implementations. Ruby's semantics do not make it well suited for a fast implementation.

That said, the question just becomes "why would you use MRI if you care about performance?"




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: