In the late 1990s and early 200xs, there were a lot of claims that there was no ...

steveklabnik · on Dec 11, 2023

I think this is really complicated. I do believe that it is important to separate language from implementation, and it is also true that different implementations can have different performance profiles. It is also true that the semantics of the language can effectively require specific implementation details that can affect performance either way. So there's always gonna be bounds to any particular languages' ability to be faster.

That is why the "as fast as C with the sufficiently smart compiler" never truly came to pass in a general sense, even if many languages that were slow to start have gotten way faster with better implementations.

jerf · on Dec 12, 2023

It certainly is complicated.

And I'd love to peer in to some alternate universe that has an optimal Python interpreter (can't say "the" optimal because it's really a complicated frontier rather than a single point) that runs on our hardware and see what it looks like. Maybe even on multiple points on that frontier.

What really is the limit? I struggle to imagine what a C-speed Python interpreter could even look like, but is there some conceivable program that runs it at, say, half the speed? What even is the limit? What techniques would such a program use that would surprise us and be new and perhaps useful other places?

Or are actually pretty close to that frontier now?

To be honest, given the way optimizations tend to work, the answer probably is that we are relatively close today. They tend to have diminishing returns.

But I don't know. Is there some execution model nobody's thought of, or that has been thought of but simply hasn't had the effort invested to make it pay off, that would make huge gains? I can't prove there isn't. (In fact it's not far off junior-level computer science to prove that you can't prove it.)

steveklabnik · on Dec 12, 2023

Yep, that'd be a lot of fun. I think back to the stuff that used to go on to optimize lua: http://lua-users.org/lists/lua-l/2011-02/msg00742.html

I also suspect that we are relatively close, and there are diminishing returns. I suspect that you would have to start the language design with this goal in mind, and then balance out performance concerns with certain features.

lifthrasiir · on Dec 12, 2023

> In the late 1990s and early 200xs, there were a lot of claims that there was no such thing as a "slow language", that all languages can be run as quickly as C if you just built a sufficiently smart compiler and/or runtime.

I think "slow language" never meant that way. It was more like a counterpoint to the claim that there are inherent classes of languages in terms of performance, so that some language is (say) 100x or 1000x slower than others in any circumstances. This is not true, even for Python. Most languages with enough optimization works can be made performant enough that it's no slower than 10x C. But once you've got to that point, it can take disproportionally more works to optimize further depending on specific designs.

> If I were designing a language to be slow, but not like stupidly slow just to qualify as an esolang, but where the slowness still contributed to things I could call "features" with a straight face, it would be hard to beat Python.

Ironically, Python's relative slowness came from its uniform design which is generally a good thing. This is distinct from a TCL-style "everything is a string" you've said, because the uniform design had a good intent by its own.

If you have used Python long enough you may know that Python originally had two types of classes---there was a transition period where you had to write `class Foo(object):` to get the newer version. Python wanted to remove a blur between builtin objects and user objects and eventually did so. But no one at that time knew that the blur is actually a good thing for optimization. Python tried to be a good language and is suffering today as a result.

jerf · on Dec 12, 2023

"If you have used Python long enough you may know that Python originally had two types of classes"

Yes, because that's also the era where this claim was flying around.

I'd say that each individual person may have their own read on what the claim meant, but certainly the way it was deployed at anyone who vaguely complained that Python was kind of slow shows that plenty of people in practice read it as I've described... that if we just wait long enough and put enough work into it, there would be no performance difference between Python and C. In 2023 we can look back with the perspective that this seems to be obviously false, so they couldn't possible have meant that, but they didn't have that perspective, and so yes they can have meant that. "Sufficiently smart compiler" was just starting to be a derogatory term. I also remember and lightly participated in the c2.com discussions on that, which may also contribute to my point that, yes, there definitely were people who truly believed "sufficiently smart compilers" could exist and were just a matter of time.

As for proportions, it's impossible to tell. Internet discussions (yea verily including this very one) in general are difficult to ascertain that from because almost by definition only outliers are participating in the discussion at all. Obviously by bulk of programmers, most programmers had simply never considered the question at all.

lifthrasiir · on Dec 13, 2023

Yeah, I agree everyone may have different anecdotes, for my case though I heard more of the "Python as glue code" arguments and never heard that Python proper can be as fast as C. I've used Python since 2.3, so maybe that argument was more prevalent before?

ngcc_hk · on Dec 14, 2023

Same here. Never heard of python can be fast. It is not what it is originally for.

igouy · on Dec 12, 2023

> Most languages with enough optimization works can be made performant enough that it's no slower than 10x C.

Maybe if we take the slowest C program, CPython no slower than 50x ?

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

lifthrasiir · on Dec 13, 2023

CPython had many other concerns besides from performance, and PyPy is much closer to my quote I believe.

igouy · on Dec 13, 2023

fwiw "How many lines of C it takes to execute a + b in Python" was explicitly about CPython.

fwiw PyPy doesn't seem to have been released in "the late 1990s".

slaymaker1907 · on Dec 11, 2023

You could add in continuation support as well as unbounded stack size to make it even more difficult to implement efficiently. There are tricks these days for implementing continuations somewhat efficiently (and by somewhat, I mean that they're only 3-5x slower than explicit continuation passing with lambdas), but these tricks largely don't work in WASM without doing something extreme like completely ignoring the WASM stack and storing return addresses/current continuation on the WASM heap.

Unbounded stack size is similarly difficult for WASM because like before, you have to be very careful about using the WASM stack.

Even with C++, you basically need to drop down to intrinsics or assembly to make full use of SIMD.

andai · on Dec 11, 2023

Can someone explain what exactly it is about Python's design that makes it slow?

What changes would have to be made to speed it up? Obviously changing its core design now would break things, but my question is, can we can imagine an alternate universe Python that's as close as possible to our Python, except really fast? What would be different?

AlotOfReading · on Dec 11, 2023

It's the whole design. Every object creates allocation/gc overhead, bytecode dispatch is a major bottleneck, attribute lookup is expensive, objects are expensive, namespaces are expensive, etc.

You can change things internally (e.g. optimizing opcode parsing), fixed object layouts, restricting mutability, converting everything to predictable array accesses, but you'll likely just end up with something like Lua or Wren rather than Python, and people like Python specifically because of the ecosystem that's built up around that dynamicism over the years.

whizzter · on Dec 13, 2023

Iirc the slowness of CPython in all the above mentioned are artifacts of the implementation rather than the language itself.

The huge issue is that a big selling point for Python was the easy C-api integration providing lots of useful functionality via libraries now works as a chain that limits how many changes can be made (see any GIL-removal discussion).

The most sane way forward would be to mandate a conversion to a future-proof C-api (PyPy has already designed an initial one iirc that's tested and also has CPython support) that packages would convert to over time.

CPython will probably never go away due to many private users of the old api, but beginning the work towards implementation independancy in the package ecosystem at large could allow _language compatible_ runtimes with V8/JSCore/LuaJIT-like performance for most new projects.

It all depends on the entire community though and that in turn depends on the goodwill of the CPython team to support this.

lifthrasiir · on Dec 14, 2023

I should clarify that LuaJIT is not universally used either, because LuaJIT doesn't support anything after Lua 5.1 among other reasons. People often claim that Lua is a relatively speedy language due to the existence of LuaJIT but it's not entirely correct.

whizzter · on Dec 16, 2023

It DIDN'T because Mike Pall(LuaJIT author) was on a hiatus from develoment. He's resumed work since and many 5.2+ features has been incorporated now even if the biggest language-turd (1: _ENV) is explicitly incompatible.

The problem is that _ENV explicitly exposes the lexical environment as regular objects prohibiting optimizations, even JavaScript has _removed_ a similar feature (2: the with statement) when running in "strict" mode to simplify optimizations.

LuaJIT _could_ implement the _ENV blocks but it'd seep into large parts of the codebase as ugly special cases that'd slow down all code in related contexts (thus possibly breaking much performance for code in seemingly unrelated places to where _ENV exists).

To compare from an implementation optimization perspective, exposing _ENV is actually __worse__ than what CPython has with the GIL for example.

Luckily "with"-statements in JS was seldomly used so implementers ignored it's existence as long as it's not used(but still have to consider it in implementations, thus adding more workload), but it's an wart that will kill many optimizations if used.

For most practical purposes most people are fine without "with" or _ENV and the languages are fast enough.

https://luajit.org/extensions.html

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...

pjmlp · on Dec 12, 2023

Smalltalk, Common Lisp and SELF are just as dynamic if not more, with a JIT.

ngcc_hk · on Dec 14, 2023

That is a question in my mind throughout reading this discussion.

How about objective c vs swift from a dynamic at least type vs static one. Can swift be glue.

steveklabnik · on Dec 11, 2023

I am going to answer this question in a roundabout way: first showing examples of different code gen in Rust, because it is more straightforward, but then I will reach for an example with Ruby, because I know it better than Python, but I believe it is similar enough that you will get the gist.

If I write a function like this in Rust:

  pub fn add(x: i32, y: i32) -> i32 {
      x + y 
  }

this will compile to this assembly (on x86_64):

  add:
    leal (%rdi,%rsi), %eax
    retq

two instructions. This is because in Rust, free functions exist, have a name, and they are called by name. There's an additional twist here though too, let's check it out in debug mode, with optimizations off:

  add:
    subq $24, %rsp
    movl %edi, 16(%rsp)
    movl %esi, 20(%rsp)
    addl %esi, %edi
    movl %edi, 12(%rsp)
    seto %al
    testb $1, %al
    jne .LBB0_2
    movl 12(%rsp), %eax
    addq $24, %rsp
    retq

  .LBB0_2:
    leaq str.0(%rip), %rdi
    leaq .L__unnamed_1(%rip), %rdx
    movq core::panicking::panic@GOTPCREL(%rip), %rax
    movl $28, %esi
    callq *%rax
    ud2

There's a few things going on here, but the core of it is that in Rust, in debug mode, overflow of addition is checked, but in release mode, wrapping is okay, and so the compiler can eliminate the error path. This is an example of language semantics dictating particular implementation: if I require overflow checks, I am going to get more code, because I have to perform the check. If I do not require the checks, I get less code, because I do not perform the checks. (Where this gets more interesting is in larger examples where the checks get elided because the compiler can prove they aren't necessary, but this is already a tangent of a tangent.)

In Ruby, there are no free functions. If I write a similar add function:

  def add(x, y)
    x + y
  end

This function is not a free function: it is a new private method on the Object class. When you invoke a function in Ruby, it's not like Rust, where you simply find the function with the name you're invoking, and then call it. You instead perform "method lookup," which has some details I will elide, but for the purposes of this discussion, the idea is that you first look at the receiver to see if it has the add method defined, and then if it does not, you look at the receivers' parent class, and if it's not there, you keep going until you hit the top of the hierarchy. Once the method definition is found, you then invoke it.

Now, it's not as if Rust doesn't also have method lookup (though the algorithm is entirely different), but Rust's design means that method lookup (in the vast majority of cases) is a compile-time thing: the lookup happens while you're building the software, and then at runtime, it simply calls the function that you found.

So why can't Ruby run method lookup at compile time? Well, for one, I left out an important second step: Ruby provides a method called method_missing, as a metaprogramming tool. What this means is, if we look the whole way up the object hierarchy and do not find a method named add, we will then re-traverse the entire ancestor tree again, instead invoking each class's method_missing method on the way. method missing takes the name of the method that was trying to be called, the arguments to it, and any block passed to it, and you can then do stuff to figure out if you want to handle this. This means that, even if no add function is defined, it still may be possible for the call to succeed, thanks to a method_missing handler.

Okay well why can't we do that at compile time? Well, Ruby also lets you redefine functions at runtime at basically any time. The define_method method can be called and generate a method on anything, anywhere you want, for whatever reason. You could do this based on user input, even! And yes, that would be a terrible idea, and you probably shouldn't do it, but the implementation of the language requires at least some sort of runtime computation to pull this off in the general case.

Now, I also want to point out that in my understanding, there's caching on method lookup, so that can help reduce the cost in many scenarios. But the point stands that the language has features that Rust does not, and those features mean that certain things must be more expensive than languages that do not have those features.

> can we can imagine an alternate universe Python that's as close as possible to our Python, except really fast? What would be different?

We could, but you lose compatibility with most Python code, and so you're effectively creating a new language. People do try this though, Mojo being an example of this very recently. I am excited to see how it goes.

andai · on Dec 11, 2023

Thanks. That's fascinating about Ruby, I'll have to look into that.

I'm not an expert on Python but I don't see how Python is significantly more dynamic than e.g. JavaScript. I think PyPy and JS performance is comparable (or at least within the same order of magnitude), so I think it largely comes down to implementation, i.e. prioritizing performance.

I think if it had been Python (or Ruby for that matter) in the browser instead of JS, it would run about as fast as JS does today.

lmm · on Dec 11, 2023

> I'm not an expert on Python but I don't see how Python is significantly more dynamic than e.g. JavaScript.

JS has much less in the way of magic methods that can affect "normal" object behaviour, and it doesn't have metaclasses in the way that Python does at all. Most of this customization goes unused most of the time, but the runtime still has to handle it in case it's being used this time.

pjmlp · on Dec 12, 2023

You should have added Ruby JITs into the explanation, with the related generated Assembly code. :)

steveklabnik · on Dec 12, 2023

Yeah that would be fun but I don't have a Ruby environment installed and the comment is already getting very long; if I ever turn this into a blog post someday maybe I will add that :)

igouy · on Dec 12, 2023

You just happened to post a day after "Ruby 3.3.0-Rc1 Released" "many performance improvements especially YJIT"

https://news.ycombinator.com/item?id=38599293

esafak · on Dec 11, 2023

"Tech debt doesn't matter!"

"We're going to redesign it the right way after we get this version out the door!"

wongarsu · on Dec 11, 2023

Arguably those redesigns might have happened if the Python 2->3 transition hadn't received a decade of extremely vocal pushback

fullspectrumdev · on Dec 11, 2023

What’s funny is the breaking change that caused us the most headaches with 2->3 was the changes to fucking print.

I’m still finding broken print as a statement instead a function issues in codebases, somehow.

pwdisswordfishc · on Dec 12, 2023

It’s the easiest one to fix, though. 2to3 does it easily. If it were the only change, we would not be having this conversation. It might have even not deserved a major version bump (given that even minor releases of Python don't conserve backwards compatibility).