What's more, you can get a significant speedup from your Python scripts by replacing the inbuilt cpython malloc calls with a static "allocate big chunk of stack at the beginning and I'll manage it myself" implementation, falling back to malloc as needed if it grows beyond that. A college class in perf engineering I TA'd did this, the results even a beginner could achieve were compelling, the top of the class produced results quite remarkable indeed..
This is most effective for reducing startup time of short lived scripts, where the runtime is dominated by many thousands of trivial mallocs right at startup. But in general if you can establish a bound on memory, it will be faster to allocate it in one shot.
I'd love to see an example of this if you happen to have one available. I hit a startup-time issue in a previous life, and I wish I had spent the time looking at it back then
A template repo can be found here https://github.com/JacksonKearl/cpython, but it does not implement an ideal malloc, just a baseline one - I am not sure if it is still being used as an assignment.
The repo states that even this dummy implementation:
> has a 60% faster startup as compared to base CPython, and in some test cases has marginally better runtime performance as well.
This is most effective for reducing startup time of short lived scripts, where the runtime is dominated by many thousands of trivial mallocs right at startup. But in general if you can establish a bound on memory, it will be faster to allocate it in one shot.