dparoski's comments

dparoski · on July 31, 2014

First, I want to thank you, Paul, for writing this critique. It's great to read well thought-out feedback about the draft spec that was announced yesterday.

Before getting into technical nitty gritty, I wanted to clarify that the spec in its current state is a draft offered to the PHP community as a starting point for specifying the PHP language. It now belongs in the php-src repository and can be updated through the standard commit processes for that repository as the community sees fit. Some decisions about specificity were made for the initial draft, but these decisions are by no means final and the hope is that the community will settle on what's right for the PHP ecosystem overall.

Regarding RAII, I'd argue that __destruct methods in PHP are a bit different than stack-allocated variables in C++. Stack allocated variables in C++ have a strictly defined lifetime based on the scope of the variable. In PHP on the other hand, objects are heap allocated and when they are destroyed is not as well defined. For example, you can have a cycle of two or more objects pointing to each other that are unreachable (i.e. cyclical garbage), and it such cases any __destruct methods for these objects are not immediately invoked when the objects become unreachable. I considered requiring refcounting-based automatic memory management for the initial draft of the memory model, but describing in detail cyclical garbage felt really implementation specific and so at a gut level it seemed better to not require RC-based automatic memory management and see how people reacted.

Based on my personal programming tastes, I'd argue that try/finally is a cleaner, more robust way to ensure certain cleanup happens when a scope is exited rather than relying on __destruct. However, I understand that there are some PHP programs out there that rely on __destruct being invoked eagerly in non-cyclical-garbage cases, and that such code will probably exist in the wild for a quite a while regardless of whether try/finally is "superior" or not. I'm curious to see how this issue settles over time.

For the record, HHVM uses RC-based automatic memory management and will eagerly call __destruct on objects that become unreachable that are not part of cyclical garbage. The initial choice to be a bit more liberal about reclamation was not essential to make sure that HHVM was compliant with the spec.

wvenable · on July 31, 2014

RAII simply requires that resource allocation is tied to object lifetime. It requires that destructors are called deterministically and immediately once all references to an object no longer exist. (In the case of cycles, a reference remains until otherwise broken -- a non-issue for this definition).

I personally prefer RAII to finally for object-related cleanup because finally is fallible. If you have a File object instance whose destructor calls fclose() automatically then I don't have to remember to call a Close() method inside a finally block. It happens automatically. But if you don't have RAII then you must remember to put in try blocks and finally clauses everywhere you create an instance. Multiply that by every database connection, network connection, and file and that is a lot of work and potential to miss something. And finally doesn't work at all if your object lifetimes aren't tied to scope.

Finally is inferior to RAII in almost all cases except where you aren't using nicely defined objects. If you use an fopen() call directly, finally is your only recourse to fclose() it properly. The only criticism of RAII is that does limit concurrency and garbage collection options.

dparoski · on Aug 1, 2014

Hmm, IMHO it feels like you're bending the definition of "lifetime" a bit in a manner that already presumes RC-based automatic memory management, and once one accepts that assumption then naturally one concludes that the lifetime ends when refcounting says it ends. To me, the lifetime of a heap allocated thing effectively ends when it is no longer reachable via any existing variables (though perhaps this definition is biased towards a tracing-GC view of the world).

I agree with your points about "finally". It can definitely be a bit clunky, and this is why some languages have introduced scoped cleanup constructs such as C#'s "using" statement, Python's "with" statement, and D's "scope" statement. I guess what I was getting at with my original comment is that I feel scoped cleanup constructs are a better way to go vs. relying on heap allocated things being reclaimed at a certain time, given that the lifetime of a heap allocated thing can depend on non-local state outside of the current function/method.

wvenable · on Aug 1, 2014

I completely agree with your definition. All that RAII needs is that the destructor be called the instant that the object is no longer reachable. With reference counting, that is the case (the object is freed when the refcount reaches zero). But with many other forms of GC (including tracing) that isn't the case -- a process eventually cleans up unreachable objects and the order of that cleanup is indeterminate.

Even scoped cleanup constructs are problematic. I often forget to use them when needed in C# and it's often hard to tell if an object needs it. Furthermore, implementators of IDiposable in C# have to write a lot of boilerplate code[1] to do disposable correctly and cascade disposing to every contained object (again determining if it's necessary). It's also not change friendly -- a class might not need to be disposed today but if changed later all the existing instantiating code won't have guards.

I'm not sure why you think the the lifetime of heap allocated objects under RAII is an issue -- they'll just clean themselves up when they're not needed. It's much less worry and code. What issue do you think exists?

As for Scoped cleanup constructs, they are just hack for languages that can't do RAII -- they have no other benefit.

[1] http://msdn.microsoft.com/en-us/library/b1yfkh5e%28v=vs.110%...

smsm42 · on Aug 1, 2014

Well, for PHP fopen() specifically fclose() would happen automatically as soon as the variable containing the result of fopen() goes out of scope. Unless you shared it with some other code, "properly" would happen automatically here.

dparoski · on May 5, 2014

To my knowledge a JIT compiler is not being added to PHP 5.7. Can you clarify your comment or provide a link to the source where you heard this?

captain_mars · on May 6, 2014

I'm not the person you replied to, but I got the same impression.

The linked email in turn links to the PHP NG page (https://wiki.php.net/phpng) on which the version number mentioned is 5.7.0-dev

dparoski · on March 20, 2014

Engineer working on Hack here.

Historically XHP was developed before Hack. It's available as an add-on for PHP but never became a formal part of the PHP language. When Hack was being designed it needed to work well with XHP (as XHP is used a fair amount with Facebook's code base), so over time it made more sense to start thinking of XHP as being one the language features that Hack offers that is not available in stock PHP.

dparoski · on March 20, 2014

Engineer working on Hack here.

Hack runs on the HHVM engine. HHVM is fairly stable and is used by Facebook to serve over a billion web requests per today. In the past year or so developers outside of Facebook have started using HHVM to run other PHP codebases, and at present 20 of the top PHP frameworks are able to run correctly on HHVM (hhvm.com/frameworks), with 9 of the framework's test suites fully passing.

If you encounter behavioral differences or bugs when running HHVM, you can report it at our github site (hhvm.com/repo) and we will help get it resolved.

dparoski · on March 20, 2014

Engineer working on Hack here.

You can actually write you example a bit more concisely in Hack: "$y ==> $y + 1".

iso8859-1 · on March 21, 2014

Return statements are optional? So it's just the last statement in a function that'll get returned?

dparoski · on March 21, 2014

The "return" keyword is not needed if the right hand side of the lambda arrow ("==>") is an expression that is not wrapped in curly braces.

If the right hand side _is_ wrapped in curly braces then it is treated as a list of statements, in which case you do need to use the "return" keyword.

dparoski · on March 22, 2014

Edit: My previous comment was referring to the second occurrence of "return" in your example (after the "==>" arrow). Just noticed I mistakenly dropped the first occurrence of "return"; the first occurrence of "return" in your example is needed.

dparoski · on Nov 30, 2012

As noted elsewhere in this thread, a stack-based design typically produces more compact bytecode. Compactness was a concern for us because of the size of FB's PHP code base. Also, generally speaking a stack-based design tends to be easier to deal with when working to get a prototype VM up and running quickly.

Many of the advantages of register-based designs (ability to optimize by rewriting the program at the bytecode level, ability to write faster interpreters, ability to map bytecode registers to physical registers, etc.) weren't particularly attractive to us because we knew we were going to build an x64 JIT that did its own analysis and optimization to take advantage of type information observed at run time.

Thus, we drafted a stack-based design for HipHop bytecode. It captured PHP's semantics correctly and happened to fit in fairly well with PHP's evaluation order, so we ran with it and here we are.

pbiggar · on Nov 30, 2012

Makes total sense, thanks!

Would using a register-based bytecode not have been useful for the x64 JIT?

kmavm · on Nov 30, 2012

Check out our HHIR work. It's an SSA-based IR, that gets us most of the advantages of a register representation. But it is at a much lower level than the bytecode.