Speaking as someone who's hit the barriers of PHP performance for many years, this is a good start.
But I wish it was done a little nicely from the community point of view (the >4gb strings stuff).
I'm considerably less talented than Dmitry and I spent several months of my life trying to unsuccessfully write a JIT for PHP - most of what stopped me was the rest of the Zend engine itself.
While I was maintaining PHP-APC, I spent many weeks trying to write a basic block JIT for php, when Zend is using the CGOTO core (FYI, if you are still using APC, switch to Zend OpCache).
This would compile code which didn't have any jumps into a native chunk and swap out the opcode's handler location into my native chunk.
The little I did actually do ended up being fairly involved assembly rewrites of the inner loop.
No matter what I did, the issues of the bytecode organization (the ->result reference) and the lack of type verifiability in the code generation resulted in me slowly throwing away every prototype somewhere between the for loop and running the default benchmark.php.
I haven't read through all the changes yet, but IMHO the Zend engine will be an absolute pain to deal with until we get to type inference/verifiability into the bytecode format so that integers get integer register ops in the JIT instead of always being zval_* based.
But a cleanup was due. And a faster VM (either HHVM or PHPng++) is good news for the regular PHP users.
Pierre Joye from Microsoft started a big refactor in the open (make string length size_t everywhere) meanwhile Dmitry Stogov of Zend worked in total secret on an even bigger refactor which is phpng, completely ignoring the size_t work (also known as 64 bit). A feud in PHP Internals followed not the first heated debate there... that mailing list is very useful as a textbook example of how not to run an open source community. At the end, Pierre have decided it's better to cooperate and now http://news.php.net/php.internals/74352 there's a vote that seems to pass and so on top of phpng size_t is also coming.
Most disheartening is the way Zeev Suraski pleaded for votes against the size_t/int64 patch in a mail[1] titled "A call for help (urgent)" because, I quote, it "negates months of hard work". That was only a few days after Zend's secret project had been uncovered.
The size_t/int64 refactor has been going through several iterations and RFCs since 2013.
Sure, but it seems pretty irrational to suddenly say 'Hey guys, we've been working on this code without telling anyone, so this community project that people have been working on for months should be abandoned'.
They could have been integrating the changes in their code as they went, and then launched with 'Hey, we have this cool new project, including the optimization work that people have been doing'. Instead, they kept their work secret, knowing that their work was incompatible with another large refactor that people were contributing to, and then said 'hey, abandon all the work you were doing because who cares'.
I'm one of those weird developers who loves PHP despite it's warts, and pushes the poor language to it's limit. I love the stuff that's happening in the community (PHP-FIG, the PSR standards, Composer/Packagist, and the like).
The thing is, that is all being achieved despite how badly the internals is run. Seriously, I tried to get involved and was turned off entirely by how it's run as an "old boys" clique where new developers and changes are considered enemies. It's really sad, and I think PHP could become a much nicer language overall if that changed. Some people who are far smarted than myself have the same opinion too, so it's not just me...
A lot of people I know who work with PHP on a daily basis agree that PHP is making significant progress despite the PHP internals culture, and not because of it.
I think what shocked me the most about that debate was how much the phpng developers tried to undermine Pierre Joye, by even contradicting their own goals.
In particular, their primary advertised concern was on the 4% increase in memory usage - but the whole purpose of phpng seems to be to make PHP more compatible with JIT compilation later on - and that seems to almost universally requires large amounts of memory to work really well.
I feel like you're giving a very one-sided representation of the issue. There were some very real concerns about the memory usage of the proposed modifications to phpng (you can see http://news.php.net/php.internals/74284 for an analysis) - what should have happened is that we just went over those changes and decided which parts are feasible and which are not. This happened eventually and now there's a full consensus on how this change is supposed to be implemented (64bit on LLP64, size_t string lengths, uint32_t array lengths). However before that happened the whole discussion was basically a pissing context between Pierre and Zeev. Pierre was most certainly not the one who got us back to cooperating.
I don't think so. Pierre worked on this in the open, whilst the phpng team did their work in secret. That makes it at least inconsiderate of them that Pierre was not told that the changes might break optimisations in phpng - there should have been a clear communication from them that major changes were coming to php which would affect the utility of his work, and that he should consider holding off.
Then, the arguments made in the discussions were basically besides the point. Overall memory usage is not really important on modern hardware, since memory is cheap. The real concerns are time performance - if the performance is degraded, it would be a valid reason to reject some of the changes. The vast majority of criticisms were just along the lines of 'but why do we even need 4GB+ strings' - I can see exactly why Pierre would get frustrated at having to parrot the same line. Performance in current php was clearly fine, and performance in phpng was only ever mentioned in completely vague terms - the most empirical it got was 'i guess 20-30% worse' which isn't really a trustworthy figure by any means.
The burden there was up to the phpng team to recognise that there were changes in the standard php pipeline that would potentially invalidate some optimisations. Coming in at a late stage and saying that it would ruin everything, having made no effort to stop someone wasting much time, was at least inconsiderate and probably rude also.
Then attempting to effectively filibuster the situation by repeatedly firing irrelevant arguments at Pierre, then recruiting randoms to try and vote against it as some kind of 'we are being undermined' campaign, then trying to change the voting rules, it all just smacked of a very amateurish and/or rude community. Yes, they were mostly by the same people, but at some point someone could have very easily come out and pointed out (for example) the inherent fallacy in the 4GB+ argument, or the fact that the phpng developers should have informed Pierre earlier that their new optimisations might conflict. It seemed like a number were happy letting some fight their preferred argument with the wrong reasons.
The arguments of 'wasted data' aren't convincing - the arguments of 'wasted data causes poor performance' would be convincing were there any data whatsoever to back it up, and if those who argued actually worked from that standpoint.
My problem was (as someone who read the mailing list) was simply that the very real concerns were brushed under the carpet, and that the situation was allowed to develop in the first place.
There's some seriously missing context to this post. It sounds like they are responding to some expectations posted elsewhere, but they don't link to what they're responding to. Or perhaps it's an update to something announced earlier, but they don't link to the past announcement. It's not clear what PHPng's relationship, if any, is to 5.5 or 5.6 in this post.
Anyone have any pointers to the conversation they are participating in with this post?
(mavci's link is an O'Reilly summary of the state of the PHP ecosystem and I didn't see any mention of PHPng.)
Yes, this is exactly what I was looking for. Thanks, rjknight. Someone visiting php.net without knowing the context would be seriously confused by this post (as I was).
There was a link a couple of days ago where I learned about phpng for the first time - it was some argument between php developers where core devs argued a change that made php5 nicer (standardised 64/32 field lengths / formats) made the phpng work harder. (undone some structure size optimisations) Maybe people discovered phpng that way and started asking questions about it? I can't find the actual link unfortunately.
If you Google "PHPng", you'll find a number of articles from the past few days about how the project is going to bring fancy stuff like a JIT compiler to the next version of PHP. Presumably this is to ensure expectations are set correctly and people don't think things are around the corner that actually aren't even in the country yet.
Isn't Hack the PHP of the future, aviable today? For me it seems with hack there are no excuses anymore to still use php. I'd be glad if someone tell me some downsides of hack, it seems to good to be true. And how do the JITs compare?
The problem I see is that Hack runs on HHVM rather than the standard PHP runtime, which is ubiquitous (some would argue that's its primary or only virtue).
So Facebook or any other company running code on their own server can shift to HHVM with (relative) ease but open source projects (like WordPress) and libraries can only switch once everyone else switches, Which leaves us in a catch-22, so Hack remains forever in a symbiotic/parasitic relationship with PHP.
you're probably interested in the downsides of HHVM, not Hack. Hack is really just a PHP frontend built on top of HHVM. HHVM is the JIT.
- it runs really well on expensive, FB hardware. there's no consideration given to anything else (performance-wise) in development. that's not to say it's slow, but it works best on 64GB servers with fat SSDs.
- it's a huge moving target when it comes to php5 compatibility -- there are a few intentional inconsistencies and a lot of unintentional ones. fixes for zend incompatibilities have a whack-a-mole effect: the typical patch fixes one inconsistency and introduces a few more.
- bad documentation and bad code quality
- it is open-source, but only in the most superficial sense. it's really an FB internal project, so good luck making any changes that help you but don't help FB
it's still better than zend PHP, and, to be fair, most of the incompatibilities are in cases where zend behaves stupidly. source: i was an HHVM contributor. (edited to space out my ascii list)
I half disagree with the third point - the documentation is certainly lacking (I recently used Redis on an HHVM site, and the adaptor had no documentation that it even existed; just read the source to figure it out), but it's a new-ish, fast-moving project, so that's not too surprising.
The code quality, however, isn't 'bad' by any stretch of the imagination. The FB team managing it is very knowledgeable about PHP internals, and are appropriately strict about determining what zend-compatibility fixes to commit. Their code's overall quite well-written, which makes me care a whole lot less about how poorly it's documented.
Odd. If I recall correctly, there was even a detailed FB PHP coding style guide in the HHVM Git repo; that was afterwards removed though - I don't know why.
I'm sorry to hear about your experience contributing to HHVM. Which commit(s) were you talking about? I'd rather stop doing whatever got in your way so others don't have the same experience.
really, the big problem was that the HHVM team didn't use CMake, so there was no concern with how long the build took and it was frequently broken. I had two instances where I spent hours fixing bugs that were fixed internally at FB at the same time, but hadn't been pushed. I know there was an ongoing effort to move completely into the open, but it was frustrating. I don't think these are persistent problems with HHVM, just reasons why the project wasn't really ready for primetime in the OSS world (open academy, specifically).
the fact that no one who doesn't work at FB is allowed to merge code (correct me if I'm wrong) made me feel like I was helping FB more than the OSS community. I get that HHVM is business critical to FB, but if you want it to truly get community support it needs to be spun off from FB.
edit: and I will add that my language was way too harsh. it's not hard to get in a PR provided it doesn't break anything in an FB internal test suite which I can't see (which never was a problem for me, but I could see it being one) and didn't degrade FB performance.
From the github page it looks like external folks are submitting pull requests. It looks like they require external people to sign the CLA, but it doesn't look like it's a case of "good luck making any changes that help you but don't help FB" per your comment.
I've found the same as you - people are getting their PRs when needed. In the earlier days of the project, it may have been true that they were less accepting of any non-critical changes (as they were still getting the core together), but in my experience they're open and eager to get feedback from the public.
Facebook moves fast. I'm not saying that they're going to abandon Hack or give it away to Apache or something, but if I were a corporation, sticking with vanilla PHP is attractive because it has a large developing body, and it's future isn't dependent on the whim of one company.
Otherwise, yeah, Hack is an improved PHP. To be clear, it adds additional functionality to PHP, but is still entirely backwards compatible. PHP >= 5.4 still has solid bones.
Hack also works really nicely with PHP - I think a lot of PMs hear about hack and think it's a workalike but completely separate language, not realizing that you can (and almost always will) have hacklang running alongside php. You're capable of declaring a class in hack (with strict-typing, method return types, better collection objects than just having 'Array' everywhere), with that class directly extending a PHP class.
80% of the reason I use hacklang is just so I can extend whatever base PHP classes my CMS provides, but implement my methods without having to add the piles of manual type-checking that PHP requires.
I've not tried this yet, though my boss has given me the go-ahead. What editor do you use to find out how it's going? I struggle a little bit with it because I'm yet to vim-hack running. Do you just stick to the "code save refresh" cycle? Does Hack give you type checking error messages in the web page? Can you run a "check types" command on code from the terminal?
Unfortunately the only way to get type errors statically right now is to run `hh_client` at the command line (see http://docs.hhvm.com/manual/en/install.hack.bootstrapping.ph...) -- this is what vim-hack will do for you. HHVM will only report type errors that it can see at runtime, which won't catch the edge cases or error cases you didn't think to test, which kind of defeats the purpose of having the static type system. For a really simple integration, you can do `watch hh_client` in a second window, which will run the checker every 3 seconds, configurable, see `man watch`.
This separation is different than many other languages, and lots of folks have found it confusing. It largely exists for technical reasons, and we should be able to have a better UX for end users; this is something I've been thinking a lot about and want to try to improve in the language going forward. (I work full-time on the Hack team.) Please do give it a shot and let us know how it goes!
Funny, I was in the IRC channel this evening having a chat to Simon and managed to work all this out! I'm actually quite excited by it, I think the type checking that's available + unit testing will catch most of the edge cases I was looking to catch, although that "doesn't check global scope" threw me for a bit when getting it all up and running!
I'm really interested to see what you end up coming up with UX wise. What sort of edge cases do you think the `hh_*` apps will miss? Running `hh_client` on save isn't a bad thing, IMO, and having it output JSON for easy integration is amazing :) But I'm curious what the type checker will miss currently?
Is there a public mailing list where these sorts of discussions happen or is it more internal to Facebook? I'm really interested in Hack (as is where I work, we push PHP to it's limits so we're keen to use something that can catch even more bugs from day 0) so thanks very much for pushing PHP even further forward!
Are the technical reasons for missing type errors that wouldn't be triggered at run-time insurmountable? Or is it more a "finding the right UX to expose it"?
It's not that the tyepchecker will miss things, it's that, for folks coming from other languages, it's not clear that there even exists a separate typechecker you have to run!
There isn't a mailing list right now. A few discussions are just in-person since we sit next to each other, but we're trying to make things as transparent as we can. Lots of stuff (and hopefully more moving forward) happen on #hhvm on Freenode or on GitHub issues -- basically the same channels as the HHVM project itself. There is also nontrivial discussion during code review, which is very unfortunately all internal right now; we hope to have that all moved external as soon as we can. (There are a lot of tricky integrations with internal tools that need to happen for that to work.)
Statically typed usually means the compiler checks types during compilation and throws away information about them after generating the target code (usually, there are exceptions like Java). So I doubt it will be able to tell you anything about types in the ouput (the web page) unless you do something with runtime contracts.
Actually Java works in the same way; the confusion stems from imprecise use of jargon. Java tends to use the word "type" for things which would more precisely be called "tags". Tags are the run-time information which distinguish different values of the same type, and can't be erased in general.
Java adds a bunch of rules about handling tags, for example object values have a "class" tag; class tags must be statically specified (AKA "type signatures"); class tags can be pattern-matched automatically (AKA dynamic dispatch, method overloading and inheritance), allowing functions to be defined in separate chunks (AKA methods); functions can only be applied to arguments which will match a pattern (AKA "type checking"), etc.
These rules are checked at compile time as well as the types. Unfortunately all these different concepts tend to be grouped under the umbrella term "type checking", which makes fine-grained discussion and comparisons to other languages more difficult.
Hack doesn't get compiled, heh, it's interpreted so it's quite a bit different to what I'm used to. It's more like TypeScript in that regard, but TypeScript is still compiled... I've never used an interpreted typed language before, hence the question!
Well HHVM is a JIT compiler isn't it? Also it applies just as well to an intermediate representation (like bytecode, or some simpler language that gets evaluated).
HHVM enforces the checks purely dynamically as it actually executes code. There's a totally separate tool that does the full static type analysis. See my sibling comment for details: https://news.ycombinator.com/item?id=7808586
I guess hack is not backward compatible with some older php code.
Particularly code that mingles php and html together. For a lot of us this doesn't matter (frameworks and templating engines), but a lot of tutorials mix the html and php.
Most frameworks work with hack now, so your right it would be easier if there where just 1 version of "php".
Hack has incompatibilities with some current PHP code too. Particularly the different syntax for generators (PHP 5.5) and the upcoming syntax for variadic functions (PHP 5.6).
Can you elaborate on these incompatibilities, particularly generators? I work on the project and the only generator incompatibility I know of is that what PHP5 calls "Generator" we call "Continuation", but only since we haven't gotten around to renaming ours yet :) I'm also pretty sure HHVM supports 5.6 variadic functions in PHP code, or will very soon; the Hack typechecker currently does not, but that is strictly a missing feature, and something we do want to support.
Hack's backed by its own open-source community too, and in its own way, is also backed by the PHP community. Laravel, symfony, cakephp, doctrine, etc. are often run on HHVM and support hacklang applications. What makes PHP better makes hack better, and hack has a growing community of developers making hack better too.
I think it will be great if PHP's internal developers start focusing on HHVM.
From comments it seems like some people are afraid that FB will keep driving HHVM project in it's own way. But this fact is dependent on contribution. FB may loose grip on HHVM if If people outside FB start contributing more on it.
But I wish it was done a little nicely from the community point of view (the >4gb strings stuff).
I'm considerably less talented than Dmitry and I spent several months of my life trying to unsuccessfully write a JIT for PHP - most of what stopped me was the rest of the Zend engine itself.
While I was maintaining PHP-APC, I spent many weeks trying to write a basic block JIT for php, when Zend is using the CGOTO core (FYI, if you are still using APC, switch to Zend OpCache).
This would compile code which didn't have any jumps into a native chunk and swap out the opcode's handler location into my native chunk.
The little I did actually do ended up being fairly involved assembly rewrites of the inner loop.
http://notmysock.org/blog/php/optimising-ze2
No matter what I did, the issues of the bytecode organization (the ->result reference) and the lack of type verifiability in the code generation resulted in me slowly throwing away every prototype somewhere between the for loop and running the default benchmark.php.
I haven't read through all the changes yet, but IMHO the Zend engine will be an absolute pain to deal with until we get to type inference/verifiability into the bytecode format so that integers get integer register ops in the JIT instead of always being zval_* based.
But a cleanup was due. And a faster VM (either HHVM or PHPng++) is good news for the regular PHP users.