throwawayGogEng's comments

throwawayGogEng · on Feb 11, 2017

Coming from companies that use reasonably-sized git repos, I absolutely hated Google's VCS.

Here's some of my painpoints with it:

* No branches. If you want to make a temporary code branch, you create a CL (Google's version of a pull request), but never submit it. This means nobody else can collaborate on it with you, and it must be manually updated to HEAD.

* No CL collaboration. Unlike Git branches, CLs can only contain changes from one user.

* No stable branch. Since everything is essentially on one long branch, it's a real hassle when a project is broken at HEAD. Sure, integration tests should ideally prevent this. In practice, HEAD is often broken. Teams have created bash scripts and mailing lists to determine 'stable' old versions that can be checked out for development.

* Single versions of libraries. Any library that is used is also checked into the VCS. However, only one version of the library can exist in the codebase, which is rarely updated. However, there are exceptions to this.

At one point, Sergey mentioned bringing Google "up to industry standards" regarding VCS's. However, that would be a monumental task and I doubt it will happen.

mikekchar · on Feb 11, 2017

It's interesting. This is exactly how it's worked at every large company I've been at. You have these enormous source trees that are essentially unbranchable. Then you have load build teams that build the latest checked in version. When you want to do anything that uses a component you aren't working on, you grab the built version and use it locally. They you hope like crazy that any changes you make don't break any downstream users (who are integrating with yesterday's build). If the build breaks (which happens frequently), then people use builds from 1, 2, or 3 days ago, leading to inevitable "integration hell".

Although I seriously doubt that Google has this problem, one of the biggest drawbacks of the scheme is that nobody knows how to build anything -- not even their own project. If you are coordinating with another project, then you're always having to wait days for the load build to finish so you can get their changes. If the build breaks for an unrelated reason, you can lose a whole week of development to screwing around.

When working in that kind of environment, I tend to maintain my own VCS, then squash my commits when integrating with the main VCS. I also do all my own load build. Everywhere I've worked, I've been heavily criticised (usually by management) for doing this, but productivity has been on my side, so people reluctantly accept it. I often find it strange how so many people prefer doing it the other way...

throwawayGogEng · on Feb 11, 2017

> If the build breaks for an unrelated reason, you can lose a whole week of development to screwing around.

This happens often on some projects at Google (although 2 days is the longest I've seen it broken). Others have told me they use this time for documenting code and writing design-documents.

> When working in that kind of environment, I tend to maintain my own VCS, then squash my commits when integrating with the main VCS.

Google actually has a tool that allows developers to use Git on their local machine, which is squashed into a CL when pushed. However, some projects are too reliant on the old system for this to work.

jmillikin · on Feb 11, 2017

Everything you posted is wrong, which makes me believe you know everything you posted is wrong. Nobody writes things that inaccurate by accident.

For the benefit of YC, here are corrections:

  > No branches. If you want to make a temporary code branch,
  > you create a CL (Google's version of a pull request), but
  > never submit it. This means nobody else can collaborate on
  > it with you, and it must be manually updated to HEAD.

Piper has branches, which can be committed to as normal by any number of engineers. It's common to have both branches for developing certain features ("dev branches"), and branches pinned to a stable base version with cherry-picked bug fixes ("release branches").

  > No CL collaboration. Unlike Git branches, CLs can only
  > contain changes from one user.

CLs are equivalent to Git commits. Collaboration is expected to occur via a series of CLs, just like Git-based projects have large changes made via a series of smaller commits.

  > No stable branch. Since everything is essentially on one
  > long branch, it's a real hassle when a project is broken
  > at HEAD. Sure, integration tests should ideally prevent
  > this. In practice, HEAD is often broken. Teams have
  > created bash scripts and mailing lists to determine
  > 'stable' old versions that can be checked out for
  > development.

There is a global testing system for the entire repository, which is used to decide whether a particular project's tests pass. Commits on which all relevant tests pass are the branch point for releases. This is similar to the Linux kernel's dev model, where stable releases are cut at known-healthy points in an evolving codebase.

Important libraries define more rigorous releases, similar to Git labels, which are updated automatically every day or two. These both reduce the amount of tests that need to run, and reduce chances of errors in low-level code affecting many teams.

  > Single versions of libraries. Any library that is used is
  > also checked into the VCS. However, only one version of
  > the library can exist in the codebase, which is rarely
  > updated. However, there are exceptions to this.

Many third-party open-source libraries have multiple versions, and new upstream releases are added when there's either a security/bug fix, or someone wants a new feature.

Three of the four languages most often used at Google (Python, C++, Go) do not allow multiple versions of a library to be linked into a single process due to symbol conflicts. This is a limitation of those languages, not of the Google repository, and they affect any company that allows use of third-party code. The standard recommendation at Google is to avoid dependency hell by sharding large binaries and using RPCs to communicate. This development model has many advantages that have been documented elsewhere.

throwawayGogEng · on Feb 11, 2017

Thanks for the info!

> Piper has branches

I've worked on two teams so far, neither of which used the branches you described. They opted to either flag off features or keep long-running CLs.

However, I'll try to learn more about Piper branches. I doubt my team will make large workflow shifts, but it would still be good to understand!

> CLs are equivalent to Git commits

I'd argue that CLs are not equivalent to Git commits, the equivalent would be a CL snapshot. Best practices in git are to have frequent, small commits. CLs tend to be much larger, and the review process means that having small CLs would greatly slow workflow.

> There is a global testing system for the entire repository, which is used to decide whether a particular project's tests pass. Commits on which all relevant tests pass are the branch point for releases.

Yes, but this doesn't help with development. When HEAD is broken, a developer has to chose between developing on an outdated codebase, or developing on a broken codebase.

> Many third-party open-source libraries have multiple versions, and new upstream releases are added when there's either a security/bug fix, or someone wants a new feature.

Popular libraries may be updated more often, however other libraries don't have many resources dedicated to them. After a brief correspondence with the team that manages third-party libraries, I decided it would be easier to implement the feature myself instead of following whatever process was required to update the library. And no, I wasn't trying to use 2 versions of the same library.

Despite your assertion, I'm not trying to write anything incorrect, and I appreciate your response.

JeremyBanks · on Feb 11, 2017

While Piper does technically have SVN-like branches, they're so unwieldy and poorly supported by lots of tooling (our team could never get TAP and our slightly-unusual testing to work on branches) that they're a much more niche tool than something like Git branches.

I agree with you with respect to CLs/snapshots. I would sometimes try chains of DIFFBASE-linked CLs in a crude emulation of linear Git branches (I want to experiment with taking things different ways, and I want the version control to store/back up my work), which sort-of worked when you're writing the code, but merging could be nasty. But there was also ad-hoc Git hosting available internally. I started using that for my experimenting, squashing into Piper commits when it was ready to share. It wasn't terrible, though still not optimal for collaboration.

Bahamut · on Feb 11, 2017

> Best practices in git are to have frequent, small commits.

I'd qualify that to only on a working branch - it's preferable to squash it all to 1 commit when merging upstream in order to make reverting painless if there is something wrong with the proposed change.

Lots of small commits cause massive pain with git :( .

jmillikin · on Feb 11, 2017

I don't know which team you work on, so this advice might not be appropriate for your situation. Feel free to email me and I'll try to help you get things sorted. I won't make any attempt to link your corp username to these comments. Same for any other Googlers reading this chain.

  > I've worked on two teams so far, neither of which used the
  > branches you described. They opted to either flag off
  > features or keep long-running CLs.
  > However, I'll try to learn more about Piper branches. I
  > doubt my team will make large workflow shifts, but it
  > would still be good to understand!

Putting new features or changed behavior behind a flag is a good process. Encourage your teammates to avoid long-running CLs in favor of submitting flag-guarded code. You should treat every run of [g4 patch] as a warning that the CL is dragging on too long.

Dev branches require a change in workflow, and can be unhelpful if a team has bigger process issues (like yours sounds like). I recommend looking into them and trying out the codelab, but don't start advocating for them just yet.

Release branches are very important. If your team is not currently using them, that needs to be fixed ASAP. Look into Rapid[1], try it out for a small CLI utility. Advocate for all releases to be done via Rapid. Despite the name it can be slower than plain [g4 sync; blaze build] but the extra steps it runs are important and useful.

[1] https://landing.google.com/sre/book/chapters/release-enginee...

  > I'd argue that CLs are not equivalent to Git commits, the
  > equivalent would be a CL snapshot. Best practices in git
  > are to have frequent, small commits. CLs tend to be much
  > larger, and the review process means that having small
  > CLs would greatly slow workflow.

I'm a Go readability reviewer, so I review a lot of CLs written by people in other parts of the company. My firm belief is that CLs should be small and frequent. CLs start to get hard to review at around 300 lines of feature code. If you are regularly reviewing large CLs, push back and ask the authors to split them up. Often these CLs are trying to do too many things at once and you can find a good fracture point to split them into 2-3 parts.

If you are regularly writing large CLs, or long chains of diffbase'd CLs, that's a sign that your codebase may be poorly factored. Take a step back from the tactical level and look at what your CLs are touching. Is the UI's HTML formatting mixed into the business logic? Are you touching the same file over and over? Move things around, use includes, use data embeds. Replace long param lists with data structures. All this standard software engineering advice applies 2x when working with other devs.

  > Yes, but this doesn't help with development. When HEAD is
  > broken, a developer has to chose between developing on an
  > outdated codebase, or developing on a broken codebase.

If HEAD is broken, your first priority should be getting HEAD fixed. Whether that means fixing the code or rolling back to an earlier version, you should not accept a broken HEAD.

After it's fixed, look at why it broke. Why was a CL that broke things allowed to be submitted? Do you have proper presubmit tests? Consider adding your TAP project(s) to METADATA files so Piper will make sure tests pass and binaries build before allowing the submit.

If other teams' changes are breaking your code, add your TAP project(s) to their METADATA or help them improve their own test coverage.

  > After a brief correspondence with the team that manages
  > third-party libraries, I decided it would be easier to
  > implement the feature myself instead of following
  > whatever process was required to update the library.

Third-party code has special rules that might prevent you from doing something reasonable. Ask the team for help with the process. It's a natural developer instinct to write new code instead of trying to update shared dependencies. If you can fight that instinct and get the new dep version imported, it will improve not just your project but the projects of everyone who might use that dep in the future.

pyb · on Feb 11, 2017

I've heard Google's VCS is basically their own reimplementation of Perforce. Are there any major differences ?

shemnon42 · on Feb 11, 2017

Other than "works at Google scale?" That not a dig at Perforce, it scales well but not to the absurd scale Google dials it up to. That is the major feature that drives any differences.

tehlike · on Feb 11, 2017

Git5 bridge exists, though it is not supported.

Single version of a library is actually a good decision, makes sub libraries pulling different version of the same code managable and error free.

Not to mention security auditd and ease of maintenance.

Disclaimer: goog emp