My technical recap of the Git Merge 2019

GitHub, GitLab, Atlassian, etc. organizes each year a conference about Git and its evolution. I wrote the recap of the 2019 event in Brussels.

This post is free for all to read thanks to the investment Mindsers Club members have made in our independent publication. If this work is meaningful to you, I invite you to join the club today.

All the conferences were concentrated in one day. This year it was on February the 1st in Brussels, a long and exhausting but very interesting day. All ticket proceeds are donated to the Software Freedom Conservancy, which helps promote, improve, develop and defend Free, Libre and Open-Source Software projects.

The future of software by Deb Nicholson

Deb Nicholson opens Git Merge 2019 with her talk. It was more like an opening keynote and as usual, it was more of a social talk than a tech talk.

Deb is the Director of Community Operation at the Software Freedom Conservancy. She is also a woman developer, not just in a managing position. She shared her first experience of using Git and reminded us that we all were a newbie once. As a Git newbie user, she faced some mean behavior.

The future of software, particularly in the open-source community, depends on what we are doing now, how we behave and how we interact with others.

How can we make free software better?

Are we open to new developers, even if they are not experienced? To build the future of software, we have to share our knowledge to newcomers and help them grow.

We won’t be here forever

Pro-tip from her: no matter how good you are at what you are doing, be kind to everyone.

Tales in scalability: how Google has seen users break Git by Ivan Frade & Minh Thai

Ivan Frade and Minh Thai work both in the Git server team at Google. Their goal is to serve Git data fast everywhere.

Google is hosting hundreds of thousands of repositories, including Chromium or Android, whose repos are very active. The numbers below are pretty much jaw dropping.

The problem with these big repos is that everything is really slow. Each time a developer wants to push code, Git is returning the entire repo index. It can take up to 20 seconds. s. 20 seconds may look like nothing but 20 seconds at each commit, there’s no way.

To solve this issue, the Google team modified the Git instances on the servers to return only some part of the index each time someone pushes code. They did that by calculating a bitmap table (a graph) to know what the client’s needs are.

The first hypothesis was that a client doesn't really need the whole repository to work. Obviously, the hypothesis was made by the Google engineer for the Google repository. It’s a specific use case. For another project and another team, this hypothesis could be wrong.  

The what, how and why of scaling repositories by Johan Abildskov

Johan Abildskov is an expert in the field of DevOps, he is a consultant at Praqma. He brings a more rational approach to the way we choose between mono and many repositories. He explained that he adapts the repo choice to the project and tries not to be biased by his personal preferences.

monolithic repositories suck!

He clearly state that in many situations, it is useful to use many repos. Furthermore, he said that the argument his clients use the most to justify monorepo is “Even Google and Microsoft use monorepo!”. It is not an argument. Even Google and Microsoft can fail.

Using mono repo, it is hard to continuously deliver new feature. For example, Microsoft has to rebuild entirely Windows when a developer update an icon on the desktop manager options. It takes a lot of time and server resources. But this process applies for each developer. It is not possible to handle easily by a regular CD (continuous delivery) server. The low-cost solution is to use many repos.

Johan also warn about the extreme opposite of monorepo. Breaking down every single part of your system into a different repository is counterproductive. He shows a screenshot where we could see someone waiting for 24 pull request on 24 different repositories to fix only one JIRA tickets. Some units of one system should stay together. Break down your app with cleverness.

Bridging the gap: transitioning Git to SHA-256 by Brian M. Carlson

An interesting talk from Brian M. Carlson, Git Ecosystem Engineer at GitHub, about the Git transition to SHA-256. Like every transition, it will be done in several steps to be sure to not break everything.

Note: for those who want to know why Git have to transition to SHA-256, please refer to this Stack Overflow answer.

This transition being a breaking change, it is complicated because it is a distributed system and everyone might not upgrade their Git at the same time. To function, the client, and the server must be on the same version. The answer to this problem is a 4 steps transition, including a step where both algorithms will be supported.

This transition will not affect the way developers use Git. At the end, versions of Git with SHA-1 and ones with SHA-256 will interoperate.

More info on transition

Lightning talk: Git protocols: still tinkering after all these years? by Brandon Williams

Brandon Williams works for the Git team at Facebook and was formerly working at the same position for Google.

The Git wire protocol is used to push and to pull from a remote repository. It has been years since it’s been changed. Brandon has worked on the version 2 and showed us a test of this version on a fetch, and it is 4 times faster than the previous one. The v2 protocol is available since Git 2.18 and was released on May 2018.

By default, all Git clients are set to the version 0 of the Git protocol, you have to change it manually. You can set the v2 protocol on chosen repositories if you don’t want it as a global configuration.

To enable the new protocol, you can use: git config protocol.version 2 from a repository or git config --global protocol.version 2.

Here is a list of other improvements of the v2 protocol:

Main improvement

Lightning talk: Native Git support for large objects by Terry Parker

Terry Parker is a developer at Google. He is leading the git-core and the Git Server team at Google.

Having large binary files in Git project can be a huge pain even for Google. But Terry is here to present a new feature of Git called “partial clone”, this feature should help us to work with large repository in Git.

When cloning the repo, you can filter objects (tree, commit, blob) you want to retrieve. When developers use Git, it will retrieve missing objects automatically if needed.

Large object can be large files but are not necessarily, so this feature can be used as an alternative to Git LFS. Or they can be used together. The main difference between the two is that with LFS you have all the objects, but some blob objects are just link to a file on another server (the LFS server), with partial clone you don’t have the blob objects at all.

It’s a great feature. The only bad aspect about this feature is that it could encourage monorepo… Joking.

Git for games: current problems and solutions by John Austin

John Austin is a video game developer. Most files used in this field are images, videos, so these are binary files and not text files. Developers often have conflicts on these files.

The problem is that because they are not text files, Git does not understand how to solve the conflicts. The changes on binary files are not spotted. For example : when you change the background color of an image, it changes the entire image, not just a line in the file. If two developers are working on the same image, Git can’t merge automatically the two versions of the image. Git asks developers what version they’d like to keep, but choosing one version is erasing the work made on a second version.

This is why classic merging is not working on this type of files.

Binary conflicts are lost work

John had to find a workflow to have as little conflict as possible with images when several developers are working on the same part of a project. His idea was to warn developers when their commits contain a binary file that has a more recent modification in another  Git branch. Using a pre-commit hook, the Git Global Graph will check for binary modifications and warn the developer before committing.

The Git Global Graph is an open-source project written in Rust you can contribute to.

The art of patience: why you should bother teaching Git to designers by Belén Barros Pena

Belén Barros Pena is a designer, sharing her experience in the company she has been hired by. She was put in a team where she worked directly with developers (who were using Git) and she could not work with them properly as she didn’t know how to work with Git.

Her team, unlike all the team she has been in before, decided to teach her how to use it! It took her two years to learn how to use Git. It was difficult for developers to explain Git and the surrounding jargon to someone who was not a developer. She had to learn all the concepts she was not familiar with, such as trees, branches, or commits. Developers had to simplify the explanations and avoid useless concepts for her work.

IMG_1390 (some things can never be simple)

According to her experience, it is essential for developers to teach designers how to use Git, so they can be more implied in the conception and review process. She was able to push code, so she took part in the design process, she could reject some pull requests, so she really became active in the creation process.

Belén gave aspiring mentors some tips:

  • Always do things with the mentees, never do things for them. (to create muscle memory)
  • Be sure the mentees take notes.
  • Teach using the command line interface.
  • Be sure the mentees use the command line interface too. (to create a good mental model based on how Git works and not how a GUI works)

Two interesting quotes from her: “The wall between developers and designers is not made of bricks, but of attitudes.” and “If we want more designers participating in FOSS, we need to help them to do so.

Lightning talk: Version control for law: Posey Rule in the U.S. Congress by Ari Hershowitz

Ari Hershowitz is a lawyer — former neuroscientist — who learned how to code. His problematic was: modernizing the public institutions. He works for a company of R&D in this field. His job is to create document comparison software that works for the US law.

Why not using Git or another version control system? Ari wrote a piece of text about that: What are the nontechnical barriers to adopting a version control system for use in writing bills and new laws?

One of the main problem is how the laws are written. We keep adding texts to laws, but we never erase or delete existing laws. It’s not exactly “versioning”. Git is not working that way.

His team decides to create a new open standard XML specifically designed for law. It is called USLM (or: the United States Legislative Markup). They convert the entire US code of law using this markup standard. With the USLM, machines can understand the changes in the law and will ultimately be able to show changes in the law at any point of time.

Lightning talk: Git, the annotated notepad by Aniket Subhash Kadam

Aniket Subhash Kadam has a problem with context reloading when he come back to work the day after or even when he come back from a coffee break. He never knew what he did before the break, what he was about to commit.

Here is the reason of the “notepad” idea. Yeah, why not write down what he does to be able to remember after his break.

But what can he write to be as descriptive as possible? Code… So, how can he know what code he wrote just before the break? Git, of course. Git can easily show you the diff of your current work and the last previous commits, so you can read only what you’ve just worked on.

You can also make a WIP commit to use the commit description as an annotation on the work in progress.

If the last commit is not enough to get your brain reload context, please navigate in your commit history, read the diff line by line, read the descriptions. Git is an annotated notepad.

Aniket give us best practices increasing the usefulness of his method:

  • atomic commits
  • read diff line by line before committing (do a code review of your own code)
  • write descriptive commit descriptions
  • refactor the best you can before committing

Side note: Aniket love GUI. He was the only one (in the speaker group) who shows us a GUI to explain his thoughts. It is not a joke.

Version control for visual learners by Veronica Hanus

Veronica Hanus is a former geology researcher who worked on the Curiosity Mars rover. She then decided to become a web developer, she is self-taught. Her problem is that she could never tell the visual changes between two commits because Git is not visual at all.

Commit descriptions tend to solve this problem, but have you tried to notice what changed on a website when you’ve just changed the font in the logo image? Typing “change font on logo” is not an option because: how to know what font are used in each of the 10 commits where you typed “change font on logo”? Maybe adding the name of the font will help you if you used to use it every day, but for the vast majority of the fonts you could use, it doesn’t work.

She presented alternative solutions to try to solve this issue:

First, she took a screenshot of the project at each commit and put it in a folder that was committed with the rest of the code. It was painful and not really satisfying.

She tried to automatize the work using puppeter/webdriver/selenium. At every commit, the pre-commit git hook starts the tools to automatically take a screenshot, put it in the screenshot folder, etc. It was better.

Then she finds tools like zeplin, abstract or sketch which can be plugged to a version control system (Git here). It is what she uses now. These tools are mainly used in big companies and not very well-known by freelance developers or by those who work on a small company.

Git & version control in the enterprise: a panel conversation with Atlassian, GitHub and GitLab

This talk is a bit different because of its format. Instead of one person talking to the public, we have 3 representative of the 3 main forges which use Git talking to each other:

  • Erik van Zijst, a Principal Engineer from Atlassian. He is focused on Bitbucket development.
  • James Ramsay, a Product Manager from GitLab
  • Briana Swift, a Trainer from GitHub

The discussion was animated by CB Bailey who is a Software Developer at Bloomberg.

The theme was how company teams are using Git. For example, do they mainly use monorepo or not? If you really want to know, James from GitLab said that a fair number of users is asking for monorepo whereas Erik from Atlassian said there were very few.

When CB asked what are the significant differences between OSS and enterprise workflow, the entire panel agreed on access-control rules, permissions, restrictions. Enterprises want to have more flexibility and capabilities on how they can restrict a developer or a group of developers on a project. Open-source project haven’t this problem because they use Forking workflow which gives no rights to developer and allow them only to fork the project and make a pull request between the two projects.

The last question of CB Bailey was: “What’s next for Git”. As a trainer, Briana thinks we could always improve how we teach Git to newcomers because “Git is accessible, but it’s not easy”. James talks about the tools we build on top of Git. He thinks we should “get better in our visual representation, so that new users can feel confident and powerful using Git.”

Lightning talk: Gitbase, SQL interface to Git repositories by Javier Fontan

Javier Fontan works for source[d] and is maintaining an open-source project that allows to retrieve information on a Git repository using a language similar to SQL. It acts like a Git repository was a database.

This project is very useful to retrieve data easily without having to learn all about Git’s API. He explained the problems he went through. For example : if his software has to search data in real time in the Git repository, it would be very long. So, it is scanning the repo and stocking parts of the data in a real database.

Technical contributions towards scaling for Windows by John Briggs

As the previous speaker from big companies like Google, Facebook and GitHub, John Briggs from Microsoft comes to talk about scaling Git due to the gigantic repositories his company hosts.

“Huge repository” are not the greatest words to describe the Windows repository. Just take a look at this pic:

Only one branch for all the Windows engineers and more than 3 million commits on that branch. It is too big of an issue to solve, Microsoft decided to contribute to Git to make it support a large repository.

John was here to show how Microsoft contributes to Git and what’s the Git feature built by Microsoft engineers. The main one is the Multi Pack Index feature, which is available since Git 2.20. This feature accelerates the time to find an object in the Git index.

His team also worked on the Git Commit Graph feature of Git. They firstly change some Git algorithms to make VSTS more reactive. VSTS was, for example, faster to show the branch graph than the git log --graph command. Once everything was working, they decide to bring this feature to the whole Git community by submit their code for review by the Git core team.

How a Git-based education cultivates more resilient developers by Ben Greenberg

Ben Greenberg or Rabbi Greenberg is a developer advocate and a former rabbi. His idea is that learning software development using Git is better than just learning software development.

He talked about his own experience and about the ones of his alumni. Learners who use Git are forced to write small pieces of code, and they have to break down the problem they want to solve into small parts. This is only one example of how using Git when learning helps newbies learn how to think.

He shows us his first commits when he learned to code e could see the improvement of how he commits to a technical issue, how he thinks of this issue.

It was a really great day. Listening to these speakers and talking with them are very inspiring. Furthermore, here are some other things we liked:

  • it was a very inclusive conference
  • no junk food but many delicious fruits
  • several speakers weren’t former developers. It was interesting to learn about their experiences.

We  learn about the Git ecosystem and about the developer community that use Git. For example:

  • Why Git should be used by designers too
  • What are the issues game developers have when they use Git
  • That there are many domains where Git could be useful, including law

If you want to learn more about the life and experience aspect of the Git Merge 2019, I wrote a second article about what we usually don't talk about when writing an event recap.

Join 250+ developers and get notified every month about new content on the blog.

No spam ever. Unsubscribe in a single click at any time.

If you have any questions or advices, please create a comment below! I'll be really glad to read you. Also if you like this post, don't forget to share it with your friends. It helps a lot!