If you end up working on a software project with other people, you’re probably going to encounter version control systems. While there’s a number of version control systems out there right now, they all do basically the same thing: give programmers a way to share code with each other while staying on the same page.
Imagine for a moment you and two friends want to work on a game together. How are you going to do it? The simplest thing would be to just have one computer that’s the “work computer” and you work in-person together, maybe pair-programming or even taking turns coding. Of course, what about working on your own time or at your own home? It’s way more convenient to be able to write code whenever you have the time instead of having everyone get together.
What do you do instead? Well, the obvious thing to try might be emailing each other with the updated code every time you’re ready to share it. You can send the files that you change to each other and then download the code and put it in the right place.
This will work fine, if slightly annoyingly, as long as you always work on different files. But imagine if one of your friends decided to work on fixing the collision detection for your game, the same as you, and then emailed it out only minutes after you sent your own changes. Now you’ve got the tough problem of someone needing to figure out how your code conflicts. Maybe you worked on the exact same part, in which case only one of your fixes needs to be accepted, but maybe you actually fixed two different parts and there’s not any conflict at all. Now you have the problem of coordinating who is going to check the files and figure out how to resolve the conflict, if any, and then email out the fixed version. If you don’t carefully coordinate all this, you could have the same problem of multiple people trying to fix the conflict and then you’d need to deal with conflicts in the conflict fix, etc.
If this is all sounding very painful, it absolutely is, and is why for a few decades now we’ve been using software to try and handle all of this annoying, and easy to mess up, coordination and conflict resolution between files. As you’ve probably guessed, those are version control systems.
There are two basic types of version control: centralized and distributed. They differ in how the collections of source code, called repositories, are handled. Centralized means that there’s a single computer that has the master copy of the software and that everyone working on the project will update their code from the master repository and push their updates to the master repository. Distributed means that there’s no master copy and that, instead, everyone’s repositories are on equal footing with each other.
One of the most common centralized version control systems is called Subversion. The most popular distributed version control systems are Git and Mercurial.
Every version control system has the same basic workflow:
- get, or check out, the newest source code
- add, or stage, your proposed changes
- finalize, or commit, your changes so they are officially a part of the source code
There’s an additional step for distributed systems, like Git, where if you want to have your changes reflected in another copy of the source code you then push your commits. Pushing and committing are generally identical in centralized systems like Subversion because there’s only one “true” version of the source code and everyone is working from copies of it.
Now, what if you’re working by yourself? Is there any point in using version control for your personal projects? Absolutely! The other big thing that version control does for you is let you experiment with your code with a safety net. Every version control system out there lets you revert changes and create different branches of source code. Reverting changes means going back to some previous state in the code. So if it turns out that you made a mistake and coded yourself into a corner, you can just go back to the point it worked and try again! If you want to play it even safer, you can start a different branch where you can experiment as much as you want without affecting the “main” version of your code. For example, if I want to try rewriting my project into Ruby instead of Python I could make a new branch for that. If I later decide the Ruby version is way better and is the main one I want to keep working on, I can merge it back into my main branch.
If you’re working on a big software project started by other people, you’ll need to use whatever way they’re managing their code. Every company and open source project will have already made their choice on this front. For example, Microsoft keeps the source code to Windows in a Git repository that’s over 250GB in size. Facebook keeps its main source code in a Mercurial repository, though they used to use Subversion and Git.
If this is your own project you’re starting, you might also want a way to host your code. If you and the people you want to work with have access to a server where you can set up a version controlled repository, then that’s an easy option. For most people, though, the easier option is to use a service that can provide hosting for you. Three of the most popular source code hosting sites are SourceForge (https://sourceforge.net/), GitHub (https://github.com/), and Bitbucket (https://bitbucket.org/). SourceForge will host Subversion, Git, and Mercurial repositories. Bitbucket will host either Git or Mercurial. GitHub, though, will only host Git repositories.
The only catch is that most of these services are only free if you allow your source code to be open source and available to all. Some, like Bitbucket, may provide a limited version of private repositories as long as only a few people are working on the project.
No matter what, though, for any size of project you probably want to use some form of version control.