Migrating open source code repositories from CVS to GitHub

Clever Synchronization

As an example of a change in a forked project, I'll add a line to the Changes file announcing an imaginary bug fix under open-source-dude's ID. git diff shows the difference between the local workspace and the local repository (Figure 7). The following commit, with the comment 'Imaginary Changes', writes the changes to the local repository. To upload the commit to the publicly visible fork on GitHub, open-source-dude would give the git push origin master command, pushing the main branch of the local repository, master, to the Git repository, origin. Git is efficient when it comes to synchronizing and will not take long because there is little to synchronize. Whereas CVS takes ages to discover that the workspace and server repositories are synchronized, especially with larger repositories, Git will be back with you in a fraction of a second in cases like this.

The developer has thus made a change to the local repository and committed it to the forked repository on GitHub. If you click the Network button on the project page, GitHub will show you a slightly blocky Flash graphic (Figure 8) telling you that the owner of the fork has added a feature to the project that the original project has not yet incorporated. In most cases, a fork on GitHub will lead to the new functionality being integrated with the original; after all, nobody intends to take over the maintenance burden of a project, just for a simple new feature. Forks are just a temporary approach to pushing project development along and a way of asking the maintainer to merge your work once it has reached a stable state. To facilitate this communication, GitHub displays a Pull Request button on the fork's project page, and all the contributor has to do is type in a few explanatory sentences for the maintainer in the box that pops up (Figure 9). Pressing the Send Pull Request button sends the message to the maintainer of the project and optional recipients.

Take It, Please!

The project maintainer receives the message via his GitHub project page and by email. One weak point in the workflow at the time of this writing is that the email appears to come from no-reply territory, and the maintainer can't just mail the sender back; instead, the maintainer has to access the GitHub page in his browser.

Assuming the maintainer approves of the new feature proposed by the contributor, he can pull the fork into his local repository. Instead of blindly trusting the contributor, the maintainer will set up a new branch for this. If you have experience with source control in Subversion or CVS, you will probably be groaning by now; after all, creating alternative branches and merging them with the main branch is a nightmare on these platforms. With Git, merging is just business as usual, and you can create a new branch or merge an existing branch in just a fraction of a second. A couple of clever ideas and advanced data storage techniques have helped Git solve a problem that has been around since the stone age of programming. The norm for developers is to open up a new branch for each bug they need to fix and to keep dozens of branches open at the same time.

Merged by the Maintainer

Figure 10 shows the sequence of commands the maintainer enters to add the new feature to a temporary new branch of the project. This assumes that the maintainer has set up a local clone of the original repository on GitHub and is now accessing it. To start, run remote add to create a new alias – say, open-source- dude-repo – for our favorite contributor's repository. The checkout command with the -b option then creates a new branch, open- source-dude-branch, in the local repository and moves to it by checking it out to the local workspace.

The following pull command in Figure 10, with the alias I created previously for the fork and the master:open-source- dude-branch parameter, reels in the master branch of the fork on GitHub and dumps it into the open-source-dude-branch branch of the local repository. In the new branch, by giving the diff command with the master branch as a reference, git shows the differences between open-source-dude-branch and the master branch (i.e., the main branch of the local repository).

If the maintainer decides to accept the patch, he gives the checkout master command to return to the main branch and then calls git merge with the --squash option and the branch to merge as the parameter (Figure 11). Whereas merge normally accepts all the changes in the fork, commit for commit, which could cause dozens of entries in the original project's log, --squash reduces them to a single commit and dumps it into Git's staging area, where it waits for the next git commit to send it into the repository. This means that the project maintainer can accept the patch into the project without generating more than a single log entry.

This revolutionary approach to contributing to open source projects has already reached CPAN, where hundreds of new modules see the light of day every week. To allow a CPAN module on search.cpan.org to display a link to the underlying GitHub repository (Figure 12), all you need is a META entry in Makefile.PL in the style of Figure 13; the CPAN software will handle the details automatically.

Buy Linux Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus

Direct Download

Read full article as PDF: