CIS 191: Linux and Unix Class 6 March 18, 2015 What’s Git All About, Anyway?

CIS 191: Linux and Unix

Class 6March 18, 2015

What’s Git All About, Anyway?

Outline

Version Control Systems

Git in a Nutshell

The Git Philosophy and Data Model

Git Branching, Merging, and Rebasing

Undoing in Git

What is version control, anyway?

• Version Control is a sort of record for our code• It takes away the book keeping from you, and takes care

of it all automagically!– Well, not really auto-magically… more on that soon

• Also it’s better than dropbox or thumb drives• Trust me.

Version Control – A History

• Version control systems (or VCS’s) can be divided into three rough generations– First: Single file, no networking, one person at a time

• Like revision control in a word document• Or Dropbox• Examples of these old school systems are RCS and SCSS

– Second: Multi-file, centralized system, many users• All changes reflected on a central server; actions affect the remote

copy• If you want to make a change, you have to merge all your changes

with the remote copy• SVN, CVS, Microsoft Team Foundation Server


• Version control systems (or VCS’s) can be divided into three rough generations– Third: Multi-file, distributed system, many users

• Changes are reflected on your personal version of the repository first, and then on the remote second

• You have to commit your changes to your local copy before you merge any changes from the remote!

• More modern systems like Git, Mercurial, Bazaar (though I haven’t seen that one used in a long time)

– http://ericsink.com/vcbe/html/history_of_version_control.html


• All three forms of VCS are in use today!– Your friends (and maybe past you) who work off of flash drives

and Dropbox are using something similar to a first generation tool, whether they know it or not!

– Huge companies like Microsoft and hospitals are using second-generation tools, because they are entrenched in them

– Pop computer science uses third generation tools• So, startups and newer large companies

– We’ll be learning Git, which is one of the canonical third generation tools

Outline


Git in a Nutshell



Undoing in Git

So what is it?

• Git is a distributed version control system which really emphasizes speed, data integrity, and a support for distributed, non-linear work flows– From wikipedia

http://www.wikiwand.com/en/Git_(software)

And who made it?

• Remember Linus Torvalds?

And who made it?

• Remember Linus Torvalds?– He made the Linux Kernel when he was 21

And who made it?

• Remember Linus Torvalds?– He made the Linux Kernel when he was 21

• Linus got tired of available version control systems when he was working on Linux, around 2005. – They were slow– They stored too much– Distributed systems weren’t free– Various other complaints

• So he made his own!

Why use it?

• Git allows for development on local machines, even without an internet connection

• The master record for a project exists on a remote machine, which developers sync up with regularly

• “Merge conflicts” are handled cleverly, and with minimal suffering for the programmer

Why use it?

• But most of all, Git makes your life easier!– And safer, in case you make a mistake…

Github

• A place in the cloud to put your Git repositories!• Launched in April 2008 to make all of your lives easier• Projects can be accessed and modified using the

standard git command line interface– More on exactly how to accomplish this in a bit

Outline


Git in a Nutshell



Undoing in Git

The Philosophy: Store Everything!

• Git stores things with some redundancy• Rather than storing differences, like SVN and Bazaar, Git

stores snapshots of your files as they are at a moment in time (i.e. when you commit them).– If a file hasn’t changed, then the system just stores a link to the

most recent snapshot, for space efficiency reasons

• Easy access to previous versions (entire history, really)• Fail-safes if you mess up

– And lord knows you will.– We all do.

Git is a Content Addressable File System

• Uh… What?• All this means is that Git refers to objects by hashes of

their contents– Recall hashing: A function that maps some large X to some small

identifier Y, preferably such that there is only one X for any Y.

• This means that Git can come up with names for things very easily

• It also means that if some corruption occurs in-transit, Git will know about it right away– The hash Git uses serves as a checksum, as well!

• Other than that it’s just of CLI management suite!

Git’s Core Repository

• When you run git init in a new or existing directory, Git creates an empty .git directory– This is where the magic happens!

• This directory contains four important entries (and a couple of other useful entries as well)– HEAD file : points to the ref you currently have checked out– index file : where Git stores staging area information– objects dir : stores all the content for the project database (git’s)– refs dir : stores pointers into commit objects in the database

Git and Blobs (objects)

• Git stores snapshots as “blobs”• Blobs are the contents of a file at the time the file is

committed• You register a snapshot of the file to be stored as a blob

when you add it to the staging area with $git add– This means that if you add a file to the staging area with $git

add, then edit the file again, it will show as up staged for commit and modified but not staged for commit!• Weird…

• But wait. How do we tie these blobs to a file name?

Git Trees (objects)

• You can think of this as a simplified *nix style file system– The analogues are everywhere!!!

• Trees are like directories– They point to either blobs (think files) or other trees (think

directories!)

• They don’t store any information about who saved the snapshots, why they were saved, or when they were saved

Visually

git-scm.com

Git commit Objects (objects)

• These objects store – the top-level tree for a project snapshot– information pulled from user.name and user.email configuration

settings for the author– the current time stamp– a blank line– the commit message

• If you think this looks similar to the entries for the git log command, then you did last week’s homework!

Git’s Plumbing

• When you run git add and git commit, git – stores blobs for the files that have changed– updates the index file– writes out trees– writes commit objects that reference the top-level trees

• and the commits that came immediately before tehm

• The blob, the tree, and the commit are initially stored as separate files in your .git/objects directory

Visually

git-scm.com

Git’s HEAD file

• This file is a symbolic reference to the current branch– It is to a reference what a symbolic link is to a file

• It points to the reference that you currently have checked out– Whether it be a branch– Or a tag– Or any other ref

Git Tags

• Tags have their own objects in the database as well• They are very similar to commit objects in structure

– Tagger– Date– Message– Pointer to a head tree

• Tags are basically just branches that never move

Git Remotes

• Stored as another type of reference– Looks like “origin/master”– origin = remote name, master = branch name

• If you add a remote and push to it, then Git will store the value you last pushed to that remote for each branch in the refs/remotes directory

• You can also fetch changes from the remote and have them stored in these references before you merge them

Git Packfiles

• Git saves objects on disk initially in a “loose object format”.

• Sometimes, Git packs up several of these objects into a single binary file called a packfile– In order to save space and improve efficiency

• Git will automatically do this if you have to many loose objects around, if you run git gc, or if you push to a remote server

• Blobs that are pointed to by a commit are put into compressed packfiles; loose blobs remain

Outline


Git in a Nutshell



Undoing in Git

Divergent Workflows

• Oftentimes, when working on a project, development will be done in multiple branches– This is usually done to maintain the integrity of the master

branch (sometimes called the trunk)

• But what should we do when it’s time to bring these changes back to the trunk?

Merging

• This action creates a new merge commit in the target branch which ties together the histories of both branches

• This is a non-destructive operation– The existing branches are not changed in any way

• Unfortunately, this also means that the target branch will have an extraneous commit every time you need to incorporate changes

Visually

www.atlassian.com

Rebasing

• This is a neater, less-safe alternative to merging– $ git checkout feature– $ git rebase master

• This will move the entire feature branch to begin on the tip of the master branch– Rewriting project history to do so!

• Rebasing gives a much cleaner project history than merging does– Don’t have to read through a bunch of merge commits in the

project commit log

Visually

www.atlassian.com

Interactive Rebasing

• You can run rebase interactively by calling it with the –i option

• Otherwise run the command as you normally would• This can also be run, pointing to an arbitrary commit in

the history to rewrite some commits or squash them together

When not to Rebase

• When other people are using the branch!• For example, never rebase master onto a feature branch

if other people are using master– You will introduce new commits before the commits currently in

the master branch– This will be… very confusing for your team mates

• In fact, git won’t even let you push a branch that you’ve rebased in this way…– Normally, you only want to rebase commits you haven’t pushed

yet

Visually – A Bad Rebase

www.atlassian.com

Bullying Git

• You can force Git to take your rebased branches• Simply call git push –f <your branch>• This can cause quite a mess for the rest of your team,

though, so only use this command when you are absolutely sure you know what you are doing– Or if you are working on an independent project or branch

Cleaning up a Feature with Rebase

• Rebase can be used to specify a commit earlier in the feature as well as to specify a parent feature branch

• If you specify an earlier commit in the feature, you can just clean up the preceding commits!– Note that this will not incorporate upstream (from the remote)

changes

• If you want to rewrite the entire feature using this method, you can get the commit ID of the original base with git merge-base feature master.– This returns the base commit that would be used if you were to

call git merge; pass this commit ID to rebase for cleanup

Incorporating Upstream

• Once you pull changes from a branch you’re working on collaboratively, your branch history may look like this:

www.atlassian.com

Incorporating Upstream (with merge)

• You can merge the upstream changes (git pull’s default behavior)

www.atlassian.com

Incorporating Upstream (with rebase)

• Or you can rebase the changes– You can force git to do this with git pull --rebase

www.atlassian.com

Incorporating Upstream (with rebase)

• This doesn’t violate our assumptions about the branch; we’re only moving our local commits– So other people will be able to cleanly and easily incorporate

our changes

• Git won’t allow you to violate these assumptions unless you force it to!

• You’re just taking what you did and moving it to the end of what your friend has already done

Outline


Git in a Nutshell



Undoing in Git

Undoing Changes

• One of the primary uses for Git (or any VCS, really), is the ability to undo changes to the source, or to revert to an older version

• In Git, there are three modes of undoing

Reset, Checkout, and Revert

• When exactly do we use one of these?• Each behaves differently

– The differences are often subtle, however!

• It’s easy to mix up which should be used when…

Scope of Undo

• Note that git reset and git checkout can operate on both full commits and on individual files.– git revert can only operate on full commits

• If you pass a file path (or paths) to git reset or git checkout, then you are limiting what they will affect

• Otherwise these two commands will operate on an entire commit

Recall these three scopes

• The working directory– This just contains the code as you see it

• Staged Snapshot– This is the code you’ve edited that you’re ready to commit to, at

the next git commit

• Commit History– Git’s database of commit snapshots that you can navigate and

add to

Git Reset (Commit Level)

• This moves the tip of the selected branch to a different commit– A good way to throw away changes that haven’t been shared

• The behavior of this command can be affected with flags– --soft : Only move the branch head; don’t update the staged

snapshot or the working directory– --mixed : Update the staged snapshot as well, but not the

working directory (default option)– --hard : Update eth staged snapshot and the working directory.

This is a good way to throw out local changes

• This is most commonly used with the current HEAD

Git Reset and updating staging area

• Git Reset can be used to clear out the staging area or to throw away changes you’ve made in your working directory

• git reset HEAD– Just throw away everything in the staging area

• git reset --hard HEAD– Throw away everything that’s not in the master branch

• Be careful when calling git reset on something other than the HEAD commit– Can run into the same problem as with rebasing (rewriting

history)

Git Reset (File Level)

• When you run this command with a file path, the staged snapshot will be updated to match the version from the specified commit– So it’s a good way to revert changes to a particular file

• This is most commonly used with the HEAD commit– Since there’s no reason to re-commit the current version!

• The reset flags do nothing for the file level operation– The staged snapshot is always updated– The working directory is never updated

Git Checkout (Commit Level)

• We’ve used this a lot, but what does it do?• It behaves a lot like the reset command

– Except it doesn’t move the branch pointers

• Only the HEAD pointer moves!• This is useful for quickly inspecting an old version of your

project, for example• Note that this will put you in a detached HEAD state!

– In other words, commits you make will not have a branch pointing to them!

• Make sure you check out a new branch before adding commits in a detached HEAD state

Git Checkout (File Level)

• git checkout with a file path behaves similarly to git reset, except it updates the working directory

• When used with the HEAD revision, this works similarly to git reset HEAD --hard, except only on specified files

• This can also be used to revert to an old version of a file– git checkout {branch --} <commit> {files}

• If you checkout an old version of a file, you can then stage and commit it to revert to an old version– This effectively throws out all changes since that revision

Git Revert

• Reverting undoes a commit by creating a new commit– A non-destructive operation– This is kind of like the merge operation of undoing

• All this does is create a new commit that does the opposite of a specified commit, to revert it

• It’s a great tool for undoing committed changes– As opposed to git reset, which effectively undoes uncommitted

changes

• This has the potential to overwrite fies in the working directory; you’ll be asked to commit or stash changes that would be lost in the revert operation first

Git Stash

• The stash is a stack in git, which stores your modifications• Running git stash will store your local changes in the

stack, and also reset your local directory to the HEAD revision– As though you had run git reset --hard HEAD

• You can then retrieve your changes with git stash pop• There are also commands to pop a certain stash level, or

delete the whole stash, and so on• See git help stash

CIS 191: Linux and Unix Class 6 March 18, 2015 What’s Git All About, Anyway?

Documents

Transcript of CIS 191: Linux and Unix Class 6 March 18, 2015 What’s Git All About, Anyway?