The internals of git
-
Upload
konstantin-nazarov -
Category
Technology
-
view
56 -
download
1
Transcript of The internals of git
It’s Like A Filesystem
In many ways you can just see git as a filesystem — it is content-addressable, and it has a notion of versioning.
When you put a file in git, it compresses the data, puts it to objects/
and names as the hash of the original data.
The following section is in shell, so you may try it
yourself, just remember to use your actual SHA1 values,
not mine.
# write a simple file to the object database$ echo 'homer' | git hash-object -w --stdin4aa0bfa07f1680c50a1567ecc37bc3b6aa567b8f
# -w means actually write the data. not just hash it.
# write a simple file to the object database$ echo 'homer' | git hash-object -w --stdin4aa0bfa07f1680c50a1567ecc37bc3b6aa567b8f
# -w means actually write the data. not just hash it.
$ find .git/objects -type f.git/objects/4a/a0bfa07f1680c50a1567ecc37bc3b6aa567b8f
# write a simple file to the object database$ echo 'homer' | git hash-object -w --stdin4aa0bfa07f1680c50a1567ecc37bc3b6aa567b8f
# -w means actually write the data. not just hash it.
$ find .git/objects -type f.git/objects/4a/a0bfa07f1680c50a1567ecc37bc3b6aa567b8f
$ git cat-file -p 4aa0bhomer
$ git cat-file -t 4aa0bblob
$ python>>> import zlib>>> f = open('.git/objects/4a/a0bfa07f1680c50a1567ecc37bc3b6aa567b8f')>>> print zlib.decompress(f.read())blob 6homer
# Create directory structure$ mkdir foo$ echo "test" > foo/bar$ echo "test2" > baz
# This is just a way to create the tree$ git update-index --add foo/bar baz$ git write-treeaf6c7364afaa4488d8c6edd44306b91b20dcba93
# Create directory structure$ mkdir foo$ echo "test" > foo/bar$ echo "test2" > baz
# This is just a way to create the tree$ git update-index --add foo/bar baz$ git write-treeaf6c7364afaa4488d8c6edd44306b91b20dcba93
# This is how plain tree file looks like$ git cat-file -p af6c7100644 blob 180cf8328022becee9aaa2577a8f84ea2b9f3827 baz100644 blob 4200aa606ead5dd5777a0b391f085cc4f4690d04 bigfile.dat040000 tree 701ce0a12c61f997c092d30121a256d17144766a foo
# Create directory structure$ mkdir foo$ echo "test" > foo/bar$ echo "test2" > baz
# This is just a way to create the tree$ git update-index --add foo/bar baz$ git write-treeaf6c7364afaa4488d8c6edd44306b91b20dcba93
# This is how plain tree file looks like$ git cat-file -p af6c7100644 blob 180cf8328022becee9aaa2577a8f84ea2b9f3827 baz100644 blob 4200aa606ead5dd5777a0b391f085cc4f4690d04 bigfile.dat040000 tree 701ce0a12c61f997c092d30121a256d17144766a foo
# And the child tree$ git cat-file -p 701ce0100644 blob 9daeafb9864cf43055ae93beb0afd6c7d144bfa4 bar
# Create directory structure$ mkdir foo$ echo "test" > foo/bar$ echo "test2" > baz
# This is just a way to create the tree$ git update-index --add foo/bar baz$ git write-treeaf6c7364afaa4488d8c6edd44306b91b20dcba93
# This is how plain tree file looks like$ git cat-file -p af6c7100644 blob 180cf8328022becee9aaa2577a8f84ea2b9f3827 baz100644 blob 4200aa606ead5dd5777a0b391f085cc4f4690d04bigfile.dat040000 tree 701ce0a12c61f997c092d30121a256d17144766a foo
# And the child tree$ git cat-file -p 701ce0100644 blob 9daeafb9864cf43055ae93beb0afd6c7d144bfa4 bar
# And the data file$ git cat-file -p 9daeatest
In general, the plain tree structure is like this:
# format:tree [content size]\0[mode] [file/folder name]\0[SHA-1 of referencing blob or tree]...[mode] [file/folder name]\0[SHA-1 of referencing blob or tree]
Let’s try the same trick with python.
Since some data is binary, I’ve done a bit of pretty-printing.
$ python>> import zlib>> f = open('.git/objects/46/c826e9c8119915961f6acb01f6f842fb1e444a')>> d = zlib.decompress(f.read())>> (head, _, tail) = d.replace('\x00', '\n', 1).partition('\n')>>> print head>>> while tail:... pos = tail.find('\x00')... print tail[:pos] + " " + ''.join(x.encode('hex') for x in tail[pos+1:pos+21])... tail = tail[pos+21:]...
Result:tree 100100644 baz df6b0d2bcc76e6ec0fca20c227104a4f28bac41b100644 bigfile.dat 4200aa606ead5dd5777a0b391f085cc4f4690d0440000 foo 701ce0a12c61f997c092d30121a256d17144766a
# Get the last tree we've created$ git write-tree46c826e9c8119915961f6acb01f6f842fb1e444a
# actually do the commit$ echo '1st commit' | git commit-tree 46c82afa322a9790619a18ec6e751469008551b3a5c77
# Get the last tree we've created$ git write-tree46c826e9c8119915961f6acb01f6f842fb1e444a
# actually do the commit$ echo '1st commit' | git commit-tree 46c82afa322a9790619a18ec6e751469008551b3a5c77
# and read back the raw commit file$ git cat-file -p afa32tree 46c826e9c8119915961f6acb01f6f842fb1e444aauthor Konstantin Nazarov <[email protected]> 1421934034 +0300committer Konstantin Nazarov <[email protected]> 1421934034 +0300
1st commit
# change the tree$ echo "test4" >baz$ git update-index --add baz$ git write-treefb74bbb3f99afed23612d2f03e5cd80775bd2f8a
# change the tree$ echo "test4" >baz$ git update-index --add baz$ git write-treefb74bbb3f99afed23612d2f03e5cd80775bd2f8a
# commit it (also specify a parent)$ echo '2nd commit' | git commit-tree fb74b -p afa32224dde75daa6879629304840aa1fd3a76187aaba
# change the tree$ echo "test4" >baz$ git update-index --add baz$ git write-treefb74bbb3f99afed23612d2f03e5cd80775bd2f8a
# commit it (also specify a parent)$ echo '2nd commit' | git commit-tree fb74b -p afa32224dde75daa6879629304840aa1fd3a76187aaba
# see how it's changed$ git cat-file -p 224ddtree fb74bbb3f99afed23612d2f03e5cd80775bd2f8aparent afa322a9790619a18ec6e751469008551b3a5c77author Konstantin Nazarov <[email protected]> 1421934840 +0300committer Konstantin Nazarov <[email protected]> 1421934840 +0300
2nd commit
$ python>>> import zlib>>> f = open('.git/objects/af/a322a9790619a18ec6e751469008551b3a5c77')>>> d = zlib.decompress(f.read())>>> print d.replace('\x00', ‘\n')
Result:commit 197tree 46c826e9c8119915961f6acb01f6f842fb1e444aauthor Konstantin Nazarov <[email protected]> 1421934034 +0300committer Konstantin Nazarov <[email protected]> 1421934034 +0300
1st commit
Just references to the top commit!
$ cat .git/refs/heads/master6566bfcd3a111ea6a1cf594301c39c7c4b1baf3c
$ git cat-file -t 6566bfcommit