three +1 different data representations

I’m currently implementing the git status command that compares different states of the workspace.There are three places hold some information about the files in storage (not in memory).

  1. The actual files of the current version

  2. the object database that is saved under .git/objects

    The type of objects stored are Blobs, Trees and Commit. These objects are compressed with the deflate algorithm. So far, I’ve noticed that the database does not hold FileStat data. I see it as a giant HashMap with the sha1 hash as the key and the contents as the value. The tree is interchangeable with directories, which contains a list of other trees or blobs. The recursive data structure was a bit tricky using Rust.

  3. staged files in the Index file

    These are the files that are “staged” area, when we do the git add command. This does not hold any content of the data and just a list of entites.

On top of this, the program defines models for the corresponding and tucks the information in memory while executing commands. The git status runs two comparisons, the first is comparing Files(1) and the Index(3) and the second, the Database(2) and Index(3). It was a bit confusing since these different formats of incorporates different models and holds different data.

models

1. Files2. Database3. Index
BlobYesYesYes
Tree (directory)YesYesNo
CommitNoYesNo

data

1. Files2. Database3. Index
contents of the dataYesYesNo
pathYesOnly the childs file name1Yes
file statYesNoYes
sha1 hashNoYesYes

  1. the last component of the path, it’s the directory’s name if it’s a folder. [return]