Inside the .git directory

Introduction

The goal of this article is to demistify the .git directory a little bit. It’s not that hard to pick up a basic workflow with git, create a branch, change some files, commit and push, but then people often feel intimidated by git once the task at hand goes beyond a few simple commands.

Note: This article does assume that the reader has some basic familiarity with git already.

Basic orientation

Before diving into it, let’s touch on a few important concepts:

Technically, the repository itself is the contents of the .git directory, but not the files outside it. This may sound counter-inituitive, but the files and directories immediately outside of .git are actually the working directory, which is entirely optional for git. It’s possible to have a git repository with nothing but the .git directory. This is called a bare git repository, and it’s very common for remote repositories on servers to be bare repos.

These are the actual project files, outside of the .git repository. By default, this directory represents the latest commit of the local repository. Once git checkout is issued, it’ll reflect the state of that commit.

The file that tracks the changes relative to the local repository. When something in the working directory is modified, git knows that a change exists, and it is up to the user to issue a git add to move it to this staging area. That’s when these added files are flagged for inclusion in the next commit. Basically, think of it as “the proposed contents of the next commit”.

The .git directory

As established, this directory is the actual repository, so let’s dive into one:

root@linuxpc:/datadisk/git/repo01# (master) ls .git
branches  hooks  info  logs  objects  refs  COMMIT_EDITMSG  config  description  HEAD  index  ORIG_HEAD  packed-refs

Let’s unpack these one by one, in no particular order:

This is a text file that contains configurations for the current repository. This is where a repository can be set to bare, this is where the url to the remote repository can be configured, branch tracking rules (which local branch maps to which remote branch), as well as any local overrides about the user.

This is not an exhaustive list, and there’s a lot of other fun stuff like sparse-checkout (change the working directory from having all tracked files present to having just a subset).

This will either contain the name of a branch, or it will directly contain a commit ID. This entirely depends on whether the user in in a branch (i.e. git status says On branch ) or in a detached HEAD state, pointing directly to a commit.

A small text file where a description of what the repository represents can be given. GUI frontends will be able to make use of this.

The info directly is less commonly used directly, but for example, if the earlier mentioned sparse checkout feature is used, (e.g. via git config core.sparseCheckout true), then a file like sparse-checkout can be created inside it, where files and directories to be excluded can be listed.

This is the first directory with some more significant substance to it. Here scripts that do stuff before/after git does its things can be added.

Git usually comes with a few samples, to give an idea of the type of hooks that are possible:

root@linuxpc:/datadisk/git/repo01# (master) ls .git/hooks
applypatch-msg.sample  fsmonitor-watchman.sample  pre-applypatch.sample  pre-merge-commit.sample    pre-push.sample    pre-receive.sample       update.sample
commit-msg.sample      post-update.sample         pre-commit.sample      prepare-commit-msg.sample  pre-rebase.sample  push-to-checkout.sample

Sidenote on git hooks

There are a few ways I find these useful, I’ll give two brief examples:

#!/bin/bash

SETTING_REGEX='^(enforce_read_only_slaves=).+$'

STAGED_FILES=$(git diff --cached --name-only)

for FILE in $STAGED_FILES; do
    if [[ "$FILE" == *.cnf ]]; then
        if [[ -f "$FILE" ]]; then
            sed -Ei "/${SETTING_REGEX}/d" $FILE
            git add $FILE
            echo "$FILE re-added due to setting removal..."
        fi
    fi
done

exit 0

It pattern matches based on a regex in .cnf files, and adds them again before a commit is made.

#!/bin/bash

OWNER="apache:apache"
DIR_PATH="/etc/httpd/htdocs/<application directory>"

cd $DIR_PATH || exit
unset GIT_DIR
files="$(git diff-tree -r --name-only --no-commit-id HEAD@{1} HEAD)"

for file in $files; do
  [ -e "$file" ] && chown $OWNER "$file"
done

echo "post-merge ownership processing done..."

It basically gets a list of files changed during git pull, and resets their ownership status.

If there’s a hook that is found to be frequently useful across git repositories, there is the option to put the hook into $HOME/.git_remplate/hooks/ and it will be enabled with newly created git repos.

With that note out of the way, let’s continue with the rest of the .git directory:

This directory essentially stores all the repository files, in a file-system based key-value pairing. The direstory is the key, and the contents is the value.

The value can be of four main types: commit, tree, blob and tag.

Each of these is worth talking about on its own merit, as there’s quite a lot to unpack here.

Commit

Let’s check a commit so that we can explore this:

commit 40db5b2c3155e0056d3f2fb38a672ad2eb29bb87 (HEAD -> master, origin/master)
Author: root 
Date:   Sat Oct 4 22:44:13 2025 +0200

    remove redundant explainer

So a commit with hash 40db5b2c3155e0056d3f2fb38a672ad2eb29bb87 was made, we can use git cat-file to check the objects:

root@linuxpc:/datadisk/git/repo01# (master) git cat-file -p 40db5b2c3155e0056d3f2fb38a672ad2eb29bb87
tree 3881df217cd25cf597f86e71ffa8c5c210e65b6b
parent bf3da7913687524f683906bc6cb6420515bc5725
author root  1759610653 +0200
committer root  1759610653 +0200

remove redundant explainer

What this tells us, is that a commit object is just a pointer to a tree object with some additional information, like the author, commit time, commit message, and signature, if there is one. Here, the commit points to the tree 3881df217cd25cf597f86e71ffa8c5c210e65b6b.

Let’s explore this tree.

Tree

root@linuxpc:/datadisk/git/repo01# (master) git cat-file -p 3881df217cd25cf597f86e71ffa8c5c210e65b6b
100644 blob 1b58029ebb69d4119920101f0ef9367c20879815    LICENSE
100644 blob 0c03f81996732db9c2b469b3c37cff1b0591df8b    Makefile
100644 blob 02b7ee29dd0d4f07811e8fd553857c962c86ff16    README
100644 blob f97a69b8497848184d7f1a2dfc39b0116d790e00    compat.h
...

The output lists all the files in the project at the time of commit. That is pretty accurate as to what a tree generally is, a snapshot of the directory listing of the project. If the project contains more directories, then it may itself contain more trees.

What all of this shows us so far, is that a commit isn’t really a patch, or set of changes, but actually a complete snapshot of an entire project at a specific point in time.

Blob

In the above tree listing, we can see multiple “blobs”. Basically, a blob object stores the content of a file without the metadata like size, extension, timestamp, filename, or permissions. The metadata is not stored here, because it’s either already present in the tree, or doesn’t need to be tracked by git, such as creation timestamps.

Looking into a blob, we can basically recover the file at the time of the relevant commit:

root@linuxpc:/datadisk/git/repo01# (master) git cat-file -p 0c03f81996732db9c2b469b3c37cff1b0591df8b | sed 8q
.POSIX:

NAME = 
VERSION = 

# paths
PREFIX = /usr/local
MANPREFIX = ${PREFIX}/man

Tag

Tags are pointers other git objects, most commonly to commits, with additional metadata like annotations, tagger information, and a message.

The main idea of tags is that they are (generally speaking) static. Once a commit is tagged, the label sticks with it forever. This can be good if some deployment process outside of git needs to hook into git, in a way that it can track what happened independently of git internals.

The best way to understand this directory, is to think about what some of the basic git commands actually do for us.

When a new repository is cloned, we know that it results in a local repository with a default master branch, which is linked to a remote master branch, with some remote connection “origin”, and all of the remote branches.

Initially, the working directory of the repo would only contain a copy of the files that are present in the remote master branch. Although the contents of the remote branches are cloned and present in the local repo, their file contents aren’t initially visible.

What this tells us, is that on top of keeping track of local branches, there must be some way to keep track of, and access the contents of remote branches.

This is where references (refs) come into the picture.

As mentioned before, cloning a repo creates an “origin” pointing back to the remote repository. Git stores these remote-tracking branches as references, and updates the config file accordingly.

So basically, we can link all of this back to commands we’re already familiar with:

git branch -> basically list out the contents of ./git/refs/heads/ and its sub-directories

git branch -r -> basically list out the contents of .git/refs/remotes/ and its sub-directories

git tag -l -> .git/refs/tags/

The main takeaway is that in a simplified way, git commands just walk through the contents of those directories to know what branches, tags, and remote branches are available to it.

We can always take a look manually:

root@linuxpc:/datadisk/git/repo01# (master) cat .git/refs/heads/master
40db5b2c3155e0056d3f2fb38a672ad2eb29bb87

What this shows, is that a branch is just a text file containing the commit id currently points to. So, basically, branch -> points to a commit id, which effectively represents the current state of things in that branch. commit -> snapshot of things in the project at a particular moment of time, when a commit is created, a new commit object is added, and the branch reference is updated. Each commit contains a parent, pointing to the previous commit, which is how changes over time can be delved into.