Initially when I first used Git and Github, like others I too thought Git and Github are directly proportional. Git is a local version and Github is a remote version which just passes our files from vscode to Github, so that I don’t have to worry about my storage, backup and data.
BUT I WAS WRONG & MANY PEOPLE TILL DATE TAKE IT WRONG.
Then what is Git?
git help git
Yes, this are the words of Linus Torvalds. He designed it from a filesystem perspective, Git is a “stupid content tracker”. It’s not like traditional SCMs but more like a filesystem that is “content addressable” and has versioning.
“I really really designed it coming at the problem from the viewpoint of a filesystem person… and I actually have absolutely zero interest in creating a traditional SCM system.”
~ Linus Torvalds
Why do you think he compared it with SCM and how is it different?
Let’s have a look (Honestly I haven’t used any SCM’s and but this is what I have read about them)
Why he called it as a Content Addressable Filesystem because unlike other SCM’s git doesn’t care about filenames, dates, or relation between files. It just stores and retrieves data from the hashcode it generates for the commits.
Git’s Characteristics are:
You give it content, it stores it using a unique hash and
you can address the files using the hashcode.
eg: you store: "Arma vs Code"
Hascode: a1b2c3d4...
//Later if you ask using the hashcode
you ask: a1b2c3d4
git returns: "Arma vs Code"
So now can we say that Git is a storage system where content is addressed and stored by its own fingerprints?
This is how the Git folder structure look like when you initialize in any directory.
This is how it looks when you commit your work
Now you may think, what are these files and folders why do we need all of them?
*let’s see how a single commit travels from our codebase to git’s database: *
What happens when it takes a snapshot? And we haven’t seen how exactly its storing in the database, so let’s have a look at that:
STEP 1: Blob Objects
A blob is Git’s way of storing the CONTENT of a file, without any metadata like filename, permissions, or directory structure.
blob helps to manage space efficiently, like if you have same file in multiple place git stores it only once.
STEP 2: Tree Objects
Imagine Git only had blobs, then it would something like this
Blob_123: “function hello() {“
Blob_456: ” return ‘world’;”
Blob_789: “}”
But you’d have NO WAY to know:
Which files these belong to, how they’re organized in directories and what the project structure looks like
Trees solve this by creating a “map” of your project. Trees are why Git can track your entire project structure, not just the file.
Without trees: “I have file content X, Y, Z”
With trees: “At 3:15 PM, my project looked exactly like THIS”
Think of a tree object like a folder contents listing or a table of contents:
NEW: blob for main.js (new content)
NEW: tree for src/ (points to new main.py blob, same utils.py blob)
REUSE: tree for docs/ (unchanged)
REUSE: blob for server.js (unchanged)
REUSE: blob for README.md (unchanged)
REUSE: blob for config.json (unchanged)
NEW: root tree (points to new src/ tree, same docs/ tree)
Trees helps to compare faster. Only changed objects are created! Unchanged parts are reused which is known for it’s snapshot based storage. A tree object contains a list of entries, each with:
Trees reveal how Git thinks about your project as a hierarchical structure of content, not just collection of files!
This is why Linus designed Git this way: it’s fundamentally a filesystem that takes complete snapshots, not a version control system that tracks changes. The tree object is what makes this snapshot approach possible and efficient.
Step 3: Creating the Commit Object
What is Commit?
Commit is nothing but taking a picture of your content, which is also called as snapshot
This is where Git creates the actual commit that will become part of your project’s history. Let me break down exactly what happens:
What Gets Stored in the Commit Object
The commit object is a plain text file that contains:
tree 8f4a8b8e7d8e8f8a8b8e7d8e8f8a8b8e7d8e8f8a
parent a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2
author Arma Sahar 1698765432 +0000
committer Arma Sahar 1698765432 +0000
The commit object is the crucial piece that ties your snapshot to the project’s history and provides all the metadata needed to understand who did what and when.
Now that we have all the details ready, Git needs to know where to attach the new commit in your project’s history. That’s when it needs to update Head
Step 4: Update HEAD
But what is Head?
HEAD is Git’s way of answering the question: “Where am I right now in my project’s history?”
What does Head contains?
This means I am currently I am on my main branch, it contains the location I am at.
Git uses the HEAD to determine where to record the new commit. If HEAD points to a branch, then that branch is updated to point to the new commit. If HEAD is detached, then HEAD itself is updated to the new commit.
STEP 5: Update branch reference
In Git, a branch is essentially a pointer to a commit. The branch reference is stored in the .git/refs/heads/
directory as a file named after the branch.
What “Update Branch Reference” Means
When we “update the branch reference,” we’re simply changing the content of that text file to point to a new commit.
If Git created the commit but didn’t update the branch reference:
The commit would exist in the database -> But no branch would point to it -> It would be “lost” (hard to find) in the object database -> git log wouldn’t show it -> It could be garbage collected eventually.
At last Index is updated and it reset its area for the next commits. This is why after a commit, your staging area is clean and ready for the next set of changes you want to prepare for committing!
Git succeeded not because it was designed as version control, but precisely because it wasn’t.