Git Exposed, Part 1: Git Isn’t What You Think It Is

AI

After an outcry, OpenAI swiftly rereleased 4o to paid users. But experts say it should not have removed the model so suddenly.

OpenAI’s decision to replace 4o with the more straightforward GPT-5 follows a steady drumbeat of news about the potentially harmful effects of extensive chatbot use. Reports of incidents in which ChatGPT sparked psychosis in users have been everywhere for the past few months, and in a blog post last week, OpenAI acknowledged 4o’s failure to…

AI

‘Cheapfake’ AI Celeb Videos Are Rage-Baiting People on YouTube

“They’re tweaking my voice or whatever they’re doing, tweaking their own voice to make it sound like me, and people are commenting on it like it is me and it ain’t me,” Washington recently told WIRED, when asked about AI. “I don’t have an Instagram account. I don’t have TikTok. I don’t have any of…

AI

GPT-5 Doesn’t Dislike You—It Might Just Need a Benchmark for Emotional Intelligence

Since the all-new ChatGPT launched on Thursday, some users have mourned the disappearance of a peppy and encouraging personality in favor of a colder, more businesslike one (a move seemingly designed to reduce unhealthy user behavior.) The backlash shows the challenge of building artificial intelligence systems that exhibit anything like real emotional intelligence. Researchers at…

AI

OpenAI Designed GPT-5 to Be Safer. It Still Outputs Gay Slurs

OpenAI is trying to make its chatbot less annoying with the release of GPT-5. And I’m not talking about adjustments to its synthetic personality that many users have complained about. Before GPT-5, if the AI tool determined it couldn’t answer your prompt because the request violated OpenAI’s content guidelines, it would hit you with a…

Initially when I first used Git and Github, like others I too thought Git and Github are directly proportional. Git is a local version and Github is a remote version which just passes our files from vscode to Github, so that I don’t have to worry about my storage, backup and data.

BUT I WAS WRONG & MANY PEOPLE TILL DATE TAKE IT WRONG.

Then what is Git?

git help git

Yes, this are the words of Linus Torvalds. He designed it from a filesystem perspective, Git is a “stupid content tracker”. It’s not like traditional SCMs but more like a filesystem that is “content addressable” and has versioning.

“I really really designed it coming at the problem from the viewpoint of a filesystem person… and I actually have absolutely zero interest in creating a traditional SCM system.”
~ Linus Torvalds

Why do you think he compared it with SCM and how is it different?
Let’s have a look (Honestly I haven’t used any SCM’s and but this is what I have read about them)

Why he called it as a Content Addressable Filesystem because unlike other SCM’s git doesn’t care about filenames, dates, or relation between files. It just stores and retrieves data from the hashcode it generates for the commits.

Git’s Characteristics are:

You give it content, it stores it using a unique hash and 
you can address the files using the hashcode.

eg: you store: "Arma vs Code"
    Hascode: a1b2c3d4...

//Later if you ask using the hashcode
    you ask: a1b2c3d4
    git returns: "Arma vs Code"

So now can we say that Git is a storage system where content is addressed and stored by its own fingerprints?

This is how the Git folder structure look like when you initialize in any directory.

This is how it looks when you commit your work

Now you may think, what are these files and folders why do we need all of them?

*let’s see how a single commit travels from our codebase to git’s database: *

What happens when it takes a snapshot? And we haven’t seen how exactly its storing in the database, so let’s have a look at that:

STEP 1: Blob Objects

A blob is Git’s way of storing the CONTENT of a file, without any metadata like filename, permissions, or directory structure.

blob helps to manage space efficiently, like if you have same file in multiple place git stores it only once.

STEP 2: Tree Objects

Imagine Git only had blobs, then it would something like this

Blob_123: “function hello() {“
Blob_456: ” return ‘world’;”
Blob_789: “}”

But you’d have NO WAY to know:
Which files these belong to, how they’re organized in directories and what the project structure looks like

Trees solve this by creating a “map” of your project. Trees are why Git can track your entire project structure, not just the file.

Without trees: “I have file content X, Y, Z”
With trees: “At 3:15 PM, my project looked exactly like THIS”

Think of a tree object like a folder contents listing or a table of contents:

NEW: blob for main.js (new content)
NEW: tree for src/ (points to new main.py blob, same utils.py blob)  
REUSE: tree for docs/ (unchanged)
REUSE: blob for server.js (unchanged)
REUSE: blob for README.md (unchanged) 
REUSE: blob for config.json (unchanged)
NEW: root tree (points to new src/ tree, same docs/ tree)

Trees helps to compare faster. Only changed objects are created! Unchanged parts are reused which is known for it’s snapshot based storage. A tree object contains a list of entries, each with:

Trees reveal how Git thinks about your project as a hierarchical structure of content, not just collection of files!

This is why Linus designed Git this way: it’s fundamentally a filesystem that takes complete snapshots, not a version control system that tracks changes. The tree object is what makes this snapshot approach possible and efficient.

Step 3: Creating the Commit Object

What is Commit?

Commit is nothing but taking a picture of your content, which is also called as snapshot

This is where Git creates the actual commit that will become part of your project’s history. Let me break down exactly what happens:

What Gets Stored in the Commit Object
The commit object is a plain text file that contains:

tree 8f4a8b8e7d8e8f8a8b8e7d8e8f8a8b8e7d8e8f8a
parent a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2
author Arma Sahar  1698765432 +0000
committer Arma Sahar  1698765432 +0000

The commit object is the crucial piece that ties your snapshot to the project’s history and provides all the metadata needed to understand who did what and when.

Now that we have all the details ready, Git needs to know where to attach the new commit in your project’s history. That’s when it needs to update Head

Step 4: Update HEAD

But what is Head?
HEAD is Git’s way of answering the question: “Where am I right now in my project’s history?”

What does Head contains?

This means I am currently I am on my main branch, it contains the location I am at.

Git uses the HEAD to determine where to record the new commit. If HEAD points to a branch, then that branch is updated to point to the new commit. If HEAD is detached, then HEAD itself is updated to the new commit.

STEP 5: Update branch reference

In Git, a branch is essentially a pointer to a commit. The branch reference is stored in the .git/refs/heads/ directory as a file named after the branch.

What “Update Branch Reference” Means
When we “update the branch reference,” we’re simply changing the content of that text file to point to a new commit.

If Git created the commit but didn’t update the branch reference:

The commit would exist in the database -> But no branch would point to it -> It would be “lost” (hard to find) in the object database -> git log wouldn’t show it -> It could be garbage collected eventually.

At last Index is updated and it reset its area for the next commits. This is why after a commit, your staging area is clean and ready for the next set of changes you want to prepare for committing!

Git succeeded not because it was designed as version control, but precisely because it wasn’t.

Source link