AI Reddit Sensational Video Summarizer & Shorts Extractor:

Turning Trends into Viral Clips in Google Colab

🧠 The Idea
Reddit is a treasure trove of viral content — from jaw-dropping political debates to hilarious short clips and trending podcasts.

But scrolling through r/all or r/videos to find the moments that actually matter is tedious. Even when you do, manually cutting clips from YouTube takes hours.

I asked myself: what if we could automate it?

➡️ Discover trending posts → locate videos → extract the best moments → make shareable highlight reels — all in one Colab notebook.

That’s how the AI Reddit Sensational Video Summarizer was born — a lightweight, fully automated pipeline that takes raw Reddit trends and turns them into polished, bite-sized videos.

📌 Project Overview

This pipeline does it all:

Scrapes trending Reddit posts from high-signal subreddits.
Searches and downloads YouTube videos linked (or inferred) from posts.
Transcribes videos with OpenAI’s Whisper.
Identifies highlight-worthy segments using AI (Gemini).
Compiles dynamic montages ready for sharing or research.
Archives everything in Google Drive for easy access.

It’s all in Google Colab, requires no paid APIs, and runs on free or pro-tier GPU resources.

🔧 What This Project Does

Scrapes trending Reddit posts from high-activity subreddits like politics, news, videos, and podcasts.
Applies keyword and viral-phrase filtering to find high-signal content (e.g., “slams”, “goes viral”, “full clip”).
Extracts or searches for YouTube video links.
Filters out videos longer than 60 minutes.
Downloads up to 3 clean videos, saves them, and exports associated metadata.
Archives everything to Google Drive for easy access.

🛠️ Tools & Libraries Used

Feature	Tool/Library	Why Use It?
Reddit Scraper	`praw`	Access Reddit posts and metadata easily
YouTube Search	`serpapi`	Find relevant videos via YouTube Search
Video Downloader	`yt-dlp`	Fast, reliable video download tool
Data Handling	`pandas`	Clean and manage Reddit + video data
Cloud Storage	`shutil` + Drive	Store results safely in Google Drive
Runtime	Google Colab	Free GPU and fast prototyping

🔐 Secure API Access

Instead of hardcoding sensitive API keys, I used Python’s getpass module to collect:

Reddit API credentials (client_id, client_secret)
SerpAPI Key (api_key for YouTube search)

import getpass

reddit_api_id = getpass.getpass("Enter Reddit API ID: ")
reddit_api_secret = getpass.getpass("Enter Reddit API Secret: ")
serp_api_key = getpass.getpass("Enter SerpAPI Key: ")

⚙️ Setting Up Reddit

import praw
import torch

reddit = praw.Reddit(
    client_id=reddit_api_id,
    client_secret=reddit_api_secret,
    user_agent="trending-video-finder by /u/your_username"
)

Tip: Always use a unique and descriptive user_agent when working with Reddit’s API.

🤖 Smart Reddit Scraping

We target high-activity, high-signal subreddits like r/politics, r/news, r/videos, and r/podcasts.

A custom Python function queries these subreddits for keywords and viral phrases:

df = get_smart_reddit_trends(
    subreddits=["politics", "news", "videos", "podcasts"],
    keywords=["speech", "interview", "debate", "podcast"],
    signal_keywords=["goes viral", "slams", "clip", "debate"],
    days_back=7,
    limit=50
)

This gives us only high-engagement posts likely to be tied to meaningful or viral YouTube videos.

🔗 Add YouTube Links via SerpAPI (if Missing)

updated_links = []
for _, row in df.iterrows():
    if row.get("youtube_link"):
        updated_links.append(row["youtube_link"])
    else:
        yt_link = search_youtube_via_serpapi(row["title"], serp_api_key)
        updated_links.append(yt_link)
        time.sleep(1.5)

df["final_youtube_link"] = updated_links

This ensures no viral moment gets missed, even if Reddit users only share the title.

We use SerpAPI to search YouTube for video links using the Reddit post titles when no direct link exists.

🎯 Filter and Download Up to 3 Valid Videos (or More)

max_downloads = 3
downloaded_count = 0
filtered_rows = []

for i, row in df.iterrows():
    if downloaded_count >= max_downloads:
        break

    url = row.get("final_youtube_link")
    title = row.get("title", f"video_{i}")

    # Metadata check
    ...
    # Skip videos > 60 mins
    ...
    # Download using yt-dlp
    ...

    downloaded_count += 1

⚠️ Optional: For Age-Restricted or Region-Locked Content

Sometimes YouTube videos are age-restricted, region-locked, or require login.
To handle these, you can use a cookies.txt file.

👉 Only the first 3 valid videos under 60 minutes are downloaded and stored with sanitized filenames.

📄 Note on `cookies.txt` (Optional)

If you want to download age-restricted, region-locked, or logged-in-only YouTube content, you’ll need a cookies.txt file.

"cookiefile": "cookies.txt"

Never share your cookies.txt.

## Archive Videos + Metadata

!zip -r downloads.zip downloads/

df.to_csv("video_metadata.csv", index=False)

This saves the downloaded videos and metadata as downloads.zip and video_metadata.csv.


Save to Google Drive
destination_folder = "/content/drive/MyDrive/sensational_video_of_the_week/3rd_week_of_july"
shutil.copy("downloads.zip", destination_folder)
shutil.copy("video_metadata.csv", destination_folder)

Both files are copied to a specific folder in your Drive for sharing, backup, or post-processing.

✅ Results

After running the pipeline, you get:

Up to 3 viral-ready YouTube videos per Reddit batch.
Clean metadata: subreddit, title, score, link.
Archived videos + transcripts in Google Drive.
Montages ready for social sharing or research.

🚀 Why This Matters

This pipeline is a complete end-to-end content repurposing solution:

Content creators → weekly highlights, Shorts, or Reels.
Educators → searchable lecture clips.
Researchers → curated datasets for NLP or multimodal learning.
Podcast producers → automated show notes + viral snippets.

No hallucination, no tedious manual editing, no hidden costs , just a fully automated AI workflow.

📝 `final_whisper_video_transcription_to_drive`

Transform video content into searchable text with timestamps — all in one seamless Google Colab pipeline.

🔥 Why This Project?

Whether you’re a content creator, researcher, or developer working with video data, one thing is clear:

🎥 Video content is hard to search, analyze, and reuse — unless it’s transcribed.

This Colab notebook offers a complete, no-fluff solution to:

✅ Automatically transcribe multiple videos using OpenAI’s Whisper model.
✅ Generate plain text and timestamped segments.
✅ Save results to Google Drive for long-term storage and use.
✅ All within Google Colab, GPU-accelerated, and beginner-friendly.

🚀 What You’ll Get

🎙 Whisper-powered transcription (GPU-accelerated in Colab)
🕓 Timestamped and plain-text transcripts
📦 Auto-zipping and upload to your Drive
✅ Ideal for podcasts, interviews, lectures, and short-form content

🛠️ Models and Tools Used

Feature	Tool / Library	Purpose
Transcription	`openai-whisper`	State-of-the-art speech-to-text
Video/Audio Handling	`ffmpeg-python`	Formats videos for Whisper
Notebook Environment	Google Colab	Cloud-based, free GPU access
Storage	Google Drive	Persistent file storage
Scripting	`os`, `shutil`, `zipfile`	File operations and archiving

🧩 Key Implementation Steps

1. Mount Google Drive from Previous Step

from google.colab import drive
drive.mount('/content/drive')

2. Install Dependencies

!pip install openai-whisper ffmpeg-python

3. Load the Model & Prepare Paths

import whisper, os, torch

device = "cuda" if torch.cuda.is_available() else "cpu"
model = whisper.load_model("base", device=device)

Loads the **Whisper model**on GPU (if available) for faster transcription.

4. Unzip the Video Files and Load Metadata

with zipfile.ZipFile(zip_path, 'r') as zip_ref:
    zip_ref.extractall(extract_folder)

df = pd.read_csv(csv_path)

Unzips your videos and loads metadata from your Google Drive.

5. Batch Transcribe with Error Handling

for filename in os.listdir(input_folder):
    if filename.endswith(".mp4"):
        result = model.transcribe(video_path)
        json.dump(result["segments"], open(json_path, "w"))
        open(txt_path, "w").write(result["text"])

For each `.mp4`, Whisper generates:  
- a **`.json`** file with timestamped segments  
- a **`.txt`** file with the full transcript

6. Zip the Output for Download & Archive

shutil.make_archive(..., root_dir=transcript_folder)
shutil.copy(zip_path, destination_folder)

📂 Folder Structure on Drive

📂 sensational_video_of_the_week

└── 3rd_week_of_july

├── downloads.zip

├── video_metadata.csv

├── transcripts_plain.zip

└── transcripts_with_segments.zip

Use Cases

🧑‍🏫 Educators: Auto-transcribe lectures and organize notes.
🧑‍💼 Content creators: Convert YouTube Shorts or Reels into searchable assets.
🧪 Researchers: Annotate timestamped audio for NLP tasks.
👩‍🎤 Podcast producers: Generate show notes and SEO content.

✅ Final Thoughts

With just a few lines of code and a powerful open-source model, you’ve automated what used to be hours of manual work.

This pipeline:

Saves time
Ensures accuracy
Gives you full control over your video transcription workflows, all within Google Colab

No API keys. No manual uploads. No hidden costs. Just results.

“What if AI could watch your videos, pick out the most viral moments, and turn them into a shareable highlight reel?”

Well, guess what? We built it. 🤖✨

🌟 What This Project Does

Imagine a world where you can take hours of footage and instantly create engaging, bite-sized video montages ready to go viral. That’s exactly what this project does!

Here’s how it works in a nutshell:

🗂 Load videos and transcripts (plain + Whisper segments)
🧠 Extract viral-worthy moments using Google’s Gemini API
⏱ Align quotes with precise video timestamps
✂️ Trim unnecessary fluff (AI-powered) while keeping the core message intact
🎞 Stitch together clips with dynamic zoom transitions and music
📦 Export everything in a neat .zip file for easy sharing

No hallucination. No fluff. Just real AI doing real work. 🔥

📂 Data Prep: The Power of a Good Foundation

Before the magic can happen, we need to prep the data. Here’s the foundation we build on:

🎥 Original video files
📄 Plaintext transcripts
⏱ Segmented transcripts (with start/end timestamps)
🗂 A metadata CSV (to keep track of titles)

This ensures that everything matches perfectly — even if the filenames are a bit mismatched.

🙏 (Shoutout to difflib.get_close_matches for making it all align!)

💡 Find the Moments That Matter

Next up? Finding the viral moments! 🚀

Using Gemini 1.5 Flash, we sift through the full transcript of each video to identify potential viral quotes. Each quote gets:

🔥 A virality score (1–10)
🗣 The exact quote (no paraphrasing here!)
💭 A brief explanation of why it could go viral

Once we get this data, we use regex to clean and organize it into a structured DataFrame, making it easier to spot the gems. 🌟

⏱ Map Words to Video

Now, the magic starts to unfold. 🎬

We map each quote back to its exact video timestamp. How?

🔍 Direct text lookup against the full transcript
🤖 If no direct match, we use SentenceTransformers to semantically find the moment

No timestamps? No problem. We’ve got that covered. 💪

✂️ Make the Moment Snappy (Without Hallucination)

Here’s the kicker: Gemini doesn’t just trim the fluff; it keeps the message intact. We say:

“Trim the fillers, but don’t change the essence!”

With this, we can:

✂️ Trim the start and end of each quote to cut out unnecessary words
📝 Align everything with the original transcript
🔗 Expand the quotes to full sentence boundaries, ensuring nothing important is lost

The result? Clean, punchy clips that don’t hallucinate or change the message. ✅

🎬 From Grid to Clip — Visual Storytelling

To add the finishing touches:

We create a static grid image from the video’s preview frames.
Then, using zoom transitions, we zoom into each clip, play it, and zoom back out.

The result is a punchy, dynamic feel that’s visually captivating — and, most importantly, it feels human.

🎶 Audio and Transitions: Bringing the Montage to Life

Next, we add the sound magic:

🎤 Voice and background music
🎧 Audio fades and mixing
🔗 Seamless transitions between clips

We do all of this using MoviePy and PIL, with zero fancy dependencies.

It’s simple, effective, and gets the job done. 💥

📤 Packaging the Output

Once everything’s polished and ready to go, we zip up the final video montages and upload them to Google Drive — all set for sharing! 📦

📂 Notebook Name: `final_viral_video_montage_generator`

If you’re looking to automate turning long interviews, podcasts, or other long-form videos into short, shareable moments, this is the notebook for you.

✅ No hallucinated quotes

✅ No manual editing

✅ Just AI-powered storytelling that works