Build Agentic Video RAG with Strands Agents and Containerized Infrastructure


🇻🇪🇨🇱 Dev.to Linkedin GitHub Twitter Instagram Youtube
Linktr

In the previous post “Ask Your Video: Build a Containerized RAG Application for Visual and Audio Analysis“, This post showed how to build a scalable containerized video processing pipeline using AWS Step Functions and Amazon ECS. This post shows how to transform that infrastructure into intelligent agent tools using the Strands Agents framework.

You’ll discover how your existing containerized video processing capabilities become autonomous AI tools that understand natural language requests and execute video analysis workflows.



Why Transform Your Pipeline into Agent Tools?

Your containerized video processing system handles workflows efficiently, but interacting with these capabilities requires technical knowledge of APIs, parameters, and workflows. By integrating with Strands Agents, you can:

  • Interact using natural language with your video processing pipeline
  • Automate multi-step workflows through intelligent decision-making
  • Combine video analysis with other business processes
  • Scale intelligent interactions across your organization



Architecture: From Pipeline to Agent Tools

The transformation builds upon your existing infrastructure:



Your Existing Containerized Pipeline

⚠️⚠️⚠️⚠️Infrastructure Requirement: Deploy the container-video-embeddings CDK stacks before running this notebook. ⚠️⚠️⚠️⚠️



🔄 Processing Workflow

The architecture processes videos through these automated steps:

  1. 📤 Video Upload: Video uploaded to S3 bucket triggers Lambda function
  2. ⚡ Workflow Orchestration: Step Functions initiates parallel processing streams
  3. 🎬 Visual Pipeline: ECS tasks extract frames using containerized FFmpeg → Bedrock generates image embeddings
  4. 🎤 Audio Pipeline: Transcribe converts speech to text → Semantic chunking → Bedrock generates text embeddings
  5. 📊 Vector Storage: Lambda functions store all embeddings in Aurora PostgreSQL with pgvector
  6. 🌐 API Access: API Gateway provides endpoints for search queries and status monitoring



New Agent Integration Layer

  • Strands Agents providing intelligent orchestration
  • Custom tools wrapping your existing APIs
  • Memory capabilities for context-aware interactions
  • Multi-modal reasoning combining visual and audio insights

The agent intelligently orchestrates your containerized infrastructure through automatic workflow management, upload detection, and status monitoring.



Building the Video Processing Agent Tool



Prerequisites

  1. Complete the containerized setup from the previous blog post



Clone & Setup Environment

git clone https://github.com/build-on-aws/langchain-embeddings.git
cd langchain-embeddings/notebooks

# Create virtual environment
python -m venv venv

# Activate environment (macOS/Linux)
source venv/bin/activate
# Or on Windows
# venv\Scripts\activate

# Install requirements
pip install -r requirements.txt
Enter fullscreen mode

Exit fullscreen mode

The notebook 07_video_embeddings_container_with_strands_agents.ipynb demonstrates how to create video_embeddings_container tool that interfaces with your containerized pipeline. This tool provides video processing and retrieval capabilities using deployed AWS infrastructure from the container-video-embeddings CDK stacks.

Features:

  • Upload videos to S3 and trigger automated processing
  • Search video content using semantic similarity
  • List processed videos and their metadata
  • Uses deployed AWS infrastructure (no local processing required)



🔍 Infrastructure Validation

Before starting, validate that the infrastructure exists by running these lines from the notebook:

ssm_client = boto3.client('ssm', region_name=AWS_REGION)
stepfunctions_client = boto3.client('stepfunctions', region_name=AWS_REGION)

try:
    api_endpoint = ssm_client.get_parameter(Name="/videopgvector/api_retrieve", WithDecryption=True)["Parameter"]["Value"]
    bucket_name = ssm_client.get_parameter(Name="/videopgvector/bucket_name", WithDecryption=True)["Parameter"]["Value"]
    state_machine_arn = ssm_client.get_parameter(Name="/videopgvector/state_machine_arn", WithDecryption=True)["Parameter"]["Value"]

    print("✅ Infrastructure parameters found:")
    print(f"🔗 API Endpoint: {api_endpoint}")
    print(f"🪣 S3 Bucket: {bucket_name}")
    print(f"⚙️ State Machine: {state_machine_arn.split(':')[-1]}")

    test_payload = {"query": "test", "method": "retrieve", "k": 1}
    response = requests.post(api_endpoint, json=test_payload, timeout=10)
    print(f"🌐 API Status: {response.status_code} - {'✅ Connected' if response.status_code == 200 else '❌ Error'}")

except Exception as e:
    print(f"❌ Infrastructure check failed: {str(e)}")
    print("💡 Deploy the container-video-embeddings stacks first")
Enter fullscreen mode

Exit fullscreen mode



🧠 Memory-Enhanced Strands Agent

To add S3 memory capabilities for personalized cloud interactions, complete the Amazon S3 Bucket information in this code.

For detailed S3 memory tool explanation, see notebook 06.

from s3_memory import s3_vector_memory

# S3 Vectors Configuration
# S3 Vectors Configuration
os.environ['VECTOR_BUCKET_NAME'] = 'YOUR-S3-BUCKET'  # Your S3 Vector bucket
os.environ['VECTOR_INDEX_NAME'] = 'YOUR-VECTOR-INDEX'        # Your vector index
os.environ['AWS_REGION'] = 'us-east-1'                       # AWS region
os.environ['EMBEDDING_MODEL'] = 'amazon.titan-embed-text-v2:0' # Bedrock embedding model

cloud_memory_agent = Agent(
    model=model,
    tools=[video_embeddings_aws, display_video_images, s3_vector_memory],
    system_prompt=CLOUD_SYSTEM_PROMPT
)

print("✅ Memory-enhanced agent created!") 

Enter fullscreen mode

Exit fullscreen mode



🎬 Video Processing

This notebook uses the same video from notebooks 05 and 06: AWS re:Invent 2024 session on “AI self-service support with knowledge retrieval using PostgreSQL” (YouTube link). The video covers:

  • Vector databases and embeddings for AI applications
  • Amazon Aurora PostgreSQL with pgvector for scalable vector storage
  • RAG (Retrieval Augmented Generation) implementations
  • Amazon Bedrock Agents for intelligent customer support
  • Real-world use cases and technical demonstrations

This content works well for testing our video analysis system with technical presentations that include both visual slides and detailed explanations.



🧪 Test Agents with Natural Language Video Analysis

Interact with your containerized pipeline through natural conversation:

print("🧪 Testing Cloud Video Agent")
print("=" * 50)

response1 = cloud_video_agent(
    "Search for content about 'aurora database scalability' using the cloud infrastructure. Return the top 5 results and explain what you found."
)
print(f"Cloud Agent Response: {response1.message}")
Enter fullscreen mode

Exit fullscreen mode

List Videos:

response2 = cloud_video_agent(
    "List all processed videos in the cloud storage and provide a summary of available content with processing statistics."
)
print(f"\nCloud Storage Summary: {response2.message}")
Enter fullscreen mode

Exit fullscreen mode

Try the memory capabilities:

USER_ID = "cloud_test_user"

response3 = cloud_memory_agent(f"""For user {USER_ID}:
1. Store my interest in cloud architecture and scalable video processing
2. Search the video content for 'production deployment' and 'scalability'
3. Remember key insights about cloud-native video processing
4. Provide a personalized summary based on my cloud architecture interests""")

print(f"\nPersonalized Cloud Analysis: {response3.message}")

Enter fullscreen mode

Exit fullscreen mode

response5 = cloud_memory_agent(f"I am {USER_ID}, Give me a summary of the content of each video")
print(f"\nPersonalized Cloud Analysis: {response5.message}")

Enter fullscreen mode

Exit fullscreen mode



Conclusion

By integrating Strands Agents with your containerized video processing pipeline, you’ve created a system that combines scalable infrastructure with intelligent interaction capabilities. This approach provides:

  • Natural language access to video processing workflows
  • Intelligent orchestration of containerized services
  • Scalable processing with cost-effective resource management
  • Multi-modal analysis combining visual and audio insights

The combination of containerized infrastructure with Strands Agents creates a foundation for advanced video analysis applications that can grow with your needs. Your containerized pipeline now responds to natural language, makes intelligent decisions, and provides comprehensive video analysis capabilities through conversational interfaces.



Ready to create your own Strands agent?

Here are some resources:


Gracias!

🇻🇪🇨🇱 Dev.to Linkedin GitHub Twitter Instagram Youtube
Linktr





Source link

Leave a Reply

Your email address will not be published. Required fields are marked *