🇻🇪🇨🇱 Dev.to Linkedin GitHub Twitter Instagram Youtube
Linktr
In the previous post “Ask Your Video: Build a Containerized RAG Application for Visual and Audio Analysis“, This post showed how to build a scalable containerized video processing pipeline using AWS Step Functions and Amazon ECS. This post shows how to transform that infrastructure into intelligent agent tools using the Strands Agents framework.
You’ll discover how your existing containerized video processing capabilities become autonomous AI tools that understand natural language requests and execute video analysis workflows.
Why Transform Your Pipeline into Agent Tools?
Your containerized video processing system handles workflows efficiently, but interacting with these capabilities requires technical knowledge of APIs, parameters, and workflows. By integrating with Strands Agents, you can:
- Interact using natural language with your video processing pipeline
- Automate multi-step workflows through intelligent decision-making
- Combine video analysis with other business processes
- Scale intelligent interactions across your organization
Architecture: From Pipeline to Agent Tools
The transformation builds upon your existing infrastructure:
Your Existing Containerized Pipeline
⚠️⚠️⚠️⚠️Infrastructure Requirement: Deploy the container-video-embeddings CDK stacks before running this notebook. ⚠️⚠️⚠️⚠️
🔄 Processing Workflow
The architecture processes videos through these automated steps:
- 📤 Video Upload: Video uploaded to S3 bucket triggers Lambda function
- ⚡ Workflow Orchestration: Step Functions initiates parallel processing streams
- 🎬 Visual Pipeline: ECS tasks extract frames using containerized FFmpeg → Bedrock generates image embeddings
- 🎤 Audio Pipeline: Transcribe converts speech to text → Semantic chunking → Bedrock generates text embeddings
- 📊 Vector Storage: Lambda functions store all embeddings in Aurora PostgreSQL with pgvector
- 🌐 API Access: API Gateway provides endpoints for search queries and status monitoring
New Agent Integration Layer
- Strands Agents providing intelligent orchestration
- Custom tools wrapping your existing APIs
- Memory capabilities for context-aware interactions
- Multi-modal reasoning combining visual and audio insights
The agent intelligently orchestrates your containerized infrastructure through automatic workflow management, upload detection, and status monitoring.
Building the Video Processing Agent Tool
Prerequisites
- Complete the containerized setup from the previous blog post
Clone & Setup Environment
git clone https://github.com/build-on-aws/langchain-embeddings.git
cd langchain-embeddings/notebooks
# Create virtual environment
python -m venv venv
# Activate environment (macOS/Linux)
source venv/bin/activate
# Or on Windows
# venv\Scripts\activate
# Install requirements
pip install -r requirements.txt
The notebook 07_video_embeddings_container_with_strands_agents.ipynb
demonstrates how to create video_embeddings_container tool that interfaces with your containerized pipeline. This tool provides video processing and retrieval capabilities using deployed AWS infrastructure from the container-video-embeddings CDK stacks.
Features:
- Upload videos to S3 and trigger automated processing
- Search video content using semantic similarity
- List processed videos and their metadata
- Uses deployed AWS infrastructure (no local processing required)
🔍 Infrastructure Validation
Before starting, validate that the infrastructure exists by running these lines from the notebook:
ssm_client = boto3.client('ssm', region_name=AWS_REGION)
stepfunctions_client = boto3.client('stepfunctions', region_name=AWS_REGION)
try:
api_endpoint = ssm_client.get_parameter(Name="/videopgvector/api_retrieve", WithDecryption=True)["Parameter"]["Value"]
bucket_name = ssm_client.get_parameter(Name="/videopgvector/bucket_name", WithDecryption=True)["Parameter"]["Value"]
state_machine_arn = ssm_client.get_parameter(Name="/videopgvector/state_machine_arn", WithDecryption=True)["Parameter"]["Value"]
print("✅ Infrastructure parameters found:")
print(f"🔗 API Endpoint: {api_endpoint}")
print(f"🪣 S3 Bucket: {bucket_name}")
print(f"⚙️ State Machine: {state_machine_arn.split(':')[-1]}")
test_payload = {"query": "test", "method": "retrieve", "k": 1}
response = requests.post(api_endpoint, json=test_payload, timeout=10)
print(f"🌐 API Status: {response.status_code} - {'✅ Connected' if response.status_code == 200 else '❌ Error'}")
except Exception as e:
print(f"❌ Infrastructure check failed: {str(e)}")
print("💡 Deploy the container-video-embeddings stacks first")
🧠 Memory-Enhanced Strands Agent
To add S3 memory capabilities for personalized cloud interactions, complete the Amazon S3 Bucket information in this code.
For detailed S3 memory tool explanation, see notebook 06.
from s3_memory import s3_vector_memory
# S3 Vectors Configuration
# S3 Vectors Configuration
os.environ['VECTOR_BUCKET_NAME'] = 'YOUR-S3-BUCKET' # Your S3 Vector bucket
os.environ['VECTOR_INDEX_NAME'] = 'YOUR-VECTOR-INDEX' # Your vector index
os.environ['AWS_REGION'] = 'us-east-1' # AWS region
os.environ['EMBEDDING_MODEL'] = 'amazon.titan-embed-text-v2:0' # Bedrock embedding model
cloud_memory_agent = Agent(
model=model,
tools=[video_embeddings_aws, display_video_images, s3_vector_memory],
system_prompt=CLOUD_SYSTEM_PROMPT
)
print("✅ Memory-enhanced agent created!")
🎬 Video Processing
This notebook uses the same video from notebooks 05 and 06: AWS re:Invent 2024 session on “AI self-service support with knowledge retrieval using PostgreSQL” (YouTube link). The video covers:
- Vector databases and embeddings for AI applications
- Amazon Aurora PostgreSQL with pgvector for scalable vector storage
- RAG (Retrieval Augmented Generation) implementations
- Amazon Bedrock Agents for intelligent customer support
- Real-world use cases and technical demonstrations
This content works well for testing our video analysis system with technical presentations that include both visual slides and detailed explanations.
🧪 Test Agents with Natural Language Video Analysis
Interact with your containerized pipeline through natural conversation:
print("🧪 Testing Cloud Video Agent")
print("=" * 50)
response1 = cloud_video_agent(
"Search for content about 'aurora database scalability' using the cloud infrastructure. Return the top 5 results and explain what you found."
)
print(f"Cloud Agent Response: {response1.message}")
List Videos:
response2 = cloud_video_agent(
"List all processed videos in the cloud storage and provide a summary of available content with processing statistics."
)
print(f"\nCloud Storage Summary: {response2.message}")
Try the memory capabilities:
USER_ID = "cloud_test_user"
response3 = cloud_memory_agent(f"""For user {USER_ID}:
1. Store my interest in cloud architecture and scalable video processing
2. Search the video content for 'production deployment' and 'scalability'
3. Remember key insights about cloud-native video processing
4. Provide a personalized summary based on my cloud architecture interests""")
print(f"\nPersonalized Cloud Analysis: {response3.message}")
response5 = cloud_memory_agent(f"I am {USER_ID}, Give me a summary of the content of each video")
print(f"\nPersonalized Cloud Analysis: {response5.message}")
Conclusion
By integrating Strands Agents with your containerized video processing pipeline, you’ve created a system that combines scalable infrastructure with intelligent interaction capabilities. This approach provides:
- Natural language access to video processing workflows
- Intelligent orchestration of containerized services
- Scalable processing with cost-effective resource management
- Multi-modal analysis combining visual and audio insights
The combination of containerized infrastructure with Strands Agents creates a foundation for advanced video analysis applications that can grow with your needs. Your containerized pipeline now responds to natural language, makes intelligent decisions, and provides comprehensive video analysis capabilities through conversational interfaces.
Ready to create your own Strands agent?
Here are some resources:
Gracias!