Mastering Email Rate Limits — A Deep Dive into Resend API and Cloud Run Debugging

How we solved persistent rate limiting issues and implemented robust batch email functionality for our AI-powered learning platform

The Challenge: When Emails Start Failing

Picture this: You’ve built a beautiful lesson scheduling system that sends personalized daily lessons to your users. Everything works perfectly in development, but in production, you start seeing mysterious email failures:

This was exactly the situation we faced with DailyMastery, our AI-powered learning platform. Despite having a solid architecture built with Nx monorepo, Fastify microservices, and deployed on Google Cloud Run, our email delivery system was hitting Resend’s rate limits consistently.

Understanding the Problem

The Initial Architecture

Our system had a straightforward approach:

Lesson scheduler triggered by Cloud Scheduler (3x daily)
Individual API calls to Resend for each lesson email
No rate limiting protection
Basic error handling

// The problematic approach
for (const lesson of lessons) {
  const emailResult = await this.emailService.sendEmail({
    to: user.email,
    subject: lesson.title,
    html: lessonContent,
  })
  // Rate limits hit here when processing multiple users rapidly
}

The Root Cause

Resend enforces a 2 requests per second rate limit across all endpoints. When our scheduler processed multiple users with multiple lessons each, we quickly exceeded this limit, causing cascading failures.

The Investigation: Debugging Cloud Run Like a Pro

Step 1: Enhanced Logging Strategy

The first step was implementing comprehensive logging to understand exactly what was happening:

async sendEmail(options: EmailOptions): Promise<{ success: boolean; error?: string; messageId?: string }> {
  try {
    console.log('Attempting to send email via Resend:', {
      to: options.to,
      subject: options.subject,
      htmlLength: options.html?.length || 0
    });

    const result = await this.resend.emails.send(emailData);
    console.log('Resend API response:', result);

    // Critical: Check if Resend returned an error (they don't throw exceptions!)
    if (result.error) {
      console.error('Resend API returned error:', result.error);
      return {
        success: false,
        error: result.error.message || JSON.stringify(result.error),
      };
    }

    return {
      success: true,
      messageId: result.data?.id
    };
  } catch (error) {
    console.error('Failed to send email via Resend:', error);
    return {
      success: false,
      error: error instanceof Error ? error.message : 'Unknown error',
    };
  }
}

Step 2: Cloud Run Log Analysis Techniques

You can view logs in the Google Cloud console, but if you like the terminal approach, then here are the essential Cloud Run debugging commands we used:

# Real-time log monitoring during issues
gcloud logging read "resource.type=cloud_run_revision AND resource.labels.service_name=lesson-scheduler" \
  --limit=50 --format="value(timestamp,textPayload)" | head -20

# Filter for specific errors
gcloud logging read "resource.type=cloud_run_revision AND resource.labels.service_name=lesson-scheduler" \
  --limit=100 | grep -E "(rate_limit|429|Failed)"

# Check logs from specific time range
gcloud logging read "resource.type=cloud_run_revision AND resource.labels.service_name=lesson-scheduler AND timestamp>=\"2025-09-03T16:00:00Z\"" \
  --limit=50

The Solution: Multi-Layer Rate Limiting Strategy

Layer 1: Proper Error Detection

The first critical insight: Resend doesn’t throw exceptions for rate limits. They return error objects:

// Before: Missing rate limit detection
const result = await this.resend.emails.send(emailData)
return { success: true, messageId: result.data?.id } // Wrong!

// After: Comprehensive error checking
const result = await this.resend.emails.send(emailData)

if (result.error) {
  console.error('Resend API returned error:', result.error)
  return {
    success: false,
    error: result.error.message || JSON.stringify(result.error),
  }
}

if (!result.data) {
  return {
    success: false,
    error: 'No data returned from Resend API',
  }
}

return {
  success: true,
  messageId: result.data.id,
}

Layer 2: Intelligent Batch Email Implementation with Auto-Chunking

Resend offers a batch API that can send up to 100 emails in a single request. Here’s our implementation with smart fallback and automatic chunking for larger batches:

async sendBatchLessonEmails(
  emails: BatchEmailOptions[]
): Promise<{ success: boolean; error?: string; messageIds?: string[]; sentCount?: number; failedCount?: number }> {
  try {
    if (emails.length === 0) {
      return { success: true, messageIds: [], sentCount: 0, failedCount: 0 };
    }

    console.log(`Sending batch of ${emails.length} lesson emails`);

    // Automatic chunking: Split into batches of 100 (Resend limit)
    const batchSize = 100;
    const batches: BatchEmailOptions[][] = [];
    for (let i = 0; i < emails.length; i += batchSize) {
      batches.push(emails.slice(i, i + batchSize));
    }

    let totalSent = 0;
    let totalFailed = 0;
    const allMessageIds: string[] = [];
    const errors: string[] = [];

    // Process each chunk sequentially with rate limiting
    for (let i = 0; i < batches.length; i++) {
      const batch = batches[i];
      console.log(`Sending batch ${i + 1}/${batches.length} with ${batch.length} emails`);

      const result = await this.emailService.sendBatchEmails(batch);

      if (result.success) {
        totalSent += batch.length;
        if (result.messageIds) {
          allMessageIds.push(...result.messageIds);
        }
        console.log(`✅ Batch ${i + 1} sent successfully (${batch.length} emails)`);
      } else {
        totalFailed += batch.length;
        errors.push(`Batch ${i + 1} failed: ${result.error || 'Unknown error'}`);
        console.error(`❌ Batch ${i + 1} failed: ${result.error}`);
      }

      // Add delay between batches to avoid rate limiting
      if (i < batches.length - 1) {
        console.log('⏳ Waiting 1 second between batches to avoid rate limiting...');
        await new Promise(resolve => setTimeout(resolve, 1000));
      }
    }

    const overallSuccess = totalFailed === 0;

    return {
      success: overallSuccess,
      messageIds: allMessageIds,
      sentCount: totalSent,
      failedCount: totalFailed,
      error: errors.length > 0 ? errors.join('; ') : undefined,
    };
  } catch (error) {
    console.error('Error sending batch lesson emails:', error);
    return {
      success: false,
      error: error instanceof Error ? error.message : 'Unknown error',
      sentCount: 0,
      failedCount: emails.length,
    };
  }
}

Layer 3: Architectural Refactoring

We completely restructured our email sending flow:

// Old approach: Send emails immediately during processing
for (const user of users) {
  for (const lesson of lessons) {
    await sendEmail(user, lesson) // Rate limit hit here
  }
}

// New approach: Prepare first, then batch send
const allPreparedEmails: PreparedEmail[] = []

// Phase 1: Prepare all emails
for (const user of users) {
  for (const lesson of lessons) {
    const preparedEmail = await this.prepareLessonEmail(user, lesson)
    if (preparedEmail) {
      allPreparedEmails.push({ email: preparedEmail, user, lesson })
    }
  }
}

// Phase 2: Batch send with intelligent chunking and accurate tracking
if (allPreparedEmails.length > 0) {
  console.log(`🚀 Sending ${allPreparedEmails.length} emails in optimized batches`)

  const emailsToSend = allPreparedEmails.map((pe) => pe.email)

  // The EmailCoordinator handles chunking internally for 100+ email limits
  const batchResult = await this.emailCoordinator.sendBatchLessonEmails(emailsToSend)

  if (batchResult.success) {
    // Only mark lessons as sent based on actual successful send count
    const successfulEmails = allPreparedEmails.slice(0, batchResult.sentCount)
    await this.markLessonsAsSent(successfulEmails)
  }
}

Layer 4: Smart Retry Logic

For critical emails like admin reports, we implemented retry logic:

async sendSummaryReport(result: SchedulerResult): Promise<void> {
  // Add delay to avoid rate limiting after lesson emails
  console.log('⏳ Waiting 3 seconds to avoid rate limiting...');
  await new Promise(resolve => setTimeout(resolve, 3000));

  const emailResult = await this.emailService.sendEmail({
    to: this.reportingEmail,
    subject: `📊 Lesson Scheduler Report - ${result.timeSlot}`,
    html: htmlContent,
  });

  if (!emailResult.success) {
    // Retry logic for rate limits
    if (emailResult.error?.includes('rate_limit_exceeded') ||
        emailResult.error?.includes('Too many requests')) {
      console.log('⏳ Rate limited - waiting 10 seconds and retrying...');
      await new Promise(resolve => setTimeout(resolve, 10000));

      const retryResult = await this.emailService.sendEmail({
        to: this.reportingEmail,
        subject: `📊 Lesson Scheduler Report - ${result.timeSlot} (Retry)`,
        html: htmlContent,
      });

      if (retryResult.success) {
        console.log('✅ Summary report sent successfully on retry');
      }
    }
  }
}

Key Debugging Techniques for Cloud Run

1. Structured Logging

Use consistent emoji prefixes and structured data:

// Good logging practices
console.log('🚀 BATCH EMAIL MODE: Sending emails', { count: emails.length })
console.log('✅ Email sent successfully', { messageId, recipient })
console.log('❌ Email failed', { error: error.message, recipient })
console.log('⏳ Rate limiting delay', { delayMs: 600 })

2. Revision Tracking

Always verify your code is actually deployed:

# Check current revision
gcloud run services describe lesson-scheduler \
  --region=us-central1 \
  --format="value(status.latestReadyRevisionName)"

# List recent revisions with images
gcloud run revisions list --service=lesson-scheduler \
  --format="table(metadata.name,status.conditions[0].status,spec.containers[0].image)"

3. Real-time Monitoring

Set up log streaming during debugging:

# Stream logs in real-time
gcloud beta logging tail "resource.type=cloud_run_revision AND resource.labels.service_name=lesson-scheduler"

# Filter for errors only
gcloud beta logging tail "resource.type=cloud_run_revision AND resource.labels.service_name=lesson-scheduler" \
  --filter="severity>=ERROR"

Results and Performance Impact

Before Optimization

❌ Cascading failures due to rate limits
❌ Poor user experience with missed lessons
❌ Manual intervention required daily

After Optimization

✅ 0% email failure rate
✅ Automatic handling of up to 100 emails per batch
✅ Zero manual intervention required

Performance Metrics

Final Test Results:
- totalUsers: 1
- emailsSent: 3
- emailsFailed: 0
- errorCount: 0
- processingTime: 7.048s
- batchEmailsUsed: true

Best Practices Learned

1. Email Service Design Patterns

interface EmailService {
  // Always return structured results
  sendEmail(options: EmailOptions): Promise<{
    success: boolean
    error?: string
    messageId?: string
  }>

  // Batch operations with fallback
  sendBatchEmails(emails: BatchEmailOptions[]): Promise<{
    success: boolean
    error?: string
    messageIds?: string[]
    sentCount?: number
    failedCount?: number
  }>
}

2. Rate Limiting Strategies

Batch First: Use batch APIs when available
Automatic Chunking: Split large batches into 100-email chunks automatically
Intelligent Delays: 600ms between individual requests (< 2 req/sec) and 1s between batch chunks
Accurate Tracking: Only mark operations as successful based on actual send counts
Exponential Backoff: Longer delays for retries
Circuit Breaker: Fail fast on repeated rate limits

Conclusion

Building a robust email delivery system requires understanding both your email provider’s limitations and your cloud platform’s deployment mechanics. Our journey from 30% failure rates to 0% taught us that:

Rate limiting is real – Plan for it from day one
Batch APIs are game-changers – Use them when available
Docker caching can bite you – Structure your builds carefully
Comprehensive logging saves time – Invest in good observability
Fallback strategies are essential – Always have a Plan B

The combination of Resend’s batch API, intelligent fallback logic, proper rate limiting, and robust Cloud Run debugging techniques transformed our email delivery from a daily headache into a reliable, scalable system.

Remember: In production systems, it’s not just about making it work—it’s about making it work reliably, even when things go wrong.

This blog post is based on our real-world experience building DailyMastery, an AI-powered learning platform. The techniques described here are battle-tested in production and have been processing thousands of lesson emails daily without issues.