Skip to content

Circuit Breakers & Resilience

The "Calling a Friend" Analogy

You call your friend. They don't pick up. What do you do?

  1. Try again immediately (maybe they were grabbing their phone)
  2. Wait a bit and try again (maybe they were in the bathroom)
  3. Give up after 3 attempts
  4. Leave a voicemail (fallback mechanism)

Traditional event buses don't do any of this. One error = crash.

typescript
// Fragile code
bus.on('api:fetch', async (data) => {
  const result = await fetch('/api/data'); // Network fails → crash
  updateUI(result);
});

The Problem: Networks Are Hostile

In production, everything fails:

  • Network timeouts (server unresponsive)
  • Server errors (500, 503)
  • Rate limiting (429 Too Many Requests)
  • DNS failures (can't resolve hostname)
  • Transient errors (works on retry)

The 3am wake-up call scenario:

2:47am: API endpoint fails
2:47am: Your app crashes completely
2:48am: Users flood social media
3:00am: Your phone rings 😱

The Solution: Automatic Retries + Circuit Breakers

Nexus provides resilience options when subscribing:

typescript
bus.on('api:fetch', fetchHandler, {
  attempts: 3,        // Retry up to 3 times
  backoff: 1000,      // Wait 1 second between retries
  fallback: (error) => {
    // Called if all retries fail
    showOfflineMode();
  }
});

No need to wrap in try/catch or manually implement retry logic!

How Circuit Breakers Work

Behind the scenes:

typescript
async function execute(fn: Listener, payload: any, options: SubscribeOptions) {
  let attempts = (options.attempts || 0) + 1; // Default: 1 attempt
  const backoff = options.backoff || 1000;    // Default: 1s
  
  while (attempts > 0) {
    try {
      await fn(payload);
      return; // Success! Exit loop
    } catch (err) {
      attempts--;
      
      if (attempts > 0) {
        // Retry after backoff
        await new Promise(resolve => setTimeout(resolve, backoff));
      } else {
        // All retries exhausted
        options.fallback?.(err);
      }
    }
  }
}

Basic Example: API Fetch with Retry

typescript
import { Nexus } from '@caeligo/nexus-orchestrator';

const bus = new Nexus();

// Handler with automatic retries
bus.on('weather:fetch', async (payload) => {
  const response = await fetch(`/api/weather?city=${payload.city}`);
  
  if (!response.ok) {
    throw new Error(`HTTP ${response.status}`);
  }
  
  const data = await response.json();
  bus.emit('weather:data', data);
}, {
  attempts: 3,       // Try 3 times total
  backoff: 2000,     // Wait 2 seconds between retries
  fallback: (error) => {
    console.error('Weather API failed:', error);
    bus.emit('weather:error', { 
      message: 'Unable to fetch weather. Using cached data.' 
    });
  }
});

// Trigger the fetch
bus.emit('weather:fetch', { city: 'New York' });

// Timeline:
// t=0:    First attempt → fails (network timeout)
// t=2s:   Second attempt → fails (server error)
// t=4s:   Third attempt → fails
// t=4s:   Fallback called

Real-World Example: E-Commerce Checkout

typescript
interface CheckoutPayload {
  orderId: string;
  amount: number;
  paymentToken: string;
}

const bus = new Nexus();

// Payment processing with exponential backoff
bus.on('checkout:process', async (payload: CheckoutPayload) => {
  const response = await fetch('/api/checkout', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      orderId: payload.orderId,
      amount: payload.amount,
      token: payload.paymentToken
    })
  });
  
  if (!response.ok) {
    // Throw to trigger retry
    throw new Error(`Payment failed: ${response.status}`);
  }
  
  const result = await response.json();
  
  // Success!
  bus.emit('checkout:success', {
    orderId: payload.orderId,
    confirmationNumber: result.confirmationId
  });
}, {
  attempts: 3,           // Critical: try 3 times
  backoff: 3000,         // 3 seconds between retries
  fallback: (error) => {
    // All retries failed - escalate to support
    bus.emit('checkout:failure', {
      orderId: payload.orderId,
      error: error.message,
      timestamp: Date.now()
    });
    
    // Log to monitoring service
    logError('CRITICAL_CHECKOUT_FAILURE', {
      orderId: payload.orderId,
      error: error.message
    });
    
    // Show user-friendly error
    bus.emit('ui:notification', {
      type: 'error',
      message: 'Payment processing failed. Your card was not charged. Please contact support.',
      action: {
        label: 'Contact Support',
        handler: () => openSupportChat()
      }
    });
  }
});

// Success and failure handlers
bus.on('checkout:success', (data) => {
  window.location.href = `/order/${data.orderId}/confirmation`;
});

bus.on('checkout:failure', (data) => {
  // Show retry button
  document.getElementById('retry-button')?.classList.remove('hidden');
});

Real-World Example: Real-Time Notifications

typescript
// WebSocket connection with reconnection logic
let ws: WebSocket | null = null;

function connectWebSocket() {
  ws = new WebSocket('wss://notifications.example.com');
  
  ws.onmessage = (event) => {
    const notification = JSON.parse(event.data);
    bus.feed('notification:received', notification);
  };
  
  ws.onerror = () => {
    bus.emit('websocket:error', { timestamp: Date.now() });
  };
}

// Handle connection errors with retries
bus.on('websocket:error', () => {
  ws?.close();
  ws = null;
  
  // Retry connection
  bus.emit('websocket:reconnect', {});
}, {
  attempts: 5,        // Try 5 times
  backoff: 5000,      // 5 seconds between attempts
  fallback: () => {
    // Give up after 5 attempts
    console.error('WebSocket connection permanently failed');
    bus.emit('ui:notification', {
      type: 'warning',
      message: 'Real-time updates unavailable. Using polling instead.'
    });
    
    // Fall back to HTTP polling
    startPolling();
  }
});

bus.on('websocket:reconnect', () => {
  connectWebSocket();
});

Edge Case: Exponential Backoff

Linear backoff (1s, 1s, 1s) isn't ideal for overloaded servers. Use exponential backoff:

typescript
// Custom backoff strategy
let retryCount = 0;

bus.on('api:heavy', async (payload) => {
  // Simulate API call
  const response = await fetch('/api/heavy-operation', {
    method: 'POST',
    body: JSON.stringify(payload)
  });
  
  if (!response.ok) throw new Error('API Error');
  
  return response.json();
}, {
  attempts: 5,
  backoff: 1000 * Math.pow(2, retryCount++), // 1s, 2s, 4s, 8s, 16s
  fallback: () => {
    retryCount = 0; // Reset for next time
    showErrorMessage();
  }
});

Note: Nexus uses linear backoff by default. Implement exponential backoff manually if needed.

Edge Case: Partial Success

Sometimes you want to retry only specific errors:

typescript
bus.on('api:conditional', async (payload) => {
  try {
    const response = await fetch('/api/data');
    const data = await response.json();
    
    if (response.status === 429) {
      // Rate limited - should retry
      throw new Error('RETRY:Rate limited');
    }
    
    if (response.status === 404) {
      // Not found - don't retry, this is a permanent error
      bus.emit('api:not-found', payload);
      return; // Success (exit normally, no throw)
    }
    
    if (!response.ok) {
      // Other error - retry
      throw new Error(`HTTP ${response.status}`);
    }
    
    bus.emit('api:success', data);
  } catch (err) {
    // Only retry if error message starts with "RETRY:"
    if (err.message.startsWith('RETRY:')) {
      throw err; // Propagate for retry
    }
    // Otherwise, log and exit gracefully
    console.error('Permanent error:', err);
  }
}, {
  attempts: 3,
  backoff: 2000
});

Edge Case: Retry with Jitter

Add randomness to prevent "thundering herd" (all clients retrying simultaneously):

typescript
function randomBackoff(base: number): number {
  return base + Math.random() * 1000; // Add 0-1s jitter
}

bus.on('api:popular', fetchHandler, {
  attempts: 3,
  backoff: randomBackoff(2000), // 2s ± 1s
  fallback: handleError
});

Pattern: Retry with Progress Indication

Show users that retries are happening:

typescript
let currentAttempt = 0;

bus.on('file:upload', async (payload) => {
  currentAttempt++;
  
  // Update UI
  bus.emit('ui:upload-status', {
    status: 'uploading',
    attempt: currentAttempt,
    message: `Uploading... (attempt ${currentAttempt}/3)`
  });
  
  // Upload logic
  const formData = new FormData();
  formData.append('file', payload.file);
  
  const response = await fetch('/api/upload', {
    method: 'POST',
    body: formData
  });
  
  if (!response.ok) {
    throw new Error('Upload failed');
  }
  
  // Success
  currentAttempt = 0;
  bus.emit('ui:upload-status', {
    status: 'success',
    message: 'Upload complete!'
  });
}, {
  attempts: 3,
  backoff: 2000,
  fallback: () => {
    currentAttempt = 0;
    bus.emit('ui:upload-status', {
      status: 'error',
      message: 'Upload failed after 3 attempts. Please try again.'
    });
  }
});

Pattern: Combining with RPC

Resilience works with request/reply:

typescript
// Server handler with retries
bus.on('user:fetch', async (payload) => {
  const user = await database.getUser(payload.id);
  bus.reply(payload, user);
}, {
  attempts: 3,
  backoff: 1000,
  fallback: (error) => {
    // Database down - reply with error
    bus.reply(payload, { error: 'Database unavailable' });
  }
});

// Client with timeout
try {
  const user = await bus.request('user:fetch', { id: 123 }, 10000);
  
  if (user.error) {
    console.error('Server error:', user.error);
  } else {
    console.log('User:', user);
  }
} catch (err) {
  console.error('Request timeout:', err);
}

Pattern: Health Check System

Monitor system health and disable features on failure:

typescript
let apiHealthy = true;

// Periodic health check
setInterval(() => {
  bus.emit('health:check', { service: 'api' });
}, 30000); // Every 30 seconds

bus.on('health:check', async (payload) => {
  const response = await fetch('/api/health');
  
  if (!response.ok) {
    throw new Error('Health check failed');
  }
  
  // Success - mark as healthy
  apiHealthy = true;
  bus.emit('health:status', { service: 'api', healthy: true });
}, {
  attempts: 3,
  backoff: 5000,
  fallback: () => {
    // All retries failed - mark as unhealthy
    apiHealthy = false;
    bus.emit('health:status', { service: 'api', healthy: false });
    
    // Disable features that depend on the API
    disableCheckoutButton();
    showMaintenanceMessage();
  }
});

// Use health status to conditionally emit events
function addToCart(product) {
  if (!apiHealthy) {
    bus.emit('ui:notification', {
      type: 'warning',
      message: 'Shopping cart temporarily unavailable. Please try again later.'
    });
    return;
  }
  
  bus.emit('cart:add', product);
}

Testing Resilience

typescript
import { describe, it, expect, vi } from 'vitest';
import { Nexus } from '@caeligo/nexus-orchestrator';

describe('Resilience', () => {
  it('should retry on failure', async () => {
    const bus = new Nexus();
    let callCount = 0;
    
    bus.on('api:call', () => {
      callCount++;
      if (callCount < 3) {
        throw new Error('Simulated failure');
      }
      // Success on 3rd attempt
    }, {
      attempts: 3,
      backoff: 10 // Short backoff for testing
    });
    
    bus.emit('api:call', {});
    
    // Wait for retries
    await new Promise(resolve => setTimeout(resolve, 100));
    
    expect(callCount).toBe(3);
  });
  
  it('should call fallback on exhaustion', async () => {
    const bus = new Nexus();
    let fallbackCalled = false;
    
    bus.on('api:fail', () => {
      throw new Error('Always fails');
    }, {
      attempts: 2,
      backoff: 10,
      fallback: () => {
        fallbackCalled = true;
      }
    });
    
    bus.emit('api:fail', {});
    
    await new Promise(resolve => setTimeout(resolve, 100));
    
    expect(fallbackCalled).toBe(true);
  });
});

Performance Considerations

Latency: Retries add attempts × backoff milliseconds in worst case.

Throughput: Retries don't block other events (they're async).

Memory: Each retry creates a new timeout - cleaned up automatically.

Recommendation: Set attempts based on failure likelihood:

  • Critical operations (payment): 3-5 attempts
  • Non-critical (analytics): 1-2 attempts
  • Idempotent operations: Higher attempts OK

Next Steps

Now that your listeners are bulletproof, learn how to test them:

Quick Reference

Basic retry:

typescript
bus.on(event, handler, { attempts: 3 });

With backoff:

typescript
bus.on(event, handler, { 
  attempts: 3, 
  backoff: 2000  // 2 seconds
});

With fallback:

typescript
bus.on(event, handler, {
  attempts: 3,
  backoff: 1000,
  fallback: (error) => {
    console.error('Failed:', error);
  }
});

Best practices:

  • Always set fallback for critical operations
  • Use reasonable backoff (1-5 seconds)
  • Don't retry non-idempotent operations (e.g., payment processing)
  • Log retry attempts for debugging

Released under the MIT License.