Circuit Breakers & Resilience
The "Calling a Friend" Analogy
You call your friend. They don't pick up. What do you do?
- Try again immediately (maybe they were grabbing their phone)
- Wait a bit and try again (maybe they were in the bathroom)
- Give up after 3 attempts
- Leave a voicemail (fallback mechanism)
Traditional event buses don't do any of this. One error = crash.
// Fragile code
bus.on('api:fetch', async (data) => {
const result = await fetch('/api/data'); // Network fails → crash
updateUI(result);
});The Problem: Networks Are Hostile
In production, everything fails:
- Network timeouts (server unresponsive)
- Server errors (500, 503)
- Rate limiting (429 Too Many Requests)
- DNS failures (can't resolve hostname)
- Transient errors (works on retry)
The 3am wake-up call scenario:
2:47am: API endpoint fails
2:47am: Your app crashes completely
2:48am: Users flood social media
3:00am: Your phone rings 😱The Solution: Automatic Retries + Circuit Breakers
Nexus provides resilience options when subscribing:
bus.on('api:fetch', fetchHandler, {
attempts: 3, // Retry up to 3 times
backoff: 1000, // Wait 1 second between retries
fallback: (error) => {
// Called if all retries fail
showOfflineMode();
}
});No need to wrap in try/catch or manually implement retry logic!
How Circuit Breakers Work
Behind the scenes:
async function execute(fn: Listener, payload: any, options: SubscribeOptions) {
let attempts = (options.attempts || 0) + 1; // Default: 1 attempt
const backoff = options.backoff || 1000; // Default: 1s
while (attempts > 0) {
try {
await fn(payload);
return; // Success! Exit loop
} catch (err) {
attempts--;
if (attempts > 0) {
// Retry after backoff
await new Promise(resolve => setTimeout(resolve, backoff));
} else {
// All retries exhausted
options.fallback?.(err);
}
}
}
}Basic Example: API Fetch with Retry
import { Nexus } from '@caeligo/nexus-orchestrator';
const bus = new Nexus();
// Handler with automatic retries
bus.on('weather:fetch', async (payload) => {
const response = await fetch(`/api/weather?city=${payload.city}`);
if (!response.ok) {
throw new Error(`HTTP ${response.status}`);
}
const data = await response.json();
bus.emit('weather:data', data);
}, {
attempts: 3, // Try 3 times total
backoff: 2000, // Wait 2 seconds between retries
fallback: (error) => {
console.error('Weather API failed:', error);
bus.emit('weather:error', {
message: 'Unable to fetch weather. Using cached data.'
});
}
});
// Trigger the fetch
bus.emit('weather:fetch', { city: 'New York' });
// Timeline:
// t=0: First attempt → fails (network timeout)
// t=2s: Second attempt → fails (server error)
// t=4s: Third attempt → fails
// t=4s: Fallback calledReal-World Example: E-Commerce Checkout
interface CheckoutPayload {
orderId: string;
amount: number;
paymentToken: string;
}
const bus = new Nexus();
// Payment processing with exponential backoff
bus.on('checkout:process', async (payload: CheckoutPayload) => {
const response = await fetch('/api/checkout', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
orderId: payload.orderId,
amount: payload.amount,
token: payload.paymentToken
})
});
if (!response.ok) {
// Throw to trigger retry
throw new Error(`Payment failed: ${response.status}`);
}
const result = await response.json();
// Success!
bus.emit('checkout:success', {
orderId: payload.orderId,
confirmationNumber: result.confirmationId
});
}, {
attempts: 3, // Critical: try 3 times
backoff: 3000, // 3 seconds between retries
fallback: (error) => {
// All retries failed - escalate to support
bus.emit('checkout:failure', {
orderId: payload.orderId,
error: error.message,
timestamp: Date.now()
});
// Log to monitoring service
logError('CRITICAL_CHECKOUT_FAILURE', {
orderId: payload.orderId,
error: error.message
});
// Show user-friendly error
bus.emit('ui:notification', {
type: 'error',
message: 'Payment processing failed. Your card was not charged. Please contact support.',
action: {
label: 'Contact Support',
handler: () => openSupportChat()
}
});
}
});
// Success and failure handlers
bus.on('checkout:success', (data) => {
window.location.href = `/order/${data.orderId}/confirmation`;
});
bus.on('checkout:failure', (data) => {
// Show retry button
document.getElementById('retry-button')?.classList.remove('hidden');
});Real-World Example: Real-Time Notifications
// WebSocket connection with reconnection logic
let ws: WebSocket | null = null;
function connectWebSocket() {
ws = new WebSocket('wss://notifications.example.com');
ws.onmessage = (event) => {
const notification = JSON.parse(event.data);
bus.feed('notification:received', notification);
};
ws.onerror = () => {
bus.emit('websocket:error', { timestamp: Date.now() });
};
}
// Handle connection errors with retries
bus.on('websocket:error', () => {
ws?.close();
ws = null;
// Retry connection
bus.emit('websocket:reconnect', {});
}, {
attempts: 5, // Try 5 times
backoff: 5000, // 5 seconds between attempts
fallback: () => {
// Give up after 5 attempts
console.error('WebSocket connection permanently failed');
bus.emit('ui:notification', {
type: 'warning',
message: 'Real-time updates unavailable. Using polling instead.'
});
// Fall back to HTTP polling
startPolling();
}
});
bus.on('websocket:reconnect', () => {
connectWebSocket();
});Edge Case: Exponential Backoff
Linear backoff (1s, 1s, 1s) isn't ideal for overloaded servers. Use exponential backoff:
// Custom backoff strategy
let retryCount = 0;
bus.on('api:heavy', async (payload) => {
// Simulate API call
const response = await fetch('/api/heavy-operation', {
method: 'POST',
body: JSON.stringify(payload)
});
if (!response.ok) throw new Error('API Error');
return response.json();
}, {
attempts: 5,
backoff: 1000 * Math.pow(2, retryCount++), // 1s, 2s, 4s, 8s, 16s
fallback: () => {
retryCount = 0; // Reset for next time
showErrorMessage();
}
});Note: Nexus uses linear backoff by default. Implement exponential backoff manually if needed.
Edge Case: Partial Success
Sometimes you want to retry only specific errors:
bus.on('api:conditional', async (payload) => {
try {
const response = await fetch('/api/data');
const data = await response.json();
if (response.status === 429) {
// Rate limited - should retry
throw new Error('RETRY:Rate limited');
}
if (response.status === 404) {
// Not found - don't retry, this is a permanent error
bus.emit('api:not-found', payload);
return; // Success (exit normally, no throw)
}
if (!response.ok) {
// Other error - retry
throw new Error(`HTTP ${response.status}`);
}
bus.emit('api:success', data);
} catch (err) {
// Only retry if error message starts with "RETRY:"
if (err.message.startsWith('RETRY:')) {
throw err; // Propagate for retry
}
// Otherwise, log and exit gracefully
console.error('Permanent error:', err);
}
}, {
attempts: 3,
backoff: 2000
});Edge Case: Retry with Jitter
Add randomness to prevent "thundering herd" (all clients retrying simultaneously):
function randomBackoff(base: number): number {
return base + Math.random() * 1000; // Add 0-1s jitter
}
bus.on('api:popular', fetchHandler, {
attempts: 3,
backoff: randomBackoff(2000), // 2s ± 1s
fallback: handleError
});Pattern: Retry with Progress Indication
Show users that retries are happening:
let currentAttempt = 0;
bus.on('file:upload', async (payload) => {
currentAttempt++;
// Update UI
bus.emit('ui:upload-status', {
status: 'uploading',
attempt: currentAttempt,
message: `Uploading... (attempt ${currentAttempt}/3)`
});
// Upload logic
const formData = new FormData();
formData.append('file', payload.file);
const response = await fetch('/api/upload', {
method: 'POST',
body: formData
});
if (!response.ok) {
throw new Error('Upload failed');
}
// Success
currentAttempt = 0;
bus.emit('ui:upload-status', {
status: 'success',
message: 'Upload complete!'
});
}, {
attempts: 3,
backoff: 2000,
fallback: () => {
currentAttempt = 0;
bus.emit('ui:upload-status', {
status: 'error',
message: 'Upload failed after 3 attempts. Please try again.'
});
}
});Pattern: Combining with RPC
Resilience works with request/reply:
// Server handler with retries
bus.on('user:fetch', async (payload) => {
const user = await database.getUser(payload.id);
bus.reply(payload, user);
}, {
attempts: 3,
backoff: 1000,
fallback: (error) => {
// Database down - reply with error
bus.reply(payload, { error: 'Database unavailable' });
}
});
// Client with timeout
try {
const user = await bus.request('user:fetch', { id: 123 }, 10000);
if (user.error) {
console.error('Server error:', user.error);
} else {
console.log('User:', user);
}
} catch (err) {
console.error('Request timeout:', err);
}Pattern: Health Check System
Monitor system health and disable features on failure:
let apiHealthy = true;
// Periodic health check
setInterval(() => {
bus.emit('health:check', { service: 'api' });
}, 30000); // Every 30 seconds
bus.on('health:check', async (payload) => {
const response = await fetch('/api/health');
if (!response.ok) {
throw new Error('Health check failed');
}
// Success - mark as healthy
apiHealthy = true;
bus.emit('health:status', { service: 'api', healthy: true });
}, {
attempts: 3,
backoff: 5000,
fallback: () => {
// All retries failed - mark as unhealthy
apiHealthy = false;
bus.emit('health:status', { service: 'api', healthy: false });
// Disable features that depend on the API
disableCheckoutButton();
showMaintenanceMessage();
}
});
// Use health status to conditionally emit events
function addToCart(product) {
if (!apiHealthy) {
bus.emit('ui:notification', {
type: 'warning',
message: 'Shopping cart temporarily unavailable. Please try again later.'
});
return;
}
bus.emit('cart:add', product);
}Testing Resilience
import { describe, it, expect, vi } from 'vitest';
import { Nexus } from '@caeligo/nexus-orchestrator';
describe('Resilience', () => {
it('should retry on failure', async () => {
const bus = new Nexus();
let callCount = 0;
bus.on('api:call', () => {
callCount++;
if (callCount < 3) {
throw new Error('Simulated failure');
}
// Success on 3rd attempt
}, {
attempts: 3,
backoff: 10 // Short backoff for testing
});
bus.emit('api:call', {});
// Wait for retries
await new Promise(resolve => setTimeout(resolve, 100));
expect(callCount).toBe(3);
});
it('should call fallback on exhaustion', async () => {
const bus = new Nexus();
let fallbackCalled = false;
bus.on('api:fail', () => {
throw new Error('Always fails');
}, {
attempts: 2,
backoff: 10,
fallback: () => {
fallbackCalled = true;
}
});
bus.emit('api:fail', {});
await new Promise(resolve => setTimeout(resolve, 100));
expect(fallbackCalled).toBe(true);
});
});Performance Considerations
Latency: Retries add attempts × backoff milliseconds in worst case.
Throughput: Retries don't block other events (they're async).
Memory: Each retry creates a new timeout - cleaned up automatically.
Recommendation: Set attempts based on failure likelihood:
- Critical operations (payment): 3-5 attempts
- Non-critical (analytics): 1-2 attempts
- Idempotent operations: Higher attempts OK
Next Steps
Now that your listeners are bulletproof, learn how to test them:
- Chaos Monkey - Simulate failures in development
- AI Prediction - Predict and prevent failures
- Teleportation - Resilient client/server communication
Quick Reference
Basic retry:
bus.on(event, handler, { attempts: 3 });With backoff:
bus.on(event, handler, {
attempts: 3,
backoff: 2000 // 2 seconds
});With fallback:
bus.on(event, handler, {
attempts: 3,
backoff: 1000,
fallback: (error) => {
console.error('Failed:', error);
}
});Best practices:
- Always set
fallbackfor critical operations - Use reasonable
backoff(1-5 seconds) - Don't retry non-idempotent operations (e.g., payment processing)
- Log retry attempts for debugging
