Type‑Safe Circuit Breaker in Node.js: A TypeScript & OpenTelemetry Playbook
Introduction
Micro‑services and third‑party APIs are the backbone of modern Node.js applications, but they also introduce a fragile dependency: an external endpoint can become slow, start returning errors, or disappear altogether. A circuit breaker protects your service from cascading failures by short‑circuiting calls that are likely to fail, giving the downstream system time to recover.
When you add TypeScript into the mix, you get compile‑time guarantees about request and response shapes, but most circuit‑breaker libraries ship with any‑typed payloads, forcing you to cast or lose type safety. This article shows how to:
- Design a generic, type‑safe circuit breaker that works with any async function.
- Wire the breaker into OpenTelemetry for tracing and metrics.
- Use the breaker in a realistic scenario (a payment‑gateway client).
By the end you’ll have a reusable utility you can drop into any Node.js codebase without sacrificing type safety or observability.
1. The circuit‑breaker problem in a nutshell
A classic circuit breaker has three states:
| State | When it transitions to | What it does while in this state |
|---|---|---|
| Closed | Service is healthy | Calls pass through normally. |
| Open | Failure threshold exceeded (e.g., 5 errors in 30 s) | Calls are rejected immediately. |
| Half‑Open | After a cool‑down period | Allows a limited number of “probe” calls; success closes the circuit, failure re‑opens it. |
The pattern is simple, but implementing it type‑safely is tricky because the breaker must accept any async function and preserve its input and output types.
2. Core design goals
| Goal | Why it matters |
|---|---|
| Generic API | The breaker should wrap () => Promise<T> for any T. |
| Strong typing of errors | Distinguish between transient (circuit‑breaker‑eligible) and permanent errors. |
| Observability | Export OpenTelemetry spans and metrics for each state transition and request. |
| Configurable | Time windows, thresholds, and back‑off strategies must be adjustable per instance. |
| Testable | Pure functions and injectable clocks make unit testing deterministic. |
3. Type‑safe API surface
/** Options that control the breaker’s behaviour */
export interface CircuitBreakerOptions {
/** Number of failures before opening the circuit */
failureThreshold: number;
/** Time (ms) the circuit stays open before trying half‑open */
resetTimeout: number;
/** How many probe calls are allowed in half‑open state */
halfOpenMaxCalls: number;
}
/** Result of a wrapped call */
export type CircuitResult<T> =
| { ok: true; value: T }
| { ok: false; error: Error; reason: 'open' | 'rejected' };
The public wrap function looks like this:
function wrap<T, A extends any[]>(
fn: (...args: A) => Promise<T>,
opts: CircuitBreakerOptions,
otel: OpenTelemetryHelper,
): (...args: A) => Promise<CircuitResult<T>> { … }
A captures the argument tuple, preserving the exact signature of the original function. The returned wrapper returns a discriminated union (ok flag) so callers must handle both success and failure paths explicitly—no more unchecked any.
4. Implementing the state machine
We model the breaker as a small class that holds mutable state but exposes only the pure execute method.
enum State {
Closed,
Open,
HalfOpen,
}
class CircuitBreaker<A extends any[], T> {
private state = State.Closed;
private failures = 0;
private lastFailure = 0;
private halfOpenCalls = 0;
constructor(
private readonly fn: (...args: A) => Promise<T>,
private readonly opts: CircuitBreakerOptions,
private readonly otel: OpenTelemetryHelper,
private readonly now: () => number = () => Date.now(),
) {}
async execute(...args: A): Promise<CircuitResult<T>> {
// 1️⃣ Record entry span
const span = this.otel.startSpan('circuitBreaker.execute', {
attributes: { 'circuit.state': State[this.state] },
});
// 2️⃣ Short‑circuit if open
if (this.state === State.Open) {
span.setAttribute('circuit.rejected', true);
span.end();
return { ok: false, error: new Error('Circuit open'), reason: 'open' };
}
// 3️⃣ Allow call (closed or half‑open)
try {
const value = await this.fn(...args);
this.recordSuccess();
span.setStatus({ code: 0 });
span.end();
return { ok: true, value };
} catch (err) {
this.recordFailure(err as Error);
span.recordException(err as Error);
span.setStatus({ code: 2, message: (err as Error).message });
span.end();
return { ok: false, error: err as Error, reason: 'rejected' };
}
}
/** Called when the wrapped call succeeds */
private recordSuccess() {
if (this.state === State.HalfOpen) {
this.halfOpenCalls++;
if (this.halfOpenCalls >= this.opts.halfOpenMaxCalls) {
this.transitionTo(State.Closed);
}
} else {
// Reset failure count on any success while closed
this.failures = 0;
}
}
/** Called when the wrapped call throws */
private recordFailure(err: Error) {
// Only count transient errors (you can customise this)
if (!isTransient(err)) return;
this.failures++;
this.lastFailure = this.now();
if (this.state === State.Closed && this.failures >= this.opts.failureThreshold) {
this.transitionTo(State.Open);
} else if (this.state === State.HalfOpen) {
this.transitionTo(State.Open);
}
}
/** Handles state transitions and emits metrics */
private transitionTo(newState: State) {
if (this.state === newState) return;
const prev = this.state;
this.state = newState;
this.halfOpenCalls = 0;
this.failures = newState === State.Closed ? 0 : this.failures;
// Open → schedule reset
if (newState === State.Open) {
setTimeout(() => this.transitionTo(State.HalfOpen), this.opts.resetTimeout);
}
// Emit OpenTelemetry metric
this.otel.recordMetric('circuit.state_change', 1, {
attributes: {
from: State[prev],
to: State[newState],
},
});
}
}
Why this is type‑safe
- The generic parameters
AandTflow from the original function to the wrapper, so the compiler knows exactly what arguments are accepted and what type is returned on success. - The
CircuitResult<T>forces callers to checkokbefore accessingvalue, eliminating accidentalundefinederrors. - The
isTransienthelper can be typed to accept onlyErrorsubclasses you deem retryable, keeping error handling explicit.
5. OpenTelemetry helper
A thin wrapper keeps the breaker code tidy and lets you swap the OTEL SDK version without touching the core logic.
import { trace, metrics, Span, SpanOptions } from '@opentelemetry/api';
export class OpenTelemetryHelper {
private tracer = trace.getTracer('circuit-breaker');
private meter = metrics.getMeter('circuit-breaker');
startSpan(name: string, opts?: SpanOptions): Span {
return this.tracer.startSpan(name, opts);
}
/** Counter for state changes */
private stateChangeCounter = this.meter.createCounter('circuit.state_change', {
description: 'Number of circuit breaker state transitions',
});
recordMetric(name: string, value: number, opts?: { attributes?: Record<string, any> }) {
if (name === 'circuit.state_change') {
this.stateChangeCounter.add(value, opts?.attributes);
}
// Extend with more metrics as needed
}
}
With this helper, each call to the breaker automatically creates a span (circuitBreaker.execute) and records a counter whenever the state changes. You can also add a Histogram for latency if you wish.
6. Real‑world example: a payment‑gateway client
Suppose we have a thin wrapper around a third‑party payment API:
interface ChargeRequest {
amountCents: number;
currency: 'USD' | 'EUR';
sourceToken: string;
}
interface ChargeResponse {
id: string;
status: 'succeeded' | 'failed';
receiptUrl: string;
}
class PaymentGateway {
async charge(req: ChargeRequest): Promise<ChargeResponse> {
// Imagine a fetch call here
const resp = await fetch('https://api.payments.example/charge', {
method: 'POST',
body: JSON.stringify(req),
headers: { 'Content-Type': 'application/json' },
});
if (!resp.ok) throw new Error(`HTTP ${resp.status}`);
return resp.json();
}
}
Wrapping it with a type‑safe breaker
const otel = new OpenTelemetryHelper();
const breaker = new CircuitBreaker(
(req: ChargeRequest) => new PaymentGateway().charge(req),
{
failureThreshold: 3,
resetTimeout: 15_000,
halfOpenMaxCalls: 2,
},
otel,
);
async function chargeWithProtection(req: ChargeRequest) {
const result = await breaker.execute(req);
if (!result.ok) {
if (result.reason === 'open') {
// Fallback logic, e.g., queue for later processing
console.warn('Payment service unavailable – queuing request');
}
throw result.error;
}
return result.value;
}
What we gain
- Compile‑time safety –
chargeWithProtectiononly acceptsChargeRequestand returnsChargeResponseon success. - Observability – Every attempt is a span; state changes are emitted as metrics, visible in Jaeger or Prometheus.
- Resilience – After three consecutive failures, the circuit opens and subsequent calls are rejected instantly, protecting downstream order‑processing pipelines.
7. Testing the breaker
Because the breaker’s timing logic is injected (now function) and state transitions are deterministic, we can unit‑test it without waiting for real timers.
import { jest } from '@jest/globals';
test('opens after threshold', async () => {
const fakeNow = jest.fn()
.mockReturnValueOnce(0) // start
.mockReturnValue(1000); // subsequent calls
const otel = new OpenTelemetryHelper();
const fn = jest.fn().mockRejectedValue(new Error('Transient'));
const breaker = new CircuitBreaker(fn, {
failureThreshold: 2,
resetTimeout: 5000,
halfOpenMaxCalls: 1,
}, otel, fakeNow);
// First failure – stays closed
await expect(breaker.execute()).resolves.toMatchObject({ ok: false });
// Second failure – should open
await expect(breaker.execute()).resolves.toMatchObject({ ok: false, reason: 'open' });
// Verify state change metric was recorded
// (implementation‑specific assertions omitted)
});
The test demonstrates that the breaker’s logic is pure apart from the injected clock and OpenTelemetry side‑effects, making it easy to verify edge cases such as rapid successive failures or successful probes in half‑open mode.
8. Best practices checklist
| ✅ | Practice |
|---|---|
Strongly type the wrapped function – use generic tuple A and return type T. |
|
Separate transient vs. permanent errors – implement isTransient(err) based on HTTP status codes, network errors, etc. |
|
| Instrument every transition – spans for calls, counters for state changes, histograms for latency. | |
| Avoid global singletons – create a breaker per downstream service or per configuration to keep metrics meaningful. | |
Graceful fallback – when the breaker returns { reason: 'open' }, enqueue work, return cached data, or surface a user‑friendly error. |
|
Monitor health – set up alerts on the circuit.state_change metric when the circuit stays open for too long. |
|
| Keep the reset timeout reasonable – too short leads to flapping; too long delays recovery. | |
Test with deterministic clocks – inject now and use jest.useFakeTimers() for time‑based tests. |
9. Putting it all together
Below is a minimal index.ts that you can drop into a Node.js project:
import { CircuitBreaker, OpenTelemetryHelper } from './circuitBreaker';
import { PaymentGateway, ChargeRequest } from './paymentGateway';
const otel = new OpenTelemetryHelper();
const paymentBreaker = new CircuitBreaker(
(req: ChargeRequest) => new PaymentGateway().charge(req),
{
failureThreshold: 5,
resetTimeout: 30_000,
halfOpenMaxCalls: 3,
},
otel,
);
export async function charge(req: ChargeRequest) {
const result = await paymentBreaker.execute(req);
if (!result.ok) {
if (result.reason === 'open') {
// Example fallback: push to a message queue
await enqueueForLater(req);
}
throw result.error;
}
return result.value;
}
Run your service with an OpenTelemetry collector (e.g., Jaeger) and you’ll see:
- A trace per payment attempt, annotated with
circuit.state. - A metric
circuit.state_changethat you can graph to spot frequent openings. - Automatic correlation between the payment request span and downstream HTTP client spans (if you instrument
fetch/axios).
Conclusion
A circuit breaker is a classic resilience pattern, but in a TypeScript codebase it often becomes a source of any‑driven bugs. By parameterising the wrapper with generic argument and return types, we retain full compile‑time safety. Adding a thin OpenTelemetry layer gives you the observability needed to operate at scale, while the state‑machine implementation stays small enough to be audited and tested.
Take the snippets above, adapt the isTransient logic to your domain, and you’ll have a production‑ready, type‑safe circuit breaker that plays nicely with modern observability stacks. Happy coding!
Member discussion