## Summary A Circuit Breaker is used in service design to prevent repeated attempts at something that is likely to fail. ## States There are three states: 1. **Closed:** Operations are allowed to proceed, with failure counts monitored. Failure count is periodically reset. If failure count hits threshold, state changes to **Open**. 2. **Open:** Operations are rejected for a period of time, after which state changes to **Half-Open**. 3. **Half-Open:** Operations are allowed to proceed, with success counts monitored. If success count hits threshold, state changes to **Closed**. If any error is hit, success changes to **Open**. ## Logic ### Closed ```python def on_state_activation(): failure_count = 0 last_failure_timestamp = NULL def on_operation_request(): if last_failure_timestamp != NULL and NOW() - last_failure_timestamp >= RESET_FAILURE_INTERVAL: failure_count = 0 perform_operation() if success: return result else: failure_count++ last_failure_timestamp = NOW() if failure_count >= FAILURE_THRESHOLD: set_state(OPEN) return failure ``` ### Open ```python def on_state_activated(): state_activated_timestamp = NOW() set_timeout(timeout=OPEN_TIMEOUT, callback=set_state(CLOSED)) def on_operation_request(): if NOW() - state_activated_timestamp >= OPEN_TIMEOUT: set_state(HALF_OPEN) return rerun_perform_operation() else: return failure ``` ### Half-Open ```python def on_state_activated(): success_count = 0 def on_operation_request(): perform_operation() if success: success_count++ if success_count >= SUCCESS_THRESHOLD: set_state(CLOSED) return result else: set_state(OPEN) return failure ``` ## Additional Considerations * Timeouts could use exponential (or similar) backoff. * As this will often be global, all state should be thread-safe (governed by a lock) Microsoft: > The pattern is customizable and can be adapted according to the type of the possible failure. For example, you can apply an increasing timeout timer to a circuit breaker. You could place the circuit breaker in the **Open** state for a few seconds initially, and then if the failure hasn't been resolved increase the timeout to a few minutes, and so on. In some cases, rather than the **Open** state returning failure and raising an exception, it could be useful to return a default value that is meaningful to the application. ## Implementations ### Python This is exception-based, and should be tailored for specific circumstances. ```python class CircuitBreakerState(Enum): CLOSED = 0 OPEN = 1 HALF_OPEN = 2 class CircuitBreaker: # Constants CLOSED_RESET_FAILURES_INTERVAL_MS = ... CLOSED_FAILURE_THRESHOLD = ... HALF_OPEN_SUCCESS_THRESHOLD = ... OPEN_TIMEOUT_MS = ... # Common state state: CircuitBreakerState state_active_timestamp: datetime lock: threading.Lock # Closed state closed_failure_count: int closed_last_failure_timestamp: datetime # Half-open state half_open_success_count: int def __init__(self) -> None: self._state = CircuitBreakerState.CLOSED @property def state(self) -> CircuitBreakerState: return self._state @state.setter def state( self, new_state: CircuitBreakerState, ) -> None: if not self.lock.acquire(): # Another thread is managing state. Bail. return try: self.state = new_state self.state_active_timestamp = datetime.now() self.closed_failure_count = 0 self.closed_last_failure_timestamp = None self.half_open_success_count = 0 finally: self.lock.release() def run_operation( self, handler: Callable, ) -> Any: now = datetime.now() if self.state == CircuitBreakerState.CLOSED: # The circuit breaker is closed. if (self.closed_last_failure_timestamp is not None and now - self.closed_last_failure_timestamp >= self.CLOSED_RESET_FAILURES_INTERVAL_MS): # Enough time passed. Reset the failure count. self.closed_failure_count = 0 try: return handler() except Exception: self.closed_failure_count += 1 if self.closed_failure_count >= self.CLOSED_FAILURE_THRESHOLD: self.state = CircuitBreakerState.OPEN raise elif self.state == CircuitBreakerState.OPEN: # The circuit breaker is open. if now - self.state_active_timestamp >= self.OPEN_TIMEOUT_MS: self.state = CircuitBreakerState.HALF_OPEN # Re-run the operation. return self.run_operation(handler) else: raise Exception('...') elif self.state == CircuitBreakerState.HALF_OPEN: # The circuit breaker is half-open. try: result = handler() self.half_open_success_count += 1 if self.half_open_success_count >= HALF_OPEN_SUCCESS_THRESHOLD: self.state = CircuitBreakerState.CLOSED return result except Exception: self.state = CircuitBreakerState.OPEN raise ``` > [!seealso] See Also: > * [Circuit Breaker — #Microsoft](https://learn.microsoft.com/en-us/azure/architecture/patterns/circuit-breaker)