Published

6 min read

High reliability and high resilience: The operating system for safer, adaptive care

High reliability and high resilience: The operating system for safer, adaptive care GettyImages 1080663958 1

By David Varnes, Principal, Strategic Consulting, and Tejal Gandhi, Chief Safety and Transformation Officer, Press Ganey.

Healthcare is experiencing a complexity shock. Variability in acuity, staffing patterns, technology, and the realities of sociotechnical work make safety not just a matter of preventing discrete failures but also of building systems that can detect, adapt, and recover from risk in real time. Some have made the case that High Reliability Organizing (HRO) has become too rule-bound or static, and that Human-Centered Resilience Engineering (HRE) should replace it. Healthcare is often nonlinear, emergent, and can be unknowable in the moment, so we do need proactive sensing, human-factors design, and adaptive capacity. But framing our future as HRE instead of HRO risks a false choice. This framing fundamentally misinterprets what HRO is—and what healthcare safety systems demand.

Resiliency has always been a core principle of HRO—embedded in its DNA through the classic “commitment to resilience.” In complex systems, the safest organizations keep anticipation and containment strong (using the Weick & Sutcliffe terms): They prevent known failure modes and adapt to unknown conditions in real time.

To move forward, we need to think beyond HRO as just a set of tools or behaviors. Instead, we need to view and understand it as the operating system (OS) of a healthcare organization. HRO establishes how a system senses, interprets, escalates, and responds to signals of risk. Resilience, far from being an alternative, should be viewed as an application layer that extends this OS—enhancing foresight, flexibility, and human-centered adaptability. The safest organizations do not choose reliability or resilience; they run both simultaneously, with each strengthening the other.

Why HRO still matters

A high-functioning OS provides consistency, signal clarity, and dependable coordination. In the clinical environment, reliability routines, standardized handoffs, structured communication, robust event identification and analysis, daily safety huddles, and escalation pathways create the predictable backbone that makes sensemaking possible.

This foundational reliability matters now more than ever. According to Press Ganey’s “State of Healthcare Safety 2026” report, 46.6% of healthcare workers rate their organization’s overall safety culture as low—a finding that directly threatens the system’s ability to detect and respond to early risks. Additionally, safety culture is not a soft metric. It is one of the strongest predictors of workforce stability, with seven of the top 10 employee engagement drivers linked directly to safety-related perceptions. And when organizations learn effectively and embed HRO principles and practices, safety culture improves.

High reliability practices reduce background noise so teams can see anomalies sooner. They create the cognitive and operational space that makes resilience possible. Without a stable architecture of routines, processes, and shared mental models, adaptive capacity collapses under pressure.

Resilience as the application layer

If HRO defines the OS, resilience is the capability layer that enables the system to navigate variability without breaking. Resilience is proactive: scanning for weak signals, enabling dynamic replanning, and making it easier for clinicians to adjust safely when the unexpected occurs.

Critics sometimes position resilience as a corrective to rigidity. But when implemented properly, HRO already contains a core principle of resilience: the ability to detect, respond, and recover from unexpected events. In fact, Weick and Sutcliffe state, “A commitment to resilience is evident in management practices and organizational norms that encourage conceptual slack.” Modern HRE practices build in additional capacity through the refinement or expansion of many already existing mature HRO practices, such as:

  • Leading indicator dashboards that reveal emerging risk
  • Protected “safe table” forums where staff surface weak signals
  • Human factors redesign to reduce cognitive burden
  • Flexible staffing and teaming approaches that absorb variability

Why choosing between HRO and HRE is counterproductive

Some have suggested that HRO is “too Safety I,” focused only on preventing bad outcomes. But that critique confuses implementation with intent. In practice, the most effective safety programs already blend Safety I and Safety II approaches—standardizing where variability creates risk, and enabling adaptation where variability is unavoidable.

The data strongly supports this blended model. Organizations with robust safety event reporting, a classic reliability behavior, are 8x more likely to score in the top quartile on learning, teamwork, and collaboration. And those with stronger relational trust and teamwork, core elements of resilience, are 50–80% more likely to achieve top performance on clinical safety outcomes like CAUTI, CLABSI, and pressure injuries.

We should, therefore, retire the suggestion that HRO is inherently “past its sell-by date.” Instead, let’s measure whether our local implementation of HRO has kept pace with complexity. If HRO is practiced as policy policing, rigid standardization, or metric chasing, it will underperform. If it is practiced as disciplined reliability that frees people’s attention for adaptation, it becomes the stable base HRE needs.

Design, not exhortation

One of the clearest lessons from both HRO and HRE is that exhortation does not change behavior. Culture and design, however, do. Safety improves not because clinicians try harder, but because systems are configured so the right actions are the easiest actions.

Human factors improvements, redesigned workflows, and better integrated technologies reduce cognitive load, making it more likely that reliable and resilient behaviors are performed consistently. And when leaders close the loop on reporting—another form of design—trust deepens and reporting increases, further strengthening the OS. In fact, reporting systems with faster feedback and stronger learning loops dramatically outperform peers on both culture and outcomes.

What a unified reliability–resilience OS looks like

The HRO operating system provides:

  • High-signal routines: Huddles, structured handoffs, standard workflows
  • Strong detection functions: Reporting, escalation, monitoring
  • Robust learning systems: Event triage, analysis, robust action plans and implementation
  • The removal of fear and increased transparency through a culture of psychological safety
  • Clear behavioral expectations (non-technical skills) for all
  • Clear coordination architecture: Roles, expectations, communication pathways
  • Reduced noise: Reliable execution that reveals anomalies

The resilience application layer adds:

  • Proactive sensing and weak signal detection
  • Capacity buffers and slack
  • Adaptive teaming and dynamic replanning
  • Human-centered design that aligns with real work-as-done

Together, they create a system that prevents known failure modes while flexibly managing unexpected ones.

The question is not whether HRO is obsolete. It’s whether our local implementation of HRO has grown to meet the complexity of modern care. When HRO is treated as the operating system, not simply a compliance program, the result is a system that can sense risk earlier, respond more effectively, and recover more quickly.

The safest organizations of the next decade will be those that invest in both sides of this equation: reliability that creates order and signal…and resilience that makes the system adaptable, humane, and strong under pressure.