Implementing Critical User Journey SLOs requires existing observability data - but that data needs to be built around the right journeys, which requires knowing what the business considers critical. This dependency loop is a sequencing problem with a clear resolution: business priorities must be established before technical work begins.

The Instrumentation Dependency

When tasked with SLO implementation, the instinct is often to start in the observability platform. The issue is that meaningful SLO targets require baseline data, and baseline data requires instrumentation scoped to the right things. Without business input defining which journeys matter most, instrumentation choices are guesses.

The result is usually one of two outcomes: SLOs that are technically measurable but not business-relevant, or SLOs that reflect the right journeys but lack the data to set credible targets.

The Correct Sequence

The resolution is organizational before technical:

  1. Business defines CUJ priority - which user journeys deliver the most value or carry the most risk if they fail
  2. SRE maps to technical components - what services, APIs, and dependencies compose that journey
  3. Assess observability gaps - what instrumentation is missing to measure the journey end-to-end
  4. Prioritize instrumentation work - ranked by journey criticality
  5. Engineering implements - instrumentation belongs in the product roadmap
  6. SRE defines SLOs with real data - targets set from measured baselines

Steps 1 and 5 are frequently skipped. Skipping step 1 means instrumentation is built around what is easy to measure rather than what matters. Skipping step 5 means SRE ends up owning engineering work that has no roadmap priority and no deadline.

Observability Gap Analysis

Before SLO definition work begins, five dimensions need assessment:

  • Service-level metrics - does each service emit availability, latency, and error rate?
  • Distributed tracing - can a request be traced across service boundaries end-to-end?
  • User journey tagging - are requests tagged with journey context, not just service context?
  • Frontend/synthetic monitoring - is the user-facing layer observable, not just backend services?
  • Business metrics integration - is technical failure correlated with business impact?

Gaps are scope constraints, not blockers. A CUJ SLO that covers an instrumented subset of the journey with documented limitations is more useful than an aspirational SLO that cannot be reliably measured.

Instrumentation Prioritization

Not all gaps need to be closed before starting. Prioritize by journey criticality against current coverage:

Journey CriticalityInstrumentation StateAction
HighNoneInstrument before defining SLO targets
HighPartialEnhance and document remaining gaps
LowNoneDefer - document and move on

The Iterative Nature of Observability

Instrumentation is not a one-time activity. Once baselines are established, new gaps become apparent. Service architectures change over time, and instrumentation needs to keep pace. A quarterly review cycle works well: reassess gap coverage, refine SLIs where the underlying service has changed, and reprioritize remaining instrumentation work.

flowchart TD

A[Current State: Poor Observability] --> B{Can We Measure CUJs?}

B -->|Check Existing Instrumentation| C[Observability Gap Analysis]

C --> C1[Service-level metrics exist?]

C --> C2[Distributed tracing complete?]

C --> C3[User journey tagging present?]

C --> C4[Frontend/synthetic monitoring?]

C --> C5[Business metrics integrated?]

C1 --> D{Gaps Found}

C2 --> D

C3 --> D

C4 --> D

C5 --> D

D -->|Major Gaps| E[THE CHICKEN-EGG PROBLEM]

E --> E1[Can't define SLOs without metrics]

E --> E2[Can't build metrics without knowing what to measure]

E --> E3[Can't know what to measure without business priorities]

E1 --> F[BLOCKER: Need Business Input First]

E2 --> F

E3 --> F

F --> G[Correct Sequence]

G --> G1[Step 1: Business Defines CUJ Priority]

G1 --> G2[Step 2: SRE Maps to Technical Components]

G2 --> G3[Step 3: Assess Observability Gaps]

G3 --> G4[Step 4: Prioritize Instrumentation Work]

G4 --> G5[Step 5: Engineering Implements]

G5 --> G6[Step 6: SRE Defines SLOs with Real Data]

G6 --> H[Instrumentation Prioritization Matrix]

H --> H1{Critical CUJ + No Metrics?}

H1 -->|Yes| I[HIGH Priority: Instrument Immediately]

H1 -->|No| H2{Critical CUJ + Partial Metrics?}

H2 -->|Yes| J[MEDIUM Priority: Enhance Instrumentation]

H2 -->|No| H3{Low-impact CUJ + No Metrics?}

H3 -->|Yes| K[LOW Priority: Defer]

I --> L[Iterative Improvement Cycle]

J --> L

K --> L

L --> L1[Implement Priority Instrumentation]

L1 --> L2[Measure Baseline with New Data]

L2 --> L3[Discover New Gaps]

L3 --> L4[Refine SLIs]

L4 --> L5[Quarterly Review: Re-prioritize]

L5 --> G3

style A fill:#ffebee

style E fill:#fff3e0

style F fill:#ffcdd2

style G fill:#e8f5e9

style I fill:#c8e6c9

style L fill:#e1f5fe

Key Takeaways

  • Business input precedes instrumentation decisions - the sequence is organizational before technical
  • Instrumentation is engineering work and belongs in the product roadmap, not as an SRE side task
  • Document observability gaps clearly - they define the scope of your SLOs, not a failure to deliver them
  • Treat observability as an ongoing cycle, not a project milestone

Related posts: