PLAYBOOK · 05

Playbook · Observability

Metrics, traces, and logs as one queryable store. OTLP in, Prometheus text out, traces as a connected graph. Replaces the Splunk + Prometheus + Tempo tier with one binary.

Three indices

Separate the three signals so ingest parsers can route directly.

$ curl -sX PUT http://localhost:8080/v1/indices/metrics -d '{"fields":{
    "@timestamp":"date","name":"keyword","value":"float","labels":"keyword"
  }}'

$ curl -sX PUT http://localhost:8080/v1/indices/traces -d '{"fields":{
    "@timestamp":"date","trace_id":"keyword","span_id":"keyword",
    "parent_span_id":"keyword","service":"keyword","operation":"keyword",
    "duration_us":"long","http_status":"integer"
  }}'

$ curl -sX PUT http://localhost:8080/v1/indices/logs -d '{"fields":{
    "@timestamp":"date","service":"keyword","level":"keyword",
    "trace_id":"keyword","span_id":"keyword","message":"text"
  }}'

OTLP ingest · collector-free

# your app already exports OTLP — point it at XERJ
$ OTEL_EXPORTER_OTLP_ENDPOINT=http://xerj:8080/v1/indices/traces/otlp \
  ./your-app

Service map via trace graph

{
  "query": { "range": { "@timestamp": { "gte": "now-1h" } } },
  "aggs": {
    "services": {
      "terms": { "field": "service", "size": 50 },
      "aggs": {
        "p95": { "percentiles": { "field": "duration_us", "percents": [95] } },
        "error_rate": {
          "filter": { "range": { "http_status": { "gte": 500 } } }
        }
      }
    }
  },
  "size": 0
}

Correlate a slow trace with its logs

# 1. find a slow trace
{
  "query": {
    "range": { "duration_us": { "gte": 5000000 } }
  },
  "size": 1,
  "sort": [ { "@timestamp": "desc" } ]
}

# 2. pull logs with that trace id
{
  "query": { "term": { "trace_id": "4f9c...2a1e" } },
  "sort":  [ { "@timestamp": "asc" } ],
  "size":  200
}

Prometheus scrape XERJ itself

Recursive: scrape /v1/metrics to watch XERJ's own health while it serves the observability workload.

Dashboards

Open the playground → INGEST · PIPELINE and SYSTEM · OVERVIEW — every panel is a real Prometheus metric from the engine's common::metrics module.

Source · engine/crates/otlp/src/lib.rs · common/src/metrics.rs