flyteorg/flyte

[flyte2] Instrument the runs service reconcilers (abort-reconciler) with Prometheus metrics

Open

#7,449 opened on May 29, 2026

View on GitHub
 (2 comments) (0 reactions) (1 assignee)Python (378 forks)batch import
flyte2good first issue

Repository metrics

Stars
 (3,705 stars)
PR merge metrics
 (Avg merge 3d 8h) (116 merged PRs in 30d)

Description

Part of #7445. Depends on #7446 (the /metrics endpoint + Scope must exist first).

Summary

Add Prometheus metrics to the runs service background reconcilers (starting with the abort reconciler) to observe queue depth, processing throughput, retries, and failures.

Background

runs/service/abort_reconciler.go runs as a background worker (registered in runs/setup.go via sc.AddWorker("abort-reconciler", ...)). It has a worker pool, a bounded queue (QueueSize: 1000), and retry logic (MaxAttempts, InitialDelay, MaxDelay). None of this is currently observable via metrics.

What to do

  1. Thread the metrics Scope (from #7446) into service.NewAbortReconciler(...) (extend its config/constructor).
  2. Emit metrics such as:
    • current queue depth / pending items (gauge)
    • items processed (counter, labeled by success/failure)
    • retries / attempts (counter)
    • per-item processing latency (timer/histogram)

Acceptance criteria

  • /metrics exposes abort-reconciler queue depth, processed count (success/failure), retry count, and processing latency.
  • Metrics use a dedicated sub-scope, e.g. scope.NewSubScope("abort_reconciler"), created once.
  • A unit test verifies that processing an item updates the relevant counters/gauges.

Pointers

  • runs/service/abort_reconciler.go — the reconciler implementation and its run loop.
  • runs/setup.go:64-73 — where NewAbortReconciler is constructed and registered as a worker.
  • flytestdlib/promutils/scope.goScope helpers (MustNewGauge, MustNewCounter, MustNewStopWatch, NewSubScope).

Notes for contributors

  • The gauge for queue depth should be updated as items are enqueued/dequeued (or sampled periodically).
  • This is independent of #7447 and #7448; all three consume the same Scope from #7446.

Contributor guide