This document defines how ratelord forecasts the future state of constraint pools.
The system’s core value proposition is predictive governance: instead of blocking actions only when a limit is hit (reactive), ratelord blocks or shapes actions when the risk of future exhaustion becomes unacceptable.
Raw counters ("4500/5000 remaining") are insufficient for decision-making. A counter without a rate is meaningless. ratelord treats observations as inputs to a probabilistic model that outputs Time-to-Exhaustion (TTE) and Risk.
All governance decisions are made in the time domain.
- Wrong: "Deny because remaining < 500."
- Right: "Deny because P90 TTE is 5 minutes, but reset is in 60 minutes."
The future is uncertain. Agents burst, providers suffer latency, and clocks drift. Therefore, ratelord never predicts a single number. It predicts a distribution:
- P50 (Expected): The median outcome. "We probably have 30 minutes left."
- P90 (Conservative): The safe bet. "We are 90% sure we have at least 15 minutes left."
- P99 (Worst Case): The defensive bound. "In the worst 1% of scenarios, we survive 5 minutes."
Governance policies typically gate on P90 or P99 to ensure reliability.
Burn rate (
To balance responsiveness with stability, ratelord tracks burn rate using Exponential Moving Averages over multiple windows:
-
Instantaneous (
$B_{inst}$ ): Very short window (e.g., last 1 minute or last$N$ events). Detects spikes immediately. -
Trend (
$B_{trend}$ ): Medium/Long window (e.g., 15 minutes, 1 hour). Captures baseline load.
We do not just track the mean burn rate
-
Stable Agent: Low variance
$\rightarrow$ P50 and P99 are close. -
Bursty Agent: High variance
$\rightarrow$ P99 is much lower (shorter TTE) than P50.
If no usage events occur, the burn rate must decay toward zero (or a known floor).
- Decay Function:
ratelordapplies a decay factor based on time-since-last-event. - Silence != Safety: A silent provider might mean a network partition, not zero usage. See "Uncertainty" below.
TTE is the estimated duration until a pool's capacity reaches zero, assuming no reset occurs.
Fundamentally: $$ TTE = \frac{\text{Remaining Capacity}}{\text{Burn Rate}} $$
Because Burn Rate is a distribution ratelord approximates quantiles:
-
P50 TTE: Derived from Mean Burn Rate. $$ TTE_{P50} \approx \frac{C_{rem}}{B_{mean}} $$
-
P99 TTE (Worst Case): Derived from a high-percentile Burn Rate (e.g., Mean + 2$\sigma$ or Peak observed). $$ TTE_{P99} \approx \frac{C_{rem}}{B_{P99}} $$
Note: As $C_{rem}$ approaches zero, TTE approaches zero regardless of burn rate.
The absolute TTE is less important than the relationship between TTE and Time-to-Reset (TTR).
"Will we exhaust before the reset?"
We are safe if: $$ TTE_{min} > TTR_{max} $$
Risk is the probability that exhaustion occurs before reset: $$ P(\text{Exhaustion}) = P(TTE < TTR) $$
- If
$P(\text{Exhaustion}) > \text{Threshold}$ (e.g., 10%), the system enters a Yellow/Warning state. - If
$P(\text{Exhaustion})$ is near 100%, the system enters a Red/Critical state (throttling active).
The Safety Margin is the buffer between the worst-case survival time and the reset time. $$ \text{Margin} = TTE_{P99} - TTR $$
- Positive Margin: "Even at worst-case burn, we survive past the reset."
- Negative Margin: "At worst-case burn, we die before reset."
- Fixed Windows: TTR is precise (modulo clock skew).
- Rolling Windows: TTR is complex (capacity trickles back in).
- Unknown/Jitter: TTR has uncertainty bounds.
ratelord conservatively assumes the latest probable reset time (
Predictions are not flat; they exist at every node in the constraint graph where a limit or burn rate can be observed.
Scope: The Provider's limit (e.g., GitHub REST Core for an Org).
- The most authoritative prediction.
- Aggregates usage from all identities and scopes sharing this pool.
- If this TTE is critical, everyone sharing the pool gets throttled.
Scope: Specific Token / Identity.
- Tracks the burn rate attributable to a single identity.
- Useful for detecting "noisy neighbors" or rogue agents.
- Used to target
approve_with_modifications(throttle specific offender) rather than global panic.
Scope: Repo, Project, or Environment.
- "How fast is the
frontend-repoCI pipeline burning budget?" - Used for high-level resource planning and "virtual" limits.
Forecasts must be honest about what they don't know.
As data ages (time since last provider_poll_observed), uncertainty grows.
-
Mechanism: Inflate the variance
$\sigma$ of the burn rate model as a function of staleness. - Result: P99 TTE drops rapidly (becomes more conservative) as data gets old. "If we haven't seen data in 5 minutes, assume the worst."
If a provider does not report a reset time (or we haven't seen one yet):
- Assume
$TTR = \infty$ (worst case) or a provider-specific heuristic default. - Risk becomes effectively 100% if any burn exists, forcing conservative throttling until a reset is observed.
If provider_error events occur:
- Freeze
$C_{rem}$ at last known value (or decay it pessimistically). - Widen confidence intervals.
- Governance policies typically switch to "Safe Mode" (deny non-critical intents).
-
usage_observed: Used to update$C_{rem}$ and calculate instantaneous burn$B_{inst}$ . -
reset_observed: Used to anchor the$TTR$ calculation. -
constraint_observed: Updates the capacity ceiling$C_{total}$ . -
provider_poll_observed: Timestamp provides the "freshness" anchor.
The Forecaster emits forecast_computed events. These are the only signals Policy uses to make decisions.
Event Payload (forecast_computed):
{
"provider_id": "github",
"pool_id": "rest_core",
"scope_id": "org:acme",
"as_of_ts": 1700000000,
"tte": {
"p50_seconds": 3600,
"p90_seconds": 1800,
"p99_seconds": 300
},
"risk": {
"probability_exhaustion_before_reset": 0.05,
"safety_margin_seconds": 240,
"ttr_seconds": 60
},
"burn_rate": {
"mean": 1.5,
"variance": 0.2,
"unit": "req/sec"
}
}- Cold Start: How do we predict burn rate for a brand new identity with zero history? (Likely need a "pessimistic default" or "learning phase" state).
- Rolling Windows: Mathematically modeling TTE for rolling windows (where capacity returns continuously) is complex. Do we simulate it, or approximate with "worst case fixed window"?
- Seasonality: Should the burn rate model eventually account for time-of-day (e.g., "CI bursts at 9am")? (Out of scope for Phase 1, but architectural hooks should exist).
- Feedback Loops: If
ratelordthrottles an agent, observed burn rate drops. This increases TTE, which might relax throttling, causing burn to spike again. We need to distinguish "natural burn" from "throttled burn".