# Wave 2: Calibration Drift Moat Ship Summary

Date: 2026-05-12

Wave 2 makes Mithrandir's model self-audit visible: lane health is computed from forward-logged predictions, reliability diagrams are surfaced on Methods, and betting boards can automatically warn when a lane drifts.

## Commit Ladder

| Step | Commit | Summary |
| --- | --- | --- |
| Step 1 | `42a9cab` | Built lane-health computation from forward logs, outcome joins, rolling calibration windows, scheduler wiring, and loader access. |
| Step 2 | `fc98fa9` | Added reliability diagrams and lane-health track-record table to `/methods`. |
| Step 3 | `948498a` | Added additive lane-health banners and track-record strips to allowed betting boards. |
| Step 4 | this commit | Verification pass and this ship summary. |

## Public Surfaces

### Methods Page

`/methods` now includes a Lane Health section near the top of the page. It renders:

- A compact table with market, lane, props logged, Brier, ECE, 14-day trend, and drift status.
- SVG reliability diagrams for market-lane combinations with real forward-log data.
- Honest tracking states for market-lane combinations without enough logged predictions yet.

This is the epistemic moat from the audit: users can see whether the model is behaving, not only what it outputs.

### Betting Boards

The allowed betting boards now consume the same lane-health artifact through shared macros:

- HR boards show real track-record stats for Palantir, Fangorn, and Valinor.
- Non-HR boards show "tracking in progress" states until their forward logs accumulate.
- Drift banners render only when `drift_flag=True`; current HR lanes are healthy, so no warning banner appears today.

The Step 3 changes were intentionally additive and did not structurally rebuild betting pages.

## Current Lane Health State

Snapshot: `data/derived/live/lane_health_2026-05-12.parquet`

| Market | Lane | Props Logged | Brier | ECE | AUC | Drift |
| --- | --- | ---: | ---: | ---: | ---: | --- |
| Home Runs | Palantir | 360 | 0.1127 | 0.0208 | 0.6442 | No |
| Home Runs | Fangorn | 360 | 0.1144 | 0.0195 | 0.5619 | No |
| Home Runs | Valinor | 360 | 0.1142 | 0.0307 | 0.6151 | No |
| Strikeouts | Palantir/Fangorn/Valinor | 0 | -- | -- | -- | Tracking |
| Total Bases | Palantir/Fangorn/Valinor | 0 | -- | -- | -- | Tracking |
| Hits | Palantir/Fangorn/Valinor | 0 | -- | -- | -- | Tracking |
| Pitcher Outs | Palantir/Fangorn/Valinor | 0 | -- | -- | -- | Tracking |
| Futures | Palantir/Fangorn/Valinor | 0 | -- | -- | -- | Tracking |

The HR Valinor ECE is higher than the P3 M1 held-out promotion figure (`0.0149`) because this artifact uses the tiny current forward log rather than the replay evaluation window. It is still below the drift threshold and is presented as a live-health read, not a replacement for the promotion backtest.

## Thresholds

Rolling-window metrics require at least 50 graded predictions. Windows with fewer than 50 rows render as insufficient sample.

Drift warning threshold:

- `ECE > 0.05`, or
- meaningful negative trend in last-14-day ECE versus the prior 14-day window, with Brier used as a tiebreak.

ECE is the primary drift signal because calibration reliability is Valinor's explicit lane purpose and because the audit's moat is about model trustworthiness, not raw ranking power.

## Outcome Join Logic

The lane-health computation first tries to use the existing HR grading artifacts, then falls back to MLB Stats API boxscore outcomes by date/team/opponent/player when no 2026 graded matches are available.

Carried debt: the existing HR grading pipeline or its backing DB did not contain 2026 outcome matches, which forced the MLB Stats API fallback. This is safe for HR today, but it will bite again when K, TB, Hits, Outs, and Futures forward logs accumulate unless the grading pipeline is updated to persist current-season outcomes consistently.

## Scheduler

New daily task:

- `compute_lane_health` at `06:00`, after prediction logging and grading-oriented tasks.

Verification run:

- `python scripts/ops/schedule.py --once --force --task compute_lane_health --target-date 2026-05-12 --skip-health-check`
- Result: success, metric `18.0`, output `data/derived/live/lane_health_2026-05-12.parquet`.

## Verification

- `/methods`: 200, reliability diagrams render.
- `/betting/home-run-edges?lane=valinor`: 200, HR lane track record renders.
- `/betting/fangorn`: 200, HR lane track record renders.
- `/betting/consensus`: 200, cross-market lane health renders.
- `/betting/consensus?view=split`: 200, cross-market lane health renders.
- `/betting/total-base-value-board?lane=palantir`: 200, tracking-in-progress state renders.
- Tests: `61` passed.
- Chalk discipline: template grep shows chalk usage only in `daily.html`.

## Carried Debts

- Methods page methodology stubs are still placeholder content, by prior user direction.
- K Valinor and Wave 4 features remain pending K Valinor resumption.
- Design Refresh Sub-step 13 betting-page structural rebuild remains pending K Valinor coordination.
- Non-HR forward logs need to accumulate before lane health can populate those markets.
- Current-season outcome grading should be fixed centrally so future market health does not depend on per-market Stats API fallbacks.

## Explicit Non-Goals

- No new ML training.
- No K-specific work.
- No structural betting-page rebuild.
- No Methods methodology content fill.
- No full mobile pass.
