# Self-Correcting Lane Architecture + K Valinor Promotion Ship Summary

Date: 2026-05-13

This milestone adds the production safety layer Mithrandir needed before expanding Valinor lanes: every market now has the forward-log/outcome-join path needed for lane health, production lanes can recalibrate nightly from graded outcomes, and K Valinor is live with an explicit caveat while its forward log accumulates.

## Commit Ladder

| Step | Commit | Summary |
| --- | --- | --- |
| Step 1 | `d5da84e` | Extended forward logging and outcome-join/grading infrastructure across TB, Hits, Outs, Futures, and generalized lane health to all graded markets. |
| Step 2 | `a68a559` | Built generic daily lane recalibration, wired HR Valinor as the reference implementation, and accepted a recalibration artifact with a 45% validation ECE improvement. |
| Step 2.5 | `36a3eb5` | Profiled HR daily production, added calibrator caching/thread caps, and confirmed matchups generation is the real bottleneck rather than recalibration. |
| Step 3 | `a03e58d` | Rolled scheduled recalibration status/artifacts across production and research lanes with honest accepted/skipped/rejected states surfaced in lane health. |
| Step 4 | `fb6c4b0` | Promoted K Valinor to production with the approved artifact, daily recalibration hook, forward-log accumulation, and caveat banner. |
| Step 5 | this commit | Verification pass and this ship summary. |

## New Infrastructure

### Forward Logs + Outcome Joins

Prediction forward logs now exist or are wired for every market:

- `data/derived/predictions_log/hr_predictions_<date>.parquet`
- `data/derived/predictions_log/k_predictions_<date>.parquet`
- `data/derived/predictions_log/tb_predictions_<date>.parquet`
- `data/derived/predictions_log/hits_predictions_<date>.parquet`
- `data/derived/predictions_log/outs_predictions_<date>.parquet`
- `data/derived/predictions_log/futures_predictions_<date>.parquet`

Grading is live for HR, K, TB, Hits, and Outs. Futures grading is deferred because futures outcomes resolve at season end and do not fit the daily boxscore pattern.

The shared outcome-join utility in `scripts/shared/outcome_join.py` is now the market-agnostic path for joining logged predictions to realized outcomes. Market-specific wrappers call it instead of each market inventing its own join logic.

### Daily Recalibration

`scripts/shared/recalibrate_lane.py` refits only the calibration layer, not the base model. It:

- Reads the rolling 30-day graded forward log for a market/lane.
- Requires at least 50 graded rows before fitting.
- Fits beta calibration by default.
- Validates the new calibration on the most recent 7-day slice.
- Accepts only when validation ECE improves.
- Writes accepted calibration artifacts under `outputs/models/<market>_<lane>/`.
- Writes `latest_recalibration_status.json` for accepted, rejected, and skipped lanes.

Inference paths load saved calibration artifacts or status pointers only. No model fitting happens during request rendering.

### K Valinor Production

K Valinor is production-wired from:

`outputs/models/k_valinor/k_valinor_lgbm_beta_20260512T223856Z.joblib`

The lane writes:

- `valinor_prob_over`
- `valinor_prob_under`
- `valinor_coverage_status=production`
- `valinor_model_version=k_valinor_lgbm_beta_20260512T223856Z`
- `valinor_prob_4_plus` through `valinor_prob_9_plus`
- `k_valinor_production_fallback_used`

The K board caveat banner renders on `/betting/strikeouts?lane=valinor` until K Valinor reaches 200 graded forward-logged predictions:

`K Valinor recently deployed. Backtest sample limited. Calibration adapting daily - see /methods for current lane health.`

## Current Lane Health

Snapshot: `data/derived/live/lane_health_2026-05-12.parquet`

| Market | Lane | Health | Props Logged | Brier | ECE | Drift |
| --- | --- | --- | ---: | ---: | ---: | --- |
| Home Runs | Palantir | healthy | 360 | 0.1127 | 0.0208 | No |
| Home Runs | Fangorn | healthy | 360 | 0.1144 | 0.0195 | No |
| Home Runs | Valinor | healthy | 360 | 0.1137 | 0.0190 | No |
| Strikeouts | Palantir | insufficient_sample | 30 | 0.2325 | 0.0899 | No |
| Strikeouts | Fangorn | insufficient_sample | 8 | 0.2581 | 0.1550 | No |
| Strikeouts | Valinor | insufficient_sample | 14 | 0.2409 | 0.1700 | No |
| Total Bases | Palantir | drift | 91 | 0.1961 | 0.1350 | Yes |
| Total Bases | Fangorn | tracking_in_progress | 0 | -- | -- | No |
| Total Bases | Valinor | tracking_in_progress | 0 | -- | -- | No |
| Hits | Palantir | tracking_in_progress | 0 | -- | -- | No |
| Pitcher Outs | Palantir | insufficient_sample | 16 | 0.2591 | 0.1557 | No |
| Futures | Palantir | tracking_in_progress | 0 | -- | -- | No |

The K Palantir ECE of `0.0899` remains a watch item as the K forward log grows. It may be small-sample noise, but it is exactly the kind of calibration gap K Valinor and nightly recalibration are meant to catch.

The Total Bases Palantir drift flag is also early-sample. It is intentionally visible rather than hidden, but should be rechecked once the forward log has a fuller sample.

## Recalibration Status

Forced recalibration run date: `2026-05-12`

| Market/Lane | Status | Rolling Rows | Validation ECE Change |
| --- | --- | ---: | --- |
| Home Runs Valinor | rejected_no_ece_improvement | 360 | `0.0174 -> 0.0190` |
| Home Runs Palantir | rejected_no_ece_improvement | 360 | `0.0156 -> 0.0160` |
| Home Runs Fangorn | accepted | 360 | `0.0294 -> 0.0057` |
| Strikeouts Palantir | skipped_insufficient_sample | 30 | -- |
| Strikeouts Fangorn | skipped_insufficient_sample | 8 | -- |
| Strikeouts Valinor | skipped_insufficient_sample | 14 | -- |
| Total Bases Palantir | skipped_insufficient_validation_split | 91 | -- |
| Total Bases Fangorn | skipped_insufficient_sample | 0 | -- |
| Hits Palantir | skipped_no_data | 0 | -- |
| Pitcher Outs Palantir | skipped_insufficient_sample | 16 | -- |
| Futures Palantir | skipped_insufficient_sample | 0 | -- |

This is the intended behavior: recalibration accepts clear improvements, rejects worse fits, and records honest skip reasons instead of silently changing production probabilities.

## Scheduler

Forced checks completed:

- `run_strikeout_edge_refresh` for `2026-05-07`: success, 14 K board rows.
- `grade_strikeout_predictions` for `2026-05-07`: success, 14 graded rows.
- `compute_lane_health` for `2026-05-12`: success, 18 lane-health rows.
- All `recalibrate_*` tasks for HR, K, TB, Hits, Outs, and Futures: success, each wrote an accepted/rejected/skipped status artifact.
- `train_k_valinor`: success, wrote a fresh metadata artifact for the scheduler check.

Production remains pinned to the approved K Valinor artifact from Step 4, not the verification retrain artifact.

## Verification

Routes:

- `/`: 200, `1.881s`
- `/projections`: 200, `0.573s`
- `/methods`: 200, `2.833s`
- `/betting/strikeouts?lane=valinor`: 200, `8.889s`
- `/betting/consensus`: 200, `0.878s`
- `/models/valinor`: 200, `0.832s`

The stabilization-pass targets hold for the core routes: `/` under 4s, `/projections` under 1s, and `/methods` under 4s. The K board route is still slow, but stage timing shows the cost is `load_betting_alignment_status` plus `load_betting_hub`, not K Valinor inference or recalibration. K board artifact loading itself is about `0.004s`.

Click-through checks:

- `/methods` renders Lane Health with recalibration status.
- `/betting/strikeouts?lane=valinor` renders K Valinor probabilities and the caveat banner.
- `/betting/consensus` recognizes Strikeouts as an eligible three-lane market. It has no visible K aligned/split rows yet because current Fangorn coverage is sparse and the only three-lane K rows are `mixed`.
- `/models/valinor` shows Strikeouts as production.

Tests:

- `python -m unittest discover -s tests`
- Result: 73 tests passed.

Chalk discipline:

- Template grep shows chalk usage only in `daily.html`.

## Critical Carried Debts

- K Valinor production uses the fallback path, not the full trained feature matrix. The daily K board currently lacks enough of the training feature matrix for direct LightGBM serving, so production uses matchup-adjusted projected K plus the learned residual distribution and beta calibration. Rows expose this via `k_valinor_production_fallback_used=True`. Future work should build the full K Valinor training feature matrix in the daily pipeline so the base LightGBM model runs in production.
- K Valinor caveat banner persists until 200 forward-logged graded predictions accumulate.
- HR matchups generation remains the real HR daily-production bottleneck at about 37 seconds. It predates recalibration and is not blocking this milestone, but it is the next meaningful HR performance target.
- Wave 2 cumulative drift threshold does not yet enforce the same minimum-sample threshold as rolling windows. This should be a small follow-up so tiny samples cannot trigger scary cumulative drift banners prematurely.
- Other Valinor lanes for Outs, TB, and Hits are still missing. The same forward-log, replay, evaluation, recalibration, caveat-gated playbook should apply when those milestones land.
- Design Refresh Sub-step 13, the structural betting-page rebuild, is now unblocked by K Valinor production but was intentionally not part of this milestone.
- Betting hub/alignment loading is still heavy on K board routes. It is separate from recalibration and should be handled as a future site-performance pass.

## Explicit Non-Goals

- No K Valinor full-feature production wiring beyond the transparent fallback path.
- No new ML training for non-K markets.
- No structural betting-page rebuild.
- No Methods methodology content fill.
- No mobile pass.
