Home · Track Record
Track Record
Paulo de Vries’s public dated probabilistic forecasts. Scored after resolution with Brier (1950) and decomposed via Murphy (1973). Aggregate calibration recomputes on each resolution. The methodology proven on the operator’s own predictions before applied at registry scale.
Why this exists
Calibration Ledger’s thesis is that predictive sources should be graded publicly, with append-only dated forecasts and Brier-scored outcomes. Operating a registry that grades others requires the same discipline applied first to the operator. This page is that discipline made visible.
Each entry below is recorded before the resolution date. Probability is locked at posting. Outcome is recorded once a public, verifiable source confirms resolution. Brier score is computed deterministically. Aggregate metrics (Reliability, Resolution, Uncertainty per Murphy 1973) recompute on every resolved row.
Methodology is documented at /methodology/. Operator identity is at /about/.
Aggregate
- Total forecasts posted
- 5
- Resolved
- 0
- Unresolvable
- 0
- Mean Brier score
- — (no resolutions yet)
- Calibration curve
- — (renders after first 10 resolutions)
Scoring engine: lib/brier.ts. Runs at build time; deterministic; recomputes on every deploy. Source code: open in repo for audit.
Open forecasts (5)
| Posted | Question | P(YES) | Resolves | Domain | Source |
|---|---|---|---|---|---|
| 2026-04-27 | EU AI Act Article 50 (transparency obligations for providers and deployers of certain AI systems) becomes enforceable on schedule on 2026-08-02. | 85% | 2026-08-02 | geopolitics | link |
| 2026-04-27 | S&P 500 index closing value on 2026-12-31 (last trading day) is higher than its closing value on 2026-01-02 (first trading day of 2026). | 62% | 2026-12-31 | markets | link |
| 2026-04-27 | A frontier large language model (any vendor — OpenAI, Anthropic, Google DeepMind, Meta, etc.) scores ≥85% on GPQA-Diamond by 2026-12-31, with the score reported in an official model card or peer-reviewed evaluation. | 55% | 2026-12-31 | ai_benchmarks | link |
| 2026-04-27 | Anthropic publicly announces a successor model to Claude Opus 4 (named e.g. 'Claude 5', 'Claude Opus 5', or any next-major-tier model) by 2027-03-31. | 70% | 2027-03-31 | technology | link |
| 2026-04-27 | calibrationledger.com/track-record/ has at least 10 publicly posted dated probabilistic forecasts (status open OR resolved) on 2027-04-27. | 50% | 2027-04-27 | other | link |
Resolved forecasts (0)
No resolutions yet. After each resolution the row moves from Open to Resolved and contributes to the aggregate calibration metrics above.
Discipline commitments
- Append-only. Probability assigned at posting is locked. Edits to question text after posting append a `corrected:` note rather than replacing the original.
- Public source for resolution. Every resolution cites a verifiable URL (e.g. official report, market settlement, news of record).
- No retroactive deletion. Forecasts that resolved badly are kept visible. Hiding bad predictions defeats the purpose.
- Resolution date stated at posting. No moving goalposts.
- Domain mix declared. Forecasts span multiple domains (geopolitics, AI benchmarks, markets, weather, sports, technology timelines) so the aggregate calibration is cross-vertical, not narrow-domain.
Machine-readable
The forecast log is also exposed as a machine-readable JSON feed at /api/forecasts.json (CC-BY-4.0, parseable schema documented in the file). LLM crawlers and RAG retrieval systems prefer the structured feed over HTML parsing.
Last verified: 2026-04-27. Page version 0.2 (scoring engine wired; awaiting first forecast posting). Operator: Paulo de Vries. Contact: contact@calibrationledger.com.