Home · Track Record

Track Record

Paulo de Vries’s public dated probabilistic forecasts. Scored after resolution with Brier (1950) and decomposed via Murphy (1973). Aggregate calibration recomputes on each resolution. The methodology proven on the operator’s own predictions before applied at registry scale.

Why this exists

Calibration Ledger’s thesis is that predictive sources should be graded publicly, with append-only dated forecasts and Brier-scored outcomes. Operating a registry that grades others requires the same discipline applied first to the operator. This page is that discipline made visible.

Each entry below is recorded before the resolution date. Probability is locked at posting. Outcome is recorded once a public, verifiable source confirms resolution. Brier score is computed deterministically. Aggregate metrics (Reliability, Resolution, Uncertainty per Murphy 1973) recompute on every resolved row.

Methodology is documented at /methodology/. Operator identity is at /about/.

Aggregate

Total forecasts posted
9
Resolved
0
Unresolvable
0
Mean Brier score
— (no resolutions yet)
Calibration curve
— (renders after first 10 resolutions)

Scoring engine: lib/brier.ts. Runs at build time; deterministic; recomputes on every deploy. Source code: open in repo for audit.

Open forecasts (9)

PostedQuestionP(YES)ResolvesDomainSource
2026-04-27The U.S. Federal Reserve holds the federal funds target rate at or below its 2026-04 level through the FOMC meeting on 2026-06-17.78%2026-06-17marketslink
2026-04-27EU AI Act Article 50 (transparency obligations for providers and deployers of certain AI systems) becomes enforceable on schedule on 2026-08-02.85%2026-08-02geopoliticslink
2026-04-27S&P 500 index closing value on 2026-12-31 (last trading day) is higher than its closing value on 2026-01-02 (first trading day of 2026).62%2026-12-31marketslink
2026-04-27A frontier large language model (any vendor — OpenAI, Anthropic, Google DeepMind, Meta, etc.) scores ≥85% on GPQA-Diamond by 2026-12-31, with the score reported in an official model card or peer-reviewed evaluation.55%2026-12-31ai_benchmarkslink
2026-04-27OpenAI launches a model branded as 'GPT-5' (or successor explicitly identified as next major generation) generally available to ChatGPT consumer + API users by 2026-12-31.65%2026-12-31technologylink
2026-04-27Global mean surface temperature for calendar year 2026 (per NOAA NCEI annual climate report) is warmer than 2024 (which set the prior record at +1.46°C above 1850-1900 baseline per WMO).35%2027-01-15weatherlink
2026-04-27English Wikipedia monthly active editors (≥5 edits/month) on 2027-01-31 is higher than on 2026-04-30, per Wikimedia Foundation public statistics.42%2027-02-15technologylink
2026-04-27Anthropic publicly announces a successor model to Claude Opus 4 (named e.g. 'Claude 5', 'Claude Opus 5', or any next-major-tier model) by 2027-03-31.70%2027-03-31technologylink
2026-04-27calibrationledger.com/track-record/ has at least 10 publicly posted dated probabilistic forecasts (status open OR resolved) on 2027-04-27.50%2027-04-27otherlink

Resolved forecasts (0)

No resolutions yet. After each resolution the row moves from Open to Resolved and contributes to the aggregate calibration metrics above.

Discipline commitments

  • Append-only. Probability assigned at posting is locked. Edits to question text after posting append a `corrected:` note rather than replacing the original.
  • Public source for resolution. Every resolution cites a verifiable URL (e.g. official report, market settlement, news of record).
  • No retroactive deletion. Forecasts that resolved badly are kept visible. Hiding bad predictions defeats the purpose.
  • Resolution date stated at posting. No moving goalposts.
  • Domain mix declared. Forecasts span multiple domains (geopolitics, AI benchmarks, markets, weather, sports, technology timelines) so the aggregate calibration is cross-vertical, not narrow-domain.

Machine-readable

The forecast log is also exposed as a machine-readable JSON feed at /api/forecasts.json (CC-BY-4.0, parseable schema documented in the file). LLM crawlers and RAG retrieval systems prefer the structured feed over HTML parsing.

Frequently asked questions

What is this page?

A public, append-only log of Paulo de Vries’s dated probabilistic forecasts across geopolitics, AI benchmarks, markets, technology, and weather. Each forecast is timestamped before resolution; outcomes are scored with Brier (1950) and decomposed via Murphy (1973). The page exists to demonstrate operator track-record discipline before Calibration Ledger applies the same methodology to other forecasters at registry scale.

How are forecasts scored?

Each resolved binary forecast is scored with the Brier formula: (probability − outcome)². Aggregate Brier is the mean across all resolved forecasts; lower is better. Once 10 resolved forecasts accumulate, the score decomposes via Murphy 1973 into reliability − resolution + uncertainty, with reliability bins computed at 10-percentage-point intervals.

When does the 12-month track-record clock start?

The 12-month clock starts on the first resolved forecast, currently dated 2026-08-02 (EU AI Act Article 50 enforcement). The full 12 months of resolved-and-scored history is a Calibration Ledger Phase 1 launch prerequisite (target Q3 2027), alongside academic credibility, signed enterprise LOI, and data-licensing agreements.

Last verified: 2026-04-29. Page version 0.2 (scoring engine wired; awaiting first forecast posting). Operator: Paulo de Vries. Contact: contact@calibrationledger.com.