Home · Track Record

Track Record

Name: Paulo de Vries Forecast Track Record
Creator: Paulo de Vries
Published: 2026-04-27
License: https://creativecommons.org/licenses/by/4.0/

Paulo de Vries’s public dated probabilistic forecasts. Scored after resolution with Brier (1950) and decomposed via Murphy (1973). Aggregate calibration recomputes on each resolution. The methodology proven on the operator’s own predictions before applied at registry scale.

Why this exists

Calibration Ledger’s thesis is that predictive sources should be graded publicly, with append-only dated forecasts and Brier-scored outcomes. Operating a registry that grades others requires the same discipline applied first to the operator. This page is that discipline made visible.

Each entry below is recorded before the resolution date. Probability is locked at posting. Outcome is recorded once a public, verifiable source confirms resolution. Brier score is computed deterministically. Aggregate metrics (Reliability, Resolution, Uncertainty per Murphy 1973) recompute on every resolved row.

Methodology is documented at /methodology/. Operator identity is at /about/.

Aggregate

Total forecasts posted: 9
Resolved: 0
Unresolvable: 0
Mean Brier score: — (no resolutions yet)
Calibration curve: — (renders after first 10 resolutions)

Scoring engine: lib/brier.ts. Runs at build time; deterministic; recomputes on every deploy. Source code: open in repo for audit.

Open forecasts (9)

Posted	Question	P(YES)	Resolves	Domain	Source
2026-04-27	The U.S. Federal Reserve holds the federal funds target rate at or below its 2026-04 level through the FOMC meeting on 2026-06-17.	78%	2026-06-17	markets	link
2026-04-27	EU AI Act Article 50 (transparency obligations for providers and deployers of certain AI systems) becomes enforceable on schedule on 2026-08-02.	85%	2026-08-02	geopolitics	link
2026-04-27	S&P 500 index closing value on 2026-12-31 (last trading day) is higher than its closing value on 2026-01-02 (first trading day of 2026).	62%	2026-12-31	markets	link
2026-04-27	A frontier large language model (any vendor — OpenAI, Anthropic, Google DeepMind, Meta, etc.) scores ≥85% on GPQA-Diamond by 2026-12-31, with the score reported in an official model card or peer-reviewed evaluation.	55%	2026-12-31	ai_benchmarks	link
2026-04-27	OpenAI launches a model branded as 'GPT-5' (or successor explicitly identified as next major generation) generally available to ChatGPT consumer + API users by 2026-12-31.	65%	2026-12-31	technology	link
2026-04-27	Global mean surface temperature for calendar year 2026 (per NOAA NCEI annual climate report) is warmer than 2024 (which set the prior record at +1.46°C above 1850-1900 baseline per WMO).	35%	2027-01-15	weather	link
2026-04-27	English Wikipedia monthly active editors (≥5 edits/month) on 2027-01-31 is higher than on 2026-04-30, per Wikimedia Foundation public statistics.	42%	2027-02-15	technology	link
2026-04-27	Anthropic publicly announces a successor model to Claude Opus 4 (named e.g. 'Claude 5', 'Claude Opus 5', or any next-major-tier model) by 2027-03-31.	70%	2027-03-31	technology	link
2026-04-27	calibrationledger.com/track-record/ has at least 10 publicly posted dated probabilistic forecasts (status open OR resolved) on 2027-04-27.	50%	2027-04-27	other	link

Resolved forecasts (0)

No resolutions yet. After each resolution the row moves from Open to Resolved and contributes to the aggregate calibration metrics above.

Discipline commitments

Append-only. Probability assigned at posting is locked. Edits to question text after posting append a `corrected:` note rather than replacing the original.
Public source for resolution. Every resolution cites a verifiable URL (e.g. official report, market settlement, news of record).
No retroactive deletion. Forecasts that resolved badly are kept visible. Hiding bad predictions defeats the purpose.
Resolution date stated at posting. No moving goalposts.
Domain mix declared. Forecasts span multiple domains (geopolitics, AI benchmarks, markets, weather, sports, technology timelines) so the aggregate calibration is cross-vertical, not narrow-domain.

Machine-readable

The forecast log is also exposed as a machine-readable JSON feed at /api/forecasts.json (CC-BY-4.0, parseable schema documented in the file). LLM crawlers and RAG retrieval systems prefer the structured feed over HTML parsing.

Frequently asked questions

What is this page?

A public, append-only log of Paulo de Vries’s dated probabilistic forecasts across geopolitics, AI benchmarks, markets, technology, and weather. Each forecast is timestamped before resolution; outcomes are scored with Brier (1950) and decomposed via Murphy (1973). The page exists to demonstrate operator track-record discipline before Calibration Ledger applies the same methodology to other forecasters at registry scale.

How are forecasts scored?

Each resolved binary forecast is scored with the Brier formula: (probability − outcome)². Aggregate Brier is the mean across all resolved forecasts; lower is better. Once 10 resolved forecasts accumulate, the score decomposes via Murphy 1973 into reliability − resolution + uncertainty, with reliability bins computed at 10-percentage-point intervals.

When does the 12-month track-record clock start?

The 12-month clock starts on the first resolved forecast, currently dated 2026-08-02 (EU AI Act Article 50 enforcement). The full 12 months of resolved-and-scored history is a Calibration Ledger Phase 1 launch prerequisite (target Q3 2027), alongside academic credibility, signed enterprise LOI, and data-licensing agreements.

Last verified: 2026-04-29. Page version 0.2 (scoring engine wired; awaiting first forecast posting). Operator: Paulo de Vries. Contact: contact@calibrationledger.com.