Home · Beta · GPT-4 (OpenAI) — pre-RLHF vs post-RLHF calibration

Beta · AI models · cited; not independently recomputed

GPT-4 (OpenAI) — pre-RLHF vs post-RLHF calibration

Source class
AI models
Metric
Expected Calibration Error (ECE) on multiple-choice benchmarks
Reported value
pre-RLHF: well-calibrated; post-RLHF: degraded calibration (per OpenAI's own measurement)
Measured
2023-03-15

Context

OpenAI's GPT-4 System Card explicitly reports that the base GPT-4 model is well-calibrated on multiple-choice benchmarks (calibration plot in §3.2 of the system card), and that RLHF post-training degraded calibration. This is a rare publisher-acknowledged calibration finding for a frontier LLM.

Citation

OpenAI (2023). GPT-4 Technical Report. arXiv:2303.08774. §3.2 "Calibration".

https://arxiv.org/abs/2303.08774

What Phase 1 launch will add

Calibration Ledger has not independently recomputed the value above. Phase 1 launch (target Q3 2027, gated on prerequisites) will add for this source class:

  • Independent recomputation from the original outcome data, under data-licensing agreement
  • Time-windowed breakdown (rolling 3-month, 12-month, lifetime)
  • Cross-domain calibration (does this source calibrate uniformly across topical verticals?)
  • Append-only timestamp anchoring of every score so retroactive revisions are visible
  • Per-source citation page with full Murphy decomposition (Reliability − Resolution + Uncertainty)

Other findings in the same source class

All other findings

Related

Last verified: 2026-04-28. Cited; Calibration Ledger has not independently recomputed this finding. Independent recomputation in Phase 1 (Q3 2027). Operator: Paulo de Vries. Contact: contact@calibrationledger.com.