Home · Beta · Anthropic — Claude / language model self-knowledge

Beta · AI models · cited; not independently recomputed

Anthropic — Claude / language model self-knowledge

Source class
AI models
Metric
P(IK) — probability the model assigns to 'I know the answer'; P(True) — calibration of confidence in own answers
Reported value
large language models are well-calibrated on their own knowledge, with calibration improving with model scale
Measured
2022-07-11

Context

Anthropic study finding that base language models are well-calibrated on whether they know the answer to a question (P(IK)) and on whether their answers are true (P(True)). This is a calibration-adjacent finding for AI models: not predictive forecasting per se, but the same proper-scoring-rule machinery applied to model self-confidence on factual questions.

Citation

Kadavath, S., Conerly, T., Askell, A., et al. (2022). Language Models (Mostly) Know What They Know. arXiv:2207.05221.

https://arxiv.org/abs/2207.05221

What Phase 1 launch will add

Calibration Ledger has not independently recomputed the value above. Phase 1 launch (target Q3 2027, gated on prerequisites) will add for this source class:

  • Independent recomputation from the original outcome data, under data-licensing agreement
  • Time-windowed breakdown (rolling 3-month, 12-month, lifetime)
  • Cross-domain calibration (does this source calibrate uniformly across topical verticals?)
  • Append-only timestamp anchoring of every score so retroactive revisions are visible
  • Per-source citation page with full Murphy decomposition (Reliability − Resolution + Uncertainty)

Other findings in the same source class

All other findings

Related

Last verified: 2026-04-28. Cited; Calibration Ledger has not independently recomputed this finding. Independent recomputation in Phase 1 (Q3 2027). Operator: Paulo de Vries. Contact: contact@calibrationledger.com.