Polymarket Calibration Tracker

Methodology

This tracker measures calibration: whether markets that trade at X% actually resolve YES X% of the time.

Data source: Polymarket's Gamma API for resolved events + CLOB orderbook for historical price snapshots (best bid/ask midpoint).
Markets analyzed: Binary (YES/NO) markets with clear resolution. Multi-outcome markets (e.g., "Who will win the election?") are included as individual binary contracts, which means the 0-5% bin is heavily populated with losing candidates.
Price snapshots: The YES token best-bid/ask midpoint price at 24 hours, 7 days, and 30 days before market close.
Binning: Markets grouped into probability buckets (0-5%, 5-15%, 15-25%, ..., 95-100%). Dot size on the chart is proportional to sample size. Faded bins with * have fewer than 50 markets.
Brier score: Mean squared error between forecast and outcome. 0.25 = coin-flip baseline. 0.02 = weather-forecast-grade accuracy. Lower is better.
Perfect calibration = the dashed diagonal. Points above the line = market underpriced the event. Below = overpriced.

Important caveat: ~75% of price-tracked markets fall in the 0-5% bin (multi-outcome market losers). Middle bins (20-80%) contain fewer than 35 markets each — the calibration curve in this range has wide confidence intervals. The overall Brier score is dominated by extreme-probability bins. Interpret the middle of the curve with caution.

Other limitations: Price history limited to top 2,000 markets by volume. Cancelled or ambiguously resolved markets excluded. Categories inferred from question text via keyword matching.

Polymarket Calibration Tracker

Calibration Curve

Highest-Volume Resolved Markets

Methodology