Essay · Quant & Markets
The Vol Surface That Kept Calling the Crowd Wrong
A longer look at the volatility-surface trade behind the Deribit-Polymarket tool: why a better fit never closed the gap, and why the gap was the whole point.
The trade had a name in my head: VolyPoly. Fit a volatility surface to Deribit's Bitcoin options, turn it into a probability for whatever Polymarket was asking, and bet when the two disagreed. Across four currencies and several hundred paper positions, it lost on nearly all of them. The interesting question is not that it lost. It is why no amount of engineering fixed it.
The Surface Was Fine
None of the loss came from a bad fit. The surface was well-behaved: clean implied-volatility points, an SVI parametrisation that tracked the smile without inventing arbitrage, fit errors in the third decimal place. The model worked correctly. It just confidently produced probabilities that were confidently wrong. Against what actually happened, it scored a Brier of about 0.30; the Polymarket mid-prices scored about 0.13. The crowd was more than twice as accurate. The tool assumed Polymarket was the inefficient venue. The data said the opposite.
Two Probabilities Wearing the Same Clothes
The two numbers are not the same kind of thing, even though both read as percentages. An options-implied probability is risk-neutral: the real chance of an event after it has been distorted by what people pay to hedge it, the variance risk premium. On equities that is a couple of points. On Bitcoin it is enormous, because crash insurance is expensive and everyone wants it, and it inflates the implied probability of large moves.
Polymarket prices sit closer to a physical probability: what people actually expect. So subtract the options number from the Polymarket number, call it "edge," and much of what you measured is just that premium, not a mispricing anyone can collect. The crowd also often knows about a catalyst a surface fitted to quotes cannot see. The model mistook both for opportunity.
Everything I Tried, and Why It Did Nothing
The natural reaction is to fix the fit, so I tried the obvious things: a haircut to shrink implied volatility toward realised, and vega weighting on the smile. Both A/B-tested against the paper book. Together they moved the Brier score by a few thousandths, against a gap of about 0.16. Statistically detectable, economically nothing.
That null result was the actual finding. If polishing the fit cannot close the gap, the gap is not a fitting error. It is structural, baked into the difference between a risk-neutral and a physical probability, and no surface technique converts one into the other. I had spent too long treating a conceptual mismatch as a numerical one.
The only lever that moved the book was time to expiry. Short horizons were roughly break-even; the longer bucket was a disaster and made up nearly half the trades. Over a longer horizon any genuine mispricing gets competed away, leaving the structural premium, which is not yours to take. That ceiling is the point.
The One Real Lesson
There was a genuine accident worth being honest about. One currency, the thinnest of the four, had the best win rate and the smallest gap to the market. Tempting to read as the model finally working. It was not: that market just had a less efficient crowd on the other side, easier to beat. The lesson is uncomfortable for anyone who likes building models: the quality of the counterparty mattered more than the quality of the model.
Which points at the only constructive move. If the options market structurally overprices tails, you do not bet against the crowd; you sell that premium directly, on the side that collects it. The same gap that made VolyPoly lose is harvestable from the other side. The failed trade was the right observation pointed in the wrong direction.
The companion failure, for a completely different structural reason, is the high-frequency directional postmortem. And the main piece, with the interactive that shows the two curves splitting in the tails, is Options vs the Crowd.