Options’ Implied Probability: A Dive into Risk-Neutral Densities

8 min readJan 6, 2024

In this article, I will describe the method of retrieving the options’ implied probability from option prices. The result will be probability density functions, also referred to as “Risk-Neutral Densities” (RND) or “State-Price Densities” (SPD). These are probably something the closest to the market expectations of the future today — an implied view of the future.

Options drive the market. Major volumes, unusual activities at some strikes, and high open interest at some levels are important factors to take into account while trading.

Furthermore, they can be viewed as part of the market-making book. The market makers must hedge their exposure to be delta-neutral by (most often) buying the underlying. Remind yourself of the Game Stop and huge volumes of calls being bought by individual investors. The market makers were selling those calls and hedging themselves by buying the shares. This was one of the factors, which led to the massive squeeze.

The article will touch on data acquisition, the transformation of the options prices with implied futures prices, the construction of volatility surface and interpolation, the conversion of implied volatilities back to prices, the implementation of the Breeden-Litzenberger formula, and the interpretation of results.

Note that the analysis was made in November 2023. I just thought that it would be interesting to take another look at the results after SPY appreciated considerably (Spot when doing analysis ~ 420, now ~ 468).

Data Acquisition

When delving into the topic of RNDs, the first crucial step lies in acquiring and refining the data. I will use Yahoo Finance as it is a free data source. However, it is crucial to remember that by no means it is a reliable data source. It is vital to note that data accuracy varies with the timing of download. The most reliable data is typically obtained during, or at the close of, trading sessions when liquidity is at its peak, thereby ensuring the most precise option prices. Again, note that this is not a reliable source of information for trading purposes, but helpful in our case of just exploring the topic.

To expedite the process and access data efficiently, I use OpenBB Terminal. This open-source platform serves as a great alternative to Bloomberg or Reuters (and it is free). With a simple command, one can quickly acquire the entire options chain for a given stock or index. For instance, to fetch the SPY options chain from Yahoo Finance, the following line of code is employed:

chain_raw = openbb.stocks.options.load_options_chains("SPY",source='YahooFinance')
df_options = chain_raw.chains
df_options.head()

As you can see, the ITM region is noisy (we will handle it later).

Navigating through data cleaning is pivotal, albeit not the most thrilling part to discuss here. Options quotes from Yahoo are subject to anomalies — sudden spikes or drops in prices that are outliers rather than reflections of true market prices. Identifying and filtering out these anomalies is vital to avoid skewing our RND analysis.

Furthermore, maintaining the monotonicity of put/call prices and keeping an eye on the volume of options contracts is important as well. It is a proxy for liquid and illiquid contracts, and the latter might introduce unnecessary noise.

The other factor is standard expirations. There are a lot of expirations, that are non-standard, and the options in these expirations are not liquid enough. It might be a good idea to exclude those dates from the analysis.

Transformation of the options prices with implied futures prices

The prices of calls at every d; x = Strike, y = Close — Call prices

Before arriving quickly at the best way of handling this particular data set, let’s quickly review our choices. The first decision is to interpolate either prices or implied volatilities.

Price interpolation seems reasonable. Remember that at the end we will work with prices, and we could get them right away. However, any interpolation of prices could lead to many arbitrage opportunities in the volatility space. In many, if not most cases it is. The more robust approach is to interpolate between implied volatilities (IVs), and that’s what I’ll do.

Now — should we use puts or calls? The answer is both. There are two methods of doing that. The first one is to calculate implied volatilities for puts and calls, then match the IVs ATM for OTM puts and calls (because OTM contracts are more liquid). With this approach, we get more information than working just with puts or calls. We effectively exclude ITM options by incorporating just OTM ones. These are more frequently traded for many reasons (e.g. hedging, convexity). However, this operation could lead to some discrepancies. In my case, the volatility ATM did not match, and the result wasn’t satisfactory. That is probably because Yahoo Finance has some time lag and the spot of the underlying might not match the prices, thus the analysis might be flawed.

To circumvent all these problems, we can use the alternative approach — calculation of implied futures prices to get the spot right, then calculate the prices from put-call parity for ITM options. This is the approach presented in the paper of Ait-Sahalia and Lo (1998).

The first step is to retrieve the future price for each maturity.

For the derivation of implied futures prices from put-call parity, we have to use the most reliable contract, and these are ATM puts and calls. With the derived future price, we can then plug the future price into put-call parity, which will return the prices of ITM options. Consequently, this method also allows us to circumvent three issues: avoid unreliable data (ITM contracts), have the future/spot price at the time of the recorded prices, and circumvent the task of forecasting correct future dividend yield. We have results, which allow us to drop either put or call prices without any loss of information.

III) Construction of Volatility Surface and interpolation

Now, when we have all the prices, we can proceed to volatility surface interpolation. I won’t delve into the details about what implied volatility is, but in simple words, implied volatility is the volatility input in the Black-Scholes equation, which yields the market price. When assessing the price of the contracts it is convenient to think in implied volatility terms as when trading options you trade volatility. The trend component (so-called drift) can be eliminated (in reality it is not that straightforward) through delta-hedging and the only source of randomness in the income from the option is the volatility.

There are dozens of volatility models such as local volatility, stochastic, and others. However, the goal of this article is to obtain accurate risk-neutral PDFs, not the “perfect” volatility surface for pricing other contracts. Our goal is to obtain a consistent volatility surface, from which we can recover option prices and analyze RNDs.

The interpolation we will use is a cubic spline of the 4th degree. This type of interpolation seems to yield consistent results.

IV) Conversion of implied volatilities back to prices and implementation of the Breeden-Litzenberger formula

Having interpolated the entire volatility surface and granulated curves for each maturity, this step involves applying the Breeden-Litzenberger formula to gain insights into the market’s expectations of future price movements by calculating RNDs.

By reverse-engineering the Black-Scholes model, one can gain valuable insights into implied distributions and market participants’ expectations, which can be highly informative for investors, traders, or risk professionals. It allows more detailed analysis and insight into market participants’ expectations, market sentiment, or option pricing by integration of these densities.

To obtain these densities, we have to differentiate the prices twice. The essential formulas are below:

After performing the calculations on market prices, the values obtained are our RNDs values. As we can see, they look just like normal probability density functions. With expiration, the tails are getting fatter and the center of the RNDs is shifting toward higher values.

Are they 100% correct? Probably not as the data was quite noisy and it needed a lot of filtering, interpolation, outlier exclusions, etc. Better data would probably result in more consistent densities. Nevertheless, the area below the densities is 1 +/- 0.01, which is very satisfactory.

V) Interpretation

The risk-neutral densities are indeed risk-neutral, which means that these are not real-world probabilities. That means that these probabilities are correct in a world indifferent to risk. This is a correct framework when pricing derivatives, however straightforward translation to our could be incorrect.

Nevertheless, there exists a link. Investors' sentiments are reflected in prices, which converted to the densities should be interpretable. Consequently, changes in RNDs are an arbitrage-free reflection of their preferences and the relative weight attached to the upside and downside. Their skin in the game, measured by the prices, can be somehow attached to these probabilities.

There are also some applications of RNDs in terms of risk management purposes or trading. Let’s consider this example:

Firstly, we have our density functions. We can either agree that they are correct or not. If we agree with them, we infer that the market participants are right and it somehow self-regulates. Consequently, RNDs can help with detecting spot anomalies and help in deciding whether the changes are short-lived or significant.

We can also disagree with them. We can think that, for example, the probability density function looks too narrow and thin-tailed. We can bet on it. To extract the 2nd and 4th moment (mean and kurtosis) of such distribution we can sell ATM straddles and buy OTM straddles in greater quantity or for example buy calendar spreads. Both trades allow us to bet on the price distribution.

Conclusion

Enhancements like tail fitting to a Generalized Pareto Distribution (GPD) for more OTM prices enhance the analysis by preventing underestimation of extreme outcomes. However, despite minor deviations in probability summation for further expirations, I am content with the present values, given the primary objective — pure analysis rather than derivative pricing. Maybe when I get access to more reliable data, I will re-do the analysis, but until then follow and subscribe to the stories— a lot of interesting articles are in the making.

I didn’t include the code, however, I can provide it if someone is interested. Just reach me out here or on Linkedin.

I hope you enjoyed this article. For any questions, feel free to reach out, I’m happy to help.

References:

Nonparametric Estimation of State-Price Densities Implicit in Financial Assets Prices (Yacine Ait-Sahalia, Andrew Lo; 1998)
https://www.stern.nyu.edu/sites/default/files/assets/documents/con_044169.pdf — Stephen Figlewski