Modeling and Structure of Prediction Markets

This article was originally written in Chinese and was published on RedNote (one Chinese social media). It serves as an introduction to the theorerical foundation of prediciton markets, and generally does not discuss practical strategies.

The following translation was produced using AI and does not guarantee full correctness.

1. Introduction to Prediction Markets

A prediction market is a public market that trades a special kind of derivative (a contract) whose value is determined by the outcome of an event—for example, a binary claim: if Mamdani is elected Mayor of New York City, it pays you $\$1$; otherwise it pays nothing. It is not hard to see that this is essentially the same as sports betting: an exotic option extended to general real-world events. Wolfers & Zitzewitz (2004) provides a fairly comprehensive introduction and lists several common formats. The two authors also produced many follow-up studies together; interested readers can explore further.

The first major attempt at a prediction market was the Iowa Electronic Markets (IEM), established in 1988. It was authorized by the CFTC (the same agency that later authorized Kalshi) to trade publicly, but only as an academic experiment. It continues to operate today, but without significant growth in scale. Over the subsequent 30 years, many prediction markets emerged; however, due to regulatory pressure, limited progress in digitization, and other factors, almost none managed to sustain long-term operation.

As external conditions evolved, Kalshi was founded in 2018 and gradually grew; it obtained a CFTC license in 2021. During the COVID era, a similar web3 project, Polymarket, was created; it also grew rapidly and has since become dominant among on-chain prediction markets.

IEM, Kalshi, and Polymarket all use the same core prediction-market product: the winner-take-all contract. This also suggests why it is among the easiest contract types to productize and popularize. Below we discuss, in detail, the market mechanism and modeling based on this contract type.

2. Market Model

2.1 Contracts

A “winner-take-all” contract is similar to the example at the beginning: by contract terms, it pays $1$ unit if the event occurs. On Kalshi and Polymarket, the unit is $\$1$ (USD). Here “winner-take-all” has two meanings.

Only those who predict correctly receive the payoff. For a fair coin toss, let the sample space be $\Omega = \{H, T\}$, the realized outcome be $A \in \Omega$, and suppose the derivative predicts $H$, i.e., the payoff is $X = \mathbf{1}_{A=H}$. Then its expected payoff is $$ \mathbb{E}X = \mathbb{P}(A=H) = 0.5. $$
For events that may have multiple mutually exclusive outcomes, among the family of contracts that predict those outcomes, only one contract will ultimately be correct and pay out, while the others pay nothing. In the coin-toss example, if another derivative predicts $T$ with payoff $Y$, then $X + Y = 1$ and the two events are mutually exclusive. More generally, if $\Omega = \{A_1, \ldots, A_n\}$, the event outcome is $A$, and the $i$-th contract has payoff $X_i = \mathbf{1}_{A = A_i}$, then $\mathbb{E} X_i = \mathbb{P}(A = A_i) =: p_i$ and $$ \sum_{i=1}^n p_i = 1. $$ Many real-world events are not binomial but multinomial with mutually exclusive outcomes (e.g., the New York City mayoral election). Hence, on Kalshi and Polymarket, a “market” often appears as a family of contracts: they differ from each other but jointly predict the same event.

2.2 Arbitrage

Readers familiar with asset pricing theory will recognize that such a family of prediction-market contracts corresponds exactly to a set of Arrow–Debreu (A–D) securities and constitutes a complete market. With a risk-free asset (cash), under a no-arbitrage assumption, the prices of $n-1$ contracts determine the price of the $n$-th.

For simplicity, assume the short-term risk-free rate is $0$ (cash pays no interest). Let the price of contract $i$ be $q_i$. The payoff of the $n$-th contract is $$ X_n = 1 – \sum_{i=1}^{n – 1} X_i. $$ The right-hand side can be replicated by holding $1$ unit of cash and shorting $1$ unit of each of the other $n-1$ contracts. If $X_n$ is priced such that $$ q_n > 1 – \sum_{i=1}^{n – 1} q_i, $$ then $X_n$ is overpriced relative to its replicating portfolio. One can sell (short) $X_n$ for $q_n$ and buy the replicating portfolio for $1 – \sum_{i=1}^{n – 1} q_i$, locking in a risk-free profit today. At maturity, the payoffs offset exactly, yielding a sure gain. A symmetric argument applies when the inequality is reversed. Therefore, no-arbitrage implies $$ \sum_{i=1}^n q_i = 1. $$ This is not the same as the probability identity in Section 2.1: it is an identity about prices, not probabilities. Only under zero interest and the risk-neutral assumption—investors care only about expected payoff and ignore risk—can we equate fair price with expected payoff, i.e., $q_i = \mathbb{E}X_i = p_i$. In practice, the favorite–longshot bias is very common: for low-probability events $A_i$, one often has $p_i > q_i$. Burgi et al. (2025) provides empirical evidence for this phenomenon on Kalshi.

A direct application of the no-arbitrage identity is Polymarket’s “negative risk” (NegRisk) mechanism. Since the payoff of the negative (No) prediction on outcome $n$ is $$ 1 – X_n = \sum_{i=1}^{n – 1}X_i, $$ Polymarket can algorithmically convert the No contract into a portfolio consisting of all other Yes contracts. These have the same theoretical payoff, and no-arbitrage implies that the price of the No contract equals $1 – q_n$.

The term “negative risk” can also be understood as the negative correlation among payoffs of the Yes contracts. For $i \ne j$, since $p_i + p_j = p \le 1$, we have $$ \text{corr}(X_i, X_j) < 0, $$ and the correlation is minimized when $p_i = p_j$.

Finally, for longer-horizon markets, asset-pricing theory under no-arbitrage implies $$ \sum_{i=1}^n q_i = \frac{1}{1 + r}, $$ where $r$ is the intertemporal interest rate. This helps explain why long-horizon prediction markets with extreme probabilities are not well-suited to a continuous double-auction format: if $$ \exists\; i \in \mathbb{Z}^+,\; i \le n, \; \text{s.t.}\; q_i \ge \frac{1}{1+r}, $$ the market will stop behaving rationally. This also reflects the effect of a minimum tick size, suggesting that questions such as “Will Jesus return before 2030?” are not suitable as long-horizon markets.

3. Market Structure

3.1 Symmetric Markets

Like traditional financial markets, prediction markets typically use a continuous double-auction: buyers and sellers continuously post quotes and are matched by the platform. Unlike traditional markets, for the same event, the Yes and No contracts are perfectly complementary and, in effect, are two sides of the same underlying claim. For a Yes buyer, the counterparty is also a No buyer. This follows from the arbitrage principle discussed earlier and is enforced by the market mechanism. We call the pair of Yes and No for the same event a “symmetric market.”

Using the same notation as before, let $\Omega = \{A_1, \ldots, A_n\}$. Let the Yes contract prices be $q_1, \ldots, q_n$, and the No contract prices be $q_1^\prime, \ldots, q_n^\prime$ (we also use $q_i$ and $q_i^\prime$ to refer to the contracts themselves). Consider $q_n$ and $q_n^\prime = 1 – q_n$.

As a concrete trading example, suppose that in the order book for $q_n$, the level-1 ask is $47¢$. Then the symmetric order book for $q_n^\prime$ will have a level-1 bid at $53¢$, and they sum to $100¢ = \$1$. Meanwhile, their share quantities correspond exactly (and across all other price levels as well). These two order books are trading the same underlying claim.

Now suppose trader $P1$ has no prior position and aggressively buys $1$ share at the level-1 ask in $q_n$. The counterparty is $P2$, and the effective balance-sheet impact is:

$P1$: decreases cash by $47¢$ and buys one share of the $q_n$ (Yes) contract.

$P2$: decreases cash by $53¢$ and buys one share of the $q_n^\prime$ (No) contract; or increases cash by $47¢$ by selling one share of an existing $q_n$ (Yes) position.

Trader $P2$ may use either of these equivalent forms, or a mixture. For example, $P2$ can decrease cash by $$ x\cdot 53¢ – (1 -x)\cdot 47¢ = (100 x – 47)¢, $$ while buying $x$ shares of $q_n^\prime$ and selling $1 – x$ shares of $q_n$, as long as the contract’s minimum tick size constraint is satisfied (here, $100x \in \mathbb{Z}$).

3.2 Cash Collateralization

At this point, one important question remains for single-event markets: how do prediction markets ensure the symmetry between Yes and No? The answer is cash collateralization.

Kalshi, Polymarket, and similar platforms are cash-collateralized markets, meaning that every participant must post cash equal to 100% of their maximum resolution obligation (margin). In the example above, $P1$ pays $q_n$ and forms a counterparty position with $P2$. At resolution:

If the outcome is Yes, then $P1$ receives $\$1$. Part of this is the cash $P1$ paid upfront, and the remainder is cash that $P2$ posted upfront, equal to $\$1 – q_n = q_n^\prime$ (the No price).

If the outcome is No, then $P1$ does not receive the $\$1$ payoff; instead the locked collateral is returned to the party who predicted correctly, which is equivalent to $P1$ having paid $q_n$ for a losing Yes position.

Operationally, when the trade occurs, the mechanism immediately locks (escrows) $q_n + q_n^\prime = \$1$, and at resolution returns this amount to the side that predicted correctly.

Cash collateralization guarantees several properties:

Yes and No form a symmetric market.
The symmetric market always resolves exactly (is fully solvent) in all states.
There is no short selling: participants cannot hold negative positions.

This greatly simplifies market and risk management, and it also explains where initial liquidity comes from: the Yes buyer and the No buyer meet via continuous quoting, each posts collateral and receives the corresponding contract, and the two complementary contracts “merge” like puzzle pieces into a locked $1$ unit payoff.

3.3 Splitting and Merging

Cash collateralization is essentially about merging and splitting mutually exclusive contracts. In prediction markets, any contract position can be viewed as a merge of cash and other contracts, and likewise can be split into cash and other contracts. In the example above, $q_n$ can be split into $\$1$ and $-q_n^\prime$, while $q_n$ and $q_n^\prime$ can be merged into $\$1$. Another no-arbitrage identity is $$ \sum_{i=1}^n q_i = 1. $$ Therefore, choosing any $s_i \in \{q_i, q_i^\prime\}$, the bundle $s = (s_1, \ldots, s_n)$ forms a complete market: each contract can be split into other contracts (and cash), any $n – 1$ contracts of equal size can be merged into one contract (and cash), and all $n$ contracts can be merged into cash.

Because Polymarket is tokenized, it has an incentive to use as few token types as possible to fully describe a market (if event states are represented as dimensions, this corresponds to finding as many linearly independent tokens as possible), while still enabling redemption into cash. This is analogous to an arcade where different machines require different tokens (contracts), and users obtain tokens by exchanging fiat-like money (USDC) for a complete set of tokens. In our model, a natural strategy is to use the $n$ token types $q_1, \ldots, q_n$, so that one can net (redeem/offset) into USDC and cover both Yes and No contracts.

Polymarket enforces splitting and merging at the smart-contract level: it can represent a No contract as a portfolio of other Yes contracts (the NegRisk mechanism), and it supports conversion between USDC and the set of tokens $q_1, \ldots, q_n$ (what it calls splitting & merging, i.e., minting/burning tokens). The main benefit is improved capital efficiency: if a trader’s holdings collectively cover all outcomes, they can settle naturally at resolution.

Kalshi has a related mechanism called flip selling and collateral return. Its mechanism design is more powerful in that it supports directional markets with non-mutually-exclusive outcomes. The documentation gives the following example:

A hypothetical example of a directional market group is “TSA check-ins today?” with the markets:

“Will there be above 1,000,000 TSA check-ins today?”

“Will there be above 2,000,000 TSA check-ins today?”

“Will there be above 3,000,000 TSA check-ins today?”

If it is true that there are above 3,000,000 TSA check-ins today then it must be true that there were above 2,000,000 TSA check-ins today. In this example, you buy Yes for “above 1,000,000 check-ins” for 80¢ and No for “above 3,000,000 check-ins” for 70¢. You’ve invested a total of $\$1.50$, but you’d be guaranteed to be paid out $\$1$. Therefore, the maximum amount you could lose at resolution is $\$1.50 – \$1 = \$0.50$. Instead of taking the full $\$1.50$ of your available funds, we take only $\$0.50$ and mark down your position value by the returned $\$1$, leading to an invested value of $\$0.50$. If you hold both contracts until resolution, then you will receive $\$1$ if both your positions are correct (there are 2,000,000 TSA check-ins today).

4. Pricing

4.1 Price Process

All prediction-market contracts have an explicit time horizon: the event state will be determined within a known finite time window. We can therefore model the price process of a Yes contract. Suppose the terminal payoff is a random variable $X \in \{0, 1\}$, the start time is $0$, the maturity is $T$, and the contract price $q(t)$ is a stochastic process over $t \in [0, T]$.

In prediction markets it is common to treat the pricing-measure probability as a proxy for the real-world probability ($\mathbb P = \mathbb Q$). If the risk-free rate is $r=0$, then $$ q(t) = \mathbb{E} (X \mid \mathcal{F}_t) = \mathbb{P}(X = 1 \mid \mathcal{F}_t) \in [0, 1], $$ where $\mathcal{F}_t$ is the $\sigma$-algebra representing the information available to investors at time $t$. For $t \le s \le T$, $$ \mathbb{E}(q(s) \mid \mathcal{F}_t) = \mathbb{E}\left( \mathbb{E} (X \mid \mathcal{F}_s) \mid \mathcal F_t \right) = \mathbb E(X \mid \mathcal{F}_t) = q(t). $$ This shows that $q(t)$ is a martingale process: at any time, it is the current best estimate of $\mathbb{P}(X = 1 \mid \mathcal F_t)$. This is also why both Kalshi and Polymarket often present recent market price $q(t)$ as an implied probability—and, for ease of understanding, may display only the probability rather than the raw price. In markets with spreads and transaction costs (e.g., Kalshi), impulsive trading based on a displayed “market probability” can lead to unexpected losses.

Different event types correspond to different price processes. In fact, as long as $q(t)$ is a martingale, always stays within $[0, 1]$, and satisfies $q(T) \in \{0, 1\}$, there exists an information structure under which it is a valid price process. This implies that, in theory, the information revealed by market behavior is limited: it cannot fully explain the event itself. Different information structures correspond to different shapes of $q(t)$. For example, an idealized “Schrödinger’s cat” event reveals information only at the final instant, whereas an event like “will BTC be above a certain level by month-end” receives a continuous flow of information, and the prediction-market price should continuously incorporate that information.

4.2 Digital Options

To price $q(t)$ via an event, we need to specify a process for an underlying event. Here it is useful to borrow standard modeling from traditional financial markets. In traditional finance, a non-path-dependent Yes/No contract whose payoff depends only on the event state at maturity is called a digital option (or binary option). Its underlying event is typically the relationship between a financial asset’s price and a threshold at maturity—for example, whether the S&P 500 closing price on the last day of January is above $7000$. A comparison is:

	Prediction Market Contract	Digital Option
Payoff	No or Yes (0/1)	No or Yes (0/1)
Underlying	Well-defined real-world event	Asset price relative to a threshold
Purpose	Information aggregation / forecasting (oracle)	Risk hedging
Participant	Informed trader / gambler	Hedger / speculator / institutional investor

For a Yes option with threshold $K$, under standard modeling, let the asset price at time $t$ be $S_t$ and assume $$ d \log S_t = \left(\mu – \frac12 \sigma^2\right)\; dt + \sigma\; dW_t, $$ so that $$ S_T = S_t \cdot \exp \left( \left( \mu -\frac12 \sigma^2\right) (T – t) + \sigma (W_T – W_t) \right), $$ where $W_t$ is standard Brownian motion. Then $$ q(t) = \mathbb P (S_T \ge K \mid \mathcal F_t) = \Phi(d_1), $$ where $\Phi$ is the standard normal CDF and $$ d_1 = \frac{\log (S_t / K) + \left(\mu – \frac12 \sigma^2\right) (T- t)}{\sigma \sqrt{T – t}}. $$

The unknowns above are the volatility $\sigma$ and the drift $\mu$. In practice, one may estimate a volatility surface from current market data to identify mispricings, or adjust prices using additional information (alpha) about the asset process $S_t$. Standard option strategies can be used as references.

4.3 Probability Distribution

We now return to the distribution question raised at the beginning. For a digital-option-like contract market, a family of contracts implicitly encodes the probability distribution of the underlying. For example, consider a Kalshi BTC market such as “Bitcoin price today at 5pm EST.” If pricing satisfies the assumptions in Section 4.1 and we treat $\mathbb P = \mathbb Q$, then we can read off the risk-neutral distribution of BTC price $S$ at maturity $T$.

For instance, if the market implies $$ \mathbb{P} (S_T \ge \$90{,}500) = 54\%, $$ then we can similarly obtain an interval probability such as $$ \mathbb P(\$90{,}000 < S_T \le \$90{,}500) = 78\% – 54\% = 24\%. $$ For prediction-market contracts of the digital-option type, distribution estimation depends only on the relationship between $\mathbb P$ and $\mathbb Q$, not on the detailed modeling of the price process. Note that only when $\mathbb P = \mathbb Q$ can it be interpreted as a real (physical) probability distribution.

4.4 One-Touch Options

Another common prediction-market contract has the form of a one-touch option: within a specified time window (rather than only at maturity), if the underlying price touches a threshold, it pays $1$ unit of cash at maturity. For a typical up one-touch option, at time $t$ let the threshold satisfy $K > S_t$ and define the stopping time $$ \tau_K = \inf \{u \ge t : S_u \ge K\}. $$ By the reflection principle and a change of measure, one can derive $$ q(t) = \mathbb P (\tau_K \le T \mid \mathcal F_t)= \Phi(d_1) + \left(\frac{S_t}{K} \right)^{-\frac{2(\mu – \frac12 \sigma^2)}{\sigma^2}}\Phi(d_2), $$ where $$ d_1 = \frac{\log (S_t / K) + \left(\mu – \frac12 \sigma^2\right) (T- t)}{\sigma \sqrt{T – t}}, $$ and $$ d_2 = \frac{\log (S_t / K) – \left(\mu – \frac12 \sigma^2\right) (T- t)}{\sigma \sqrt{T – t}}. $$ Here $d_1$ is the same as in Section 4.2. Therefore, $\Phi(d_1)$ is exactly the digital-option price. When $(\mu – \frac12 \sigma^2)$ is small, $d_1 \approx d_2$ and the exponent term is approximately $1$, so $$ q(t) \approx 2 \Phi(d_1), $$ i.e., the one-touch option price is approximately twice the digital-option price. This provides a quick way to estimate the underlying’s maturity distribution. (Another approach is to assume the short-term underlying follows a scaled, driftless Brownian motion; the reflection principle then yields a similar conclusion.)

As an example from Polymarket, a market may resolve under a rule like:

This market will resolve to “Yes” if, on any trading day, the official CME resolution price for the Active Month (front month) of Silver (SI) futures is equal to or above the listed price by the final trading day of January 2026. Otherwise, the market will resolve to “No”.

If the market-implied probability for the $\$110$ threshold contract is $22\%$, then by the approximation above we can estimate that the probability the CME month-end settlement price exceeds $\$110$ is about $11\%$.

Modeling and Structure of Prediction Markets

1. Introduction to Prediction Markets

2. Market Model

2.1 Contracts

2.2 Arbitrage

3. Market Structure

3.1 Symmetric Markets

3.2 Cash Collateralization

3.3 Splitting and Merging

4. Pricing

4.1 Price Process

4.2 Digital Options

4.3 Probability Distribution

4.4 One-Touch Options

Comments

Leave a Reply Cancel reply

More posts

Modeling and Structure of Prediction Markets

Market Regimes from an HFT Perspective

复杂系统与认识的终结

作为领域知识的火警