Theory 3 papers

Theory Digest — Mar 30, 2026

Today’s Digest at a Glance

Today’s papers examine stability and duality in optimal transport problems, empirical Bayes methods with improved deconvolution rates, and supermartingale couplings in mathematical finance.

Schrödinger Bridges

Schrödinger bridges address the problem of finding the most likely path between two probability distributions given noisy observations. The classical approach connects an initial distribution μ₀ to a final distribution μ₁ by finding the probability measure on path space that minimizes relative entropy subject to marginal constraints. However, this formulation lacks flexibility in controlling the diffusion behavior along the path.

The core mathematical framework starts with the entropy-regularized optimal transport problem:

\[\min_{\pi \in \Pi(\mu_0, \mu_1)} \int c(x,y) d\pi(x,y) + \gamma H(\pi|\mu_0 \otimes \mu_1)\]
where $H(\pi \mu_0 \otimes \mu_1)$ is the relative entropy and $\gamma > 0$ is the regularization parameter. The Schrödinger bridge extends this to the dynamic setting by considering probability measures on path space that satisfy the marginal constraints while minimizing relative entropy against a reference diffusion process.

Intuitively, Schrödinger bridges find the “most natural” way to morph one distribution into another while accounting for the inherent randomness of diffusion processes.

Empirical Bayes with Nonparametric Maximum Likelihood

Empirical Bayes methods face a fundamental challenge when estimating mixing distributions from noisy observations: the classical nonparametric maximum likelihood estimator (NPMLE) achieves only logarithmic rates for deconvolution problems. Standard approaches model $X_i = \theta_i + \epsilon_i$ where $\theta_i$ are drawn from an unknown mixing distribution $H$ and $\epsilon_i$ are Gaussian noise, but the discrete nature of the NPMLE limits its performance.

The smooth NPMLE approach addresses this by introducing a hierarchical Gaussian smoothing step. Instead of directly using the discrete NPMLE $\hat{H}_n$, the method forms a smooth prior estimator:

\[g_{\lambda_n}(\xi) = \int \phi_{\lambda_n}(\xi - u) d\hat{H}_n(u)\]

where $\phi_{\lambda_n}$ is a Gaussian kernel with bandwidth $\lambda_n$. This smoothing operation transforms the discrete measure into a continuous density while preserving the maximum likelihood structure. The bandwidth $\lambda_n$ is chosen to balance bias and variance, typically scaling as $n^{-\alpha}$ for some $\alpha > 0$.

The key insight is that smoothing the mixing distribution estimate leads to polynomial rather than logarithmic deconvolution rates, dramatically improving the quality of posterior inference while maintaining computational tractability through convex optimization.

Supermartingale Optimal Transport

Supermartingale optimal transport extends classical optimal transport by requiring that coupled processes satisfy a supermartingale constraint: $\mathbb{E}[Y_t \mathcal{F}_s] \leq Y_s$ for all $s \leq t$. This constraint arises naturally in mathematical finance where it captures no-arbitrage conditions or risk management requirements. The challenge is that supermartingale constraints are much more restrictive than simple marginal constraints, making existence and stability theorems more delicate.

The mathematical framework seeks to minimize $\mathbb{E}[c(X,Y)]$ over all couplings $(X,Y)$ where $X \sim \mu$, $Y \sim \nu$, and $Y$ is a supermartingale with respect to the filtration generated by $X$. Unlike classical optimal transport, the feasible set of couplings depends crucially on the temporal structure and information flow.

The key technical innovation involves decomposing supermartingale couplings using critical points where the cumulative distribution functions of the marginals coincide, allowing for a constructive approach to approximation and stability analysis.

Reading Guide

The first paper develops a unified framework that interpolates between Schrödinger bridges and Bass martingale transport through a parameter β, establishing duality theory despite non-coercive diffusion control. The second paper tackles the empirical Bayes estimation problem by introducing hierarchical smoothing to achieve polynomial deconvolution rates. The third paper focuses on stability properties of supermartingale optimal transport, proving approximation theorems that are crucial for numerical implementation and robustness analysis.


Bridging Schrödinger and Bass: A Semimartingale Optimal Transport Problem with Diffusion Control

Authors: Pierre Henry-Labordere, Grégoire Loeper, Othmane Mazhar, Huyên Pham et al. (5 authors) · Institution: NYU Tandon, Ecole Polytechnique, Université Paris Cité, Monash University, BNP Paribas · Category: math.PR

Establishes strong duality and existence for a semimartingale optimal transport problem that interpolates between Schrödinger bridges and Bass martingale transport via a parameter $\beta$.

Tags: optimal transport stochastic control Schrödinger bridge martingale transport semimartingales HJB equations inf-convolution heat equation

arXiv · PDF

Problem Formulation
  1. Motivation (2–3 sentences): This paper studies optimal transport problems for continuous-time stochastic processes that interpolate between two classical regimes: the Schrödinger bridge (which controls drift with fixed diffusion) and martingale optimal transport exemplified by the Bass problem (which controls diffusion with zero drift). Such interpolation is important for understanding the spectrum of entropic vs martingale transport and provides variational foundations for data-driven diffusion models.

  2. Mathematical setup: Let $T > 0$ and consider the canonical space $\Omega = C([0,T], \mathbb{R}^d)$ with canonical process $X_t(\omega) = \omega(t)$. Define $\mathcal{P}$ as the set of probability measures $P$ on $\Omega$ under which $X$ has the diffusion decomposition

    \[X_t = X_0 + \int_0^t \alpha^P_s ds + \int_0^t \sigma^P_s dW^P_s\]
    for some $P$-Brownian motion $W^P$ and characteristics $\nu^P = (\alpha^P, \sigma^P)$ valued in $\mathbb{R}^d \times S^d_+$ with $\int_0^T ( \alpha^P_t + \sigma^P_t ^2) dt < \infty$ $P$-a.s. Given marginal distributions $\mu_0, \mu_T \in \mathcal{P}_2(\mathbb{R}^d)$, define the transport constraint set
    \[\mathcal{P}(\mu_0, \mu_T) = \{P \in \mathcal{P} : P \circ X_0^{-1} = \mu_0, P \circ X_T^{-1} = \mu_T\}\]

    The cost function is

    \[c(a,b) = \frac{1}{2}|a|^2 + \frac{\beta}{2}|b - \text{Id}|^2\]

    for $a \in \mathbb{R}^d, b \in S^d_+$, where $\beta > 0$ is a parameter and $\text{Id}$ is the identity matrix.

    Assumptions:

    1. $\mu_0, \mu_T \in \mathcal{P}_2(\mathbb{R}^d)$ (finite second moments)
    2. $T > 0$ finite
    3. $\beta > 0$ fixed parameter
  3. Toy example: When $d=1$, $\mu_0 = \delta_0$, $\mu_T = \delta_1$, and $T=1$, the linear interpolation gives $X_t = t$ with drift $\alpha_t \equiv 1$ and diffusion $\sigma_t \equiv 0$. The cost is $J_0(P^{\text{lin}}) = \frac{1}{2} + \frac{\beta d}{2} = \frac{1}{2} + \frac{\beta}{2}$. This illustrates how the cost balances drift control (entropy-like) with diffusion penalization.

  4. Formal objective: The Schrödinger-Bridge-Bass (SBB) problem is

    \[\text{SBB}(\mu_0, \mu_T) = \inf_{P \in \mathcal{P}(\mu_0, \mu_T)} \mathbb{E}^P\left[\int_0^T c(\nu^P_t) dt\right]\]
Method

The method establishes a complete duality theory for the SBB problem despite lack of coercivity in the diffusion component.

Key steps:

  1. Dual formulation via penalization: Rewrite the constrained problem as

    \[\text{SBB}(\mu_0, \mu_T) = \inf_{P \in \mathcal{P}_0(\mu_0)} \sup_{\psi \in C_w} \mu_T(\psi) + J^{\psi}(P)\]

    where $J^{\psi}(P) = \mathbb{E}^P[-\psi(X_T) + \int_0^T c(\nu^P_t) dt]$.

  2. Hamilton-Jacobi-Bellman equation: The value function $V^{\psi}_0(x)$ satisfies the HJB equation

    \[\partial_t v + H(\nabla v, D^2 v) = 0\]

    with Hamiltonian

    \[H(p,A) = \frac{1}{2}|p|^2 + \frac{\beta}{2}\text{tr}[\beta(\beta\text{Id} - A)^{-1} - \text{Id}]\]

    on domain ${(p,A) : A < \beta\text{Id}}$.

  3. Reduced dual via inf-convolution: Define the Moreau transform

    \[T^+_\beta[\phi](x) = \inf_{y \in \mathbb{R}^d} \left\{\phi(y) + \frac{\beta}{2}|x-y|^2\right\}\]

    The reduced dual problem becomes

    \[V_{\text{red}}(\mu_0, \mu_T) = \sup_{\phi \in C^{\text{conv}}_w} J(\phi)\]

    where

    \[J(\phi) = \mu_T(T^+_\beta[\phi]) - \mu_0(T^+_\beta[u^{\phi}_T])\]

    and $u^{\phi}_s(y) = \log(N_s * e^{\phi})(y)$ with $N_s$ the heat kernel.

  4. Characterization via Schrödinger-Bass system: The optimal solution is characterized by the system:

    • $h_t = N_{T-t} * e^{\hat{\phi}}$ solves backward heat equation
    • $\nu_t = N_t * \nu_0$ solves forward Fokker-Planck equation
    • Transport map $Y_t = \text{id} - \frac{1}{\beta}\nabla T^+_\beta[u^{\hat{\phi}}_{T-t}]$
    • Inverse map $X_t(y) = y + \frac{1}{\beta}\nabla \log h_t(y)$

    Applied to toy example: For $d=1$, $\mu_0 = \delta_0$, $\mu_T = \delta_1$, the optimal $\hat{\phi}$ and corresponding maps $Y_t, X_t$ provide explicit feedback representations for the optimal drift and diffusion coefficients, interpolating between Schrödinger bridge ($\beta \to \infty$) and Bass martingale ($\beta \to 0$).

Novelty & Lineage

Step 1 — Prior work: The closest papers are:

  1. “Entropic optimal transport between unbalanced measures” (Chizat et al., 2016) - established duality theory for Schrödinger bridges with drift control only
  2. “Martingale Benamou-Brenier: a probabilistic perspective” (Backhoff-Veraguas et al., 2017) - studied Bass martingale transport with diffusion control only
  3. “Semimartingale optimal transport” (Tan & Touzi, 2013) - developed general framework but required coercivity in diffusion component

    Step 2 — Delta: This paper introduces a unified interpolating problem that bridges Schrödinger and Bass regimes via parameter $\beta$. The key technical advance is establishing strong duality and existence despite the lack of coercivity in the diffusion component - a condition that was essential in prior semimartingale transport theory.

    Step 3 — Theory-specific assessment:

    • The main theorem is moderately surprising. While interpolation between these regimes is natural, proving strong duality without coercivity required non-trivial technical innovations including careful analysis of inf-convolution operators and heat equation regularization.
    • The proof technique introduces genuinely new elements, particularly the use of quadratic inf-convolution to handle the non-coercive diffusion term and the coupled Schrödinger-Bass bridge system characterization.
    • Bounds appear to be sharp in limiting cases ($\beta \to 0, \infty$) where they recover known Schrödinger and Bass results, but no general lower bounds are established for intermediate $\beta$.

    The technical condition $\beta T > 1$ for existence is restrictive and the paper doesn’t fully explore its necessity.

    Verdict: INCREMENTAL — solid extension bridging two known theories with some technical innovation, but the interpolation idea is natural and the main results largely confirmatory of expected behavior.

Proof Techniques

The proof employs several key technical innovations:

  1. Convexity and continuity analysis: Proves the value function $F(m) = \text{SBB}(\mu_0, m)$ is convex and continuous on $\mathcal{P}_2(\mathbb{R}^d)$ using Gyöngy’s Markovian projection and time-change arguments with Young’s inequality:

    \[|x-y|^2 \leq \frac{|x|^2}{1-\eta} + \frac{|y|^2}{\eta}\]
  2. Fenchel-Moreau duality without coercivity: Applies Fenchel-Moreau theorem despite non-coercive cost by showing the extended function $\bar{F}$ is proper, convex, and $\sigma(\mathcal{M}_2, C_w)$-lower semicontinuous.

  3. Heat equation regularization: For $\beta$-convex $\phi$, proves $h^{\phi}_t = N_{T-t} * e^{\phi} > 0$ solves the backward heat equation:

    \[\partial_t h^{\phi} + \frac{1}{2}\Delta h^{\phi} = 0\]

    and $\tilde{u}^{\phi} = \log h^{\phi}$ satisfies Cole-Hopf equation:

    \[\partial_t \tilde{u}^{\phi} + \frac{1}{2}(\Delta \tilde{u}^{\phi} + |\nabla \tilde{u}^{\phi}|^2) = 0\]
  4. Inf-convolution envelope analysis: Shows the minimizer $Y_t(x)$ in

    \[v^{\phi}_t(x) = \inf_y \left\{\tilde{u}^{\phi}_t(y) + \frac{\beta}{2}|x-y|^2\right\}\]

    is unique and satisfies the curvature bound:

    \[D^2 \tilde{u}^{\phi}_t + \frac{\beta}{1+\beta(T-t)}I \succeq 0\]
  5. Verification argument: Establishes $V^{\psi} = v^{\phi}$ by constructing explicit controls via Girsanov change of measure using the density:

    \[\frac{dQ^m}{dQ} = \frac{h^{\phi}(t_m, Y_{t_m})}{h^{\phi}(t,y)}\]
  6. Measurable selection for dual attainment: Uses measurable selection theorems to construct $\eta$-optimal policies and prove existence via compactness arguments in the space of $\beta$-concave functions.

Experiments & Validation

Purely theoretical. Empirical validation would require implementing the coupled Schrödinger-Bass bridge system numerically, testing the interpolation behavior as $\beta$ varies, and comparing against known Schrödinger bridge and Bass martingale solutions in limiting cases. The theoretical framework could be tested on synthetic 2D Gaussian transport problems where ground truth is computable.

Limitations & Open Problems

Limitations:

  1. The condition $\beta T > 1$ for existence is RESTRICTIVE - significantly narrows applicability to short-time or small-$\beta$ regimes
  2. Results limited to finite time horizon $T < \infty$ - TECHNICAL limitation needed for heat kernel analysis
  3. Second moment conditions $\mu_0, \mu_T \in \mathcal{P}_2(\mathbb{R}^d)$ are NATURAL for optimal transport theory
  4. Euclidean space $\mathbb{R}^d$ restriction is TECHNICAL - could likely extend to manifolds

    Open problems:

  5. Determine necessity of the condition $\beta T > 1$ - is this purely technical or fundamental?
  6. Extend to infinite time horizon and study long-time behavior of the interpolating bridges

Empirical Bayes Estimation and Inference via Smooth Nonparametric Maximum Likelihood

Authors: Taehyun Kim, Bodhisattva Sen · Institution: Columbia University · Category: math.ST

Introduces hierarchical Gaussian smoothing for empirical Bayes that achieves polynomial (vs logarithmic) deconvolution rates while maintaining convex optimization and enabling smooth posterior inference.

Tags: empirical_bayes nonparametric_maximum_likelihood deconvolution normal_mixtures posterior_inference uncertainty_quantification minimax_rates optimal_coverage_sets

arXiv · PDF

Problem Formulation

Motivation: The empirical Bayes g-modeling approach via nonparametric maximum likelihood estimator (NPMLE) suffers from two key limitations: the NPMLE is necessarily discrete (yielding discrete posteriors), and deconvolution rates are logarithmically slow. This matters for large-scale inference in normal means problems where smooth uncertainty quantification is desired.

Mathematical setup: Consider hierarchical normal location mixture model for $i = 1, \ldots, n$:

\[X_i | \theta_i \overset{ind}{\sim} N(\theta_i, 1)\] \[\theta_i | \xi_i \overset{ind}{\sim} N(\xi_i, c_*^2)\] \[\xi_i \overset{iid}{\sim} H_* \in \mathcal{P}(\mathbb{R})\]

This implies marginal distribution $G_* = H_* \star N(0, c_*^2)$ where $\star$ denotes convolution. The marginal density of $\theta_i$ is:

\[g_{H_*}(\theta) := \int \phi_{c_*}(\theta - \xi) dH_*(\xi)\]

Assumptions:

  1. $c_* \geq 0$ is known (relaxed later)
  2. $H_* \in \mathcal{P}(\mathbb{R})$ is unknown
  3. Observations are independent

    Toy example: When $H_* = \frac{1}{2}\delta_{-2} + \frac{1}{2}\delta_2$ and $c_* = 1$, the true prior becomes $G_* = \frac{1}{2}N(-2,1) + \frac{1}{2}N(2,1)$. The smooth NPMLE can recover this two-component Gaussian mixture, while classical NPMLE yields a discrete approximation.

    Formal objective: Estimate the NPMLE $\hat{H}_n$ by solving:

    \[\hat{H}_n \in \arg\max_{H \in \mathcal{P}(\mathbb{R})} \sum_{i=1}^n \log f_H(X_i)\]

    where $f_H(x) = \int \phi_{\sigma_*}(x - \xi) dH(\xi)$ with $\sigma_*^2 := 1 + c_*^2$.

Method

Method: The smooth NPMLE approach consists of three steps:

  1. Estimate mixing distribution: Solve the convex optimization problem:

    \[\hat{H}_n \in \arg\max_{H \in \mathcal{P}(\mathbb{R})} \sum_{i=1}^n \log \int \phi_{\sigma_*}(X_i - \xi) dH(\xi)\]
  2. Form smooth prior estimator:

    \[g_{\hat{H}_n}(\theta) := \int \phi_{c_*}(\theta - \xi) d\hat{H}_n(\xi)\]
  3. Compute posterior mean: Using hierarchical normal-normal structure:

    \[\hat{\theta}_i = \alpha_* X_i + (1-\alpha_*) \hat{\xi}_i\]
    where $\alpha_* := \frac{c_*^2}{c_*^2 + 1}$ and $\hat{\xi}_i := \mathbb{E}_{\hat{H}_n}[\xi_i X_i]$.

    Application to toy example: For $H_* = \frac{1}{2}\delta_{-2} + \frac{1}{2}\delta_2$ and $c_* = 1$:

    • $\sigma_*^2 = 2$, so marginal likelihood involves $\phi_{\sqrt{2}}(X_i - \xi)$
    • $\hat{H}_n$ will have at most $n$ support points
    • $g_{\hat{H}_n}(\theta) = \sum_j w_j \phi_1(\theta - \xi_j)$ where ${(\xi_j, w_j)}$ are the atoms/weights of $\hat{H}_n$
    • Posterior mean: $\hat{\theta}_i = 0.5 X_i + 0.5 \hat{\xi}_i$ (since $\alpha_* = 0.5$)
Novelty & Lineage

Prior work:

  1. Kiefer-Wolfowitz (1956), Jiang-Zhang (2009), Saha-Guntuboyina (2020): Classical NPMLE achieves near-parametric denoising rates but logarithmic deconvolution rates due to discreteness.
  2. Efron (2014), Bovy et al. (2011): Smooth prior estimation via spline bases or finite Gaussian mixtures, but with non-convexity issues.
  3. Soloff et al. (2021): Established $O((\log n)^{-1})$ deconvolution rates for classical NPMLE.

    Delta: This paper introduces hierarchical Gaussian smoothing that:

    • Achieves polynomial $O(n^{-\alpha_*})$ deconvolution rates vs. logarithmic rates
    • Maintains convexity of optimization (unlike finite mixture EM)
    • Enables smooth posterior inference and optimal marginal coverage sets

    Theory-specific assessment:

    • Main theorem surprising: Yes - polynomial deconvolution rates were not expected in this generality
    • Proof technique: Genuinely new - uses Plancherel theorem to relate $L^2$ prior estimation to Hellinger marginal estimation, exploiting supersmooth structure of Gaussian convolution
    • Bound tightness: Theorem 2.2 proves the $n^{-\alpha_*}$ rate is asymptotically minimax over exponentially smooth classes $G_{r,\zeta,L;exp}$

    Verdict: SIGNIFICANT — The polynomial deconvolution rate breakthrough and minimax optimality represent clear theoretical advances that empirical Bayes researchers should know about.

Proof Techniques

Main proof strategy: The key insight is connecting prior density estimation to marginal density estimation via Plancherel’s theorem.

  1. Fourier analysis connection: For the estimation error $|g_{\hat{H}_n} - g_{H_*}|_{L^2}^2$, use Plancherel:

    \[\|g_{\hat{H}_n} - g_{H_*}\|_{L^2}^2 = \frac{1}{2\pi} \int |\phi_{\hat{H}_n}(t) - \phi_{H_*}(t)|^2 e^{-2c_*^2 t^2} dt\]
  2. Key inequality: Since $\phi_{f_H}(t) = \phi_H(t) e^{-\sigma_*^2 t^2/2}$, we have:

    \[|\phi_{\hat{H}_n}(t) - \phi_{H_*}(t)|^2 \leq e^{\sigma_*^2 t^2} |\phi_{f_{\hat{H}_n}}(t) - \phi_{f_{H_*}}(t)|^2\]
  3. Critical observation: The exponent difference gives:

    \[e^{-2c_*^2 t^2} \cdot e^{\sigma_*^2 t^2} = e^{-(2c_*^2 - \sigma_*^2)t^2} = e^{-c_*^2 t^2}\]

    since $\sigma_*^2 = 1 + c_*^2$.

  4. Rate transfer: Apply existing marginal density bounds:

    \[\mathbb{E}[H^2(f_{\hat{H}_n}, f_{H_*})] \lesssim \epsilon_n^2(M, S, H_*)\]
  5. Final bound: Combining via Plancherel yields:

    \[\mathbb{E}[\|g_{\hat{H}_n} - g_{H_*}\|_{L^2}^2] \lesssim \epsilon_n^{2\alpha_*}(M, S, H_*)\]

    The power $\alpha_* = \frac{c_*^2}{c_*^2 + 1} \in (0,1)$ captures the smoothing benefit - larger $c_*$ gives faster rates.

Experiments & Validation

Purely theoretical. The paper provides simulated illustrations in Figures 1-2 showing smooth NPMLE vs classical discrete NPMLE for two-component normal mixture and Laplace priors. Empirical validation would require:

  1. Deconvolution experiments: Compare $L^2$ convergence rates of smooth vs classical NPMLE across different $c_*$ values and sample sizes
  2. Coverage experiments: Test marginal coverage properties of proposed confidence sets vs standard methods
  3. Real data validation: Apply to genomics, astronomy, or other empirical Bayes applications where smooth priors are plausible
  4. Computational benchmarks: Compare optimization time for convex smooth NPMLE vs non-convex finite mixture EM
Limitations & Open Problems

Limitations:

  1. Smoothness assumption - NATURAL: Restricting priors to Gaussian convolutions $G_* = H_* \star N(0, c_*^2)$ is reasonable for many applications where some smoothing is expected.

  2. Compact support restriction (Assumption A1) - TECHNICAL: Only needed for misspecification results; likely removable with more sophisticated entropy arguments.

  3. Known noise variance - NATURAL: Assuming $\sigma_i^2$ known is standard in empirical Bayes; extensions to unknown variances possible.

  4. Identifiability of $c_*$ - RESTRICTIVE: Only the largest Gaussian component $c_0$ is identifiable, requiring additional estimation step that affects downstream uncertainty.

    Open problems:

  5. Non-Gaussian likelihoods: Extend polynomial deconvolution rates to exponential families beyond normal
  6. Adaptive smoothing: Develop data-driven methods for choosing $c_*$ that optimize bias-variance tradeoff automatically

Stability of supermartingale optimal transport problems

Authors: Shuoqing Deng, Gaoyue Guo, Dominykas Norgilas · Institution: HKUST, Université Paris-Saclay, North Carolina State University · Category: math.PR

Establishes stability of weak supermartingale optimal transport problems by proving approximation theorems in adapted Wasserstein topology and monotonicity principles for optimal couplings.

Tags: optimal transport supermartingale couplings adapted Wasserstein distance stability theory weak optimal transport mathematical finance stochastic orders approximation theory

arXiv · PDF

Problem Formulation
  1. Motivation: Constrained optimal transport has become central to mathematical finance, particularly martingale optimal transport (MOT) for model-independent pricing. This paper studies weak supermartingale optimal transport (WSOT), where costs depend on conditional laws rather than just point locations, extending both classical supermartingale transport and weak martingale transport.

  2. Mathematical setup: Let $(\Omega, \mathcal{F}, P)$ be a probability space. For probability measures $\mu, \nu \in \mathcal{P}_r(\mathbb{R})$, define the supermartingale constraint set:

    \[\Pi_S(\mu, \nu) := \left\{\pi \in \Pi(\mu, \nu) : \int_{\mathbb{R}} y \pi_x(dy) \leq x \text{ for } \mu\text{-a.e. } x\right\}\]

    where $\pi(dx, dy) = \mu(dx)\pi_x(dy)$ is a disintegration. The feasibility condition is characterized by the decreasing convex order:

    \[\Pi_S(\mu, \nu) \neq \emptyset \iff \mu \preceq_{cd} \nu\]

    For a cost function $C: \mathbb{R} \times \mathcal{P}_r \to \mathbb{R}$, the weak supermartingale optimal transport problem is:

    \[V_S^C(\mu, \nu) := \inf_{\pi \in \Pi_S(\mu, \nu)} \int_{\mathbb{R}} C(x, \pi_x) \mu(dx)\]

    Assumptions:

    1. $C$ is measurable and continuous in the second argument
    2. Growth condition: $ C(x, m) \leq K(1 + x ^r + \int y ^r m(dy))$ for some $K > 0$
    3. Convergence assumption: $(μ^k, ν^k) \to (μ, ν)$ in $W_r$ with $μ^k \preceq_{cd} ν^k$
  3. Toy example: When $d = 1$, $\mu = \delta_0$, $\nu = \frac{1}{2}\delta_{-1} + \frac{1}{2}\delta_1$, and $C(x, m) = \int y^2 m(dy)$, we have $\mu \preceq_{cd} \nu$ since $P_\mu(t) = t^+ \leq \frac{1}{2}(t+1)^+ + \frac{1}{2}(t-1)^+ = P_\nu(t)$ for all $t$. The optimal coupling concentrates mass to maintain the supermartingale constraint while minimizing the cost.

  4. Formal objective: The main quantities to establish are:

    \[\lim_{k \to \infty} AW_r(\pi^k, \pi) = 0\]

    for approximating sequences, and:

    \[\lim_{k \to \infty} V_S^C(\mu^k, \nu^k) = V_S^C(\mu, \nu)\]
Method

The main method consists of several key steps for constructing approximating supermartingale couplings:

  1. Irreducible decomposition: Decompose $\pi \in \Pi_S(\mu, \nu)$ using the critical point $x^* := \sup{x \in \mathbb{R} : P_\mu(x) = P_\nu(x)}$ into:

    \[\pi = \sum_{n \geq -1} \pi_n\]

    where $\pi_{-1}$ is diagonal transport, $\pi_0 \in \Pi_S(\mu_0, \nu_0)$ is the supermartingale component, and $\pi_n \in \Pi_M(\mu_n, \nu_n)$ are martingale components for $n \geq 1$.

  2. Regularization and localization: Apply truncation to kernels:

    \[\pi_x^R := \pi_x \wedge_{cd} \left(\frac{R - \overline{\pi_x}}{2R}\delta_{-R} + \frac{R + \overline{\pi_x}}{2R}\delta_R\right)\]
    for $ x \leq R$, and affine contraction $\pi_x^\alpha := (T_{x,\alpha})_#\pi_x$ where $T_{x,\alpha}(y) := \alpha y + (1-\alpha)x$.
  3. Target measure construction: Define intermediate measures:

    \[\nu^{R,\alpha,k} := \nu^k \wedge_c (\mu^k \vee_{cd} T_{\Delta_k}\#\nu^{R,\alpha})\]

    where $\Delta_k := \text{bary}(\nu^k) - \text{bary}(\nu^{R,\alpha})$.

  4. Barycentre correction: For kernels $\hat{\pi}_x^k$ that may violate the supermartingale constraint, define correction factors:

    \[c_k(x) := \frac{(\int y \hat{\pi}_x^k(dy) - x)^+}{\int (x-y) \nu_{-}^{R,\alpha,k}(dy)}\]

    and corrected kernels:

    \[\tilde{\pi}_x^k := \frac{\hat{\pi}_x^k + c_k(x) \nu_{-}^{R,\alpha,k}}{1 + c_k(x) \nu_{-}^{R,\alpha,k}(\mathbb{R})}\]
  5. Completion and gluing: Show that remainder measures satisfy:

    \[\mu_{\text{rem}}^k := \mu^k - (1-2\varepsilon)\hat{\mu}^k \preceq_{cd} \nu_{\text{rem}}^k := \varepsilon\nu^k + (1-\varepsilon)\nu^{R,\alpha,k} - (1-2\varepsilon)\tilde{\nu}^k\]

    This requires proving the key inequality on $J^c$:

    \[P_{\nu^{R,\alpha,k}} - P_{\mu^k} \geq P_{(1-2\varepsilon)\tilde{\nu}^k} - P_{(1-2\varepsilon)\hat{\mu}^k}\]

    Applied to toy example: For $\mu = \delta_0$, $\nu = \frac{1}{2}\delta_{-1} + \frac{1}{2}\delta_1$, the method constructs approximating sequences by first localizing around a compact interval containing 0, then using the free mass from the negative part of the support to correct barycentres while maintaining the supermartingale constraint $\int y \pi_0(dy) \leq 0$.

Novelty & Lineage

Prior work: The closest prior results are:

  1. Beiglböck, Jourdain, Margheriti, Pammer (2020): Established approximation theorems for martingale optimal transport in adapted Wasserstein topology, proving that martingale couplings can be approximated when marginals converge.

  2. Guo and Obłój (2019): Initiated stability theory for weak martingale optimal transport, focusing on costs depending on conditional laws rather than just locations.

  3. Beiglböck and Juillet (2016): Developed monotonicity principles for martingale optimal transport, introducing cyclical monotonicity and left-curtain couplings.

    Delta: This paper extends the martingale approximation theory of Beiglböck et al. to the supermartingale setting. The key technical advance is handling inequality constraints ($\int y \pi_x(dy) \leq x$) rather than equality constraints ($\int y \pi_x(dy) = x$).

    Theory-specific assessment:

    • Main theorem predictability: The extension from martingales to supermartingales is natural but technically demanding. The result follows expected patterns from martingale theory.
    • Proof techniques: The proof requires genuinely new techniques beyond the martingale case. The completion step (Proposition 3.9) involves delicate analysis of put potentials on complementary regions that both sides of inequalities are non-trivial, unlike in martingales.
    • Bound tightness: No lower bounds are established. The approximation rates depend on Wasserstein convergence rates but are not quantified optimally.

    The monotonicity principle (Theorem 2.6) extends known characterizations but follows established patterns from martingale optimal transport.

    Verdict: INCREMENTAL — This is a solid technical extension of martingale approximation theory to supermartingales, requiring new proof techniques but following predictable directions from prior work.

Proof Techniques

The proof employs several key techniques:

  1. Irreducible decomposition and reduction: Uses the critical point characterization to decompose supermartingale couplings. The key reduction shows it suffices to prove the result for irreducible components where $I = {x : P_\mu(x) < P_\nu(x)}$ is a single interval.

  2. Two-stage regularization: First applies kernel truncation preserving supermartingale property:

    \[W_1(\pi_x^R, \pi_x) \leq 2\int |y| \pi_x(dy)\]

    Then affine contraction with explicit bound:

    \[AW_1(\pi^\alpha, \pi) \leq (1-\alpha)\left(\int |x|\mu(dx) + \int |y|\nu(dy)\right)\]
  3. Localization via adapted Wasserstein lemma: The crucial Lemma 4.2 provides approximation on compact sets. For compact $K \subset \mathbb{R}$ and target measures, constructs sub-couplings $\hat{\pi}^k$ with:

    \[AW_1(\hat{\pi}^k, (1-\varepsilon_k)\pi|_{K \times \mathbb{R}}) \to 0\]
  4. Barycentre correction using free mass: The key technical innovation corrects supermartingale violations by mixing with measures from the left tail. For correction factors:

    \[c_k(x) = \frac{(\int y \hat{\pi}_x^k(dy) - x)^+}{\int (x-y) \nu_{-}^{R,\alpha,k}(dy)}\]

    This ensures $\int y \tilde{\pi}_x^k(dy) \leq x$ while controlling approximation error.

  5. Put potential comparison on complementary sets: The most delicate step proves the domination:

    \[P_{\nu^{R,\alpha,k}} - P_{\mu^k} \geq P_{(1-2\varepsilon)\tilde{\nu}^k} - P_{(1-2\varepsilon)\hat{\mu}^k} \quad \text{on } J^c\]

    This requires careful analysis using definitions of all auxiliary measures, exploiting that the $\varepsilon$ terms provide necessary slack for the inequality.

  6. Strassen-type completion: Uses the decreasing convex order characterization to complete sub-probability measures to full couplings, then applies quantitative Strassen theorems:

    \[\int |z-y| M^k(dz,dy) \leq 2W_1(\varepsilon\nu^k + (1-\varepsilon)\nu^{R,\alpha,k}, \nu^k)\]
  7. Gluing estimate: The final approximation bound combines all steps:

    \[\limsup_{k \to \infty} AW_1(\pi^k, \pi) \leq C\varepsilon\]

    where $C$ depends only on first moments of limiting marginals.

Experiments & Validation

Purely theoretical. The paper contains no numerical experiments or empirical validation.

Empirical validation would involve:

  1. Computing optimal supermartingale couplings numerically for specific marginal pairs
  2. Testing approximation quality as discretization parameters vary
  3. Comparing convergence rates with theoretical predictions
  4. Applying to model-independent pricing problems in mathematical finance with real market data
Limitations & Open Problems

Limitations:

  1. Dimension restriction: Results proven only on $\mathbb{R}$. In higher dimensions, even martingale optimal transport can lose stability properties — RESTRICTIVE (significantly limits applicability to multivariate financial models).

  2. Growth and regularity conditions: Cost function $C$ requires polynomial growth bounds $ C(x,m) \leq K(1 + x ^r + \int y ^r m(dy))$ and continuity in second argument — NATURAL (standard in optimal transport literature).
  3. Wasserstein convergence assumption: Requires $W_r$-convergence of marginals, which is stronger than weak convergence — TECHNICAL (likely removable with more work, as weak convergence plus moment bounds often suffices).

  4. No quantitative rates: Approximation bounds are not explicit in terms of $W_r(\mu^k, \mu)$ and $W_r(\nu^k, \nu)$ — TECHNICAL (refinement of proof techniques could yield rates).

  5. Convexity requirement for monotonicity: Theorem 2.6 requires cost convexity in second argument — NATURAL (standard assumption for monotonicity principles in optimal transport).

    Open problems:

  6. Multidimensional extension: Can stability results extend to $\mathbb{R}^d$ with $d \geq 2$? The martingale case already fails in higher dimensions, suggesting fundamental obstructions.

  7. Quantitative approximation rates: Establish explicit bounds $AW_r(\pi^k, \pi) \leq f(W_r(\mu^k, \mu), W_r(\nu^k, \nu))$ for some function $f$, potentially with optimal dependence on problem parameters.