Theory 3 papers

Theory Digest — May 4, 2026

Today’s Digest at a Glance

Preliminary

Today’s papers explore advanced theoretical frameworks for Markov chain analysis, stochastic optimization, and statistical convergence, introducing novel entropy correction methods and extending classical results to more general settings.

Hypocoercive Entropy Analysis

Hypocoercivity is a fundamental concept in kinetic theory that addresses the challenge of proving exponential convergence to equilibrium for degenerate diffusion processes where the noise does not act directly on all components of the state space. The classical example is the underdamped Langevin equation, where random perturbations only affect velocities but not positions directly, yet the system still converges to equilibrium through the coupling between position and velocity.

The naive approach of applying standard coercivity techniques fails because the generator of the process is not elliptic—it lacks sufficient “mixing” in all directions. Hypocoercivity theory overcomes this by constructing modified entropy functionals that capture the indirect mixing through the system’s internal dynamics. The key insight is to add correction terms to the standard relative entropy that account for the transport of mass along characteristics of the kinetic equation.

Mathematically, instead of studying just the entropy $\text{Ent}_\mu(g) = \int g \log(g/\mu) d\mu$, hypocoercive methods introduce modified functionals of the form $H_\epsilon(g) = \text{Ent}_\mu(g) + \epsilon \mathcal{C}(g)$ where $\mathcal{C}(g)$ is a corrector that couples different components of the state space. The corrector compensates for the degeneracy by measuring how much the current distribution deviates from optimal transport maps, effectively quantifying the “distance from equilibrium” in directions where direct noise is absent.

Wasserstein correctors represent a recent advancement in hypocoercive analysis that use optimal transport theory to construct these correction terms. Rather than ad-hoc geometric constructions, Wasserstein correctors are built from transport maps that optimally move mass from the current distribution to equilibrium. The corrector $\text{COT}(g) = \int \Pi_v(vg) \cdot (x - T_q(x)) \mu_x(dx)$ measures the correlation between velocity currents and spatial displacement from optimal transport, creating a bridge between kinetic theory and optimal transport that enables sharp entropy decay rates.

Reading Guide

The first paper develops sharp hypocoercive bounds using Wasserstein correctors for underdamped Langevin dynamics, establishing fundamental convergence rates for MCMC theory. The second paper extends Merton’s portfolio optimization to rough volatility models with jumps using martingale methods and Riccati BSDEs. The third paper provides variance control techniques for Markov chain weak convergence, connecting to the broader theme of quantitative convergence analysis across different stochastic systems.


A sharp hypocoercive entropy decay estimate for underdamped Langevin dynamics

Authors: Jianfeng Lu · Institution: Duke University · Category: math.AP

Establishes sharp $\sqrt{\rho}$-rate entropy decay for underdamped Langevin dynamics using a novel Wasserstein corrector that couples velocity current to optimal transport displacement.

Tags: hypocoercivity Langevin dynamics entropy methods optimal transport MCMC theory kinetic Fokker-Planck logarithmic Sobolev inequalities

arXiv · PDF

Problem Formulation
  1. Motivation: The underdamped Langevin dynamics is a fundamental model for sampling from high-dimensional distributions and studying nonequilibrium relaxation. Understanding its convergence rate to equilibrium is crucial for both theoretical analysis and practical MCMC algorithms.

  2. Mathematical setup: Consider the underdamped Langevin dynamics on $\mathbb{R}^d \times \mathbb{R}^d$ with position-velocity state $(x,v)$. The invariant measure is:

    \[\mu(dx \, dv) \propto e^{-U(x) - |v|^2/2} dx \, dv\]

    where $U$ is the potential function. The dynamics follows the Fokker-Planck equation:

    \[\partial_t p_t + v \cdot \nabla_x p_t - \nabla U(x) \cdot \nabla_v p_t = \gamma \nabla_v \cdot (v p_t + \nabla_v p_t)\]

    Assumptions:

  3. $U$ is convex on $\mathbb{R}^d$ with logarithmic Sobolev inequality: $\text{Ent}_{\mu_x}(f) \leq \frac{1}{2\rho} \int \frac{ \nabla f ^2}{f} d\mu_x$ where $\mu_x(dx) \propto e^{-U(x)} dx$
  4. $U$ has polynomial growth with tame confining bounds

  5. Toy example: When $d=1$ and $U(x) = \rho x^2/2$ (quadratic potential), the system reduces to an Ornstein-Uhlenbeck process in phase space. The optimal transport displacement $\xi_q(x) = x - T_q(x)$ measures how far the current position marginal is from equilibrium.

  6. Formal objective: Establish exponential decay of relative entropy:

    \[\text{Ent}(p_t | \mu) \leq C e^{-\Lambda t} \text{Ent}(p_0 | \mu)\]

    with optimal rate $\Lambda = O(\sqrt{\rho})$.

Method

The method introduces a modified entropy functional with a nonlinear Wasserstein corrector:

\[H_\epsilon(g) = \text{Ent}_\mu(g) + \epsilon \text{COT}(g)\]

where the corrector is:

\[\text{COT}(g) = \int \Pi_v(vg) \cdot (x - T_q(x)) \mu_x(dx)\]

Here $\Pi_v$ denotes averaging over velocity, $q = \Pi_v g$ is the position marginal, and $T_q$ is the Brenier optimal transport map from $q\mu_x$ to $\mu_x$.

Algorithm steps:

  1. Choose friction parameter $\gamma = \Gamma\sqrt{\rho}$ and corrector weight $\epsilon = \theta\sqrt{\rho}$
  2. Set $\theta = \min{\Gamma/12, 1/(4\Gamma)}$
  3. Establish equivalence: $(1-\theta)\text{Ent}(g) \leq H_\epsilon(g) \leq (1+\theta)\text{Ent}(g)$
  4. Prove Lyapunov inequality: $\frac{d}{dt} H_\epsilon(g_t) \leq -\lambda_\Gamma \sqrt{\rho} H_\epsilon(g_t)$

    Applied to toy example: For quadratic $U(x) = \rho x^2/2$, the Brenier map becomes $T_q(x) = x\sqrt{\mu_x/q}$ and the corrector directly measures the Wasserstein-2 distance between current and target position marginals.

Novelty & Lineage

Step 1 — Prior work: Closest papers include:

  • Dolbeault-Mouhot-Schmeiser (2015): “Hypocoercivity for linear kinetic equations” - established $O(\sqrt{m})$ rates using modified $L^2$ methods where $m$ is Poincaré constant
  • Cao-Lu-Wang (2021): “Space-time Poincaré inequality approach” - achieved sharp $O(\sqrt{m})$ rates for $L^2$ convergence under convexity
  • Villani (2009): “Hypocoercivity memoir” - systematic functional framework using mixed position-velocity correctors

Step 2 — Delta: This paper contributes:

  • First sharp $\sqrt{\rho}$ entropy convergence rate (vs. previous $L^2$ results)
  • Nonlinear Wasserstein corrector replacing standard linear mixed derivatives
  • Constants depending only on LSI constant $\rho$, not auxiliary Sobolev norms
  • Extension from $L^2$ to entropy setting with explicit constants

Step 3 — Theory-specific assessment:

  • Main theorem is not surprising given prior $L^2$ results, but entropy case required genuinely new techniques
  • Proof introduces novel Wasserstein corrector - this is a non-trivial technical innovation beyond assembling known lemmas
  • Achieves optimal $\sqrt{\rho}$ rate matching known lower bounds for this problem class
  • Bounds appear tight given the explicit constants and optimal scaling

Verdict: SIGNIFICANT — clear advance extending sharp hypocoercive rates from $L^2$ to entropy with new Wasserstein technique that entropy theorists should know.

Proof Techniques

Main proof uses modified entropy method with Wasserstein corrector. Key steps:

  1. Corrector bound: Establishes

    \[|\text{COT}(g)| \leq \rho^{-1/2} \text{Ent}(g)\]

    using Talagrand’s inequality and current estimates.

  2. Wasserstein acceleration inequality: For position marginal $\nu_t = q_t \mu_x$, proves

    \[\frac{d}{dt} \text{COT}(g_t) \leq J(g_t) + \int \xi_t \cdot \left[\partial_t j_t - \nabla_x^* \left(\frac{j_t \otimes j_t}{q_t}\right)\right] d\mu_x\]

    using Benamou-Brenier formula and characteristic flow analysis.

  3. Stress estimate for Brenier maps: Key technical inequality

    \[-A(q) + S(q,\Theta) \leq \frac{1}{\beta} \text{Ent}_v(g) - \text{Ent}_x(q)\]

    where $A(q) = \int \nabla q \cdot \xi_q d\mu_x$ and $S(q,\Theta) = \int \xi_q \cdot \nabla_x^* \Theta d\mu_x$. Uses:

    • Alexandrov second derivative: $DT = G(x)dx + D^s T$ with $G \geq 0$
    • Monge-Ampère identity: $q(x)r(x) = r(T(x))\det G(x)$
    • BV integration by parts for singular transport maps
  4. Lyapunov inequality closure: Combining above gives

    \[\frac{d}{dt} H_\epsilon(g_t) \leq -\sqrt{\rho}\left[\frac{\Gamma}{2} I_v(g_t) + \frac{\theta}{2} \text{Ent}_x(q_t)\right]\]
  5. Approximation argument: Uses Hérau-Nier smoothing to extend from regular solutions to arbitrary finite-entropy initial data.

Experiments & Validation

Purely theoretical. Empirical validation would involve:

  • Numerical verification of the $\sqrt{\rho}$ scaling for various potentials
  • Comparison with overdamped Langevin (rate $\rho$) showing acceleration
  • Testing constants’ dependence on dimension $d$ and friction parameter $\Gamma$
  • MCMC sampling experiments on high-dimensional log-concave targets
Limitations & Open Problems

Limitations:

  1. Convexity assumption on $U$ - RESTRICTIVE (rules out multimodal distributions, many ML applications)
  2. Polynomial growth bounds (Assumption 2.2) - TECHNICAL (used only for regularity approximation)
  3. Logarithmic Sobolev inequality assumption - NATURAL (standard in this literature)
  4. Explicit constants may not be optimal - TECHNICAL (proof technique limitation)

    Open problems:

  5. Extension to non-convex potentials while maintaining $\sqrt{\rho}$ rates
  6. Removing polynomial growth assumptions or replacing with weaker tail conditions

Optimal Merton’s Problem under Multivariate Affine Volterra Models with Jumps

Authors: Sigui Brice Dro, Emmanuel Gnabeyeu · Institution: Sorbonne Université · Category: math.OC

Extends Merton’s portfolio optimization to multivariate rough volatility models with jumps using martingale optimality principle and Riccati BSDEs with jumps.

Tags: portfolio optimization rough volatility Volterra processes backward stochastic differential equations jumps Merton problem affine models fractional processes

arXiv · PDF

Problem Formulation

Motivation: This paper extends Merton’s portfolio optimization to multivariate Volterra models with jumps. Classical approaches fail because these models are non-Markovian and non-semimartingale, preventing use of Hamilton-Jacobi-Bellman equations.

Mathematical setup: Consider a financial market with $d+1$ assets: one bond $S^0_t = S^0_0 e^{\int_0^t r(s)ds}$ and $d$ risky assets following:

\[dS^i_t = S^i_t(r(t) + \theta_i \sqrt{V^i_t})dt + S^i_t \sqrt{V^i_t} dB^i_t\]

The volatility process $V = (V^1, \ldots, V^d)^T$ follows a multivariate affine Volterra equation:

\[V_t = \phi(t)V_0 + \int_0^t K(t-s)[\mu(s) + DV_s]ds + \sigma^v \varsigma(s)\sqrt{\text{diag}(V_s)}dW_s + \int_E \eta(e) \tilde{N}(ds, de)\]

Here:

  1. $K = \text{diag}(K_1, \ldots, K_d)$ are completely monotone kernels
  2. $D$ is a matrix with $D_{ij} \geq 0$ for $i \neq j$
  3. $\tilde{N}(dt, de) = N(dt, de) - \xi(V_t, de)dt$ is compensated Poisson random measure
  4. $\xi(V_t, de) = \nu_0(de) + \sum_{i=1}^d V^i_t \nu_i(de)$

    Toy example: When $d=2$, $K_i(t) = t^{\alpha_i-1}/\Gamma(\alpha_i)$ with $\alpha_1 = 0.6, \alpha_2 = 0.9$, and $\eta(e) = \kappa e \mathbf{1}_{e>0}$, this captures rough volatility with different Hurst parameters and upward jumps.

    Formal objective: Maximize expected utility:

    \[V(x_0, V_0) = \sup_{\alpha \in \mathcal{A}} E[U(X^{\alpha}_T)]\]

    where $U$ is exponential, power, or logarithmic utility.

Method

The method uses the martingale optimality principle by constructing supermartingale processes $J^{\alpha}_t$ satisfying:

For exponential utility $U(x) = -\frac{1}{\gamma}e^{-\gamma x}$:

\[J^{\alpha}_t = -\frac{1}{\gamma}\exp(-\gamma x_0 e^{\int_0^T r(s)ds})\exp\left(-\gamma \int_0^t e^{\int_s^T r(u)du}[\alpha_s^T dB_s + \alpha_s^T \lambda_s ds] + \gamma Y_t\right)\]

The triplet $(Y, \Lambda, U)$ solves a Riccati BSDE with jumps:

\[dY_t = -\left[\frac{1}{2\gamma}|\lambda_t + \gamma \Sigma \Lambda_t|^2 - \frac{\gamma}{2}|\Lambda_t|^2 - \int_E h_{\gamma}(U_t(e))\xi(V_t, de)\right]dt + \Lambda_t^T dW_t + \int_E U_t(e) \tilde{N}(dt, de)\]

where $h_{\gamma}(x) = \frac{e^{\gamma x} - \gamma x - 1}{\gamma}$.

The solution is characterized by:

\[\Lambda^i_t = \sigma^v_i \varsigma_i(t) \psi_i(T-t) \sqrt{V^i_t}\] \[U_t(e) = \psi(T-t)^T \eta(e)\]

where $\psi$ solves the inhomogeneous Riccati-Volterra equation:

\[\psi_i(t) = \int_0^t K_i(t-s)\left[\zeta_i - \frac{\theta_i^2}{2\gamma} + F_i(T-s, \psi(s))\right]ds\]

Applied to toy example: For the 2D rough Heston with jumps, $\psi_1, \psi_2$ solve coupled fractional Riccati equations with different $\alpha_i$ parameters, producing optimal strategies that depend on both roughness and jump intensity.

Novelty & Lineage

Prior work:

  1. Abi Jaber et al. (2019) - rough Heston models without jumps, single utility
  2. Gnabeyeu (2026) - multivariate fake stationary Volterra models, no jumps
  3. Hernandez & Warin (2020) - rough Heston portfolio optimization, exponential utility only

    Delta: This paper adds jump components to multivariate affine Volterra models while solving Merton problems for three utility types (exponential, power, logarithmic) simultaneously.

    Theory-specific assessment:

    • Main theorem extends known BSDE techniques to jump-diffusion Volterra setting
    • Proof technique combines Volterra integral equation theory with backward SDE methods for jumps
    • The Riccati BSDE with jumps is new but follows predictable extension pattern
    • Bounds are not compared to known lower bounds (none mentioned)

    The mathematical development is solid but represents a natural combination of existing techniques rather than fundamentally new insights. The affine structure makes the problem tractable through known exponential-affine representations.

    Verdict: INCREMENTAL — extends known rough volatility portfolio optimization to include jumps using standard BSDE techniques.

Proof Techniques

The proof strategy involves three main stages:

  1. Martingale Property via Generalized Novikov Criterion: Key inequality for stochastic exponential:

    \[E\left[\exp\left(\frac{1}{2}\langle M^c \rangle_T + \int_0^T \int_E [(U_s(e)-1)e^{U_s(e)} + 1]\xi(V_s, de)ds\right)\right] < \infty\]
  2. Riccati BSDE with Jumps Construction: The driver has quadratic growth in $\Lambda$ and exponential growth in jumps:

    \[f(t, y, \Lambda, U) = \frac{1}{2\gamma}|\lambda_t + \gamma\Sigma\Lambda_t|^2 - \frac{\gamma}{2}|\Lambda_t|^2 - \int_E h_{\gamma}(U_t(e))\xi(V_t, de)\]

    The key exponential integrability condition:

    \[E\left[\exp\left(a(p)\int_0^T (|\lambda_s|^2 + |\Lambda_s|^2)ds\right)\right] < \infty\]
  3. Volterra Equation Analysis: Existence of solution $\psi \in C([0,T], \mathbb{R}^d)$ to:

    \[\psi_i(t) = \int_0^t K_i(t-s)[c_i + F_i(T-s, \psi(s))]ds\]
    Technical insight: The adjusted forward process $g_t(s) = E[V_s - \int_s^t K(s-u)DV_u du \mathcal{F}_t]$ preserves affine structure despite non-Markovian dynamics, enabling explicit Laplace transforms via exponential-affine representation.
Experiments & Validation

Numerical Setup: Two-dimensional rough Heston with jumps using fractional kernels $K_i(t) = t^{\alpha_i-1}/\Gamma(\alpha_i)$ with $\alpha_1 = 0.6, \alpha_2 = 0.9$ (Hurst parameters $H_1 = 0.1, H_2 = 0.4$). Parameters: $V_0 = (0.01, 0.03)^T$, $\mu_0 = (2.0, 2.5)^T$, $D = \text{diag}(-0.2, -0.6)$, jump intensity $\beta > 0$ with upward-only jumps $\eta(e) = \kappa e \mathbf{1}_{e>0}$.

Numerical Methods:

  • Volterra process simulation via K-integrated Euler-Maruyama scheme
  • Riccati-Volterra equations solved using fractional Adams-Bashforth-Moulton method
  • Time horizon $T = 1$ year with $n = 200$ time steps

Key Results:

  • Rougher volatility (smaller $H$) leads to more negative $\psi$ values for exponential utility
  • Optimal investment demand decreases with roughness for power utility, increases for exponential utility
  • Jump arrivals cause instantaneous volatility spikes followed by slow decay
  • Risk aversion parameter $\gamma$ significantly affects strategy sensitivity to roughness
Limitations & Open Problems

Limitations:

  1. TECHNICAL: Assumption 3.1 requires exponential integrability conditions and bounded risk premium parameters - needed for martingale property proofs but likely removable with more sophisticated techniques

  2. TECHNICAL: Completely monotone kernel assumption restricts to specific fractional and exponential decay forms - standard in Volterra literature but excludes some interesting cases

  3. RESTRICTIVE: Jump structure limited to affine dependence on volatility levels $\xi(V_t, de) = \nu_0(de) + \sum_i V^i_t \nu_i(de)$ - significantly narrows model flexibility

  4. NATURAL: Continuous stock price assumption with jumps only in volatility - widely used in rough volatility literature

  5. TECHNICAL: Power utility requires finite explosion time $T_{\max}$ constraint - artifact of quadratic Riccati growth

    Open Problems:

  6. Extend to jump-diffusion stock prices with state-dependent jump intensities
  7. Develop portfolio optimization for general (non-affine) Volterra volatility models using alternative techniques

Implications of weak convergence rates of Markov transition kernels

Authors: Austin Brown · Institution: Texas A&M University · Category: math.ST

Extends weak convergence bounds of Markov kernels to explicit variance control for unbounded Lipschitz functions via truncation techniques.

Tags: markov-chains mcmc convergence-rates weak-convergence wasserstein-distance central-limit-theorems high-dimensional-statistics metropolis-hastings

arXiv · PDF

Problem Formulation

Motivation: Markov chain Monte Carlo (MCMC) methods are fundamental in statistics and machine learning, but obtaining convergence bounds for high-dimensional problems is challenging. Traditional techniques based on total variation bounds often fail or degenerate in high dimensions, while weak convergence bounds (convergence for bounded Lipschitz functions) are easier to establish but provide limited guarantees.

Mathematical setup: Let $X$ be a metric space with metric $d(\cdot,\cdot): X \times X \to \mathbb{R}_+$. Let $\Pi$ be the target probability measure and $(P_t)_{t \in T}$ be Markov transition kernels with unique invariant measure $\Pi$. For the bounded Lipschitz (BL) norm:

\[\|P_t(x,\cdot) - \Pi\|_{BL(d)} = \sup\left\{|P_t f(x) - \int_X f d\Pi| : \|f\|_{Lip(d)} + \|f\|_\infty \leq 1\right\}\]
where $|f|_{Lip(d)} = \sup_{x,y \in X, x \neq y} f(y) - f(x) /d(x,y)$.

Assumption 1: There exist $M: X \to [1,\infty)$ and rate function $R: T \to (0,1]$ strictly decreasing to 0 such that:

\[\|P_t(x,\cdot) - \Pi\|_{BL(d)} \leq M(x) R(t)\]

The spread-to-fluctuation ratio (SFR) of order $p \geq 1$ for function $f: X \to \mathbb{R}$:

\[SFR_p(f) = \frac{\|f - \int_X f d\Pi\|_{L^p(\Pi)}}{\|f\|_{Lip(d)}}\]

Toy example: When $X = \mathbb{R}^d$ with $d(x,y) = |x-y|_2$, $\Pi = \mathcal{N}(0,I_d)$, and $f(x) = x_1$ (first coordinate), then $SFR_2(f) = 1$. The main difficulty is extending convergence from bounded to unbounded Lipschitz functions.

Formal objective: Control the variance of conditional expectations:

\[\|P_t f - \int_X f d\Pi\|^2_{L^2(\Pi)} = \int_X \left|P_t f - \int_X f d\Pi\right|^2 d\Pi\]
Method

The main method extends weak convergence bounds to variance control via truncation techniques.

Key steps:

  1. For Lipschitz function $\varphi$ with $\int \varphi d\Pi = 0$ and $|\varphi|_{Lip(d)} \leq 1$, define truncation $\psi_r = (-r) \vee (r \wedge \varphi)$

  2. Decompose using convexity:

    \[\int_X |P_t \varphi|^q d\Pi \leq 2^{q-1} \int_X |\varphi - \psi_r + \int \psi_r d\Pi|^q d\Pi + 2^{q-1} \int_X |P_t \psi_r - \int \psi_r d\Pi|^q d\Pi\]
  3. Control bounded term using Assumption 1:

    \[\int_X |P_t \psi_r - \int \psi_r d\Pi|^q d\Pi \leq 2^{q-1} \int_X M^{q-1} d\Pi \, R(t)^{q-1} (r + r^q)\]
  4. Control tail term using Hölder inequality:

    \[\int_X |\varphi - \psi_r + \int \psi_r d\Pi|^q d\Pi \leq \frac{2^q \int_X |\varphi|^{qp} d\Pi}{r^{q(p-1)}}\]
  5. Optimize over truncation level:

    \[r = \left(\frac{2(p-1) \int_X |\varphi|^{qp} d\Pi}{\int_X M^{q-1} d\Pi \, R(t)^{q-1}}\right)^{1/(qp)}\]

    Main result (Theorem 2): For $q \geq 2$, $p > 1$, and Lipschitz $f$:

    \[\|P_t f - \int_X f d\Pi\|^q_{L^q(\Pi)} \leq c_1 \left(\int_X M^{q-1} d\Pi\right)^{1-1/(qp)} R(t)^{(q-1)(1-1/p)} \|f - \int_X f d\Pi\|^q_{L^{qp}(\Pi)}\]

    Toy example application: For $f(x) = x_1$ with $\mathcal{N}(0,I_d)$, if $M(x) = 1 + |x|^2$ and $R(t) = e^{-\kappa t}$, then:

    \[\|P_t f - \mathbb{E}[f]\|^2_{L^2(\Pi)} \leq C e^{-\kappa t(1-1/p)}\]
Novelty & Lineage

Prior work:

  1. Bakry, Cattiaux & Guillin (2008): Rate of convergence for ergodic Markov processes using Lyapunov vs Poincaré - achieved $L^2(\Pi)$ bounds but required total variation convergence

  2. Durmus, Fort & Moulines (2016): Subgeometric convergence in Wasserstein distance - established weak convergence bounds but only for bounded Lipschitz functions

  3. Röckner & Wang (2001): Weak Poincaré inequalities and $L^2$-convergence - connected spectral gaps to weak convergence but in reversible geometric case only

    Delta: This paper extends weak convergence bounds (which only control bounded Lipschitz functions) to explicit variance bounds for unbounded Lipschitz functions. The key innovation is the truncation technique that optimally balances tail control vs bounded approximation error.

    Theory-specific assessment:

    • Main theorem: Not particularly surprising - the truncation approach is natural, though the optimal choice of truncation level and resulting rates are non-obvious
    • Proof technique: Routine assembly of known truncation methods, convexity, and Hölder inequalities - no fundamentally new technique
    • Bound tightness: No lower bounds provided to assess tightness of the $(1-1/p)$ rate degradation

    The equivalence result (Proposition 5) characterizing weak convergence through $L^2$ bounds is more interesting but still follows standard duality arguments.

    Verdict: INCREMENTAL — solid extension of existing weak convergence theory but using predictable truncation techniques with expected rate degradation.

Proof Techniques

Main proof strategy (Theorem 2):

  1. Truncation decomposition: For $\varphi$ with $|\varphi|_{Lip(d)} \leq 1$, define $\psi_r = (-r) \vee (r \wedge \varphi)$ and use convexity:

    \[\int_X |P_t \varphi|^q d\Pi \leq 2^{q-1} \int_X |\varphi - \psi_r + \int \psi_r d\Pi|^q d\Pi + 2^{q-1} \int_X |P_t \psi_r - \int \psi_r d\Pi|^q d\Pi\]
  2. Bounded term control: Apply Assumption 1 to $\psi_r$ (bounded Lipschitz):

    \[\int_X |P_t \psi_r - \int \psi_r d\Pi|^q d\Pi \leq 2^{q-1} \int_X M^{q-1} d\Pi \, R(t)^{q-1} (r + r^q)\]
  3. Tail term control using Hölder: Key inequality:

    \[\int_X |\varphi - \psi_r + \int \psi_r d\Pi|^q d\Pi \leq 2^q \int_{|\varphi| > r} |\varphi|^q d\Pi\]

    Apply Hölder with exponents $p$ and $p/(p-1)$:

    \[\int_{|\varphi| > r} |\varphi|^q d\Pi \leq \left(\int_X |\varphi|^{qp} d\Pi\right)^{1/p} \Pi(|\varphi| > r)^{1-1/p}\]
    Then Markov inequality: $\Pi( \varphi > r) \leq |\varphi|_{L^{qp}(\Pi)}^{qp} / r^{qp}$
  4. Optimization: Balance the two error terms by choosing:

    \[r = \left(\frac{2(p-1) \int_X |\varphi|^{qp} d\Pi}{\int_X M^{q-1} d\Pi \, R(t)^{q-1}}\right)^{1/(qp)}\]
  5. Rate combination: Substituting optimal $r$ yields the rate $R(t)^{(q-1)(1-1/p)}$ with integrability requirement $\int_X M^{q-1} d\Pi < \infty$.

    The reversible case improvement uses the semigroup property $P_{2t} = P_t^2$ to get better constants.

Experiments & Validation

Purely theoretical. The paper provides three applications demonstrating the theory:

  1. Preconditioned Crank-Nicolson MCMC: Shows chi-squared divergence bounds for infinite-dimensional Metropolis-Hastings where total variation techniques fail

  2. Stochastic Gradient Descent: Establishes polynomial convergence rates $O(t^{-(p-1)/(2\alpha)})$ for locally strongly convex objectives with tail condition

  3. Stochastic delay equations: Proves exponential convergence when drift and minorization conditions are unavailable

    Empirical validation would require implementing these algorithms and measuring empirical variance decay rates compared to the theoretical bounds, particularly testing the dependence on the spread-to-fluctuation ratio and integrability conditions.

Limitations & Open Problems

Limitations:

  1. Integrability requirement: Need $\int_X M^{q-1} d\Pi < \infty$ - TECHNICAL (often satisfied when Assumption 1 holds via drift conditions, but not automatic)

  2. Rate degradation: Get $R(t)^{(q-1)(1-1/p)}$ instead of $R(t)^{q-1}$ - NATURAL (expected cost of handling unbounded functions)

  3. Higher moment requirement: Functions must be in $L^{qp}(\Pi)$ rather than $L^q(\Pi)$ - TECHNICAL (price for truncation approach, could potentially be improved)

  4. SFR condition: First result requires $t \geq R^{-1}(SFR_{qp}(f)^{qp/(q-1)})$ - RESTRICTIVE (makes the bound non-uniform in function class)

  5. No reversibility improvement for general $q$: The $R(2t)$ improvement only works when $q=2$ - TECHNICAL (limitation of current proof technique)

    Open problems:

  6. Optimal rate degradation: Is the $(1-1/p)$ exponent loss optimal, or can it be improved? No lower bounds provided.

  7. Moment requirement: Can the $L^{qp}(\Pi)$ requirement be reduced to $L^q(\Pi)$ while maintaining constructive bounds?