Mar 28, 2026 Theory 3 papers

Theory Digest — Mar 28, 2026

Today’s Digest at a Glance

Today’s digest explores conformal prediction for causal inference, Bayesian optimization regret analysis, and multimarginal optimal transport discretization methods.

Conformal Prediction

Conformal prediction is a distribution-free method for constructing prediction intervals with finite-sample coverage guarantees. The core challenge is that we want conditional coverage P(Y ∈ C(X)

X = x) ≥ 1 - α for any x, but standard conformal methods only guarantee marginal coverage P(Y ∈ C(X)) ≥ 1 - α. The naive approach of applying split conformal prediction directly often fails to achieve conditional coverage because the calibration scores may not be identically distributed conditional on X.

The key insight is to reformulate conditional coverage as marginal coverage over a carefully chosen family of distributions. For instrumental variable settings, we define a shift family over instrument values and apply conformal prediction to achieve coverage uniformly over this family. The method works by computing nonconformity scores on a calibration set, then using quantiles of these scores to construct prediction intervals. This transforms the impossible conditional coverage problem into a tractable marginal coverage problem over the practitioner’s chosen shift family.

Spectral Risk Measures

Spectral risk measures are functionals that evaluate the risk of a random variable by weighting its tail quantiles according to a spectral function φ. Unlike coherent risk measures like CVaR which weight tail quantiles uniformly, spectral risk measures allow flexible risk preferences by varying the weight function φ(u) over the quantile level u ∈ [0,1]. For a random variable X with distribution function F, the spectral risk measure is R_φ(X) = ∫₀¹ φ(u)F⁻¹(u)du.

The challenge in multimarginal optimal transport with spectral risk objectives is that we need to optimize over the space of probability measures satisfying multiple marginal constraints simultaneously. The naive approach of discretizing each marginal separately leads to exponential complexity in the number of marginals. Spectral risk measures add another layer of complexity because they involve integration over quantiles rather than simple expectations.

The key mathematical insight is to discretize the admissible couplings using uniformly weighted point clouds and enforce marginal constraints through Wasserstein penalization. This transforms the infinite-dimensional optimization problem over measures into a finite-dimensional problem over particle locations, while the spectral risk objective becomes a finite sum over the particles weighted by the spectral function.

Reading Guide

The first paper advances conformal prediction from standard regression to causal settings with instrumental variables, providing a bridge between distribution-free inference and causal inference. The second paper provides the first theoretical analysis of practical Efficient Global Optimization, filling a gap between theory and practice in Bayesian optimization. The third paper tackles numerical methods for a challenging class of optimal transport problems that combine multiple marginal constraints with complex risk objectives.

Conformal Prediction for Nonparametric Instrumental Regression

Authors: Masahiro Kato · Institution: University of Tokyo · Category: econ.EM

Extends conformal prediction to instrumental variable regression by reformulating conditional coverage as marginal coverage over practitioner-chosen IV shift families, with finite-sample guarantees.

Tags: conformal-prediction instrumental-variables nonparametric-regression causal-inference prediction-intervals covariate-shift conditional-coverage distribution-free-inference

arXiv · PDF

Problem Formulation

Motivation: Nonparametric instrumental variable (NPIV) regression is essential for causal inference when regressors are endogenous. Existing methods focus on estimating the structural function $h_0$, but practitioners need prediction intervals for future outcomes $Y$ with finite-sample guarantees.
Mathematical setup: Let $Y \in \mathcal{Y} \subseteq \mathbb{R}$ be a scalar outcome, $X \in \mathcal{X} \subseteq \mathbb{R}^{k_X}$ be an endogenous variable, and $Z \in \mathcal{Z} \subseteq \mathbb{R}^{k_Z}$ be instrumental variables. The data-generating process follows:
\[Y = h_0(X) + \varepsilon\]
where $h_0: \mathcal{X} \to \mathcal{Y}$ is the structural function and $\varepsilon$ satisfies:
\[E[\varepsilon | Z] = 0\]
Assumptions:
1. $(Y, X, Z)$ follow joint distribution $P$
2. Observations ${(Y_i, X_i, Z_i)}_{i=1}^n$ are i.i.d. copies
3. $\varepsilon$ is sub-Gaussian with mean zero
4. $h_0$ is uniquely identified under the conditional moment restriction

Toy example: When $d_X = d_Z = 1$, $Z \sim \text{Unif}[-1,1]$, and $h_0(x) = x^2$, the challenge is constructing intervals $\hat{C}(x,z)$ such that $P(Y_{n+1} \in \hat{C}(X_{n+1}, Z_{n+1})

Z_{n+1} = z) = 1-\alpha$ for all $z$, despite endogeneity preventing direct regression of $Y$ on $X$.

Formal objective: Construct prediction intervals satisfying:
\[P(Y_{n+1} \in \hat{C}(X_{n+1}, Z_{n+1}) | Z_{n+1} = z) = 1-\alpha, \quad \forall z \in \mathcal{Z}\]

Method

The method reformulates conditional coverage as marginal coverage over IV shifts and applies conformal prediction. Three radius classes are considered:

$(X,Z)$-indexed radius $\mathcal{T}_{XZ}$: Intervals of form:
\[\hat{C}_{XZ}(x,z) = [\hat{h}(x) - \hat{\tau}_{XZ}(x,z), \hat{h}(x) + \hat{\tau}_{XZ}(x,z)]\]
$Z$-indexed radius $\mathcal{T}_Z$ (IV-CCP):
\[\hat{C}_Z(x,z) = [\hat{h}(x) - \hat{\tau}_Z(z), \hat{h}(x) + \hat{\tau}_Z(z)]\]
Algorithm steps:
Split data into training $I_{tr}$ and calibration $I_{cal}$
Fit NPIV estimator $\hat{h}$ on $I_{tr}$
Compute nonconformity scores $S_i = Y_i - \hat{h}(X_i) $ for $i \in I_{cal}$
Solve augmented quantile regression:
\[\hat{g}_s \in \arg\min_{g \in \mathcal{G}} \frac{1}{m+1}\left[\sum_{i \in I_{cal}} \rho_\alpha(S_i - g(Z_i)) + \rho_\alpha(s - g(z))\right]\]
where $\rho_\alpha(u) = \alpha \max{u,0} + (1-\alpha)\max{-u,0}$
Set $\hat{\tau}_Z(z)$ as largest accepted $s$
1. $X$-indexed radius $\mathcal{T}_X$: Uses importance weighting to convert conditional coverage to weighted unconditional moments, then applies weighted conformal recalibration.
Toy example application: For $d_X = d_Z = 1$ with linear features $\phi(z) = (1,z)$, the method protects against all density ratio tilts of form $f(z) = \beta_0 + \beta_1 z$ with $\beta_0, \beta_1 \geq 0$.

Novelty & Lineage

Prior work:

Conformal prediction (Vovk et al. 2005): Distribution-free finite-sample coverage under exchangeability, but exact conditional coverage impossible
Conditional conformal guarantees (Gibbs et al. 2025): Finite-dimensional conditional conformal machinery for covariate shifts
Weighted conformal prediction (Tibshirani et al. 2019): Single fixed target shift with known density ratios

Delta: This paper specializes conditional conformal methods to instrumental variable settings, reformulating IV conditional coverage as marginal coverage over practitioner-chosen IV shift classes.

Theory assessment:
- Main theorem is a straightforward application of Gibbs et al. (2025) to IV settings - not surprising given prior conditional conformal work
- Proof technique assembles known lemmas from conformal prediction literature with importance weighting ideas from Kato et al. (2022)
- No lower bounds provided to assess tightness of $d/(m+1)$ calibration complexity scaling
- The importance-weighted approach for $X$-indexed intervals requires density ratio estimation and loses distribution-free property
Verdict: INCREMENTAL — Solid application of existing conditional conformal machinery to IV problems, but represents expected extension rather than fundamental advance.

Proof Techniques

Moment characterization: Uses the equivalence between conditional coverage and infinite-dimensional moment conditions:
\[P(Y_{n+1} \in \hat{C}(X_{n+1}, Z_{n+1}) | Z_{n+1} = z) = 1-\alpha \iff E[f(Z_{n+1})[1\{Y_{n+1} \in \hat{C}(X_{n+1}, Z_{n+1})\} - (1-\alpha)]] = 0\]
for all bounded measurable $f$.
Finite-dimensional relaxation: Replaces full measurable class with user-specified $\mathcal{F} \subset \mathcal{M}$ and converts to one-sided inequality:
\[E[f(Z_{n+1})[1\{Y_{n+1} \in \hat{C}(X_{n+1}, Z_{n+1})\} - (1-\alpha)]] \geq 0, \quad \forall f \in \mathcal{F}\]
Augmented quantile regression: For each test point, solves LP dual of:
\[\min_{g \in \mathcal{G}} \frac{1}{m+1}\left[\sum_{i \in I_{cal}} \rho_\alpha(S_i - g(Z_i)) + \rho_\alpha(s - g(z))\right]\]
Uses exchangeability of calibration scores conditional on training split.
Stability analysis: Key bound showing robustness to NPIV estimation error:
\[|\hat{\tau}_Z(z) - \hat{\tau}_Z^*(z)| \leq \|\hat{h} - h_0\|_\infty\]
Length inflation bound: Under regularity conditions, calibration contributes:
\[|\hat{\tau}_Z^*(z) - \tau_0(z)| \leq \frac{d}{(m+1)p_{\min}} + o_p(1)\]
where $\tau_0(z)$ is oracle conditional quantile and $p_{\min}$ is minimum conditional density.

Experiments & Validation

Datasets: Three synthetic NPIV designs with increasing difficulty:

Dataset 1: $d_X = d_Z = 1$, nonlinear structural function and heteroskedastic errors
Dataset 2: $d_X = 3, d_Z = 1$, higher-dimensional endogenous variables
Dataset 3: $d_X = d_Z = 3$, fully multivariate setting

Baselines: Different radius models (bins, linear, RKHS, MLP) across three radius classes (XZ-indexed, Z-indexed, X-indexed).

Key results:
- Coverage ratios consistently around target 90% level
- Z-indexed (IV-CCP) provides good coverage with moderate interval lengths
- X-indexed intervals show wider variation and longer lengths
- All methods robust across different IV shift patterns (observed, linear tilt, local tilt, step tilt)
Coverage ratios: Range 0.889-0.922 across experiments, demonstrating empirical validity of finite-sample guarantees.

Limitations & Open Problems

Limitations:

Finite-dimensional feature maps only - RESTRICTIVE (limits to simple shift patterns, excludes complex nonlinear IV distribution changes)
X-indexed method requires density ratio estimation - TECHNICAL (loses distribution-free property, adds estimation error)
Single-shift guarantee for X-indexed class - RESTRICTIVE (cannot handle simultaneous coverage over shift families)
Regularity conditions for length bounds - TECHNICAL (local density lower bounds, oracle quantile transfer assumptions)
No lower bounds provided - NATURAL (common in conformal prediction literature)

Open problems:
Exact finite-sample family-wise coverage for X-indexed prediction intervals C(X) under IV shift classes
Adaptive feature map selection that balances approximation quality against calibration complexity in high-dimensional IV settings

Practical Efficient Global Optimization is No-regret

Authors: Jingyi Wang, Haowei Wang, Nai-Yuan Chiang, Juliane Mueller et al. (6 authors) · Institution: Lawrence Livermore National Laboratory · Category: stat.ML

First cumulative regret bound for practical EGO showing it achieves O(T^1/2√γ_T log^1/2(T)) regret, proving the widely-used algorithm is no-regret.

Tags: bayesian-optimization gaussian-processes regret-bounds expected-improvement no-regret-algorithms numerical-stability

arXiv · PDF

Problem Formulation

Motivation: Efficient Global Optimization (EGO) is one of the most widely used noise-free Bayesian optimization algorithms. In practice, a small positive nugget $\epsilon > 0$ is added to the covariance matrix for numerical stability, creating “practical EGO.” Despite widespread use, cumulative regret bounds for practical EGO remained unestablished, leaving open whether it is a no-regret algorithm.
Mathematical setup: Consider optimization problem:
\[\min_{x \in C} f(x)\]
where $C \subset \mathbb{R}^d$ is compact and $f: \mathbb{R}^d \to \mathbb{R}$ lies in RKHS $H_k(C)$ with $|f|_{H_k} \leq B$. The kernel satisfies $k(x,x’) \leq 1$ and $k(x,x) = 1$. Practical EGO uses GP posterior:
\[\mu_t(x) = k_t(x)^T (K_t + \epsilon I)^{-1} f_{1:t}\] \[\sigma_t^2(x) = k(x,x) - k_t(x)^T (K_t + \epsilon I)^{-1} k_t(x)\]
where $\epsilon > 0$ is the nugget. The Expected Improvement acquisition function is:
\[EI_t(x) = (f_t^+ - \mu_t(x))\Phi(z_t(x)) + \sigma_t(x)\phi(z_t(x))\]
where $z_t(x) = \frac{f_t^+ - \mu_t(x)}{\sigma_t(x)}$ and $f_t^+ = \min_{i=1,…,t} f(x_i)$.

Assumptions:
1. $f \in H_k(C)$ with $|f|_{H_k} \leq B$
2. Kernel is bounded: $k(x,x’) \leq 1$, $k(x,x) = 1$
3. Domain $C$ is compact
4. Nugget $\epsilon > 0$
Toy example: When $d=2$, $C = [-1,1]^2$, squared exponential kernel with length scale $l=1$, and $\epsilon = 10^{-6}$, the posterior variance at any point $x$ satisfies $\sigma_t(x) \geq \sqrt{\epsilon/(t+\epsilon)}$, ensuring positive uncertainty even at observed points.
Formal objective: Establish cumulative regret bound:
\[R_T = \sum_{t=1}^T (f(x_t) - f(x^*))\]

Method

The method analyzes practical EGO through novel regret decomposition. Key steps:

Derive instantaneous regret bound by considering two cases: $f_{t-1}^+ \leq f(x_t)$ and $f_{t-1}^+ > f(x_t)$
For case 1, bound regret as:
\[r_t \leq \mu_{t-1}(x_t) - f_{t-1}^+ + c_B EI_{t-1}(x_t) + B\sigma_{t-1}(x_t)\]
Establish positive lower bound on $EI_{t-1}(x_t)$ using nugget:
\[EI_{t-1}(x_t) \geq \tau(-B)\sqrt{\frac{\epsilon}{t+\epsilon}}\]
This yields the instantaneous regret bound:
\[r_t \leq c_{B1}\max\{f_{t-1}^+ - f(x_t), 0\} + (c_{B\epsilon}(\epsilon,t) + B + c_B(B + \phi(0)))\sigma_{t-1}(x_t)\]
where $c_{B\epsilon}(\epsilon,t) = \log^{1/2}(\frac{t+\epsilon}{2\pi\epsilon\tau^2(-B)})$.
Sum over $T$ iterations using telescoping argument for exploitation terms and maximum information gain $\gamma_T$ for exploration terms.

Applied to toy example: With SE kernel, $d=2$, $\epsilon=10^{-6}$, the bound becomes $O(T^{1/2}\log^{2}(T))$, confirming no-regret property as $R_T/T \to 0$.

Novelty & Lineage

Prior work:

“Information-theoretic regret bounds for Gaussian process optimization in the bandit setting” (Srinivas et al., 2010) - established regret bounds for UCB but not EI
“Convergence rates of efficient global optimization algorithms” (Bull, 2011) - proved simple regret bounds for EGO, not cumulative regret
Various modified EI papers (Wang & Jegelka 2017, Nguyen et al. 2017) - required artificial modifications to EI for regret analysis

Delta: This paper provides the first cumulative regret bound for practical EGO as actually implemented, without modifications to the algorithm. Key technical innovation is showing how the nugget $\epsilon > 0$ enables the analysis by providing positive lower bounds on $EI_{t-1}(x_t)$.

Theory-specific assessment:
- Main theorem is somewhat predictable given known techniques, but the specific challenge of analyzing unmodified EI with nugget required novel technical insights
- Proof technique combines standard GP confidence bounds with new exploitation term telescoping - not entirely routine due to EI’s non-convex nature
- Bounds match expected rates $O(T^{1/2}\sqrt{\gamma_T})$ but constants depend on nugget in complex ways
- No known lower bounds for comparison
The analysis reveals fundamental role of nugget in making EGO theoretically tractable - without it, the current proof framework breaks down.

Verdict: INCREMENTAL — solid extension proving no-regret property for widely-used algorithm, but uses expected techniques and rates.

Proof Techniques

Main proof strategy uses novel two-case analysis for instantaneous regret:

Case decomposition: Split based on whether $f_{t-1}^+ - f(x_t) \leq 0$ or $> 0$
Key confidence bound: Apply standard RKHS result:
\[|f(x) - \mu_{t-1}(x)| \leq B\sigma_{t-1}(x)\]
Critical nugget-dependent lower bound: Show that nugget provides:
\[\sigma_{t-1}(x) \geq \sqrt{\frac{\epsilon}{t+\epsilon}}\]
leading to:
\[EI_{t-1}(x_t) \geq \tau(-B)\sqrt{\frac{\epsilon}{t+\epsilon}}\]
Exploitation term telescoping: For subsequence ${x_{t_i}}$ where $f_{t_i-1}^+ > f(x_{t_i})$:
\[\sum_{i} (f_{t_i-1}^+ - f(x_{t_i})) \leq \sum_i (f(x_{t_{i-1}}) - f(x_{t_{i+1}})) \leq 2B\]
Maximum information gain bound: Apply standard result:
\[\sum_{t=1}^T \sigma_{t-1}(x_t) \leq \sqrt{C_\gamma(\epsilon)T\gamma_T(\epsilon)}\]
where $C_\gamma(\epsilon) = \frac{2}{\log(1+1/\epsilon)}$.
Function $\tau$ properties: Use monotonicity and bounds:
\[\tau(z) = z\Phi(z) + \phi(z) > 0\] \[\tau'(z) = \Phi(z) > 0\]
The technical insight is that nugget $\epsilon > 0$ prevents posterior variance from vanishing at observed points, enabling the critical lower bound on $EI_{t-1}(x_t)$ needed for the analysis.

Experiments & Validation

Experiments on two groups:

GP-sampled functions: 2D and 4D problems with known SE and Matérn kernels (ν=2.5), 20-40 initial points, 200 EGO iterations, nuggets ε ∈ {10^-10, 10^-6, 10^-4}
Synthetic benchmarks: Rosenbrock, six-hump camel, Hartmann6, Branin, Michalewicz functions with nuggets ε ∈ {10^-2, 10^-4, 10^-6}

Key numbers: For GP-sampled functions, cumulative regret decreases as optimization progresses (e.g., SE 2D: from 0.596 to 0.040-0.097 depending on ε). Synthetic problems show consistent no-regret behavior with ε=10^-4 performing best in 3/5 cases.

Results support theory: all experiments demonstrate sublinear cumulative regret R_T/T → 0, confirming no-regret property. Choice of nugget affects performance as predicted by theory.

Limitations & Open Problems

Limitations:

Analysis requires nugget ε > 0 - TECHNICAL (needed for current proof technique but standard in practice)
RKHS assumption on objective function - NATURAL (standard in GP-BO theory)
Bounded kernel assumption k(x,x’) ≤ 1 - NATURAL (satisfied after normalization)
Cannot extend to pure EGO (ε = 0) - TECHNICAL (current proof framework breaks down)
Nugget choice guidance is complex, kernel-dependent - TECHNICAL (bounds involve multiple competing terms)
Constants in regret bound depend on problem parameters in complex ways - TECHNICAL (typical in GP-BO analysis)

Open problems:
Establish whether pure EGO (without nugget) is no-regret - fundamental theoretical gap
Derive optimal nugget choice that minimizes actual cumulative regret (not just upper bound)

Particle method for a nonlinear multimarginal optimal transport problem

Authors: Adrien Cances, Quentin Mérigot, Luca Nenna · Institution: École Polytechnique · Category: math.OC

Provides first quantitative convergence analysis for particle discretization of nonlinear multimarginal optimal transport problems with spectral risk objectives.

Tags: optimal transport spectral risk measures quantization theory penalty methods multimarginal transport numerical optimization risk management particle methods

arXiv · PDF

Problem Formulation

Motivation: This problem arises in risk management where we need to find the riskiest dependence structure among random variables $X_1, \ldots, X_D$ with known marginals. Unlike standard optimal transport, the objective involves spectral risk measures which are inherently nonlinear.
Mathematical setup: Let $X_j \subseteq \mathbb{R}^{d_j}$ be convex sets and $\rho_j \in \mathcal{P}_p(X_j)$ probability measures with finite $p$-moment. The product space is $X = X_1 \times \cdots \times X_D$. A spectral risk measure is defined as:
\[R_\alpha(\mu) = \int_0^1 F_\mu^{-1}(t)\alpha(t)dt\]
where $\alpha: (0,1) \to [0,+\infty)$ is nondecreasing with $\int_0^1 \alpha(t)dt = 1$ and $F_\mu^{-1}$ is the quantile function.

Assumptions:
1. $\alpha \in L^q(0,1)$ for some $q \in (1,\infty)$
2. Cost function $c: X \to \mathbb{R}$ is $\beta$-Hölder continuous
3. $\frac{\beta}{p} + \frac{1}{q} \leq 1$
Toy example: When $D=2$, $X_1 = X_2 = [0,1]$, $\rho_1 = \rho_2 = \mathcal{U}[0,1]$, and $c(x_1,x_2) = x_1 + x_2$ with $\alpha = 2\mathbf{1}_{(1/2,1)}$ (CVaR at level 1/2), this reduces to finding the coupling that maximizes the expected value of the top 50% of sums.
Formal objective:
\[\max_{\gamma \in \Gamma(\rho_1,\ldots,\rho_D)} R_\alpha(c_\#\gamma)\]

Method

The method discretizes admissible couplings using uniformly weighted point clouds and enforces marginal constraints through Wasserstein penalization.

The discretized problem is:

\[\min_{Y \in X^N} -R_\alpha(c_\#\delta_Y) + \lambda_N \sum_{j=1}^D W_p(\rho_j, \pi_j^\#\delta_Y)^p\]

where $\delta_Y = \frac{1}{N}\sum_{i=1}^N \delta_{y_i}$ is the uniform point cloud and $\lambda_N > 0$ is the penalty parameter.

Key steps:

Approximate the nonlinear objective using its equivalent linear formulation with additional marginal $\rho_0 = \alpha_#\mathcal{L}_{(0,1)}$
Use penalty method to enforce marginal constraints via Wasserstein distances
Optimize penalty parameter $\lambda_N$ based on quantization error bounds

Applied to toy example: For the 2D uniform case, the algorithm would place $N$ points $(y_1,y_2)$ to maximize $\sum_{i=1}^N p_{\sigma(i)} c(y_i)$ where $p_i$ are CVaR weights and $\sigma$ orders the $c(y_i)$ values, subject to Wasserstein penalties ensuring projections approximate uniform marginals.

Novelty & Lineage

Prior work:

“Optimal transport approach to Michael’s selection theorem” (Ennaji et al., 2021) - established equivalence between nonlinear spectral risk problem and linear multimarginal transport
“Uniform quantization of absolutely continuous measures” (Mérigot-Mirebeau, 2016) - provided quantization error bounds in terms of box dimension
“Multimarginal optimal transport with repulsive costs” (Carlier, 2003; Pass, 2015) - analyzed supermodular costs and comonotone solutions

Delta: This paper provides the first quantitative convergence analysis for Lagrangian particle methods applied to nonlinear multimarginal transport. It establishes convergence rates in terms of uniform quantization errors and box dimension of optimal solutions.

Theory-specific assessment:
- Main convergence theorems are technically solid but follow predictable quantization theory
- Proof techniques combine standard penalty method analysis with existing quantization bounds
- Convergence rates match those expected from uniform quantization theory - no surprises
- The supermodular case leverages known structural results (comonotone optimizers) in straightforward way
- No comparison to known lower bounds provided
The paper makes useful but incremental contributions - it extends known discretization techniques to a new problem class without fundamental methodological innovations.

Verdict: INCREMENTAL — solid theoretical analysis of expected difficulty extending standard quantization results to spectral risk transport problems.

Proof Techniques

The proofs use penalty method analysis combined with uniform quantization theory.

Main ingredients:

Hölder continuity of the spectral risk functional:
\[|R_\alpha(c_\#\gamma) - R_\alpha(c_\#\tilde{\gamma})| \leq \|\alpha\|_{L^q} C_H W_p(\gamma,\tilde{\gamma})^\beta\]
Block approximation lemma: For point cloud $\delta_Y$, construct joint measure $\gamma_N \in \Gamma(\rho_1,\ldots,\rho_D)$ satisfying:
\[W_p(\gamma_N, \delta_Y) \lesssim \left(\sum_{j=1}^D W_p(\rho_j, \delta_{Y_j})^p\right)^{1/p}\]
Upper/lower bound strategy: - Upper bound: approximate optimal $\gamma^*$ by $N$-point cloud giving $\min F_N - \min F \lesssim u_N^\beta + \lambda_N u_N^p$ - Lower bound: approximate discrete minimizer by constructed $\gamma_N$ giving $\min F_N - \min F \gtrsim \lambda_N v_N^p - C v_N^\beta$
Optimization of penalty parameter using Young’s inequality:
\[\lambda_N = u_N^{-(p-\beta)}\]

yields matching bounds $ \min F_N - \min F = O(u_N^\beta)$.
For supermodular costs, comonotone structure allows:
\[e_{p,N}(\gamma_{mon}) \lesssim \left(\sum_{j=1}^D e_{p,N}(\rho_j)^p\right)^{1/p}\]

Experiments & Validation

Numerical experiments on several test cases:

Surplus cost: $c(x_1,x_2,x_3) = -

x_1+x_2+x_3

^2$ with uniform, triangular, and semicircle marginals. Shows solution concentrates on hyperplane as expected.

Partial barycenters: Reproduces Kitagawa-Pass non-monotonicity example for partial Wasserstein barycenters. Notes discretization falls into local minima.
Risk management examples: Various spectral risk measures applied to financial risk factors.

Implementation uses L-BFGS optimization with semi-discrete optimal transport for penalty terms. Uses $p=2$ and unidimensional marginals for computational tractability.

Key limitation: Method tends to get trapped in local minima for partial barycenter problems, requiring careful initialization and penalty parameter tuning.

Limitations & Open Problems

Penalty parameter tuning requires problem-specific knowledge - TECHNICAL (theoretical rates may not reflect practical performance)
Assumption that cost function is Hölder continuous - NATURAL (standard regularity assumption)
Restriction to finite $p$-moments for marginals - NATURAL (needed for Wasserstein distances)
Box dimension bounds may be loose in practice - TECHNICAL (dimension-dependent rates can be pessimistic)
Supermodular case requires monotonicity assumptions - RESTRICTIVE (excludes many practical cost functions)
Numerical method gets trapped in local minima - RESTRICTIVE (limits practical applicability)

Open problems:
Establish lower bounds for the discretization error to determine optimality of current rates
Develop initialization strategies to avoid local minima in partial transport settings