May 2, 2026 Theory 3 papers

Theory Digest — May 2, 2026

Today’s Digest at a Glance

Today’s theory digest explores minimax optimal generative modeling, efficient Bayesian sensitivity analysis, and privacy mechanisms for geometric data through three complementary statistical perspectives.

Denoising Score Matching

Denoising score matching addresses the fundamental challenge that directly learning score functions $\nabla_x \log p(x)$ requires computing intractable normalizing constants and estimating derivatives of density functions. The naive approach of finite-difference approximation to gradients suffers from both computational expense and statistical instability in high dimensions.

The core insight is to learn the score function through a denoising regression problem. Given noisy observations $X_t = X + \sigma_t \xi$ where $\xi \sim \mathcal{N}(0,I)$, the score function can be recovered by solving:

\[\hat{s}_t = \arg\min_s E\left[\left\|\frac{\xi}{\sigma_t} + s(X_t, t)\right\|^2\right]\]

This works because the optimal solution satisfies $s^*(x_t, t) = \nabla_{x_t} \log p(x_t)$, where $p(x_t)$ is the density of the noisy data.

Intuitively, instead of trying to estimate the score directly from clean data, we learn to predict the noise that was added to create noisy observations—a much more tractable regression target that naturally encodes the score information.

Heat Diffusion Mechanisms for Privacy

Heat diffusion mechanisms provide privacy for manifold-valued data by exploiting the geometric structure of Riemannian manifolds rather than treating them as embedded in Euclidean space. Standard differential privacy mechanisms like Gaussian noise addition fail on manifolds because they don’t respect the intrinsic geometry and can produce invalid outputs.

The approach uses the heat semigroup $P_t$ generated by Brownian motion on the manifold. For the BM mechanism, we release $Y \sim P_t(f(D), \cdot)$ where $f(D)$ is the query result and $P_t(x, \cdot)$ represents the distribution of Brownian motion started at $x$ after time $t$. The privacy budget is characterized through dimension-free Harnack inequalities:

\[\frac{P_t(x, A)}{P_t(y, A)} \leq \exp\left(\frac{d(x,y)^2}{4t} + \text{Ric}^- \cdot t\right)\]

where $\text{Ric}^-$ bounds the Ricci curvature from below.

This naturally adds noise that respects the manifold’s geometry while providing provable privacy guarantees that adapt to the manifold’s curvature—positive curvature (like spheres) requires less noise for the same privacy level than negative curvature (like hyperbolic spaces).

Reading Guide

The first paper establishes fundamental statistical limits for score-based generative models, providing the theoretical foundation that could guide practical implementations of denoising score matching. The second paper tackles computational efficiency in Bayesian model comparison, offering methods to explore sensitivity without repeated expensive model fits. The third paper extends privacy guarantees beyond Euclidean settings, enabling private analysis of geometric data like directional statistics or manifold-valued observations.

Statistical Analysis of Markovian Generative Modeling

Authors: Eddie Aamari, Arthur Stéphanovitch · Institution: CNRS, École Normale Supérieure, PSL · Category: math.ST

First finite-sample analysis of score-based generative models achieving minimax optimal Wasserstein rates via time-adaptive neural network architectures.

Tags: diffusion models score matching generative models statistical learning theory minimax rates Wasserstein distance stochastic differential equations

arXiv · PDF

Problem Formulation

Motivation: Score-based generative models have achieved remarkable empirical success in generating high-quality samples. However, the theoretical understanding of their statistical properties, especially finite-sample guarantees and optimality, remains incomplete. Understanding worst-case performance is crucial for rigorous deployment.

Mathematical setup: Let $p^*$ be the unknown target distribution on $\mathbb{R}^d$. We consider Gaussian interpolating paths:

\[X_t^{\circ} = m_t(Z^{\circ}) + \sigma_t \xi^{\circ}\]

where $Z^{\circ} \sim p^*$, $\xi^{\circ} \sim N(0, I_d)$ independent, with boundary conditions $m_0(z) = 0$, $\sigma_0 = 1$ (noise) and $m_1(z) = z$, $\sigma_1 = 0$ (data). The marginal densities $p_t$ satisfy the Fokker-Planck equation:

\[\partial_t p_t = -\nabla \cdot (f_t p_t) + \Delta(\sigma_t^2 p_t)\]

Assumptions:

Target distribution $p^*$ has smooth density with controlled regularity
Score functions $\nabla \log p_t$ exist and have finite Fisher information
Neural network approximators satisfy universal approximation properties

Toy example: When $d=2$ and $p^* = N(0, I_2)$, the interpolating path becomes $X_t^{\circ} = (1-t)\xi^{\circ} + t Z^{\circ}$ with $\xi^{\circ}, Z^{\circ}$ independent standard Gaussians. The score function $\nabla \log p_t(x) = -x/((1-t)^2 + t^2)$ is explicitly computable.

Formal objective: Minimize the Wasserstein distance between generated and target distributions:
\[\inf_{\hat{s}} W_2(p^*, \hat{p}_1^{\hat{s}})\]
where $\hat{p}_1^{\hat{s}}$ is the distribution after running the generative SDE with learned score $\hat{s}$.

Method

Method: The approach combines generator matching with denoising score matching and time-adaptive neural networks.

Steps:

Denoising Score Matching: Learn score function via regression:
\[\hat{s}_t = \arg\min_s E\left[\left\|\frac{\xi^{\circ}}{\sigma_t} + s(X_t^{\circ}, t)\right\|^2\right]\]
Generator Matching: Construct the generative SDE with learned drift:
\[dX_t = a_t(X_t) dt + \sqrt{2b_t} dB_t\]
where the drift incorporates the learned score:
\[a_t(x) = \frac{\dot{\alpha}_t}{\alpha_t} x + (\sigma_{fwd,t}^2 + b_t^2) \hat{s}_t(x)\]
Time-Adaptive Architecture: Partition time interval $[0,1]$ into blocks and use different network architectures per block to handle varying regularity properties.

Application to toy example: For the 2D Gaussian case, the method learns:
- Score network $\hat{s}_t(x) \approx -x/((1-t)^2 + t^2)$
- Drift becomes $a_t(x) = -x/((1-t)^2 + t^2)$
- Generates samples by solving the SDE from $t=0$ to $t=1$

Novelty & Lineage

Prior work:

“Score-based generative modeling through stochastic differential equations” (Song et al. 2020) - established the continuous-time SDE framework for diffusion models
“Denoising diffusion probabilistic models” (Ho et al. 2020) - introduced discrete-time denoising approach
Various works on optimal transport and Wasserstein distances in generative modeling

Delta: This paper provides the first finite-sample statistical analysis with minimax optimal rates for score-based models. Key additions:
- Rigorous error propagation analysis from learned scores to final distributions
- Construction of time-adaptive neural network classes achieving optimal rates
- Unified generator matching framework beyond diffusions
- Sharp Wasserstein convergence rates under smoothness assumptions
Theory-specific assessment:
- Main theorem: Achieving minimax rates $n^{-s/(2s+d)}$ for $s$-smooth targets is significant but not entirely surprising given similar rates in density estimation
- Proof technique: Combines standard approximation theory with novel stability analysis via backward Kolmogorov equations - the stability analysis appears genuinely new
- Bound tightness: Claims optimality but no explicit lower bound construction provided in this excerpt
Verdict: INCREMENTAL - While the finite-sample analysis is valuable and technically solid, the results follow predictable patterns from nonparametric statistics. The time-adaptive architecture is a reasonable engineering contribution but not a fundamental breakthrough.

Proof Techniques

Main proof strategy:

Stability Analysis via Backward Kolmogorov: Key insight using the dual formulation:
\[\frac{d}{dt} KL(p_t | q_t) = \int_{\mathbb{R}^d} \langle u_t(x) - \hat{u}_t(x), \nabla \log \frac{p_t(x)}{q_t(x)} \rangle p_t(x) dx\]
Error Propagation Bound: Central inequality controlling how score errors propagate:
\[KL(p^* | \hat{p}_1) \leq KL(p_0 | \tilde{p}_0) + \int_0^1 \int_{\mathbb{R}^d} \frac{(\sigma_t^2 + b_t^2)^2}{4b_t^2} \|\nabla \log p_t - \hat{s}_t\|^2 p_t dx dt\]
Score Regularity Analysis: Derives bounds on score smoothness:
\[\|\nabla \log p_t\|_{C^k} \leq C(t) \|p^*\|_{C^{k+2}}\]
for appropriate time-dependent constants $C(t)$.
Approximation-Generalization Decomposition: - Time partitioning with adaptive network complexity per block - Blockwise approximation error: $O(m^{-s/d})$ where $m$ is network size - Generalization bound via Rademacher complexity: $O(\sqrt{\log m/n})$
Final Rate Assembly: Balancing approximation and generalization yields the minimax rate $n^{-s/(2s+d)}$ after optimizing over network architecture parameters.

Experiments & Validation

Purely theoretical. The paper focuses entirely on statistical analysis without empirical validation.

Natural empirical validation would include:

Comparing predicted rates with observed convergence on synthetic targets with known smoothness
Testing time-adaptive vs. fixed architectures on benchmark datasets
Verifying stability predictions under various noise schedules

Limitations & Open Problems

Limitations:

RESTRICTIVE: Assumes target distribution has smooth density with finite moments - excludes many practical cases with heavy tails or discontinuities
TECHNICAL: Time-adaptive architecture requires knowing target smoothness a priori - not realistic in practice
RESTRICTIVE: Analysis limited to Gaussian interpolating paths - doesn’t cover more general coupling strategies
TECHNICAL: Requires exact knowledge of forward process parameters - approximation errors in practice not analyzed
NATURAL: Focuses on Wasserstein distance - other metrics might be more relevant for some applications

Open problems:
Extension to unknown smoothness (adaptive rates) and non-smooth targets
Analysis of discretization errors in numerical SDE solvers and their impact on final rates

Efficient Bayes Factor Sensitivity Analysis via Posterior Density Ratios

Authors: František Bartoš, Eric-Jan Wagenmakers, Maarten Marsman, Don van den Bergh · Institution: University of Amsterdam · Category: stat.ME

Proposes method to recover entire Bayes factor sensitivity curve from single additional model fit using posterior density ratios in extended model with hyperprior on sensitivity parameter

Tags: bayesian-inference sensitivity-analysis bayes-factors density-estimation mcmc computational-statistics meta-analysis hypothesis-testing

arXiv · PDF

Problem Formulation

Motivation: Bayes factor sensitivity analysis examines how evidence for competing hypotheses depends on prior specifications. Standard approaches require refitting models at each hyperparameter value, making computational costs scale linearly with grid size. In complex models (meta-analyses, structural equation models), this becomes prohibitively expensive.

Mathematical setup: Consider data $y$ and competing hypotheses $H_0$ and $H_1(\gamma)$, where $\gamma$ is a hyperparameter governing prior specification under $H_1$. The Bayes factor is:

\[BF_{10}(\gamma) = \frac{p(y|H_1(\gamma))}{p(y|H_0)} = \frac{Z(\gamma)}{Z_0}\]

where $Z(\gamma) = \int p(y

\xi,H_1(\gamma))p(\xi

H_1(\gamma))d\xi$ is the marginal likelihood. Define an extended model $H_\gamma$ placing hyperprior $\pi(\gamma)$ on $\gamma$:

\[p(y|H_\gamma) = \int Z(\gamma)\pi(\gamma)d\gamma\]

The posterior of $\gamma$ under $H_\gamma$ is:

\[p(\gamma|y,H_\gamma) = \frac{Z(\gamma)\pi(\gamma)}{p(y|H_\gamma)}\]

Assumptions:

Bayes factor $BF_{10}(\gamma)$ is well-defined for all $\gamma$ in sensitivity range
Hyperparameter $\gamma$ enters only through prior specification on model parameters
Extended model $H_\gamma$ reduces to $H_1(\gamma^*)$ when $\gamma$ fixed at $\gamma^*$
Anchor point $\gamma_0$ lies in interior of posterior support

Toy example: In Bayesian t-test with $H_1: \delta \sim \text{Cauchy}(0,r)$, sensitivity parameter is $\gamma = r \in [0.01, 2]$. When $r = \sqrt{2}/2$ (default), extended model assigns $r \sim \text{Uniform}(0.01, 2)$.

Formal objective: Recover entire sensitivity curve $BF_{10}(\gamma)$ for $\gamma \in [\gamma_L, \gamma_U]$ from single additional model fit, avoiding $K$ separate model fits for $K$-point grid.

Method

The method decomposes target Bayes factor using transitivity:

\[BF_{10}(\gamma_x) = BF_{10}(\gamma_0) \times \text{MLR}(\gamma_x, \gamma_0)\]

where $\text{MLR}(\gamma_x, \gamma_0) = Z(\gamma_x)/Z(\gamma_0)$ is marginal likelihood ratio.

Key identity exploiting posterior density ratios:

\[\text{MLR}(\gamma_x, \gamma_0) = \frac{p(\gamma_x|y,H_\gamma)}{p(\gamma_0|y,H_\gamma)} \times \frac{\pi(\gamma_0)}{\pi(\gamma_x)}\]

Main result:

\[BF_{10}(\gamma_x) = BF_{10}(\gamma_0) \times \frac{p(\gamma_x|y,H_\gamma)}{p(\gamma_0|y,H_\gamma)} \times \frac{\pi(\gamma_0)}{\pi(\gamma_x)}\]

Algorithm:

Fit $H_0$ and $H_1(\gamma_0)$ to obtain anchor $BF_{10}(\gamma_0)$
Fit extended model $H_\gamma$ with hyperprior $\pi(\gamma)$
Estimate posterior density $\hat{p}(\gamma y,H_\gamma)$ using IWMDE
Compute $BF_{10}(\gamma_x)$ via main formula for all desired $\gamma_x$

IWMDE estimator:
\[\hat{p}(\gamma^*|y,H_\gamma) = \frac{1}{n}\sum_{i=1}^n w(\gamma^*|\theta_i) \frac{p(\gamma^*,\theta_i|y,H_\gamma)}{p(\gamma_i,\theta_i|y,H_\gamma)}\]
Because $\gamma$ enters only through priors, likelihood cancels:
\[\frac{p(\gamma^*,\theta_i|y,H_\gamma)}{p(\gamma_i,\theta_i|y,H_\gamma)} = \frac{p(\theta_i|\gamma^*)\pi(\gamma^*)}{p(\theta_i|\gamma_i)\pi(\gamma_i)}\]
Toy example application: For Cauchy prior $\delta \sim \text{Cauchy}(0,\gamma_i)$, each IWMDE term becomes ratio of Cauchy densities evaluated at sampled $\delta_i$, requiring no likelihood evaluations.

Novelty & Lineage

Prior work:

Franck & Gramacy (2020) - “Bayes factor surface” visualization using Gaussian process surrogates for expensive computations
Fowlie (2024) - Bayes factor contours via Savage-Dickey density ratio in particle physics, varying likelihood parameters
Standard approaches - Brute-force grid evaluation requiring $K$ separate model fits for $K$-point sensitivity analysis

Delta: This paper combines two ideas:
Bayes factor transitivity decomposition through extended model with hyperprior on sensitivity parameter, and
IWMDE for density ratio estimation exploiting likelihood-free property when sensitivity parameter enters only through priors.

Theory-specific assessment:
- Main theorem: The identity in equation (7) is mathematically straightforward, following from Bayes factor transitivity and posterior density definitions. Not surprising given the setup.
- Proof technique: Routine algebraic manipulation of probability ratios. The IWMDE application exploits known structure but requires no new mathematical insights.
- Bounds: No theoretical bounds provided. Empirical accuracy depends on density estimation quality, but no formal analysis of approximation error rates.
The computational insight is valuable: reducing $K$ model fits to single extended fit plus density estimation. However, the mathematical content is standard probability theory applications.

Verdict: INCREMENTAL — Solid computational contribution combining known techniques (Bayes factor transitivity, IWMDE) in a useful way, but mathematically routine.

Proof Techniques

The main result follows from routine probability manipulations:

Step 1 - Bayes factor transitivity:

\[BF_{10}(\gamma_x) = \frac{Z(\gamma_x)}{Z_0} = \frac{Z(\gamma_x)}{Z(\gamma_0)} \times \frac{Z(\gamma_0)}{Z_0}\]

Step 2 - Posterior density relationship: From definition of posterior under extended model $H_\gamma$:

\[p(\gamma|y,H_\gamma) = \frac{Z(\gamma)\pi(\gamma)}{p(y|H_\gamma)}\]

Taking ratio at two points:

\[\frac{p(\gamma_x|y,H_\gamma)}{p(\gamma_0|y,H_\gamma)} = \frac{Z(\gamma_x)\pi(\gamma_x)}{Z(\gamma_0)\pi(\gamma_0)}\]

Rearranging:

\[\frac{Z(\gamma_x)}{Z(\gamma_0)} = \frac{p(\gamma_x|y,H_\gamma)}{p(\gamma_0|y,H_\gamma)} \times \frac{\pi(\gamma_0)}{\pi(\gamma_x)}\]

Step 3 - IWMDE likelihood cancellation: The key technical insight is that joint posterior ratio simplifies when $\gamma$ enters only through priors:

\[\frac{p(\gamma^*,\theta_i|y,H_\gamma)}{p(\gamma_i,\theta_i|y,H_\gamma)} = \frac{p(y|\theta_i)p(\theta_i|\gamma^*)\pi(\gamma^*)}{p(y|\theta_i)p(\theta_i|\gamma_i)\pi(\gamma_i)} = \frac{p(\theta_i|\gamma^*)\pi(\gamma^*)}{p(\theta_i|\gamma_i)\pi(\gamma_i)}\]

The likelihood $p(y

\theta_i)$ cancels exactly, reducing IWMDE to ratio of prior density evaluations.

No sophisticated inequalities or concentration bounds are employed - the derivation is purely algebraic manipulation of probability densities.

Experiments & Validation

Datasets:

Oosterwijk facial-feedback data (53 vs 57 participants, funniness ratings)
Bem (2011) precognition studies (K=9 experimental studies)

Baselines: Exact Bayes factors (t-test), kernel density estimation (KDE), bridge sampling validation

Key numbers:
- IWMDE approximation ratio stays within 1-2% of exact solution vs 10-20% for KDE at boundaries
- IWMDE accurate with 3,000 MCMC samples vs 300,000+ needed for stable KDE
- Computational savings: 13 minutes for 10-point grid vs ~1 minute for extended model approach
- 36-model robust meta-analysis: same accuracy as 4-model case without deterioration
Validation: Systematic comparison against exact solutions (t-test) and independent bridge sampling refits across sensitivity ranges. IWMDE consistently outperforms KDE in accuracy and stability, particularly with moderate sample sizes and in distribution tails.

Limitations & Open Problems

Limitations:

TECHNICAL: Method limited to low-dimensional sensitivity spaces (few hyperparameters) due to curse of dimensionality in density estimation - could potentially be addressed with better density estimators
TECHNICAL: Requires anchor point to lie in interior of posterior support - easily satisfied in practice by choosing reasonable reference value
TECHNICAL: Density ratio estimates become unreliable in regions with sparse posterior mass - manageable by choosing appropriate sensitivity ranges
NATURAL: Only addresses prior sensitivity, not likelihood specification sensitivity - complementary concern outside scope
TECHNICAL: Extended model adds hyperprior structure that may not reflect analyst’s actual uncertainty about hyperparameter - standard limitation of hierarchical modeling
NATURAL: Accuracy depends on quality of posterior density estimation, particularly for extreme tail evaluations

Open problems:
Theoretical analysis - Formal characterization of approximation error rates for IWMDE in this context, particularly convergence properties as MCMC sample size increases
Higher-dimensional extensions - Better density estimation methods or alternative approaches for simultaneous sensitivity over many hyperparameters beyond current 2-3 parameter limitation

Geometric Renyi Differential Privacy: Ricci Curvature Characterized by Heat Diffusion Mechanisms

Authors: Xiaotian Chang, Yangdi Jiang, Cyrus Mostajeran, Qirui Hu · Institution: Nanyang Technological University, Shanghai University of Finance and Economics · Category: stat.ML

Establishes privacy mechanisms for manifold-valued data using heat diffusion, with privacy budgets characterized by Ricci curvature through dimension-free Harnack inequalities.

Tags: differential privacy Riemannian geometry heat kernels Ricci curvature stochastic processes geometric statistics Renyi divergence manifold learning

arXiv · PDF

Problem Formulation

Motivation: Differential privacy mechanisms for manifold-valued data are needed since non-Euclidean data (medical images, trajectories, shapes) contain sensitive information. Existing approaches embed data in Euclidean space, add noise, and project back, which distorts intrinsic geometry.
Mathematical setup: Let $(M, g)$ be a complete $m$-dimensional Riemannian manifold with Ricci curvature $\text{Ric} \geq -K$ for some $K$. Given a dataset $D$ and adjacent dataset $D’$ differing in one individual, let $f: \mathcal{D} \to M$ be a manifold-valued summary statistic with global sensitivity:
\[d(f(D), f(D')) \leq \Delta\]
Define the heat kernel $p(x,z,t)$ as the fundamental solution to:
\[\left(\frac{\partial}{\partial t} - \Delta_x\right) p(x,z,t) = 0\]
Let $(P_t)_{t \geq 0}$ be the heat semigroup:
\[P_t f(x) = \int_M p(x,z,t) f(z) \, d\text{vol}(z)\]
Assumptions:
1. $M$ is stochastically complete
2. $f$ has bounded global sensitivity $\Delta$
3. Ricci curvature satisfies $\text{Ric}(X) \geq -K|X|^2$
Toy example: When $M = \mathbb{R}^2$ with $g = I_2$, we have $K = 0$ and the heat kernel becomes $p(x,z,t) = (4\pi t)^{-1} \exp(-|x-z|^2/(4t))$, recovering standard Gaussian noise. The core difficulty is that on negatively curved manifolds, Brownian motion retains “memory at infinity.”
Formal objective: Establish $(α,ε)$-Rényi differential privacy where:
\[D_α(B_t(f(D)) \| B_t(f(D'))) \leq \varepsilon\]

Method

The method consists of two mechanisms:

BM Mechanism: Release $Y \sim P_t(f(D), \cdot)$ where $P_t$ is the heat semigroup (Brownian motion for time $t$).

Langevin Mechanism: For Hadamard manifolds, use drift $-\nabla V$ with $V(x) = \lambda d^2(o,x)/2$ to obtain confining diffusion:

\[dX_t = \sqrt{2}dB_t - \nabla V(X_t)dt\]

Key equations: Privacy budget for BM mechanism:

\[\varepsilon = \frac{K\alpha\Delta^2}{2(1-e^{-2Kt})}\]

Privacy budget for Langevin mechanism:

\[\varepsilon = \frac{(\lambda-K)\alpha\Delta^2}{2(1-e^{-2(\lambda-K)t})}\]

Application to toy example: For $\mathbb{R}^2$ with $K=0$, the BM mechanism gives $\varepsilon = \alpha\Delta^2/(4t)$, recovering the standard Gaussian mechanism. The heat kernel is exactly $p(x,z,t) = (4\pi t)^{-1}\exp(-|x-z|^2/(4t))$.

Novelty & Lineage

Step 1 — Prior work:

Reimherr et al. (2021): “Riemannian Laplace mechanisms” - first intrinsic DP on manifolds via exponential families
Jiang et al. (2023): “Gaussian differential privacy on Riemannian manifolds” - extended GDP to manifolds
Soto et al. (2022): “Shape-preserving differential privacy” - structure-aware mechanisms

Step 2 — Delta: This paper connects Ricci curvature directly to privacy budgets via dimension-free Harnack inequalities. The key insight is using heat diffusion as a privacy mechanism rather than closed-form densities.

Step 3 — Theory-specific assessment:

Main theorem is somewhat predictable given known connections between curvature and diffusion
Proof technique combines standard semigroup theory with Harnack inequalities - not genuinely novel
The curvature-privacy connection is interesting but follows from established geometric analysis
No lower bounds established to assess tightness

Verdict: INCREMENTAL — Solid theoretical contribution connecting known tools (Harnack inequalities, heat kernels) to differential privacy, but the connection is natural given existing geometric analysis literature.

Proof Techniques

The main proof strategy uses dimension-free Harnack inequalities to control Rényi divergences.

Key inequality: For $\text{Ric}^V \geq -K$, the Harnack inequality states:

\[(P_t|f|)^\alpha (x) \leq P_t|f|^\alpha(y) \exp\left[\frac{K\alpha d^2(x,y)}{2(\alpha-1)(1-e^{-2Kt})}\right]\]

Proof stages:

Apply Harnack inequality with $f$ being the density ratio between $P_t(f(D), \cdot)$ and $P_t(f(D’), \cdot)$
Use sensitivity bound $d(f(D), f(D’)) \leq \Delta$ to get:
\[\frac{(P_t p_1)^\alpha(z)}{P_t p_1^\alpha(z)} \leq \exp\left[\frac{K\alpha\Delta^2}{2(\alpha-1)(1-e^{-2Kt})}\right]\]
Convert to Rényi divergence using definition:
\[D_\alpha(P \| Q) = \frac{1}{\alpha-1} \log E_{z \sim Q}\left[\left(\frac{p(z)}{q(z)}\right)^\alpha\right]\]
The key insight is that the Harnack inequality provides exactly the pointwise control needed for Rényi divergence bounds.

Experiments & Validation

Purely theoretical with some numerical illustrations. The paper includes:

Synthetic experiments on hyperbolic space showing privacy-utility tradeoffs
Comparisons with embedding-based approaches showing geometric distortion
Fréchet mean estimation experiments

Empirical validation would require: large-scale experiments on real manifold-valued datasets (medical images, shape data), comparison with other intrinsic DP mechanisms, and validation of theoretical utility bounds.

Limitations & Open Problems

Limitations:

Stochastic completeness assumption - TECHNICAL (needed for heat kernel uniqueness but often satisfied)
Global sensitivity requirement - NATURAL (standard in DP literature)
Restriction to Hadamard manifolds for Langevin mechanism - RESTRICTIVE (excludes many important manifolds like spheres)
No finite-sample convergence rates - TECHNICAL (asymptotic analysis only)

Open problems:
Extend Langevin mechanisms to compact manifolds with positive curvature
Develop adaptive sensitivity analysis that doesn’t require global bounds