Theory Digest — May 2, 2026
Today’s Digest at a Glance
Today’s theory digest explores minimax optimal generative modeling, efficient Bayesian sensitivity analysis, and privacy mechanisms for geometric data through three complementary statistical perspectives.
Denoising Score Matching
Denoising score matching addresses the fundamental challenge that directly learning score functions $\nabla_x \log p(x)$ requires computing intractable normalizing constants and estimating derivatives of density functions. The naive approach of finite-difference approximation to gradients suffers from both computational expense and statistical instability in high dimensions.
The core insight is to learn the score function through a denoising regression problem. Given noisy observations $X_t = X + \sigma_t \xi$ where $\xi \sim \mathcal{N}(0,I)$, the score function can be recovered by solving:
\[\hat{s}_t = \arg\min_s E\left[\left\|\frac{\xi}{\sigma_t} + s(X_t, t)\right\|^2\right]\]This works because the optimal solution satisfies $s^*(x_t, t) = \nabla_{x_t} \log p(x_t)$, where $p(x_t)$ is the density of the noisy data.
Intuitively, instead of trying to estimate the score directly from clean data, we learn to predict the noise that was added to create noisy observations—a much more tractable regression target that naturally encodes the score information.
Heat Diffusion Mechanisms for Privacy
Heat diffusion mechanisms provide privacy for manifold-valued data by exploiting the geometric structure of Riemannian manifolds rather than treating them as embedded in Euclidean space. Standard differential privacy mechanisms like Gaussian noise addition fail on manifolds because they don’t respect the intrinsic geometry and can produce invalid outputs.
The approach uses the heat semigroup $P_t$ generated by Brownian motion on the manifold. For the BM mechanism, we release $Y \sim P_t(f(D), \cdot)$ where $f(D)$ is the query result and $P_t(x, \cdot)$ represents the distribution of Brownian motion started at $x$ after time $t$. The privacy budget is characterized through dimension-free Harnack inequalities:
\[\frac{P_t(x, A)}{P_t(y, A)} \leq \exp\left(\frac{d(x,y)^2}{4t} + \text{Ric}^- \cdot t\right)\]where $\text{Ric}^-$ bounds the Ricci curvature from below.
This naturally adds noise that respects the manifold’s geometry while providing provable privacy guarantees that adapt to the manifold’s curvature—positive curvature (like spheres) requires less noise for the same privacy level than negative curvature (like hyperbolic spaces).
Reading Guide
The first paper establishes fundamental statistical limits for score-based generative models, providing the theoretical foundation that could guide practical implementations of denoising score matching. The second paper tackles computational efficiency in Bayesian model comparison, offering methods to explore sensitivity without repeated expensive model fits. The third paper extends privacy guarantees beyond Euclidean settings, enabling private analysis of geometric data like directional statistics or manifold-valued observations.
Statistical Analysis of Markovian Generative Modeling
Authors: Eddie Aamari, Arthur Stéphanovitch · Institution: CNRS, École Normale Supérieure, PSL · Category: math.ST
First finite-sample analysis of score-based generative models achieving minimax optimal Wasserstein rates via time-adaptive neural network architectures.
Tags: diffusion models score matching generative models statistical learning theory minimax rates Wasserstein distance stochastic differential equations
Problem Formulation
Motivation: Score-based generative models have achieved remarkable empirical success in generating high-quality samples. However, the theoretical understanding of their statistical properties, especially finite-sample guarantees and optimality, remains incomplete. Understanding worst-case performance is crucial for rigorous deployment.
Mathematical setup: Let $p^*$ be the unknown target distribution on $\mathbb{R}^d$. We consider Gaussian interpolating paths:
\[X_t^{\circ} = m_t(Z^{\circ}) + \sigma_t \xi^{\circ}\]where $Z^{\circ} \sim p^*$, $\xi^{\circ} \sim N(0, I_d)$ independent, with boundary conditions $m_0(z) = 0$, $\sigma_0 = 1$ (noise) and $m_1(z) = z$, $\sigma_1 = 0$ (data). The marginal densities $p_t$ satisfy the Fokker-Planck equation:
\[\partial_t p_t = -\nabla \cdot (f_t p_t) + \Delta(\sigma_t^2 p_t)\]Assumptions:
- Target distribution $p^*$ has smooth density with controlled regularity
- Score functions $\nabla \log p_t$ exist and have finite Fisher information
-
Neural network approximators satisfy universal approximation properties
Toy example: When $d=2$ and $p^* = N(0, I_2)$, the interpolating path becomes $X_t^{\circ} = (1-t)\xi^{\circ} + t Z^{\circ}$ with $\xi^{\circ}, Z^{\circ}$ independent standard Gaussians. The score function $\nabla \log p_t(x) = -x/((1-t)^2 + t^2)$ is explicitly computable.
Formal objective: Minimize the Wasserstein distance between generated and target distributions:
\[\inf_{\hat{s}} W_2(p^*, \hat{p}_1^{\hat{s}})\]where $\hat{p}_1^{\hat{s}}$ is the distribution after running the generative SDE with learned score $\hat{s}$.
Method
Method: The approach combines generator matching with denoising score matching and time-adaptive neural networks.
Steps:
-
Denoising Score Matching: Learn score function via regression:
\[\hat{s}_t = \arg\min_s E\left[\left\|\frac{\xi^{\circ}}{\sigma_t} + s(X_t^{\circ}, t)\right\|^2\right]\] -
Generator Matching: Construct the generative SDE with learned drift:
\[dX_t = a_t(X_t) dt + \sqrt{2b_t} dB_t\]where the drift incorporates the learned score:
\[a_t(x) = \frac{\dot{\alpha}_t}{\alpha_t} x + (\sigma_{fwd,t}^2 + b_t^2) \hat{s}_t(x)\] -
Time-Adaptive Architecture: Partition time interval $[0,1]$ into blocks and use different network architectures per block to handle varying regularity properties.
Application to toy example: For the 2D Gaussian case, the method learns:
- Score network $\hat{s}_t(x) \approx -x/((1-t)^2 + t^2)$
- Drift becomes $a_t(x) = -x/((1-t)^2 + t^2)$
- Generates samples by solving the SDE from $t=0$ to $t=1$
Novelty & Lineage
Prior work:
- “Score-based generative modeling through stochastic differential equations” (Song et al. 2020) - established the continuous-time SDE framework for diffusion models
- “Denoising diffusion probabilistic models” (Ho et al. 2020) - introduced discrete-time denoising approach
-
Various works on optimal transport and Wasserstein distances in generative modeling
Delta: This paper provides the first finite-sample statistical analysis with minimax optimal rates for score-based models. Key additions:
- Rigorous error propagation analysis from learned scores to final distributions
- Construction of time-adaptive neural network classes achieving optimal rates
- Unified generator matching framework beyond diffusions
- Sharp Wasserstein convergence rates under smoothness assumptions
Theory-specific assessment:
- Main theorem: Achieving minimax rates $n^{-s/(2s+d)}$ for $s$-smooth targets is significant but not entirely surprising given similar rates in density estimation
- Proof technique: Combines standard approximation theory with novel stability analysis via backward Kolmogorov equations - the stability analysis appears genuinely new
- Bound tightness: Claims optimality but no explicit lower bound construction provided in this excerpt
Verdict: INCREMENTAL - While the finite-sample analysis is valuable and technically solid, the results follow predictable patterns from nonparametric statistics. The time-adaptive architecture is a reasonable engineering contribution but not a fundamental breakthrough.
Proof Techniques
Main proof strategy:
-
Stability Analysis via Backward Kolmogorov: Key insight using the dual formulation:
\[\frac{d}{dt} KL(p_t | q_t) = \int_{\mathbb{R}^d} \langle u_t(x) - \hat{u}_t(x), \nabla \log \frac{p_t(x)}{q_t(x)} \rangle p_t(x) dx\] -
Error Propagation Bound: Central inequality controlling how score errors propagate:
\[KL(p^* | \hat{p}_1) \leq KL(p_0 | \tilde{p}_0) + \int_0^1 \int_{\mathbb{R}^d} \frac{(\sigma_t^2 + b_t^2)^2}{4b_t^2} \|\nabla \log p_t - \hat{s}_t\|^2 p_t dx dt\] -
Score Regularity Analysis: Derives bounds on score smoothness:
\[\|\nabla \log p_t\|_{C^k} \leq C(t) \|p^*\|_{C^{k+2}}\]for appropriate time-dependent constants $C(t)$.
-
Approximation-Generalization Decomposition: - Time partitioning with adaptive network complexity per block - Blockwise approximation error: $O(m^{-s/d})$ where $m$ is network size - Generalization bound via Rademacher complexity: $O(\sqrt{\log m/n})$
-
Final Rate Assembly: Balancing approximation and generalization yields the minimax rate $n^{-s/(2s+d)}$ after optimizing over network architecture parameters.
Experiments & Validation
Purely theoretical. The paper focuses entirely on statistical analysis without empirical validation.
Natural empirical validation would include:
- Comparing predicted rates with observed convergence on synthetic targets with known smoothness
- Testing time-adaptive vs. fixed architectures on benchmark datasets
- Verifying stability predictions under various noise schedules
Limitations & Open Problems
Limitations:
-
RESTRICTIVE: Assumes target distribution has smooth density with finite moments - excludes many practical cases with heavy tails or discontinuities
-
TECHNICAL: Time-adaptive architecture requires knowing target smoothness a priori - not realistic in practice
-
RESTRICTIVE: Analysis limited to Gaussian interpolating paths - doesn’t cover more general coupling strategies
-
TECHNICAL: Requires exact knowledge of forward process parameters - approximation errors in practice not analyzed
-
NATURAL: Focuses on Wasserstein distance - other metrics might be more relevant for some applications
Open problems:
- Extension to unknown smoothness (adaptive rates) and non-smooth targets
- Analysis of discretization errors in numerical SDE solvers and their impact on final rates
Efficient Bayes Factor Sensitivity Analysis via Posterior Density Ratios
Authors: František Bartoš, Eric-Jan Wagenmakers, Maarten Marsman, Don van den Bergh · Institution: University of Amsterdam · Category: stat.ME
Proposes method to recover entire Bayes factor sensitivity curve from single additional model fit using posterior density ratios in extended model with hyperprior on sensitivity parameter
Tags: bayesian-inference sensitivity-analysis bayes-factors density-estimation mcmc computational-statistics meta-analysis hypothesis-testing
Problem Formulation
Motivation: Bayes factor sensitivity analysis examines how evidence for competing hypotheses depends on prior specifications. Standard approaches require refitting models at each hyperparameter value, making computational costs scale linearly with grid size. In complex models (meta-analyses, structural equation models), this becomes prohibitively expensive.
Mathematical setup: Consider data $y$ and competing hypotheses $H_0$ and $H_1(\gamma)$, where $\gamma$ is a hyperparameter governing prior specification under $H_1$. The Bayes factor is:
\[BF_{10}(\gamma) = \frac{p(y|H_1(\gamma))}{p(y|H_0)} = \frac{Z(\gamma)}{Z_0}\]| where $Z(\gamma) = \int p(y | \xi,H_1(\gamma))p(\xi | H_1(\gamma))d\xi$ is the marginal likelihood. Define an extended model $H_\gamma$ placing hyperprior $\pi(\gamma)$ on $\gamma$: |
The posterior of $\gamma$ under $H_\gamma$ is:
\[p(\gamma|y,H_\gamma) = \frac{Z(\gamma)\pi(\gamma)}{p(y|H_\gamma)}\]Assumptions:
- Bayes factor $BF_{10}(\gamma)$ is well-defined for all $\gamma$ in sensitivity range
- Hyperparameter $\gamma$ enters only through prior specification on model parameters
- Extended model $H_\gamma$ reduces to $H_1(\gamma^*)$ when $\gamma$ fixed at $\gamma^*$
-
Anchor point $\gamma_0$ lies in interior of posterior support
Toy example: In Bayesian t-test with $H_1: \delta \sim \text{Cauchy}(0,r)$, sensitivity parameter is $\gamma = r \in [0.01, 2]$. When $r = \sqrt{2}/2$ (default), extended model assigns $r \sim \text{Uniform}(0.01, 2)$.
Formal objective: Recover entire sensitivity curve $BF_{10}(\gamma)$ for $\gamma \in [\gamma_L, \gamma_U]$ from single additional model fit, avoiding $K$ separate model fits for $K$-point grid.
Method
The method decomposes target Bayes factor using transitivity:
\[BF_{10}(\gamma_x) = BF_{10}(\gamma_0) \times \text{MLR}(\gamma_x, \gamma_0)\]where $\text{MLR}(\gamma_x, \gamma_0) = Z(\gamma_x)/Z(\gamma_0)$ is marginal likelihood ratio.
Key identity exploiting posterior density ratios:
\[\text{MLR}(\gamma_x, \gamma_0) = \frac{p(\gamma_x|y,H_\gamma)}{p(\gamma_0|y,H_\gamma)} \times \frac{\pi(\gamma_0)}{\pi(\gamma_x)}\]Main result:
\[BF_{10}(\gamma_x) = BF_{10}(\gamma_0) \times \frac{p(\gamma_x|y,H_\gamma)}{p(\gamma_0|y,H_\gamma)} \times \frac{\pi(\gamma_0)}{\pi(\gamma_x)}\]Algorithm:
- Fit $H_0$ and $H_1(\gamma_0)$ to obtain anchor $BF_{10}(\gamma_0)$
- Fit extended model $H_\gamma$ with hyperprior $\pi(\gamma)$
-
Estimate posterior density $\hat{p}(\gamma y,H_\gamma)$ using IWMDE -
Compute $BF_{10}(\gamma_x)$ via main formula for all desired $\gamma_x$
IWMDE estimator:
\[\hat{p}(\gamma^*|y,H_\gamma) = \frac{1}{n}\sum_{i=1}^n w(\gamma^*|\theta_i) \frac{p(\gamma^*,\theta_i|y,H_\gamma)}{p(\gamma_i,\theta_i|y,H_\gamma)}\]Because $\gamma$ enters only through priors, likelihood cancels:
\[\frac{p(\gamma^*,\theta_i|y,H_\gamma)}{p(\gamma_i,\theta_i|y,H_\gamma)} = \frac{p(\theta_i|\gamma^*)\pi(\gamma^*)}{p(\theta_i|\gamma_i)\pi(\gamma_i)}\]Toy example application: For Cauchy prior $\delta \sim \text{Cauchy}(0,\gamma_i)$, each IWMDE term becomes ratio of Cauchy densities evaluated at sampled $\delta_i$, requiring no likelihood evaluations.
Novelty & Lineage
Prior work:
- Franck & Gramacy (2020) - “Bayes factor surface” visualization using Gaussian process surrogates for expensive computations
- Fowlie (2024) - Bayes factor contours via Savage-Dickey density ratio in particle physics, varying likelihood parameters
-
Standard approaches - Brute-force grid evaluation requiring $K$ separate model fits for $K$-point sensitivity analysis
Delta: This paper combines two ideas:
- Bayes factor transitivity decomposition through extended model with hyperprior on sensitivity parameter, and
-
IWMDE for density ratio estimation exploiting likelihood-free property when sensitivity parameter enters only through priors.
Theory-specific assessment:
- Main theorem: The identity in equation (7) is mathematically straightforward, following from Bayes factor transitivity and posterior density definitions. Not surprising given the setup.
- Proof technique: Routine algebraic manipulation of probability ratios. The IWMDE application exploits known structure but requires no new mathematical insights.
- Bounds: No theoretical bounds provided. Empirical accuracy depends on density estimation quality, but no formal analysis of approximation error rates.
The computational insight is valuable: reducing $K$ model fits to single extended fit plus density estimation. However, the mathematical content is standard probability theory applications.
Verdict: INCREMENTAL — Solid computational contribution combining known techniques (Bayes factor transitivity, IWMDE) in a useful way, but mathematically routine.
Proof Techniques
The main result follows from routine probability manipulations:
Step 1 - Bayes factor transitivity:
\[BF_{10}(\gamma_x) = \frac{Z(\gamma_x)}{Z_0} = \frac{Z(\gamma_x)}{Z(\gamma_0)} \times \frac{Z(\gamma_0)}{Z_0}\]Step 2 - Posterior density relationship: From definition of posterior under extended model $H_\gamma$:
\[p(\gamma|y,H_\gamma) = \frac{Z(\gamma)\pi(\gamma)}{p(y|H_\gamma)}\]Taking ratio at two points:
\[\frac{p(\gamma_x|y,H_\gamma)}{p(\gamma_0|y,H_\gamma)} = \frac{Z(\gamma_x)\pi(\gamma_x)}{Z(\gamma_0)\pi(\gamma_0)}\]Rearranging:
\[\frac{Z(\gamma_x)}{Z(\gamma_0)} = \frac{p(\gamma_x|y,H_\gamma)}{p(\gamma_0|y,H_\gamma)} \times \frac{\pi(\gamma_0)}{\pi(\gamma_x)}\]Step 3 - IWMDE likelihood cancellation: The key technical insight is that joint posterior ratio simplifies when $\gamma$ enters only through priors:
\[\frac{p(\gamma^*,\theta_i|y,H_\gamma)}{p(\gamma_i,\theta_i|y,H_\gamma)} = \frac{p(y|\theta_i)p(\theta_i|\gamma^*)\pi(\gamma^*)}{p(y|\theta_i)p(\theta_i|\gamma_i)\pi(\gamma_i)} = \frac{p(\theta_i|\gamma^*)\pi(\gamma^*)}{p(\theta_i|\gamma_i)\pi(\gamma_i)}\]| The likelihood $p(y | \theta_i)$ cancels exactly, reducing IWMDE to ratio of prior density evaluations. |
No sophisticated inequalities or concentration bounds are employed - the derivation is purely algebraic manipulation of probability densities.
Experiments & Validation
Datasets:
- Oosterwijk facial-feedback data (53 vs 57 participants, funniness ratings)
-
Bem (2011) precognition studies (K=9 experimental studies)
Baselines: Exact Bayes factors (t-test), kernel density estimation (KDE), bridge sampling validation
Key numbers:
- IWMDE approximation ratio stays within 1-2% of exact solution vs 10-20% for KDE at boundaries
- IWMDE accurate with 3,000 MCMC samples vs 300,000+ needed for stable KDE
- Computational savings: 13 minutes for 10-point grid vs ~1 minute for extended model approach
- 36-model robust meta-analysis: same accuracy as 4-model case without deterioration
Validation: Systematic comparison against exact solutions (t-test) and independent bridge sampling refits across sensitivity ranges. IWMDE consistently outperforms KDE in accuracy and stability, particularly with moderate sample sizes and in distribution tails.
Limitations & Open Problems
Limitations:
-
TECHNICAL: Method limited to low-dimensional sensitivity spaces (few hyperparameters) due to curse of dimensionality in density estimation - could potentially be addressed with better density estimators
-
TECHNICAL: Requires anchor point to lie in interior of posterior support - easily satisfied in practice by choosing reasonable reference value
-
TECHNICAL: Density ratio estimates become unreliable in regions with sparse posterior mass - manageable by choosing appropriate sensitivity ranges
-
NATURAL: Only addresses prior sensitivity, not likelihood specification sensitivity - complementary concern outside scope
-
TECHNICAL: Extended model adds hyperprior structure that may not reflect analyst’s actual uncertainty about hyperparameter - standard limitation of hierarchical modeling
-
NATURAL: Accuracy depends on quality of posterior density estimation, particularly for extreme tail evaluations
Open problems:
-
Theoretical analysis - Formal characterization of approximation error rates for IWMDE in this context, particularly convergence properties as MCMC sample size increases
-
Higher-dimensional extensions - Better density estimation methods or alternative approaches for simultaneous sensitivity over many hyperparameters beyond current 2-3 parameter limitation
Geometric Renyi Differential Privacy: Ricci Curvature Characterized by Heat Diffusion Mechanisms
Authors: Xiaotian Chang, Yangdi Jiang, Cyrus Mostajeran, Qirui Hu · Institution: Nanyang Technological University, Shanghai University of Finance and Economics · Category: stat.ML
Establishes privacy mechanisms for manifold-valued data using heat diffusion, with privacy budgets characterized by Ricci curvature through dimension-free Harnack inequalities.
Tags: differential privacy Riemannian geometry heat kernels Ricci curvature stochastic processes geometric statistics Renyi divergence manifold learning
Problem Formulation
-
Motivation: Differential privacy mechanisms for manifold-valued data are needed since non-Euclidean data (medical images, trajectories, shapes) contain sensitive information. Existing approaches embed data in Euclidean space, add noise, and project back, which distorts intrinsic geometry.
-
Mathematical setup: Let $(M, g)$ be a complete $m$-dimensional Riemannian manifold with Ricci curvature $\text{Ric} \geq -K$ for some $K$. Given a dataset $D$ and adjacent dataset $D’$ differing in one individual, let $f: \mathcal{D} \to M$ be a manifold-valued summary statistic with global sensitivity:
\[d(f(D), f(D')) \leq \Delta\]Define the heat kernel $p(x,z,t)$ as the fundamental solution to:
\[\left(\frac{\partial}{\partial t} - \Delta_x\right) p(x,z,t) = 0\]Let $(P_t)_{t \geq 0}$ be the heat semigroup:
\[P_t f(x) = \int_M p(x,z,t) f(z) \, d\text{vol}(z)\]Assumptions:
- $M$ is stochastically complete
- $f$ has bounded global sensitivity $\Delta$
- Ricci curvature satisfies $\text{Ric}(X) \geq -K|X|^2$
-
Toy example: When $M = \mathbb{R}^2$ with $g = I_2$, we have $K = 0$ and the heat kernel becomes $p(x,z,t) = (4\pi t)^{-1} \exp(-|x-z|^2/(4t))$, recovering standard Gaussian noise. The core difficulty is that on negatively curved manifolds, Brownian motion retains “memory at infinity.”
-
Formal objective: Establish $(α,ε)$-Rényi differential privacy where:
\[D_α(B_t(f(D)) \| B_t(f(D'))) \leq \varepsilon\]
Method
The method consists of two mechanisms:
BM Mechanism: Release $Y \sim P_t(f(D), \cdot)$ where $P_t$ is the heat semigroup (Brownian motion for time $t$).
Langevin Mechanism: For Hadamard manifolds, use drift $-\nabla V$ with $V(x) = \lambda d^2(o,x)/2$ to obtain confining diffusion:
\[dX_t = \sqrt{2}dB_t - \nabla V(X_t)dt\]Key equations: Privacy budget for BM mechanism:
\[\varepsilon = \frac{K\alpha\Delta^2}{2(1-e^{-2Kt})}\]Privacy budget for Langevin mechanism:
\[\varepsilon = \frac{(\lambda-K)\alpha\Delta^2}{2(1-e^{-2(\lambda-K)t})}\]Application to toy example: For $\mathbb{R}^2$ with $K=0$, the BM mechanism gives $\varepsilon = \alpha\Delta^2/(4t)$, recovering the standard Gaussian mechanism. The heat kernel is exactly $p(x,z,t) = (4\pi t)^{-1}\exp(-|x-z|^2/(4t))$.
Novelty & Lineage
Step 1 — Prior work:
- Reimherr et al. (2021): “Riemannian Laplace mechanisms” - first intrinsic DP on manifolds via exponential families
- Jiang et al. (2023): “Gaussian differential privacy on Riemannian manifolds” - extended GDP to manifolds
- Soto et al. (2022): “Shape-preserving differential privacy” - structure-aware mechanisms
Step 2 — Delta: This paper connects Ricci curvature directly to privacy budgets via dimension-free Harnack inequalities. The key insight is using heat diffusion as a privacy mechanism rather than closed-form densities.
Step 3 — Theory-specific assessment:
- Main theorem is somewhat predictable given known connections between curvature and diffusion
- Proof technique combines standard semigroup theory with Harnack inequalities - not genuinely novel
- The curvature-privacy connection is interesting but follows from established geometric analysis
- No lower bounds established to assess tightness
Verdict: INCREMENTAL — Solid theoretical contribution connecting known tools (Harnack inequalities, heat kernels) to differential privacy, but the connection is natural given existing geometric analysis literature.
Proof Techniques
The main proof strategy uses dimension-free Harnack inequalities to control Rényi divergences.
Key inequality: For $\text{Ric}^V \geq -K$, the Harnack inequality states:
\[(P_t|f|)^\alpha (x) \leq P_t|f|^\alpha(y) \exp\left[\frac{K\alpha d^2(x,y)}{2(\alpha-1)(1-e^{-2Kt})}\right]\]Proof stages:
-
Apply Harnack inequality with $f$ being the density ratio between $P_t(f(D), \cdot)$ and $P_t(f(D’), \cdot)$
-
Use sensitivity bound $d(f(D), f(D’)) \leq \Delta$ to get:
\[\frac{(P_t p_1)^\alpha(z)}{P_t p_1^\alpha(z)} \leq \exp\left[\frac{K\alpha\Delta^2}{2(\alpha-1)(1-e^{-2Kt})}\right]\] -
Convert to Rényi divergence using definition:
\[D_\alpha(P \| Q) = \frac{1}{\alpha-1} \log E_{z \sim Q}\left[\left(\frac{p(z)}{q(z)}\right)^\alpha\right]\]The key insight is that the Harnack inequality provides exactly the pointwise control needed for Rényi divergence bounds.
Experiments & Validation
Purely theoretical with some numerical illustrations. The paper includes:
- Synthetic experiments on hyperbolic space showing privacy-utility tradeoffs
- Comparisons with embedding-based approaches showing geometric distortion
-
Fréchet mean estimation experiments
Empirical validation would require: large-scale experiments on real manifold-valued datasets (medical images, shape data), comparison with other intrinsic DP mechanisms, and validation of theoretical utility bounds.
Limitations & Open Problems
Limitations:
- Stochastic completeness assumption - TECHNICAL (needed for heat kernel uniqueness but often satisfied)
- Global sensitivity requirement - NATURAL (standard in DP literature)
- Restriction to Hadamard manifolds for Langevin mechanism - RESTRICTIVE (excludes many important manifolds like spheres)
-
No finite-sample convergence rates - TECHNICAL (asymptotic analysis only)
Open problems:
- Extend Langevin mechanisms to compact manifolds with positive curvature
- Develop adaptive sensitivity analysis that doesn’t require global bounds