Apr 7, 2026 Theory 3 papers

Theory Digest — Apr 7, 2026

Today’s Digest at a Glance

Today’s papers address three distinct theoretical challenges: quantifying concentration and convergence rates in disordered systems, applying generative modeling techniques to causal discovery, and solving portfolio optimization problems in non-Markovian settings.

McKean-Vlasov Equations and Propagation of Chaos

McKean-Vlasov equations are nonlinear stochastic differential equations where the drift and diffusion coefficients depend on the law (probability distribution) of the solution itself. These equations naturally arise when studying the limiting behavior of large particle systems, where each particle’s dynamics depends on the empirical distribution of all particles.

The canonical form is $dX_t = b(X_t, \mu_t) dt + \sigma(X_t, \mu_t) dW_t$ where $\mu_t$ is the law of $X_t$. The “propagation of chaos” phenomenon states that as the number of particles $N \to \infty$, the empirical measure of the particle system converges to the unique solution of the McKean-Vlasov equation. However, establishing quantitative convergence rates, especially in the “quenched” setting where randomness is fixed rather than averaged over, requires sophisticated concentration inequalities and coupling arguments.

The key insight is that the nonlinearity through the measure $\mu_t$ creates a self-consistency condition that must be resolved iteratively, making the analysis fundamentally different from linear SDEs.

Riccati Backward Stochastic Differential Equations (BSDEs)

Riccati BSDEs are a special class of backward stochastic differential equations where the driver function is quadratic in the $Z$ variable, taking the form $dY_t = -f(t, Y_t, Z_t) dt + Z_t dW_t$ with $f(t,y,z) = A_t y + B_t z + \frac{1}{2} z^T C_t z + D_t$. These equations arise naturally in stochastic optimal control problems, particularly in mean-variance portfolio optimization.

The quadratic structure allows for explicit solutions when coefficients are deterministic, but becomes significantly more complex with stochastic or path-dependent coefficients. The solution typically involves matrix-valued processes satisfying coupled Riccati equations, where the $Y$ process represents the value function and $Z$ captures the optimal control in feedback form.

In portfolio contexts, Riccati BSDEs encode the trade-off between expected return and risk, with the quadratic term capturing the variance penalty that makes mean-variance optimization mathematically tractable.

Reading guide: The first paper develops new quantitative tools for analyzing particle systems in disordered media, while the third paper applies similar stochastic control techniques to financial mathematics. The second paper bridges machine learning and causal inference by borrowing diffusion-based objectives from generative modeling.

Quantitative propagation of chaos and universality for asymmetric Langevin spin glass dynamics

Authors: Manuel Arnese, Kevin Hu · Institution: NYU, Simons Society of Fellows · Category: math.PR

Establishes first quantitative concentration and convergence rates for quenched propagation of chaos in Langevin spin glass dynamics with non-Gaussian disorder.

Tags: spin-glass-dynamics propagation-of-chaos random-matrix-theory quantitative-bounds disordered-systems McKean-Vlasov-equations Malliavin-calculus concentration-of-measure

arXiv · PDF

Problem Formulation

Motivation: This problem studies quantitative propagation of chaos in disordered spin glass dynamics, which matters for understanding high-dimensional stochastic systems with random interactions and has applications in statistical physics, machine learning theory, and neural networks.
Mathematical setup: Fix $N \in \mathbb{N}$, $\beta > 0$, $A, T \in (0,\infty)$. Consider the asymmetric Langevin spin glass dynamics:
\[dX^i_t = -U'(X^i_t)dt + \frac{\beta}{\sqrt{N}} \sum_{j=1}^N J_{ij} X^j_t dt + dW^i_t, \quad i = 1,\ldots,N\]
where $U: (-A,A) \to \mathbb{R}$ is an even confining potential, $J = (J_{ij})_{i,j=1}^N$ is a random matrix with i.i.d. entries having mean 0, variance 1, and $W^1, \ldots, W^N$ are independent Brownian motions.

Assumptions:
1. $U = U_c + U_\ell$ where $U_c$ is convex and $U_\ell$ has Lipschitz derivative
2. Initial distribution $P^N_0$ is exchangeable, symmetric, and satisfies Poincaré inequality
3. Disorder entries satisfy T2 inequality with constant $\sigma^2$
Toy example: When $N=2$, $\beta=1$, $U(x) = x^2/2$, and $J$ has Gaussian entries, this reduces to two coupled Ornstein-Uhlenbeck processes with random coupling strength of order $N^{-1/2}$.
Formal objective: Establish quantitative bounds on quenched propagation of chaos:
\[\mathbb{P}\left(\left|\int f(x) d[P^{N,v}_{[t]}(J) - \mu^{\otimes k}_{[t]}](x)\right| \geq r\right) \leq c_0 e^{-c_1 r^2 N/k}\]

Method

The method proceeds in three main steps:

Concentration of measure: Show that the quenched law $P^{N,v}_{[t]}(J)$ concentrates around its averaged law $\mathbb{E}[P^{N,v}_{[t]}(J)]$ using entropy-Girsanov arguments to establish $N^{-1/2}$-Lipschitz regularity:
\[|F_f(J) - F_f(J')| \leq C\|f\|_\infty N^{-1/2} \|J-J'\|_{Fr}\]
Averaged propagation of chaos: For Gaussian disorder $G$, prove the averaged law converges to the limit:
\[W_2^2(\mathbb{E}[P^{N,v}_{[t]}(G)], \mu^{\otimes k}_{[t]}) \leq Ck/N\]
Key insight: Use mimicking theorem to represent averaged dynamics as:
\[dX^i_t = -U'(X^i_t)dt + \frac{\beta}{\sqrt{N}} X_t \cdot \mathbb{E}[G_i | X_{[t]}] dt + dW^i_t\]
Then apply Malliavin calculus to handle non-adapted integrands.
Universality: Show Gaussian and general T2 disorder yield the same averaged limit using Stein’s method techniques.

Toy example application: For the 2-particle case, the method shows the empirical measure of the two spins converges to the product of identical one-dimensional marginals with rate $O(N^{-1/2}) = O(1/\sqrt{2})$.

Novelty & Lineage

Prior work:

Arous-Guionnet (2019): Established qualitative quenched propagation of chaos for Gaussian disorder using large deviations
Arous-Gheissari-Jagannath (2021): Extended qualitative results to high temperature regime
Jagannath (2021): Proved qualitative universality for non-Gaussian disorder but without rates

Delta: This paper provides the first quantitative rates for quenched propagation of chaos with non-Gaussian disorder. Key advances:
concentration bounds with explicit constants
Wasserstein convergence rates
universality beyond Gaussian case.

Theory-specific assessment:
- Main theorem is somewhat predictable - concentration plus universality plus averaged propagation of chaos naturally combine
- Proof technique combining mimicking theorem + Malliavin calculus + Stein’s method is genuinely new and technically sophisticated
- Bounds appear reasonably tight for $k=1$ case ($N^{-1/2}$ rate) but deteriorate badly for large $k$ (authors acknowledge suboptimality)
The $N^{-1/2}$ rate for single particle is slower than classical mean-field rate $N^{-1}$, reflecting fundamental impact of disorder. Lower bounds in Section 7 suggest this is optimal.

Verdict: SIGNIFICANT — Provides first quantitative theory for important disordered system with genuinely novel proof techniques, though bounds become loose for multiple particles.

Proof Techniques

The proof combines three sophisticated techniques:

Entropy-Girsanov regularity: Key inequality using relative entropy and Girsanov’s theorem:
\[H(P_{[T]}(J) \| P_{[T]}(J')) \leq \frac{\beta^2}{2N} \|J-J'\|_{Fr}^2 \int_0^T \|\mathbb{E}_J[X_t X_t^\top]\|_{op} dt\]
Combined with Pinsker’s inequality and Poincaré bounds on covariance to establish Lipschitz property.
Mimicking theorem representation: For averaged dynamics, the crucial SDE becomes:
\[dX^i_t = -U'(X^i_t)dt + \frac{\beta^2}{N} X_t \cdot \left(\int_0^t Q_s X_s dW^i_s\right) dt + dW^i_t\]
Malliavin calculus manipulation: The key technical insight is swapping $X_t$ with the stochastic integral:
\[\frac{\beta^2}{N} X_t \cdot \int_0^t Q_s X_s dW^i_s = \frac{\beta^2}{N} \int_0^t X_t \cdot Q_s X_s dW^i_s + \frac{\beta^2}{N} \int_0^t D^i_s X_t \cdot Q_s X_s ds\]
where $D^i_s X_t$ is the Malliavin derivative. Proving the error term vanishes requires delicate analysis of non-Markovian coefficients.
Stein’s method for universality: The key identity comparing conditional expectations:
\[\mathbb{E}[J_i | X_{[t]}] - \mathbb{E}[G_i | X_{[t]}] = \left(I + \frac{\beta^2}{N} \int_0^t X_s X_s^\top ds\right)^{-1} \frac{\mathbb{E}_Q[J_i f(J_i) - \nabla f(G_i)]}{\mathbb{E}_Q[f(J_i)]}\]
The right side is shown to be small using T2 inequality properties.

Experiments & Validation

Purely theoretical. Empirical validation would require:

Numerical simulation of the N-particle SDE system for various disorder distributions
Comparison of empirical measures to theoretical limit μ
Verification of concentration rates and universality claims across different N values
Testing optimality of bounds, particularly the N^{-1/2} rate for k=1

Limitations & Open Problems

Limitations:

T2 inequality assumption on disorder - TECHNICAL (restrictive but standard in random matrix theory)
Bounded confining potential requirement - NATURAL (standard for ensuring well-posedness)
Time horizon dependence of constants - TECHNICAL (prevents uniform-in-time results)
Deteriorating bounds for large k - RESTRICTIVE (makes results less useful for many-particle observables)

Open problems:
Optimal rates for k ≥ 2: Conjecture is E[W₂(P^{k,N}(J), μ^⊗k)] = O(k/√N) but current bounds give O(k N^{-1/k})
Extension to symmetric disorder case which corresponds to original SK model but has much more complex limit dynamics

Smoothing the Landscape: Causal Structure Learning via Diffusion Denoising Objectives

Authors: Hao Zhu, Di Zhou, Donna Slonim · Institution: Harvard Medical School, Tufts University · Category: cs.LG

Applies diffusion denoising objectives to causal structure learning, proving equivalence to standard SEM loss while achieving gradient smoothing and computational speedups through adaptive k-hop constraints.

Tags: causal-discovery structural-equation-models diffusion-models dag-learning continuous-optimization denoising-objectives gradient-smoothing

arXiv · PDF

Problem Formulation

Motivation: Causal structure learning from observational data is critical for decision-making in genetics, epidemiology, and healthcare. Existing methods like NOTEARS and DAG-GNN face scalability and stability issues in high-dimensional settings, especially with feature-sample imbalance.
Mathematical setup: Given dataset $X \in \mathbb{R}^{n \times d}$ with $n$ samples and $d$ features, learn dependency graph $G$ represented by weighted adjacency matrix $W \in \mathbb{R}^{d \times d}$. Assumes structural equation model:
\[X = XW + E\]
where $E$ captures additive noise terms.

Assumptions:
1. Causal sufficiency (no unobserved confounders)
2. No selection bias (i.i.d. samples)
3. Acyclicity (true structure is DAG)
4. Additive noise model with independent errors
5. Non-Gaussian errors or equal variance Gaussian for identifiability
6. Faithfulness
Toy example: When $d=3$ with variables $X_1, X_2, X_3$ and true structure $X_1 \to X_2 \to X_3$, the adjacency matrix has $W_{12} \neq 0, W_{23} \neq 0$, others zero. Traditional methods optimize reconstruction loss but face volatile gradients in high dimensions.
Formal objective: Minimize denoising objective:
\[\min_W \frac{1}{2n}\|(X_t - X_t W) - \text{diag}(\sqrt{1-\alpha_t})Z(I-W)\|_F^2 + \lambda_1\|W\|_1 + \lambda_2\|W\|_2\]

Method

The method applies diffusion denoising to causal discovery through three steps:

Forward diffusion: Perturb data using standard diffusion schedule:
\[X_t = \sqrt{\alpha_t}X_0 + \sqrt{1-\alpha_t}Z\]
where $Z \sim N(0,I)$ and $\alpha_t = \prod_{i=1}^t(1-\beta_i)$.
Denoising objective: Learn adjacency matrix $W$ to predict added noise:
\[\min_W \frac{1}{2n}\|(X_t - X_t W) - \text{diag}(\sqrt{1-\alpha_t})Z(I-W)\|_F^2\]
Adaptive k-hop acyclicity: Replace expensive matrix exponential with k-hop constraint:
\[h(W,k,\gamma) = \sum_{j=1}^{k+1} \frac{1}{j!\gamma^{2j}}\text{tr}((\gamma W \circ \gamma W)^j)\]
Schedule increases $k$ during training: $k=3$ (early), $k=10$ (mid), $k=d$ (final).

Toy example application: For $d=3$ case, method perturbs $X$ with noise, then learns $W$ by minimizing denoising error. The k-hop constraint prevents short cycles like $X_1 \to X_2 \to X_1$ early in training.

Novelty & Lineage

Step 1 — Prior work:

NOTEARS (Zheng et al., 2018): introduced continuous DAG constraint $h(W) = \text{tr}(e^{W \circ W}) - d$ for structure learning
DAG-GNN (Yu et al., 2019): extended to nonlinear case using autoencoder with structural constraints
DAGMA (Bello et al., 2022): improved optimization through log-determinant acyclicity characterization

Step 2 — Delta: This paper repurposes diffusion denoising objectives for causal discovery, proving equivalence to standard SEM loss while providing gradient smoothing. Introduces adaptive k-hop constraint reducing complexity from $O(d^3)$ to $O(d^2)$.

Step 3 — Theory-specific assessment:

Main theorem (objective equivalence) is straightforward algebraic manipulation, not surprising
Proof technique is routine - standard diffusion reparameterization applied to linear SEM
Gradient smoothing argument relies on known randomized smoothing theory
No comparison to known lower bounds provided
k-hop scheduling is engineering optimization, not fundamental theoretical advance

The connection to diffusion models is conceptually interesting but the theoretical contribution is limited to showing algebraic equivalence and applying existing smoothing results.

Verdict: INCREMENTAL — solid engineering improvements with modest theoretical justification, but lacks fundamental theoretical breakthroughs.

Proof Techniques

Main proof shows objective equivalence through algebraic manipulation:

Start with perturbed data:
\[X_t = \text{diag}(\sqrt{\alpha_t})X_0 + \text{diag}(\sqrt{1-\alpha_t})Z\]
Apply structural equation $X_0 = X_0 W + E$:
\[X_t W = \text{diag}(\sqrt{\alpha_t})X_0 W + \text{diag}(\sqrt{1-\alpha_t})ZW\]
Rearrange terms:
\[X_t - X_t W = \text{diag}(\sqrt{\alpha_t})(X_0 - X_0 W) + \text{diag}(\sqrt{1-\alpha_t})(Z - ZW)\]
Isolate original reconstruction term:
\[\text{diag}(\sqrt{\alpha_t})(X_0 - X_0 W) = (X_t - X_t W) - \text{diag}(\sqrt{1-\alpha_t})Z(I-W)\]
Key insight: The denoising objective minimizes the right-hand side, which is equivalent to minimizing the original SEM reconstruction loss $X_0 - X_0 W$.

Gradient smoothing argument invokes randomized smoothing theory: optimizing over perturbed inputs $X_t$ bounds the Lipschitz constant of gradients, preventing sharp local minima.

The k-hop constraint proof simply truncates the matrix exponential series expansion and maintains running products to achieve $O(d^2)$ complexity.

Experiments & Validation

Synthetic benchmarks: Scale-Free and Erdős-Rényi graphs (d=20-5000, degree=10-500) with linear and 7 nonlinear functions (sin, cos, tanh, ReLU, sigmoid, polynomial). Metrics: SHD, TPR, FDR, FPR, runtime.

Key results:

DDCD-Linear matches DAGMA/GOLEM performance with 10x speedup
DDCD-Nonlinear recovers nonlinear functions accurately
2000-node graph: 5.7 minutes vs 53.7 minutes for full DAG constraint
k-hop scheduling reduces runtime by 90% while maintaining acyclicity

Real data: Myocardial infarction dataset (qualitative analysis only). DDCD-Smooth identifies medically plausible causal clusters (cardiogenic shock → tachycardia, pulmonary edema → nitrate use).

Baselines: NOTEARS, NOTEARS-MLP, DAG-GNN, GOLEM, GAE, DAGMA with default hyperparameters.

Limitations & Open Problems

Limitations:

Assumes linear causal mechanisms in latent space - TECHNICAL (standard assumption, but limits to linear-in-latent models)
Requires additive Gaussian noise for theoretical guarantees - TECHNICAL (common in causal discovery, but restrictive for real applications)
Identifiability requires non-Gaussian noise or equal variances - TECHNICAL (standard identifiability condition)
Causal sufficiency assumption (no confounders) - RESTRICTIVE (rarely satisfied in observational studies)
k-hop scheduling hyperparameters need tuning - TECHNICAL (method-specific limitation)
Real-world evaluation limited to qualitative assessment - RESTRICTIVE (lack of ground truth limits validation)
DDCD-Smooth normalization may lose important scale information - TECHNICAL (trade-off between robustness and information preservation)

Open problems:
Extend theoretical guarantees to nonlinear mechanisms without latent linear structure
Develop principled methods for selecting k-hop schedule and diffusion hyperparameters automatically

On the mean-variance problem through the lens of multivariate fake stationary affine Volterra dynamics

Authors: Emmanuel Gnabeyeu · Institution: Sorbonne Université · Category: math.OC

Solves multivariate Markowitz portfolio optimization in fake stationary affine Volterra models using Riccati BSDE methods, extending rough volatility portfolio theory to time-dependent coefficients

Tags: portfolio optimization rough volatility Volterra processes mean-variance Riccati equations non-Markovian models stochastic control backward SDEs

arXiv · PDF

Problem Formulation

Motivation: The continuous-time Markowitz mean-variance portfolio selection problem remains largely unsolved in rough volatility models due to their non-Markovian and non-semimartingale nature. These models better capture empirical volatility features like low Hölder regularity but pose significant challenges for classical stochastic control methods.
Mathematical setup: Consider a financial market on
\[[0,T]\]
with
\[d+1\]
securities: one risk-free bond
\[S^0_t\]
with rate
\[r(t)\]
and
\[d\]
risky assets with dynamics:
\[dS^i_t = S^i_t(r(t) + \theta_i V^i_t)dt + S^i_t \sqrt{V^i_t} dB^i_t\]
where volatility
\[V = (V^1, \ldots, V^d)^\top\]
follows the fake stationary affine Volterra process:
\[V_t = \phi(t)V_0 + \int_0^t K(t-s)[\mu(s) + DV_s]ds + \int_0^t K(t-s)\nu\varsigma(s)\sqrt{\text{diag}(V_s)}dW_s\]
Assumptions:
\[K = \text{diag}(K_1, \ldots, K_d)\]
with completely monotone kernels satisfying integrability and continuity conditions
1. Risk premium
\[\lambda = (\theta_1\sqrt{V^1}, \ldots, \theta_d\sqrt{V^d})^\top\]
for constants
\[\theta_i \geq 0\]
1. Correlation structure
\[W^i = \rho_i B^i + \sqrt{1-\rho_i^2} B^{\perp,i}\]
between volatility and price processes

The wealth process under strategy
\[\alpha\]
satisfies:
\[dX^\alpha_t = (r(t)X^\alpha_t + \alpha_t^\top \lambda_t)dt + \alpha_t^\top dB_t\]
Toy example: For
\[d=2\]
with fractional kernels
\[K_i(t) = t^{\alpha_i-1}/\Gamma(\alpha_i)\]
where
\[\alpha_i \in (1/2,1)\]
, constant correlation
\[\rho_1 = \rho_2 = -0.6\]
, and parameters
\[V^i_0 = 0.02, \theta_i = 0.1, \nu_i = 0.4\]
, the problem becomes a two-dimensional non-Markovian optimization with memory effects from the fractional kernels.
Formal objective: The Markowitz problem seeks:
\[V(m) := \inf_{\alpha \in \mathcal{A}} \{\text{Var}(X_T) : \text{s.t.} \; E[X_T] = m\}\]

Method

The method transforms the constrained mean-variance problem into solving a Riccati backward SDE through Lagrangian duality.

Key steps:

Convert to unconstrained problem:
\[V(m) = \max_\eta \min_\alpha E[(X^\alpha_T - (m-\eta))^2] - \eta^2\]
Introduce auxiliary process
\[(\Gamma, \Lambda)\]
solving the Riccati BSDE:
\[d\Gamma_t = \Gamma_t[-2r(t) + |\lambda_t + \Sigma\Lambda_t|^2]dt + \Lambda_t^\top dW_t\] \[\Gamma_T = 1\]
Define
\[\Lambda^i_t = \nu_i \varsigma_i(t) \psi_i(T-t) \sqrt{V^i_t}\]
where
\[\psi\]
solves the Riccati-Volterra equation:
\[\psi_i(t) = \int_0^t K_i(t-s)[-\theta_i^2 + F_i(T-s, \psi(s))]ds\]
with
\[F_i(s, \psi) = -2\theta_i\rho_i\nu_i\varsigma_i(s)\psi_i + (D^\top\psi)_i + \frac{\nu_i^2}{2}(1-2\rho_i^2)(\varsigma_i(s)\psi_i)^2\]
The auxiliary process admits explicit representation:
\[\Gamma_t = \exp\left(2\int_t^T r(s)ds + \sum_{i=1}^d \int_t^T [-\theta_i^2 + F_i(s, \psi(T-s))]g^i_t(s)ds\right)\]
Optimal strategy via completion of squares:
\[\alpha^*_t = -(\lambda_t + \Sigma\Lambda_t)(X^{\alpha^*}_t - \xi^* e^{-\int_t^T r(s)ds})\]
Applying to toy example: For the two-dimensional case with
\[V^1_0 = V^2_0 = 0.02\]
, the optimal allocation becomes:
\[\alpha^{*,i}_t = -(\theta_i + \rho_i\nu_i\varsigma_i(t)\psi_i(T-t)\sqrt{V^i_t})(X^*_t - \xi^* e^{-\int_t^T r(s)ds})\]
This captures both direct risk premium exposure and volatility-correlation adjustments.

Novelty & Lineage

Prior work:

“Mean-variance portfolio selection with stochastic volatility” (Han & Wong 2020) - solved single-asset Volterra Heston case using forward variance approach
“Multidimensional Markovian FBSDEs with super-quadratic growth” (Abi Jaber et al. 2021) - addressed multivariate affine Volterra models but without time-dependent diffusion coefficients
“Portfolio Choice under Rough Volatility” (El Euch & Rosenbaum 2019) - established rough Heston model foundation

Delta: This paper extends to multivariate fake stationary affine Volterra models with time-dependent diffusion coefficients
\[\varsigma(t)\]
. The key advance is handling the “fake stationarity” regime where mean and variance remain constant despite non-stationarity, requiring solving functional equations for the stabilizer function.

Theory-specific assessment:
- Main theorem is predictable extension of Han & Wong (2020) to higher dimensions, though the technical machinery for time-dependent coefficients is non-trivial
- Proof technique assembles known Riccati BSDE methods with Volterra equation theory - not genuinely novel but technically demanding
- No lower bounds provided for comparison of optimality gaps
- The “fake stationarity” condition creates additional functional constraints that previous work avoided
The extension to time-dependent diffusion and multivariate correlation structures represents solid but incremental progress over existing Volterra portfolio theory.

Verdict: INCREMENTAL — Expected multidimensional extension with time-dependent coefficients, building systematically on established single-asset framework.

Proof Techniques

The proof strategy combines three main technical components:

Riccati BSDE existence and representation: Uses completion of squares technique to show that optimal control satisfies:
\[d[\Gamma_t(\tilde{X}^\alpha_t)^2] = [\alpha_t + h_t\tilde{X}^\alpha_t]^\top \Gamma_t [\alpha_t + h_t\tilde{X}^\alpha_t] dt + \text{martingale terms}\]
where
\[h_t = \lambda_t + \Sigma\Lambda_t\]
. The key inequality driving optimality is:
\[E[(\tilde{X}^\alpha_T)^2] = \Gamma_0(\tilde{X}^\alpha_0)^2 + E\left[\int_0^T [\alpha_s + h_s\tilde{X}^\alpha_s]^\top \Gamma_s [\alpha_s + h_s\tilde{X}^\alpha_s] ds\right]\]
Exponential-affine representation for auxiliary process: The crucial technical insight uses the adjusted forward variance
\[g_t(s) = E[V_s - \int_t^s K(s-u)DV_u du | \mathcal{F}_t]\]
to obtain:
\[\Gamma_t = \exp\left(\sum_{i=1}^d \int_t^T [-\theta_i^2 + F_i(s, \psi(T-s))] g^i_t(s) ds\right)\]
This leverages the exponential-affine transform:
\[E\left[\exp\left(\int_t^T m(ds) V_s\right) | \mathcal{F}_t\right] = \exp\left(\int_t^T [\tilde{F}(s, \tilde{\psi}(T-s))] g_t(s) ds\right)\]
Riccati-Volterra equation solvability: For existence of
\[\psi\]
, the paper applies fixed-point arguments to the integral equation:
\[\psi_i(t) = \int_0^t K_i(t-s)[-\theta_i^2 + F_i(T-s, \psi(s))] ds\]
The key boundedness condition ensures:
\[\sup_{t \in [0,T]} |\psi_i(t)| \leq \frac{|\theta_i|^2}{\bar{\lambda}_i}(1 - R_{\bar{\lambda}_i}(T))\]
where
\[R_{\bar{\lambda}_i}\]
is the resolvent of kernel
\[K_i\]
.

Experiments & Validation

Numerical experiments focus on a two-dimensional fake stationary rough Heston model with fractional kernels

\[K_i(t) = t^{\alpha_i-1}/\Gamma(\alpha_i)\]

where

\[\alpha_1 = 0.6, \alpha_2 = 0.9\]

Key parameters:

\[V^i_0 \sim \mathcal{N}(\mu^i_0/\lambda_i, v^i_0)\]

with

\[c = (0.01, 0.03)^\top\]

\[\mu_0 = (2.0, 1.0)^\top\]

, correlation matrix

\[\Sigma = \text{diag}(-0.7, -0.55)\]

, risk premiums

\[\theta = (0.1, 0.12)^\top\]

, volatility parameters

\[\nu = (0.4, 0.32)^\top\]

Implementation uses K-integrated Euler-Maruyama scheme for Volterra process simulation and finite difference methods for Riccati-Volterra equation discretization. Results show efficient frontier as straight line with slope depending on rough volatility parameters.

Datasets: Synthetic data generated from model with

\[T = 1\]

year,

\[n = 252\]

time steps. No real market data validation provided.

Limitations & Open Problems

Limitations:

TECHNICAL: Assumption 3.1 requires bounded growth condition
\[\max_i \sup_t [\theta_i^2 + \nu_i^2 \varsigma_i(t)^2 \psi_i(T-t)^2] \leq a/a(p)\]
- this is needed for martingale properties but may be restrictive for large risk premiums.
NATURAL: Fake stationarity conditions require specific functional relationships between initial conditions and model parameters - widely satisfied in calibrated models.
RESTRICTIVE: Diagonal structure of matrix
\[D\]
in volatility drift limits cross-asset volatility spillovers, though authors note extension is possible.
TECHNICAL: Correlation structure
\[W^i = \rho_i B^i + \sqrt{1-\rho_i^2} B^{\perp,i}\]
assumes constant correlations - stochastic correlation would require additional state variables.
RESTRICTIVE: Weak solution framework means optimal strategies may not be pathwise unique, limiting practical implementation.

Open problems:
Extend to non-diagonal drift matrix
\[D\]
with full cross-asset volatility interactions while maintaining analytical tractability
Develop robust numerical methods for high-dimensional Riccati-Volterra systems beyond
\[d=2\]
case