Mar 20, 2026 Theory 8 papers

Theory Digest — Mar 20, 2026

Today’s Digest at a Glance

Today’s digest spans advanced control theory, statistical learning, and optimization, with several papers pushing the boundaries of stochastic systems and high-dimensional inference. The collection reveals deep connections between classical control problems and modern machine learning techniques.

Stochastic Optimal Control and Advanced Dynamics

Stochastic optimal control addresses the fundamental question: how do we make the best decisions when our system evolves randomly over time? Imagine piloting a drone through turbulent air or managing a financial portfolio amid market volatility. The mathematical framework starts with a state variable $X_t$ that follows a stochastic differential equation like $dX_t = f(X_t, u_t)dt + \sigma(X_t)dW_t$, where $u_t$ is our control input and $W_t$ represents random noise. The goal is to choose $u_t$ to minimize a cost function $J = \mathbb{E}[\int_0^T L(X_t, u_t)dt + \Phi(X_T)]$.

The classical approach relies on the Hamilton-Jacobi-Bellman (HJB) equation, a partial differential equation that characterizes the optimal value function $V(x,t)$. However, solving HJB equations numerically becomes prohibitively expensive in high dimensions due to the “curse of dimensionality.” Recent advances use physics-informed neural networks (PINNs) to approximate solutions by embedding the PDE directly into the neural network’s loss function. This approach is particularly powerful for problems involving temperature control in Langevin dynamics, where the system’s “temperature” $T(x,t)$ controls the exploration-exploitation tradeoff in optimization algorithms.

A related challenge involves computing Malliavin derivatives—essentially derivatives with respect to the random driving process—which appear in backward stochastic differential equations (BSDEs). These equations, of the form $Y_t = \xi + \int_t^T g(s, Y_s, Z_s)ds - \int_t^T Z_s dW_s$, are notoriously difficult to solve computationally. The forward approach using Malliavin calculus offers a promising alternative by transforming the backward problem into a forward one that’s more amenable to numerical solution.

High-Dimensional Statistics and Hypothesis Testing

Modern data analysis frequently encounters situations where the number of parameters $p$ exceeds or is comparable to the sample size $n$. Traditional statistical methods often fail in this “high-dimensional” regime, requiring new theoretical frameworks and computational approaches. The fundamental challenge is that classical asymptotic theory assumes $n \to \infty$ with $p$ fixed, but modern applications require understanding the joint limit where both $n,p \to \infty$ with their ratio $p/n$ approaching a constant.

Random matrix theory provides the mathematical foundation for high-dimensional analysis. Key results like the Marchenko-Pastur theorem describe the eigenvalue distribution of sample covariance matrices $\frac{1}{n}X^TX$ when $X$ is a $n \times p$ random matrix. These results enable the construction of test statistics that maintain proper Type I error rates even when $p \approx n$. The deviation testing framework addresses a practically important but theoretically challenging problem: rather than testing whether two population means are exactly equal ($H_0: \mu_1 = \mu_2$), we want to test whether they’re “similar enough” ($H_0: \lvert \mu_1 - \mu_2 \rvert \leq \delta$ for some tolerance $\delta > 0$). This non-standard hypothesis testing problem requires sophisticated techniques from empirical process theory and concentration inequalities.

Multi-Agent Systems and Game Theory

Linear-Quadratic-Gaussian (LQG) games model scenarios where multiple agents make decisions simultaneously, each trying to optimize their own quadratic cost function while their systems evolve according to linear dynamics with Gaussian noise. Agent $i$’s system follows $dx_i = (A_i x_i + B_i u_i + \sum_{j \neq i} C_{ij} u_j)dt + \sigma_i dW_i$, and each agent minimizes $J_i = \mathbb{E}[\int_0^T (x_i^T Q_i x_i + u_i^T R_i u_i)dt + x_i^T(T) Q_{i,f} x_i(T)]$. The challenge is computing Nash equilibria—strategy profiles where no agent benefits from unilateral deviation.

In distributed settings, agents cannot observe all other agents’ states directly, making the estimation problem central to performance. Sparse estimation techniques like group lasso regularization ($\min_\beta \frac{1}{2}\lvert y - X\beta \rvert^2 + \lambda \sum_g \lvert \beta_g \rvert_2$) help identify which agents or sensors are most informative, reducing communication costs and improving robustness. The theoretical analysis requires establishing how sparsity-inducing penalties affect the game’s equilibrium structure and convergence properties.

Machine Learning Theory and Regularization

Modern deep learning’s success partly stems from implicit regularization effects that aren’t fully understood theoretically. The maximum entropy principle provides one lens for understanding why certain algorithmic choices work well in practice. When we impose constraints like $\mathbb{E}[f(X)] = c$ and seek the distribution that maximizes entropy $H(p) = -\int p(x) \log p(x) dx$, we often obtain Gaussian distributions or exponential families that appear naturally in machine learning algorithms.

Gaussian processes offer a principled framework for understanding neural network behavior in the infinite-width limit. A GP is defined by its mean function $m(x) = \mathbb{E}[f(x)]$ and covariance kernel $k(x,x’) = \text{Cov}[f(x), f(x’)]$. The choice of kernel encodes our assumptions about function smoothness and structure. Recent work explores how different regularization schemes affect the implicit kernel and representation learning capabilities of neural networks, particularly regarding calibration—ensuring that predicted probabilities match true frequencies.

Causal Inference and Representation Learning

Causal representation learning asks: can we recover latent causal variables from observed data across multiple environments? The structural causal model framework posits that data is generated by $X = f(Z, N)$ where $Z$ are latent causal variables, $N$ represents noise, and $f$ is some mixing function. When we have interventional data from multiple domains (environments where different variables have been experimentally manipulated), identifiability becomes possible under certain conditions.

Empirical Bayes methods provide a natural framework for this problem by treating the latent variables as random effects with unknown prior distributions. The key insight is using score matching—fitting models by matching score functions $\nabla_x \log p(x)$ rather than densities directly—combined with Tweedie’s formula, which relates empirical Bayes estimates to score functions. This approach sidesteps many computational challenges while providing finite-sample guarantees for causal variable recovery.

Reading Guide

For control theorists and applied mathematicians: Start with papers 1 and 4 on stochastic optimal control, which showcase complementary numerical approaches to classical problems. Paper 8 connects to modern learning theory perspectives on control.

For statisticians: Papers 2 and 6 offer contrasting perspectives on high-dimensional inference—one focusing on hypothesis testing fundamentals, the other on causal structure discovery.

For machine learning researchers: Begin with paper 7 on regularization theory, then explore paper 6 for connections to causal representation learning. Paper 3 provides a bridge to control applications.

For systems researchers: Papers 3 and 8 both address multi-agent coordination but from different angles—game theory versus online learning. Paper 5 offers a glimpse into quantum extensions of classical control theory.

The papers naturally pair: (1,4) for stochastic control methods, (2,6) for high-dimensional statistical inference, (3,8) for multi-agent learning, and (5,7) for advanced theoretical frameworks.

State-dependent temperature control in Langevin diffusions using numerical exploratory Hamiltonian-Jacobi-Bellman equations

Authors: Taorui Wang, Xun Li, Gu Wang, Zhongqiang Zhang · Institution: Worcester Polytechnic Institute, Hong Kong Polytechnic University · Category: math.NA

Develops a PINN-based method to solve exploratory Hamilton-Jacobi-Bellman equations for computing state-dependent temperature schedules in Langevin dynamics, extending from 1D to high-dimensional nonconvex optimization problems.

Tags: stochastic optimization Hamilton-Jacobi-Bellman equations physics-informed neural networks Langevin dynamics nonconvex optimization temperature control high-dimensional PDEs exploratory control

arXiv · PDF

Problem Formulation

Motivation: Choosing appropriate noise levels in Langevin dynamics is crucial for escaping local minima in nonconvex optimization, but determining optimal state-dependent temperature schedules remains challenging, especially in high dimensions where traditional methods become computationally intractable.
Mathematical setup: Consider the minimization problem
\[\min_{x \in \Omega} f(x)\]
where $\Omega \subset \mathbb{R}^d$ is a bounded domain and $f: \mathbb{R}^d \to \mathbb{R}$ is continuously differentiable. The controlled Langevin dynamics is given by
\[dX(t) = -\nabla f(X(t)) dt + h(X(t)) dW(t)\]
where $W(t)$ is a $d$-dimensional Brownian motion and $h(x)$ is the state-dependent noise coefficient. The value function is defined as
\[v(x) = \inf_{u \in \mathcal{A}_0(x)} \mathbb{E}\left[\int_0^\infty e^{-\rho t} f(X(t)) dt \mid X(0) = x\right]\]
The classical HJB equation is
\[-\rho v(x) + f(x) - \nabla f(x) \cdot \nabla v(x) + H(\nabla^2 v(x)) = 0\]
where $H(\cdot) = \inf_{u \in U} (u \text{Tr}(\cdot))$ and $U = [u_{min}, u_{max}]$. The exploratory HJB equation uses the regularized Hamiltonian
\[H_\lambda(\cdot) = -\lambda \ln \int_U \exp(-\lambda^{-1} \text{Tr}(\cdot) u) du\]
yielding
\[F_\lambda(\nabla^2 v_\lambda, \nabla v_\lambda, v_\lambda, x) := -\rho v_\lambda(x) + f(x) - \nabla v_\lambda(x) \cdot \nabla f(x) + H_\lambda(\nabla^2 v_\lambda(x)) = 0\]
Assumptions:
1. $\Omega \subset \mathbb{R}^d$ is a bounded domain with smooth boundary
2. $f \in C^2(\Omega)$
3. Global minima do not lie on the boundary $\partial \Omega$
4. Homogeneous Neumann boundary conditions: $\nabla v_\lambda(x) \cdot n(x) = 0$ for $x \in \partial \Omega$
Toy example: When $d = 1$ and $f(x) = x^4 - 2x^2$ on $\Omega = [-2, 2]$, this becomes a double-well potential with local minima at $x = \pm 1$ and global minima at $x = \pm 1$. The noise coefficient should be large near the saddle point at $x = 0$ and small near the global minima.
Formal objective: Determine the optimal noise coefficient
\[h_\lambda(x) := \sqrt{2 \frac{\int_U u \exp(-\lambda^{-1} u \Delta v_\lambda(x)) du}{\int_U \exp(-\lambda^{-1} u \Delta v_\lambda(x)) du}}\]
where $v_\lambda$ solves the exploratory HJB equation.

Method

The method consists of three main components:

Solve the exploratory HJB equation using Physics-Informed Neural Networks (PINNs) by minimizing the loss function:
\[\mathcal{L}(\phi; D_\Omega, D_{\partial\Omega}) = \alpha_{res} \frac{1}{N_\Omega} \sum_{i=1}^{N_\Omega} R_\lambda(x_i; \phi)^2 + \alpha_{\partial\Omega} \frac{1}{N_{\partial\Omega}} \sum_{\ell=1}^{N_{\partial\Omega}} B_{\partial\Omega}(x_\ell; \phi)^2\]
where $R_\lambda(x; \phi) = F_\lambda(\nabla^2 v_{\lambda,\phi}, \nabla v_{\lambda,\phi}, v_{\lambda,\phi}, x)$ and $B_{\partial\Omega}(x; \phi) = \nabla v_{\lambda,\phi}(x) \cdot n(x)$.
Compute the noise coefficient using a numerically stable evaluation:
\[h_{\lambda,\phi}(x) = \sqrt{2 \left(u_{min} + \delta \frac{(z-1) e^z + 1}{z(e^z - 1)}\right)}\]
where $z = -\delta \lambda^{-1} \Delta v_{\lambda,\phi}(x)$ and $\delta = u_{max} - u_{min}$.
Apply truncation for stability:
\[h_\lambda^\tau(x) = h_{\lambda,\phi}(x) \mathbf{1}_{\{h_{\lambda,\phi}(x) \geq \tau\}}\]
and run the discrete Langevin dynamics:
\[X_{k+1}^* = X_k^* - \eta \nabla f(X_k^*) + \sqrt{\eta} h_\lambda^\tau(X_k^*) \xi_k\]
where $\xi_k \sim \mathcal{N}(0, I_d)$.
Control bounds are set using:
\[\kappa = \frac{1}{2} \max_{1 \leq \ell \leq S} \|\nabla f(x^{(\ell)})\|_\infty^2\] \[u_{max} = C_\kappa \kappa\]
Applied to the toy example $f(x) = x^4 - 2x^2$: The PINN learns $v_{\lambda,\phi}(x)$ such that $\Delta v_{\lambda,\phi}(x) \approx 0$ near $x = \pm 1$ (giving small noise) and $\Delta v_{\lambda,\phi}(x) > 0$ near $x = 0$ (giving large noise), enabling escape from the central saddle point.

Novelty & Lineage

This extends the 1D exploratory HJB approach of Gao et al. [11] to high dimensions (2-6D) using PINNs. Prior work includes: Gao et al. [11] introduced state-dependent temperature control via exploratory HJB in 1D; Tang et al. [29] established convergence theory; Kim et al. [19] applied PINN policy iteration to exploratory HJB.

Key novelties:

First extension of eHJB-based temperature control to genuinely high-dimensional nonconvex problems
Direct PINN solution avoiding policy iteration loops (unlike [19])
Stable numerical implementation of log-partition operator without numerical quadrature
Principled control bound selection rules and truncation strategies
Focus on learning Laplacian structure rather than pointwise accuracy

The work differs fundamentally from existing HJB solvers by targeting the Laplacian structure rather than pointwise solutions, and solving offline rather than on-the-fly.

SIGNIFICANT

Proof Techniques

The main theoretical result (Theorem 4.1) provides error bounds for the PINN approximation:

Define the PINN residual error as:
\[F_\lambda(\nabla^2 v_{\lambda,\phi}, \nabla v_{\lambda,\phi}, v_{\lambda,\phi}, x) = R_\lambda(x; \phi) := \varepsilon \tilde{R}_\lambda(x; \phi)\]
where $\lvert \tilde{R}_\lambda\ \rvert_{L^\infty(\Omega)} = 1$.
The key error bound is:
\[u_{min} \|\Delta v_{\lambda,\phi} - \Delta v\|_{L^\infty(\Omega)} \leq C(\lambda + \varepsilon \|\tilde{R}_\lambda(\cdot, \phi)\|_{L^\infty})\]
The proof technique uses:
- Uniform ellipticity of the linearized operator since $u_{min} I \leq A_\lambda$
- The difference $w_\lambda = v_\lambda - v$ satisfies:
\[\rho w_\lambda - \nabla f \cdot \nabla w_\lambda + A_\lambda \Delta w_\lambda = H(\nabla^2 v) - H_\lambda(\nabla^2 v)\]
- Interpolation inequality:
\[\|\nabla w_\lambda\|_{L^\infty} \leq C \|w_\lambda\|_{L^\infty}^{1/2} \|\Delta w_\lambda\|_{L^\infty}^{1/2}\]
- Lipschitz bounds on the Hamiltonian:
\[u_{min} \|X - Y\| \leq |H(X) - H(Y)| \leq u_{max} \|X - Y\|\]
Convergence analysis (Lemma C.1) shows:
\[\|H_\lambda(X) - H(X)\|_{C^{0,\gamma}} \leq C\lambda^{1-\gamma/\alpha}\]
for $X \in C^{0,\alpha}$, leading to the half-order convergence observed numerically.

Experiments & Validation

Datasets/Functions tested:

1D double-well function (validation against finite difference reference)
2D Gaussian mixture (25 components on [−1,5]²)
2D Easom function (global minimum in flat plateau)
6D Hartmann function (multimodal benchmark)

Key experimental setup:

1001 trajectories, T=1000 Langevin iterations
PINN: 5 hidden layers, width 64, tanh activation, 20k training iterations
LION optimizer, float64 precision
Interior collocation points: N_Ω = 16384
Boundary points: N_∂Ω varies by problem geometry

Key results:

Half-order convergence in λ confirmed (e_∞(2λ)/e_∞(λ) ≈ √2)
Successful optimization up to 6D with proper control bound selection
Truncation parameter τ = s√(2u_max) with s ∈ [1/16, 1/2] crucial for stability
Control bounds u_max ≥ κ (gradient-based scale) necessary for good performance
Method remains robust beyond low-dimensional test cases

Limitations & Open Problems

Limitations:

Bounded domains only - RESTRICTIVE (many optimization problems are on unbounded domains, though extension to unbounded case is planned future work)
Global minima must not lie on boundary - TECHNICAL (needed for current boundary condition treatment, could potentially be relaxed)
High computational cost due to PINN training - NATURAL (typical for neural PDE solvers, could be improved with tensor networks or other advanced techniques)
Requires smooth activation functions (C³) for theoretical guarantees - TECHNICAL (needed for residual bounds but may not be necessary in practice)
Performance depends on hyperparameter tuning (λ, τ, control bounds) - NATURAL (common in stochastic optimization methods)
Limited to relatively low dimensions (tested up to 6D) - RESTRICTIVE (curse of dimensionality affects PINN training, though better than classical grid methods)

Open problems:
Extension to unbounded domains with appropriate boundary treatments and convergence guarantees
Theoretical analysis of optimal truncation strategies and their effect on convergence rates

Deviation Tests for a High-dimensional Mean

Authors: Zengjing Chen, Ruihan Liu, Jianfeng Yao · Institution: Shandong University, University of Hong Kong, Chinese University of Hong Kong (Shenzhen) · Category: stat.ME

Develops the first systematic high-dimensional test for whether mean vectors are similar within a specified tolerance, using two-armed bandit processes to handle the technical challenges of non-standard parameter spaces.

Tags: high-dimensional-statistics hypothesis-testing random-matrix-theory two-armed-bandit deviation-testing bioequivalence sequential-analysis stochastic-differential-equations

arXiv · PDF

Problem Formulation

Motivation: Standard high-dimensional hypothesis tests examine equality of means (μ = μ₀), but this is too restrictive for practical applications like biosimilar testing where we need to prove similarity within a tolerance. Such equality tests provide no information about the magnitude of differences.

Mathematical setup: Given high-dimensional data {xₜ ∈ ℝⁿ : t = 1,…,T} with population mean μ, where both n and T grow to infinity such that lim_{T→∞} n/T = c₀ ∈ [0,∞). The data model is:

\[x_t = \mu + \Gamma y_t\]

where ΓΓ’ = Σ ∈ ℝⁿˣⁿ with

₂,

≤ O(1) and lim inf_{T→∞} (1/T)Tr(Σ) > 0. The innovation vectors yₜ = (y₁,ₜ,…,y_{m,t})’ have independent components with E[y_{j,t}] = 0, E[y²{j,t}] = 1, and sup{j,t}

E[y⁴_{j,t}] - 3

< ∞.

Assumptions:

Both sample size T and dimension n grow proportionally
Bounded moments and covariance structure as specified above

Toy example: When n = 2, μ = (0.5, 0.5)’, Σ = I₂, and d₀ = 0.8, we test whether

₂ = √0.5 ≈ 0.71 is less than 0.8 (accept H₁) or greater than 0.8 (reject H₀). The traditional test would ask if μ = 0 exactly, which is clearly false and uninformative.

Formal objective: Test the deviation hypothesis
\[H_0: ||\mu - \mu_0||_2 \geq d_0 \text{ versus } H_1: ||\mu - \mu_0||_2 < d_0\]
for prespecified tolerance d₀ > 0.

Method

The method constructs a two-armed bandit (TAB) test statistic using a sequential feedback mechanism.

Split data into T₁ and T₂ samples where lim_{T→∞} T₁/T = c₁ ∈ (0,1)
Define cross-products:
\[X_t = \frac{1}{T_1} \sum_{s=1}^{T_1} (x_s' x_{t+T_1} - d_0^2), \quad t = 1,...,T_2\]
Compute sample statistics:
\[\hat{\tau}_1 := \frac{1}{T_2} \sum_{t=1}^{T_2} X_t, \quad \hat{\sigma}_1^2 := \frac{1}{T_2} \sum_{t=1}^{T_2} X_t^2 - \hat{\tau}_1^2\]
Build the TAB process sequentially with adaptive weights θₜ:
\[M_{t,T_2}(\vec{\theta}_t) := \frac{1}{T_2} \sum_{s=1}^t \theta_s X_s + \frac{1}{\sqrt{T_2}} \sum_{s=1}^t \frac{\theta_s X_s}{\sqrt{\hat{\tau}_1^2 + \hat{\sigma}_1^2}}\]
where θ₁ = 1 and for t ≥ 2:
\[\theta_t = \begin{cases} 1 & \text{if } M_{t-1,T_2}(\vec{\theta}_{t-1}) \leq 0 \\ -1 & \text{if } M_{t-1,T_2}(\vec{\theta}_{t-1}) > 0 \end{cases}\]
Reject H₀ if M_{T_2,T_2}(\vec{\theta}_{T_2}) > z_{α/2}

Toy example application: With n = 2, the cross-products X_t capture the deviation magnitude through inner products. The adaptive weights θₜ create negative feedback under H₀ (statistic concentrates near 0) and positive feedback under H₁ (statistic diverges), distinguishing the hypotheses.

Novelty & Lineage

Extends the univariate TAB framework from [6] to high-dimensional settings. The key novelty is handling the correlation between the test statistic M_{t-1,T_2} and the innovation X_t, which doesn’t exist in the univariate case where innovations are independent of the current process value. Prior work includes classical high-dimensional mean tests [1,4,3,20,14] for equality testing, and the original TAB approach [6] for univariate deviation testing. This is the first systematic extension to high-dimensional deviation testing using feedback processes.

SIGNIFICANT

Proof Techniques

The main challenge is proving that M_{T_2,T_2}(\vec{\theta}{T_2}) converges to the bandit distribution B(-κ{1,T_2}) despite correlations.

Consistency of estimators: Show
\[E[(\hat{\tau}_1 - \tau_1)^4] \leq O(T^{-2}), \quad E[(\hat{\sigma}_1^2 - \sigma_1^2)^2] \leq O(T^{-1})\]
using moment bounds and Minkowski inequality.

Correlation control: The key technical innovation handles E[X_t

ℱ_{t-1}] where ℱ_{t-1} = σ(X_1,…,X_{t-1}). Show the conditional expectation concentrates around its unconditional value:

\[\text{Var}(E[X_t | ℱ_{t-1}]) \leq O(T^{-1})\]

Smooth approximation: Replace the non-smooth indicator

≤ a with smooth function φ_{a,ε}(y) = Φ((a-y)/ε) - Φ(-(a+y)/ε) and show convergence via:

\[\lim_{ε↓0} \sup_{y∈ℝ} |φ_{a,ε}(y) - \mathbf{1}_{|y|≤a}| = 0\]

SDE connection: Connect to stochastic differential equation
\[dY_s^{x,t} = α \text{sign}(Y_s^{x,t})ds + βdB_s\]
whose solution has the bandit distribution density.
Taylor expansion with controlled remainders: Use Itô’s formula and Dynkin’s formula to handle the sequential construction, controlling remainder terms to O(T^{-4/3}).

Experiments & Validation

Simulation experiments with significance level α = 0.05, data x_t ~ N(μ, Σ) where Σ = [0.5^{

i-j

}] and μ = (n^{-1/2},…,n^{-1/2}) giving

₂ = 1. Tests various (n,T) combinations from (100,200) to (600,1200) with d₀ ranging from 0.5 to 1.5. Results confirm theoretical predictions: empirical rejection rates approach 1 when d₀ <

₂ and stay below α when d₀ >

₂.

Real data analysis on intestinal microbiota dataset (1006 samples, 130 phylogenetic groups) comparing younger (under 35, n=303) vs elder (35-50, n=313) groups. Found

μ₁ - μ₂

₂ ≈ 1.5 at significance level 0.05, providing concrete magnitude information missing from standard equality tests.

Limitations & Open Problems

Requires proportional growth n/T → c₀ - NATURAL (standard in high-dimensional statistics)
Bounded fourth moments and spectral norm conditions - NATURAL (common regularity conditions)
Data splitting reduces effective sample size by factor T₁/T - TECHNICAL (needed for construction but potentially improvable)
Focus on L₂ norm only - RESTRICTIVE (other norms like L∞ may be more appropriate for some applications)
Two-sample extension requires equal subsample sizes N₀ - TECHNICAL (balancing constraint likely removable)

Open problems:
Extend to other norms (L∞, L₁) for different notions of similarity
Develop adaptive sample splitting strategies to optimize power while maintaining size control

Linear-Quadratic Gaussian Games with Distributed Sparse Estimation

Authors: Tianyu Qiu, Filippos Fotiadis, Xinjie Liu, Christian Ellis et al. (8 authors) · Institution: University of Texas at Austin, DEVCOM Army Research Laboratory · Category: eess.SY

Introduces sparse distributed estimation for LQG games using group lasso optimization with theoretical guarantees on estimation quality degradation bounds.

Tags: multi-agent systems linear quadratic gaussian games sparse estimation distributed control sensor selection group lasso kalman filtering nash equilibrium

arXiv · PDF

Problem Formulation

Motivation: Multi-agent systems often operate under resource constraints where agents cannot afford to process all available observations when estimating system states. This is particularly relevant for robot swarms and distributed control systems where communication and computational resources are limited.

Mathematical setup: Consider an $N$-player discrete-time finite-horizon linear-quadratic Gaussian (LQG) game over time horizon $T$. Each agent $i \in [N]$ has state $x_t^i \in \mathbb{R}^{n_i}$ evolving according to:

\[x_{t+1}^i = A_t^i x_t^i + B_t^i u_t^i + w_t^i\]

where $w_t^i \sim \mathcal{N}(0, W_t^i)$ is process noise. The joint state is $x_t = [(x_t^1)^\top, \ldots, (x_t^N)^\top]^\top \in \mathbb{R}^n$.

Each agent $i$ observes:

\[y_t^i = C_t^i x_t + v_t^i\]

where $v_t^i \sim \mathcal{N}(0, V_t^i)$ is observation noise. Agent $i$ aims to minimize:

\[\min_{u_i} \sum_{t=1}^T \frac{1}{2} \mathbb{E}[(x_t^\top Q_t^i + 2(q_t^i)^\top)x_t + (u_t^\top R_t^i + 2(r_t^i)^\top)u_t]\]

Assumptions:

All agents know system matrices $(A_t, B_t, C_t^i, W_t, V_t)$ and Nash gains $(\Gamma_t, \alpha_t)$
Individual observations $y_t^i$ and estimates $\hat{x}_t^i$ are not shared between agents
Each observation $y_t^i$ can be partitioned into $P_i$ sensor groups

Toy example: Consider $N=2$ agents with scalar states, where agent 1 observes both agents ($C_1 = I_2$) but wants to sparsely select which agent to observe. With noise variances $V_1 = I_2$, the optimal Kalman gain would use both observations, but sparse estimation might zero out the gain for agent 2’s state if the regularization is high enough.

Formal objective: Design sparse estimation gains $K_t^i$ that solve:
\[\min_{K_t^i} \|K_t^i(C_t^i \Sigma_t^{i-} (C_t^i)^\top + V_t^i) - \Sigma_t^{i-} (C_t^i)^\top\|_F^2 + \sum_{\rho \in [P_i]} \lambda_t^{i\rho} \|K_t^i[\rho]\|_F\]

Method

The method consists of three main components:

Distributed state estimation: Each agent $i$ updates its state estimate using:
\[\hat{x}_{t+1}^{i-} = A_t \hat{x}_t^i + B_t \hat{u}_t^i\] \[\hat{x}_{t+1}^i = \hat{x}_{t+1}^{i-} + K_{t+1}^i (y_{t+1}^i - C_{t+1}^i \hat{x}_{t+1}^{i-})\]
Group lasso sparse estimation: For each agent $i$, solve the convex optimization:
\[\min_{K_t^i} \|K_t^i(C_t^i \Sigma_t^{i-} (C_t^i)^\top + V_t^i) - \Sigma_t^{i-} (C_t^i)^\top\|_F^2 + \sum_{\rho=1}^{P_i} \lambda_t^{i\rho} \|K_t^i[\rho]\|_F\]
Reset mechanism: After solving the group lasso, apply thresholding:
\[K_t^i[\rho] = \begin{cases}\] \[K_t^{i*}[\rho] & \text{if } \|K_t^i[\rho]\|_F \geq r_{th} \|K_t^{i*}[\rho]\|_F \\\] \[0 & \text{otherwise}\] \[\end{cases}\]
Control-adaptive regularization: Set regularization as:
\[\lambda_{t+1}^{ij} = \begin{cases}\] \[\frac{L_1}{\|\Gamma_t^i[j]\|_F} & \text{if } \|\Gamma_t^i[j]\|_F \neq 0 \\\] \[L_2 & \text{otherwise}\] \[\end{cases}\]
Application to toy example: For the 2-agent scalar case, if $\Gamma_1[2] = 0.1$ (weak coupling), then $\lambda^{12} = L_1/0.1 = 10L_1$ would be high, promoting sparsity. If the resulting $\lvert K_1[2] \rvert_F < r_{th} \lvert K_1^*[2] \rvert_F$, then agent 1 would zero out observations of agent 2.

Novelty & Lineage

This work extends classical LQG game theory to incorporate sparse estimation. Prior work on LQG games (Shapley 1953, Gupta et al. 2014) focused on information structures but not sparsity. Recent works on sparse games (Liu et al. 2024, Chahine et al. 2023) considered deterministic settings without estimation uncertainty.

The key novelties are:

First formulation of sparse estimation in LQG games using group lasso
Distributed estimation framework where agents don’t share observations
Theoretical guarantee on reset mechanism preventing unbounded estimation degradation
Control-adaptive regularization that ties sparsity decisions to game-theoretic coupling

This combines sparse estimation techniques from sensor selection literature with multi-agent game theory in a novel way.

SIGNIFICANT

Proof Techniques

The main theoretical result (Theorem 1) provides sufficient conditions for the reset mechanism to trigger. The proof strategy involves:

First-order optimality analysis: For the group lasso problem, when $K_t^i[\rho] \neq 0$ for all $\rho$, the first-order condition gives:
\[2(K_t^i - K_t^{i*})(\Sigma_t^{i-} + V_t^i)^2 = -[\frac{\lambda_t^{i1} K_t^i[1]}{\|K_t^i[1]\|_F}, \ldots, \frac{\lambda_t^{iP_i} K_t^i[P_i]}{\|K_t^i[P_i]\|_F}]\]
Perturbation bound: This yields $K_t^i = K_t^{i*} + K_\lambda$ where:
\[\|K_\lambda[\rho]\|_F \leq \frac{\sqrt{P_i} \bar{\lambda}_t^i}{2[\sigma_{min}(\Sigma_t^{i-} + V_t^i)]^2}\]
Reverse triangle inequality: Using $K_t^i[\rho] = K_t^{i*}[\rho] + K_\lambda[\rho]$:
\[\|K_t^i[\rho]\|_F \geq \|K_t^{i*}[\rho]\|_F - \frac{\sqrt{P_i} \bar{\lambda}_t^i}{2[\sigma_{min}(\Sigma_t^{i-}) + \sigma_{min}(V_t^i)]^2}\]
Lower bound on optimal gain: Key insight is that:
\[\|K_t^{i*}[\rho]\|_F \geq \frac{\sigma_{min}(\Sigma_t^{i-})}{\sigma_{max}(\Sigma_t^{i-}) + \sigma_{max}(V_t^i)}\]
Sufficient condition derivation: Combining these bounds with the reset condition $\lvert K_t^i[\rho] \rvert_F \geq r_{th} \lvert K_t^{i*}[\rho] \rvert_F$ yields the final sufficient condition.
Contradiction argument: The proof shows that if the sufficient condition holds, then $K_t^i[\rho] = 0$ cannot be optimal, ensuring reset occurs.

Experiments & Validation

The paper validates the approach on a three-robot formation game simulation. The setup includes:

System: Double-integrator dynamics with 4D state (2D position + velocity) per robot, discretized with $\Delta t = 0.05$s over $T = 150$ timesteps.

Baselines: Compares control-adaptive regularization against static regularization levels $\lambda = 50$ and $\lambda = 1000$.

Metrics:

Sensor usage patterns (which robots observe which others)
Individual estimation covariance evolution
Trajectory tracking performance
Communication resource consumption

Key results: Control-adaptive regularization achieves 60-80% reduction in sensor usage while maintaining trajectory performance within 5% of the full-observation case. Static high regularization ($\lambda = 1000$) saves more communication but degrades tracking performance significantly.

The formation game shows the leader robot (R1) can effectively track its reference path using primarily self-measurements, while follower robots (R2, R3) selectively observe the leader more frequently than each other, reflecting the game’s coupling structure.

Limitations & Open Problems

Limitations:

TECHNICAL: Linear-quadratic structure required - the group lasso formulation and theoretical guarantees depend on the quadratic estimation objective and linear dynamics.
TECHNICAL: Finite horizon assumption - the analysis is restricted to finite-time games, though many applications require infinite horizon.
RESTRICTIVE: Individual observation assumption - each agent can only use its own sensors, which may be overly restrictive in some distributed systems where partial information sharing is feasible.
TECHNICAL: Gaussian noise assumption - the optimality of Kalman filtering and the specific covariance bounds rely on Gaussian process and observation noise.
RESTRICTIVE: Known system parameters - assumes all agents know the system matrices and Nash equilibrium gains, which requires significant coordination.

Open problems:
Extension to nonlinear dynamics and non-quadratic objectives using iterative LQ approximations
Online learning of the regularization parameters $\lambda_t^{i\rho}$ without knowing the Nash gains a priori

A Convergence-Guaranteed Algorithm for Stochastic Optimal Control Problems

Authors: Mohsen Amidzadeh · Institution: Aalto University · Category: math.OC

Develops a convergent iterative algorithm for stochastic optimal control that replaces computationally expensive backward stochastic differential equation simulation with forward Malliavin derivative computation.

Tags: stochastic optimal control Malliavin calculus stochastic maximum principle backward stochastic differential equations gradient projection methods computational finance stochastic differential equations sensitivity analysis

arXiv · PDF

Problem Formulation

Motivation: Stochastic optimal control problems (SOCPs) are fundamental in sequential decision-making across finance, game theory, and filtering. Existing iterative algorithms based on the stochastic maximum principle require solving backward stochastic differential equations (BSDEs) that must be adapted to forward filtrations while satisfying terminal conditions, creating computational bottlenecks and curse of dimensionality issues.
Mathematical setup: Consider filtered probability space $(\Omega, \mathcal{F}, P)$ with $l$-dimensional Wiener process $W = {w_t^l}_{l=1}^d$ and natural filtration $\mathcal{F}_t$. The controlled SDE is:
\[dx_t^u = a(x_t^u, u_t) dt + \sum_{i=1}^l b_i(x_t^u, u_t) dw_t^i\]
with initial condition $x_0$ given, where $x_t^u \in \mathbb{R}^n$, $u_t \in U \subset \mathbb{R}^k$ is $\mathcal{F}_t$-adapted control, $U$ is open convex, $a: \mathbb{R}^n \times U \to \mathbb{R}^n$ is drift, and $b_i: \mathbb{R}^n \times U \to \mathbb{R}^n$ are diffusion functions. The cost functional is:
\[J(u, x_0) = E\left[\int_0^T L(x_t^u, u_t, t) dt + h(x_T^u)\right]\]
Assumptions:
1. Functions $a, b_i, L, h$ are measurable with bounded partial derivatives
2. $U$ is nonempty convex set with square-integrable elements
3. All processes satisfy standard regularity conditions for Malliavin calculus
Toy example: When $n = k = 1$, $l = 1$, consider Black-Scholes control:
\[dx_t = u_t x_t dt + \sigma x_t dw_t\]
with cost $J = \frac{1}{2}\int_0^T E[(x_t - x_t^*)^2] dt + \frac{1}{2}\int_0^T u_t^2 dt$. The Malliavin derivative becomes $D_s x_t = b(x_s, s) \exp(\int_s^t [a_x(\tau) - \frac{1}{2}b_x(\tau)^2] d\tau + \int_s^t b_x(\tau) dw_\tau)$.
Formal objective: Find optimal control $u^*$ solving:
\[\max_{u \in U} J(u, x_0) \text{ subject to the controlled SDE}\]

Method

The method replaces backward BSDE simulation with forward Malliavin derivative computation:

Derive stochastic maximum principle using Malliavin calculus instead of adjoint processes
For scalar case ($n = k = 1$), the variation becomes:
\[\delta_v J(u, x_0) = E\left[\int_0^T \left(\frac{a_u(s) - b_x(s)b_u(s)}{b(s)}\int_s^T L_x(t)D_s x_t dt + \frac{b_u(s)}{b(s)}\int_s^T L_{xx}(t)(D_s x_t)^2 dt + L_u(s)\right) v_s ds\right]\]
For vector case, the gradient is:
\[\nabla_u J(u, x_0)|_s = E\left[\left(\int_s^T \nabla_x L(t)^T \Gamma_{s,t} dt\right)\left(J_u a(s) - \sum_{l,l'} q_{l,l'} J_x b_l(s) J_u b_{l'}(s)\right) + \sum_{l=1}^d \left(\int_s^T D_s^l x_t^T \nabla_x^2 L(t) \Gamma_{s,t} dt\right) J_u b_l(s) + \nabla_u L(s)^T\right]\]
Compute stochastic flow $\Gamma_{s,t}$ via forward SDE:
\[d\Gamma_{r,t} = \left(\sum_{l=1}^d J_x b_l(x_t, u_t) dw_t^l + J_x a(x_t, u_t) dt\right) \Gamma_{r,t}\]
with $\Gamma_{r,r} = I$
Update control via gradient projection:
\[u^{i+1} \leftarrow \mathcal{P}(u^i + \lambda \nabla_u J(u^i, x_0))\]
Toy example application: For the Black-Scholes case, the algorithm computes $D_s x_t = \sigma x_s \exp(\int_s^t [\sigma u_\tau - \frac{1}{2}\sigma^2] d\tau + \int_s^t \sigma dw_\tau)$ forward in time, avoiding backward BSDE simulation entirely.

Novelty & Lineage

This extends Meyer-Brandis et al. (2012) who derived Malliavin-based SMP for scalar SOCPs. Key differences:

generalizes to vector SOCPs
provides convergent iterative algorithm, not just optimality conditions
concrete computational implementation. Prior gradient methods (Gong et al. 2017, Archibald et al. 2020) required backward BSDE simulation or approximations. The Malliavin approach for sensitivity analysis traces to Gobet & Munos
but this is first convergent algorithm avoiding adjoint processes.

SIGNIFICANT

Proof Techniques

Main proof strategy uses variational analysis with Malliavin calculus integration-by-parts:

Perturbation analysis: For control perturbation $u + \epsilon v$, define variational process $\delta x_t = \lim_{\epsilon \to 0} \frac{1}{\epsilon}(x^{u+\epsilon v}_t - x^u_t)$ satisfying:
\[d\delta x_t = (J_x a(x_t, u_t)\delta x_t + J_u a(x_t, u_t)v_t)dt + \sum_l (J_x b_l(x_t, u_t)\delta x_t + J_u b_l(x_t, u_t)v_t)dw_t^l\]
Key technical insight - variation of constants solution: Express $\delta x_t$ using stochastic flows rather than adjoint processes:
\[\delta x_t = \int_0^t \Gamma_{s,t}(J_u a(s) - \sum_{l,l'} q_{l,l'} J_x b_l(s) J_u b_{l'}(s))v_s ds + \int_0^t \sum_l \Gamma_{s,t} J_u b_l(s) v_s dw_s^l\]
Integration-by-parts formula application: For vector case, use multivariate Malliavin integration-by-parts:
\[E\left[F(x_t)^T \sum_l \int_0^t g_s^l dw_s^l\right] = \sum_l \int_0^t E[D_s^l x_t^T J_x F(x_t)^T g_s^l] ds\]
Convergence analysis: Under Lipschitz continuity with constant $L$ and uniform monotonicity with rate $r$, choosing $0 < 1 - 2r\lambda + (1+2L)\lambda^2 < \delta^2$ ensures:
\[\|u^* - u^i\| \sim O(\Delta t)\]
as $i \to \infty$

The crucial technical innovation is replacing backward adaptation requirements with forward Malliavin derivative computation.

Experiments & Validation

Experiments on scalar and vector SOCPs:

Scalar experiments:

Black-Scholes type: analytical solution available, all methods achieve similar performance, control error $E_c = 1.7 \times 10^{-5}$ for Mal-GPro vs $2.2 \times 10^{-5}$ for Ad-GPro
Nonlinear diffusion: $dx_t = u_t x_t dt + \sigma\sqrt{1+x_t^2} dw_t$, Mal-GPro matches Ad-GPro while Ad-SGD fails

Vector experiments:
3D controlled process: analytical benchmarks available, Mal-GPro achieves $6.5 \times 10^{-4}$ control error vs $8.2 \times 10^{-4}$ for Ad-GPro
Linear-Quadratic regulator (10D): compared against analytical Riccati solution, Mal-GPro achieves $8.7 \times 10^{-4}$ vs $3.1 \times 10^{-3}$ for Ad-SGD

Computational efficiency: Runtime per iteration - Mal-GPro: 0.3-0.8s, Ad-SGD: 0.1s, Ad-GPro: 11-27s. Mal-GPro balances accuracy and efficiency.

Baselines: Ad-GPro (neural network parameterized adjoint), Ad-SGD (single-sample backward approximation)

Limitations & Open Problems

Limitations:

Requires bounded partial derivatives for drift and diffusion functions - TECHNICAL (standard regularity for Malliavin calculus, likely improvable)
Assumes open convex control constraint set $U$ - NATURAL (standard in continuous optimization)
Square-integrability requirements for control processes - NATURAL (standard in stochastic control theory)
Forward simulator accuracy affects Malliavin derivative computation - TECHNICAL (inherent to numerical SDE methods)
Convergence rate $O(\Delta t)$ depends on discretization step size - TECHNICAL (could potentially be improved with higher-order schemes)
No analysis for problems with state constraints or jump processes - RESTRICTIVE (significant limitation for many applications)

Open problems:
Extension to SOCPs with state constraints while maintaining computational advantages
Analysis of method performance under model misspecification or when regularity conditions are violated

Second order necessary conditions for quantum stochastic optimal control problems

Authors: Penghui Wang, Shan Wang · Institution: Shandong University, Hebei Normal University · Category: math.OC

Establishes second-order necessary optimality conditions for quantum stochastic control problems driven by fermionic Brownian motion using variational methods and the parity operator.

Tags: quantum stochastic control second-order optimality fermion Brownian motion quantum optimal control variational methods quantum probability stochastic differential equations

arXiv · PDF

Problem Formulation

Motivation: Second-order necessary conditions are crucial for distinguishing true optimal solutions from candidates that merely satisfy first-order conditions in stochastic optimal control. This problem becomes particularly complex in quantum stochastic systems driven by fermionic noise, which model quantum optical systems.
Mathematical setup: Let $(Λ(H), C, m)$ be a quantum probability space where $Λ(H)$ is the anti-symmetric Fock space over $H = L^2(ℝ_+)$. The fermion Brownian motion is:
\[W(t) := Ψ(χ_{[0,t]}) = A^*(χ_{[0,t]}) + A(Jχ_{[0,t]})\]
Consider the quantum stochastic control system:
\[dx(t) = D(t,x(t),u(t))dt + F(t,x(t),u(t))dW(t) + dW(t)G(t,x(t),u(t))\] \[x(t_0) = x_0\]
with cost functional:
\[J(u(\cdot)) = \int_{t_0}^T L(t,x(t),u(t))dt + g(x(T))\]
Assumptions:
1. $D,F,G: [t_0,T] × L^2(C) × U → L^2(C)$ are adapted and Lipschitz continuous
2. $L,g$ are measurable and Lipschitz continuous
3. $D,F,G$ are twice Fréchet differentiable with bounded derivatives
4. $L,g$ are twice Fréchet differentiable with bounded derivatives
Toy example: When $U = ℝ$, $D(t,x,u) = ax + bu$, $F(t,x,u) = cx$, $G(t,x,u) = 0$, and $L(t,x,u) = \frac{1}{2}(x^2 + u^2)$, the system reduces to a linear-quadratic control problem in the quantum setting where the state evolution involves both classical drift and quantum diffusion terms.
Formal objective: Find the second-order necessary condition that any optimal control $\bar{u}(\cdot) ∈ U^β[t_0,T]$ must satisfy:
\[\bar{u}(\cdot) = \arg\inf_{u(\cdot) ∈ U^β[t_0,T]} J(u(\cdot))\]

Method

The method employs a variational approach analogous to classical stochastic control theory, adapted for quantum systems.

Construct first and second-order variational equations for perturbations $δu(\cdot) = u(\cdot) - \bar{u}(\cdot)$
Introduce the adjoint equations:
\[dy(t) = -\{D_x(t)^*y(t) + (F_x(t) + ΥG_x(t))^*Y(t) - L_x(t)\}dt + Y(t)dW(t)\] \[y(T) = -g_x(x(T))\]
Define the second-order adjoint equation:
\[dP(t) = -\{D_x(t)^*P(t) + P(t)D_x(t) + (F_x(t) + ΥG_x(t))^*Q(t)Υ + \text{higher order terms} + H_{xx}(t)\}dt + Q(t)dW(t)\]
Apply the parity operator $Υ$ to handle non-commutativity between quantum operators
Use Taylor expansion of the cost functional and variational equations to derive:
\[δx(\cdot) = εx_1(\cdot) + \frac{ε^2}{2}x_2(\cdot) + o(ε^2)\]
Apply Fermion Itô’s formula to establish the main second-order condition

Applied to toy example: For the linear-quadratic case, $x_1(t)$ satisfies the linear equation with control perturbation, $x_2(t)$ captures quadratic effects, and the second-order condition becomes a quadratic form involving the Riccati-type operator $P(t)$.

Novelty & Lineage

This work extends classical second-order necessary conditions from stochastic optimal control (Mammadov-Bashirov 1997, Zhang-Zhang 2015-2017) to the quantum stochastic setting. Prior work by Tang (2010) and Lü-Zhang-Zhang (2021) established such conditions for classical stochastic systems with various constraints.

The key novelty is adapting these techniques to quantum stochastic differential equations driven by fermionic Brownian motion, which requires:

Introduction of the parity operator Υ to handle non-commutativity
Use of relaxed transposition solutions for second-order adjoint equations
Quantum Itô calculus (Fermion Itô’s formula)

The authors’ previous work (Wang-Wang 2024) established first-order conditions (maximum principle) for these systems. This paper provides the natural second-order extension.

INCREMENTAL

Proof Techniques

The proof follows a multi-step variational approach:

Establish variational estimates using quantum stochastic integrals:
\[\|\δx(\cdot)\|_{C_A([t_0,T];L^2(C))} \leq C\|\δu\|_{L^2(t_0,T;H)}\]
Construct the asymptotic expansion using mean value theorem and second-order Taylor expansion:
\[J(u_ε(\cdot)) - J(\bar{u}(\cdot)) = ε \text{(first order)} + \frac{ε^2}{2} \text{(second order)} + o(ε^2)\]
Key technical lemma establishes the crucial estimate:
\[\|\delta x_\varepsilon(\cdot) - \varepsilon x_1(\cdot) - \frac{\varepsilon^2}{2}x_2(\cdot)\|_{C_A([t_0,T];L^2(C))} = o(\varepsilon^2)\]
Apply Fermion Itô’s formula to relate state and adjoint processes:
\[\langle y(t), x_1(t) \rangle = \int_{t_0}^t \{\langle y(s), D_u(s)\delta u(s) \rangle + \langle Y(s), (F_u(s) + ΥG_u(s))\delta u(s) \rangle\} ds\]
Use the relaxed transposition solution property for the second-order adjoint equation to eliminate second-order state terms and derive the final inequality
The Gronwall inequality is repeatedly applied to control the growth of variational processes

The key technical insight is using the parity operator Υ to transform the non-commutative quantum setting into a form where classical variational techniques can be adapted.

Experiments & Validation

Purely theoretical. Empirical validation would require:

Numerical implementation of quantum stochastic differential equation solvers
Comparison with first-order conditions on specific quantum control problems
Verification that the second-order conditions effectively distinguish optimal from suboptimal controls in quantum optical systems
Computational study of the relaxed transposition solutions for the adjoint equations

Limitations & Open Problems

Limitations:

Requires $u(\cdot) \in U^4[t_0,T]$ (fourth moment boundedness) - TECHNICAL (needed for variational estimates)
Control set $U$ must be closed and convex - NATURAL (standard in optimal control)
Coefficients must be twice Fréchet differentiable with bounded derivatives - TECHNICAL (standard smoothness for second-order conditions)
Uses relaxed transposition solutions rather than classical solutions for second-order adjoint equations - TECHNICAL (due to infinite-dimensional setting)
Limited to fermionic Brownian motion, not general quantum noise - RESTRICTIVE (excludes bosonic systems)

Open problems:
Extend to quantum systems driven by bosonic Brownian motion or general quantum Lévy processes
Develop pointwise (rather than integral) second-order maximum principle for quantum stochastic control

Multi-Domain Causal Empirical Bayes Under Linear Mixing

Authors: Bohan Wu, Julius von Kügelgen, David M. Blei · Institution: Columbia University, ETH Zürich · Category: stat.ML

Develops an empirical Bayes f-modeling algorithm using causally-structured score matching and Tweedie’s formula for finite-sample estimation of latent causal variables from multi-domain interventional data.

Tags: causal representation learning empirical Bayes score matching Tweedie formula structural causal models multi-domain learning interventional data simultaneous inference

arXiv · PDF

Problem Formulation

Motivation: Causal representation learning (CRL) aims to recover low-dimensional causal latent variables from high-dimensional observations, which is crucial for understanding causal mechanisms in complex systems. However, while identifiability conditions have been extensively studied, the finite-sample estimation problem remains less explored.
Mathematical setup: Consider $M$ domains indexed by $e \in [M]$, each characterized by an action label $a_e \in {0,1}^{d_Z}$ indicating intervention targets. In domain $e$, we observe i.i.d. samples ${x_{ei}}_{i=1}^{N_e}$ where $x_{ei} \in \mathbb{R}^{d_X}$ are drawn from unknown population distribution $p^*(x\lvert a_e)$.

Assumption 1: Causal empirical Bayes structure exists such that for all $a \in \mathcal{A}$:
\[p^*(x|a) = \int p^*(x|z)p^*(z|a)dz\]
Assumption 2: Linear Gaussian measurement model with latent $z_{ei} \sim p^*(z\lvert a_e)$ and
\[x_{ei} = A^*z_{ei} + \epsilon_{ei}, \quad \epsilon_{ei} \sim N(0,\sigma^2_* I_{d_X})\]
Assumption 3: Orthogonality condition $(A^*)^T A^* = (D^*)^2$ for diagonal $D^*$.

Assumption 4: Interventional priors arise from sparse interventions on known acyclic SCM with targets $I(a_e) = {j : a_{ej} = 1}$.
Toy example: When $d_Z = 2$, $M = 3$ domains with interventions $a_1 = (1,0)$, $a_2 = (0,1)$, $a_3 = (0,0)$, and causal graph $z_1 \to z_2$. The measurement model becomes $x_{ei} = A^*z_{ei} + \epsilon_{ei}$ where $A^* \in \mathbb{R}^{100 \times 2}$ has orthogonal columns. Domain 1 intervenes on $z_1$, domain 2 on $z_2$, domain 3 is observational.
Formal objective: Estimate the posterior distributions
\[p^*(z|x_{ei}, a_e) \propto p^*(z|a_e)p^*(x_{ei}|z)\]
for all latent causal variables ${z_{ei}}_{e \in [M], i \in [N_e]}$ along with measurement parameters $(A^*, \sigma^2_*)$.

Method

The method develops an EM f-modeling algorithm that exploits Tweedie’s formula for empirical Bayes estimation:

Transform observations to normal means model: Compute $y_{ei} = O^T x_{ei}$ where $A = OD$ under orthogonality assumption, yielding
\[y_{ei} = Dz_{ei} + \zeta_{ei}, \quad \zeta_{ei} \sim N(0,\sigma^2 I_{d_Z})\]
Estimate causal score function via structured score matching: For each component $j$, solve
\[\hat{s}_j \in \arg\min_{s_j} \sum_{e=1}^M \sum_{i=1}^{N_e} \left[s_j(y_{eij}, y_{ei,\text{pa}(j)}, a_{ej})^2 + 2\partial_j s_j(y_{eij}, y_{ei,\text{pa}(j)}, a_{ej})\right]\]
Update latent variables using damped Tweedie formula:
\[\hat{z}_{ei} = \hat{D}^{-1}(y_{ei} + \eta\hat{\sigma}^2\hat{s}(y_{ei}, a_e))\]
Update second moments via second-order Tweedie:
\[\hat{z}^2_{eij} = \hat{z}_{eij}^2 + \hat{\sigma}^2\hat{D}_{jj}^{-2} + \hat{\sigma}^4\hat{D}_{jj}^{-2}\partial_j\hat{s}_j(y_j, y_{\text{pa}(j)}, a_e)\]
Update measurement parameters via MLE using SVD decomposition and analytical formulas.

Applied to toy example: With $z_1 \to z_2$ graph, score components are $s_1(y_1, a_1)$ and $s_2(y_1, y_2, a_2)$. The algorithm alternately estimates these sparse score functions and updates the $2 \times 2$ diagonal matrix $D$ and mixing matrix $O \in O_{100 \times 2}$.

Novelty & Lineage

This work extends empirical Bayes methodology to the multi-domain causal representation learning setting. Key novelty is connecting CRL to simultaneous inference via f-modeling rather than the typical g-modeling approaches used in iVAE (Khemakhem et al.), conditional VAEs, and other CRL methods like Brehmer et al. and von Kügelgen et al.

The causally-structured score matching approach is new, exploiting Markov factorization of the causal graph to enable decoupled estimation of score components. Prior CRL work focused primarily on identifiability (Ahuja et al., Squires et al., Buchholz et al.) rather than finite-sample estimation.

The orthogonal measurement model assumption connects to independent mechanism analysis (Gresele et al.) and principal component flows (Cunningham et al.), but the multi-domain interventional setting is novel.

SIGNIFICANT

Proof Techniques

The main technical contributions involve three key results:

Reduction to normal means model: Under orthogonality assumption $(A^*)^T A^* = (D^*)^2$, the linear measurement model transforms as
\[y_{ei} = O^T x_{ei} = Dz_{ei} + \zeta_{ei}\]
where $\zeta_{ei} \sim N(0,\sigma^2 I_{d_Z})$ enables direct application of Tweedie’s formula.
Causal score decomposition theorem (Theorem 5.1): Shows that the score matching objective decouples across components:
\[L_G(s) = \sum_{e=1}^M \sum_{j=1}^{d_Z} E_{y \sim f_{a_e}}\left[s_j(y_j, y_{\text{pa}(j)}, a_{ej})^2 + 2\partial_j s_j(y_j, y_{\text{pa}(j)}, a_{ej})\right]\]
Proof uses Hyvärinen-Dayan score matching identity:
\[L_G(s) = \sum_{e=1}^M E_{y \sim f_{a_e}}\left[\|s(y,a_e)\|_2^2 + 2\text{tr}(\nabla_y s(y,a_e))\right] + \text{const}\]
Sparse intervention approximation: Shows that in large-sample regime, the score function satisfies causal invariance constraint where $f_{a_e}(y_j\lvert y_{\text{pa}(j)})$ depends on environment only through local intervention indicator $a_{ej}$, justified by posterior concentration around empirical distribution of denoised estimates.

The EM convergence analysis relies on standard variational inference lower bounds and the concavity of the Gaussian likelihood in the natural parameters.

Experiments & Validation

Synthetic data experiments on chain graph $z_1 \to z_2 \to z_3 \to z_4$ with $d_Z = 4$ latent variables and $d_X = 100$ observations. Structural equations use nonlinear mechanisms $z_j := \sum_{k \in \text{pa}(j)} w_{jk} g(z_k) + u_j$ with $g(z) = \tanh(3z) + (3z)^3$.

Interventions replace mechanisms with $z_j \sim N(10,1)$. Measurement model uses random orthonormal $A^* \in O_{100 \times 4}$ and noise $\sigma^2_* = 2$.

Key results across 40 runs:

True DAG: Relative MSE = 0.11, $\lvert A - A^* \rvert_F = 0.501$
No shrinkage (η=0): Relative MSE = 0.61, $\lvert A - A^* \rvert_F = 1.54$
Empty graph: Relative MSE = 0.422, $\lvert A - A^* \rvert_F = 1.12$
Pooled data: Relative MSE = 0.496, $\lvert A - A^* \rvert_F = 1.34$
PCA baseline: Relative MSE = 0.587, $\lvert A - A^* \rvert_F = 1.57$

Scaling shows $O(\sqrt{d_Z})$ growth in relative MSE and stabilization around $N_e \geq 200$ samples per environment.

Limitations & Open Problems

Orthogonality assumption $(A^*)^T A^* = (D^*)^2$ - TECHNICAL: Needed for reduction to normal means model but restrictive for many applications. Authors show relaxation to positive semi-definite case is possible.
Known causal graph and intervention targets - RESTRICTIVE: Significantly limits practical applicability since these are rarely known in real applications.
Linear measurement model - RESTRICTIVE: Many real-world observation processes are nonlinear, though authors note g-modeling extends more naturally to this case.
Binary intervention indicators $a_e \in {0,1}^{d_Z}$ - TECHNICAL: Could be extended to continuous intervention strengths or additive shifts as in single-cell literature.
Gaussian noise assumptions - NATURAL: Standard assumption that could potentially be relaxed using exponential family extensions.
Acyclic causal graph requirement - NATURAL: Standard assumption in causal inference, though cycles occur in some domains.

Open problems:
Develop score matching approaches that work with unknown causal graphs, potentially using the EB objective for simultaneous graph discovery and representation learning.
Extend f-modeling to nonlinear measurement models, possibly through neural score estimation or kernel methods that preserve the Tweedie formula structure.

Variational Kernel Design for Internal Noise: Gaussian Chaos Noise, Representation Compatibility, and Reliable Deep Learning

Authors: Ziran Liu · Institution: Fudan University, Shanghai Jiao Tong University · Category: cs.LG

This paper derives spatial noise mechanisms from maximum entropy principles and proves they preserve finite log-ratio geometry while hard masking becomes increasingly incompatible with coherent representations

Tags: regularization deep learning theory gaussian processes maximum entropy calibration representation learning stochastic regularization

arXiv · PDF

Problem Formulation

Motivation: Internal noise mechanisms in deep networks like dropout are typically chosen heuristically rather than derived from first principles. When representations become coherent and encode positive evidence (e.g., attention weights, post-ReLU activations), hard binary masking can destroy semantic structure through discontinuous deletions.
Mathematical setup: Let $U = {1,\ldots,H} \times {1,\ldots,W}$ be a spatial grid and $h: U \to (0,\infty)$ be a positive feature map. A noise mechanism is specified by a triple $N = (F, K, T)$ where $F$ is a law family on latent fields $\psi \in \mathbb{R}^U$, $K$ is a correlation kernel on $U \times U$, and $T$ is an injection operator. The Dirichlet Laplacian on $U$ is defined as:
\[L_U \phi(x) = \sum_{y:\{x,y\} \in E} c_{xy}(\phi(x) - \bar{\phi}(y))\]
where $\bar{\phi}$ extends $\phi$ by zero on the boundary $B = \bar{U} \setminus U$. The Dirichlet energy is:
\[E(\phi) = \frac{1}{2}\langle \phi, L_U \phi \rangle = \frac{1}{2}\sum_{\{x,y\} \in E} c_{xy}(\bar{\phi}(x) - \bar{\phi}(y))^2\]
Assumptions:
1. The feature map $h$ is strictly positive on its support
2. The grid $U$ is connected with positive edge weights $c_{xy} > 0$
3. Dirichlet boundary conditions make $L_U \succ 0$
Toy example: When $U = {(1,1), (1,2), (2,1), (2,2)}$ is a $2 \times 2$ grid with unit weights, $L_U$ is a $4 \times 4$ matrix. For a constant field $h(x) = 1$ everywhere, binary dropout with probability $q = 0.5$ zeros each pixel independently, potentially creating discontinuous patterns.
Formal objective: Find the noise mechanism that maximizes entropy subject to constraints:
\[\sup_{p \in A(Q,\varepsilon)} h(p) = \sup_{p \in A(Q,\varepsilon)} -\int_{\mathbb{R}^U} p(\psi) \log p(\psi) d\psi\]

Method

The method consists of three stages:

Variational Design: Solve the constrained entropy maximization over centered log-fields $\psi$ with quadratic energy budget:
\[p^*_{Q,\varepsilon} = N(0, \Sigma_{Q,\varepsilon}), \quad \Sigma_{Q,\varepsilon} = \frac{2\varepsilon}{n}Q^{-1}\]
Canonical Realization: Apply Wick-normalized exponential to get exact positive mean-one gate:
\[\xi^{ex}_\gamma(x) = \exp(\gamma\psi(x) - \frac{\gamma^2}{2}C(x,x))\]
Practical Implementation: Use sample-wise mean-one normalization for optimization:
\[\xi^{sw}_\gamma(x) = \frac{\exp(\gamma\psi(x))}{\frac{1}{|U|}\sum_{y \in U}\exp(\gamma\psi(y))}\]
Applied to the $2 \times 2$ toy example: Sample $\psi \sim N(0, (βL_U)^{-1})$ where $β = \lvert U \rvert/(2\varepsilon)$, then compute the gate values using the sample-wise formula. The Green kernel $G_U = L_U^{-1}$ determines spatial correlations, making nearby pixels more likely to have similar gate values than independent dropout.

Novelty & Lineage

This work introduces Variational Kernel Design (VKD) as a compositional framework for deriving noise mechanisms from learning desiderata rather than heuristics. The connection to maximum entropy under quadratic constraints is classical, but using it as an operator-level design principle for training-time noise is novel.

Key extensions beyond prior work:

Dropout (Srivastava et al., 2014) uses i.i.d. binary masks
Spatial methods like DropBlock (Ghiasi et al., 2018) impose hard rectangular occlusions
This work derives the noise geometry from first principles via Green kernels

The representation-compatibility analysis providing exact Gaussian control of log-ratio deformations and margin-sensitive ranking is entirely new. Prior noise injection methods lack this mathematical characterization of their geometric effects on positive semantic representations.

SIGNIFICANT

Proof Techniques

The main proof strategy has three components:

Quadratic MaxEnt Solution: Use Lagrange multipliers for the constrained optimization. The key insight is that the optimizer must be Gaussian with precision matrix proportional to the constraint operator:
\[\Sigma^{-1}_{Q,\varepsilon} = \frac{n}{2\varepsilon}Q\]
The uniqueness follows from the entropy gap identity:
\[h(p^*) - h(p) = KL(p \| p^*) \geq 0\]
Wick Normalization Theory: For the exact gate, multi-point moments follow from joint Gaussianity of the log-field:
\[E\left[\prod_{r=1}^m \xi^{ex}_\gamma(x_r)\right] = \exp\left(\gamma^2 \sum_{1 \leq a < b \leq m} C(x_a, x_b)\right)\]
Sample-wise Gate Analysis: The key technical insight is that sample-wise normalization differs from unnormalized exponential only by a spatially constant term in log-domain:
\[\log \tilde{h}(x) = \log h(x) + \gamma\psi(x) - c(\psi)\]
This makes pairwise log-ratios exactly Gaussian:
\[\Delta^{sw}_{xy}(h) = \gamma(\psi(x) - \psi(y)) \sim N(0, \gamma^2 R_G(x,y))\]
where $R_G(x,y) = G_U(x,x) + G_U(y,y) - 2G_U(x,y) \geq 0$.

Experiments & Validation

Datasets: ImageNet, ImageNet-C (7 selected corruptions), Oxford-IIIT Pets Architectures: ResNet-50, Swin-T
Baselines: Standard dropout, DropBlock, no regularization

Key Results:

ImageNet clean: GCh improves ECE from 4.1% to 2.8% (ResNet-50)
ImageNet-C shift: GCh improves both ECE (8.2% → 6.9%) and NLL while maintaining competitive accuracy
Late-stage injection: GCh maintains calibration benefits even when injected only in final layers, where hard masking degrades performance
Ablation studies confirm importance of spatial correlation, positivity, and injection depth

The experiments specifically test the theoretical prediction that GCh should excel in late-stage regimes where representations are coherent and encode positive evidence.

Limitations & Open Problems

RESTRICTIVE: Theory applies only to strictly positive feature maps, limiting applicability to post-ReLU activations, attention weights, or similar positive evidence representations
TECHNICAL: Dirichlet boundary conditions and rectangular grid assumption could likely be extended to other topologies and boundary treatments
NATURAL: Focus on spatial correlations rather than channel-wise dependencies is standard for spatial regularization methods
TECHNICAL: Sample-wise implementation breaks exact moment formulas of canonical Wick-normalized gate, though compatibility analysis still holds
RESTRICTIVE: Gaussian assumption on latent field excludes potentially useful non-Gaussian noise geometries

Open Problems:
- Extend VKD framework to non-Gaussian noise families (e.g., stable processes for heavy-tailed perturbations)
- Develop channel-wise or cross-layer correlation structures within the VKD framework

Online Learning for Supervisory Switching Control

Authors: Haoyuan Sun, Ali Jadbabaie · Institution: MIT · Category: math.OC

Achieves $\mathcal{O}(N \log N)$ sample complexity for supervisory switching control in partially-observed linear systems by using observability to safely explore potentially destabilizing controllers.

Tags: supervisory control switching systems online learning multi-armed bandits system identification adaptive control partial observability finite-time analysis

arXiv · PDF

Problem Formulation

Motivation: In many control applications like power systems and autonomous vehicles, no single controller works well across all operating conditions of an unknown plant. Supervisory switching control addresses this by selecting among candidate controllers, but classical estimator-based methods only guarantee asymptotic stability without quantitative finite-time bounds, while modern online learning methods require restrictive stability assumptions incompatible with testing potentially destabilizing controllers.

Mathematical setup: Consider a discrete-time partially-observed linear system with unknown true parameters $(C^*, A^*, B^*)$ belonging to a collection ${(C_i, A_i, B_i)}_{i=1}^N$. The system evolves as:

\[x_{t+1} = A^* x_t + B^* \bar{u}_t + w_t\] \[y_t = C^* x_t + \eta_t\]

where $x_t \in \mathbb{R}^{d_x}$, $\bar{u}_t \in \mathbb{R}^{d_u}$, $y_t \in \mathbb{R}^{d_y}$, with $x_0 = 0$, $w_t \sim \mathcal{N}(0, \sigma_w^2 I)$, $\eta_t \sim \mathcal{N}(0, \sigma_\eta^2 I)$. Each candidate model is paired with a controller, and applying controller $j$ to the true system creates closed-loop dynamics $M_{[i^*, j]} = (C_{[i^*, j]}, A_{[i^*, j]}, B_{[i^*, j]})$. The control input is $\bar{u}_t = u_t + K(p_t, \check{x}_t, y_t)$ where $u_t \sim \mathcal{N}(0, \sigma_u^2 I)$ provides excitation and $p_t \in [N]$ is the switching signal.

Assumptions:

For every $i$, the correctly matched closed-loop system $M_{[i,i]}$ is $\varepsilon_c$-strictly observable with index $\nu$: $\sigma_{\min}([C_{[i,i]} A_{[i,i]}^{\nu-1}; \ldots; C_{[i,i]} A_{[i,i]}; C_{[i,i]}]) \geq \varepsilon_c$
If $M_{[i,j]}$ is unstable, then $\rho(A_{[i,j]}) \geq 1 + \varepsilon_a$ for some $\varepsilon_a > 0$
The Markov parameters $G_{[i,j]} = [C_{[i,j]} A_{[i,j]}^{h-1} B_{[i,j]}, \ldots, C_{[i,j]} B_{[i,j]}]$ satisfy $\lvert G_{[i,j]} - G_{[i’,j]}\ \rvert_{op} \geq \gamma$ for $i \neq i’$

Toy example: When $N=2$, $d_x = d_u = d_y = 1$, and we have two candidate controllers where controller 1 stabilizes model 1 but destabilizes model 2, while controller 2 does the reverse. The core difficulty is safely exploring both controllers on the unknown true system without causing divergence, while distinguishing which model-controller pair matches the truth using only partial observations.

Formal objective: Find the matching controller index $i^*$ and achieve finite $L_2$-gain:
\[\sum_{t=1}^T \|x_t\|^2 \leq C_0 \left(\sum_{t=1}^{\tau L} (\|u_t\|^2 + \|w_t\|^2)\right) + C_1 \left(\sum_{t=\tau L+1}^T (\|u_t\|^2 + \|w_t\|^2)\right)\]

Method

The algorithm operates in episodes of fixed length $\tau$, using two evaluation criteria combined via $S = \frac{1}{2}(S^{(1)} + S^{(2)})$.

Criterion 1 (Instability Detection):

Estimate initial state: $\hat{x}_1 = O_{[j,j]}^\dagger [y_\nu, \ldots, y_1]^\top$
Compute predicted trajectory: $\hat{y}_t = C_{[j,j]} A_{[j,j]}^{t-1} \hat{x}_1$
Return 0 if $\lvert y_\tau - \hat{y}_\tau \rvert > \sqrt{2d_y \Theta \log(2d_y/\delta)}$, else return 1

Criterion 2 (System Identification):
Compute shifted outputs: $\tilde{y}_t = y_t - C_{[j,j]} A_{[j,j]}^{t-1} \hat{x}_1 - G_{[j,j]} z_t$
For each critical direction $(u_k, v_k)$, estimate via OLS:
\[\hat{\Delta}_k = \left(\sum_{t=\nu+h+1}^\tau (u_k^\top \tilde{y}_t)(v_k^\top z_t)\right) \left(\sum_{t=\nu+h+1}^\tau (v_k^\top z_t)^2\right)^{-1}\]
Return 0 if any $\lvert \hat{\Delta}_k \rvert > \gamma$, else return 1

UCB Selection: Choose controller $i_\ell = \arg\max_i \bar{s}_i(q) + \sqrt{\frac{a_\ell}{Q_i(\ell-1)}}$ where $\bar{s}_i(q)$ is average score and $Q_i(\ell)$ is usage count.

Toy example application: With $N=2$ controllers, the algorithm would first try both controllers once. If controller 1 is applied to a system where it’s destabilizing, Criterion 1 detects the explosive growth in $\lvert y_\tau - \hat{y}_\tau \rvert$ and returns 0. If controller 2 stabilizes but doesn’t match, Criterion 2 detects the mismatch via $\lvert \hat{\Delta}_1 \rvert > \gamma$ in the critical direction and returns 0. Only the matching controller achieves $S = 1$ consistently.

Novelty & Lineage

This work bridges classical supervisory control theory and modern online learning. Prior estimator-based supervisory control (Hespanha 2001, Morse 1996-1997) guarantees asymptotic stability but lacks finite-time bounds. Recent non-asymptotic system identification (Simchowitz et al. 2018, Oymak & Ozay 2019, Sarkar et al. 2021) requires stability assumptions. Multi-armed bandit approaches to control (Li et al. 2023, Kim & Lavaei 2025) need full state observation or stability certificates.

The key novelty is using system observability to decouple episode dependencies in partial observation settings, enabling safe exploration of potentially unstable controllers. The instability detection criterion leverages observability to reconstruct initial states and isolate explosive modes. The system identification criterion uses critical directions for dimension-free analysis. This achieves $\mathcal{O}(N \log N)$ sample complexity versus the authors’ prior $\mathcal{O}(\exp(N))$ result (Sun & Jadbabaie 2024).

SIGNIFICANT

Proof Techniques

The proof strategy combines three main components:

1. Observability-based state reconstruction: For the matching controller case, the key inequality is:

\[\|O_{[j,j]}^\dagger\|_{op} \leq 1/\varepsilon_c\]

This bounds the pseudo-inverse of the observability matrix, enabling accurate initial state estimation $\hat{x}_1$ from outputs. The estimation error satisfies:

\[x_1 - \hat{x}_1 \text{ is } \varepsilon_c^{-1}(\|\langle C_{[j,j]}, A_{[j,j]} \rangle\|_{H_\infty} \sigma_w^2 + \|\langle C_{[j,j]}, A_{[j,j]}, B_{[j,j]} \rangle\|_{H_\infty} \sigma_u^2 + \sigma_\eta^2)\text{-sub-Gaussian}\]

2. Explosive mode detection: For destabilizing controllers, the explosive eigenvalue $\lambda_{[i^*,j]}$ with $\lvert \lambda_{[i^*,j]} \rvert \geq 1 + \varepsilon_a$ creates growth:

\[\|C_{[i^*,j]} A_{[i^*,j]}^{\tau-\nu-2} q_{i^*}\| \geq \varepsilon_c \varepsilon_a (1 + \varepsilon_a)^{\tau-2\nu-2}\]

where $q_{i^*}$ is the corresponding eigenvector. This dominates the residual $y_\tau - \hat{y}_\tau$ with probability $\geq 2/5$.

3. UCB analysis with martingale concentration: The algorithm decouples episode dependencies by showing evaluation scores concentrate around their expectations independently of initial states. Using Azuma’s inequality for the martingale sequence of score differences:

\[\Pr\left(\frac{1}{n} \sum_{k=1}^n D_k \geq t\right) \leq 2\exp(-2nt^2)\]

where $D_k \in [0,1]$ are score differences. The key insight is that episodes can be analyzed independently due to the observability-based isolation of state effects.

4. Self-normalized martingale bounds: For the system identification criterion, applying the self-normalized bound (Proposition 4):

\[\left\|\sum_{t=1}^T z_t r_t\right\|^2_{(V + V_0)^{-1}} \leq 2\sigma_r^2 \log\left(\delta^{-1} \det(V + V_0)^{-1/2} \det(V_0)^{1/2}\right)\]

enables dimension-free concentration of the OLS estimates along critical directions.

Experiments & Validation

Purely theoretical. The paper provides no empirical validation.

Empirical validation would require:

Simulation on linear systems with known ground truth to verify the $\mathcal{O}(N \log N)$ sample complexity
Comparison against classical estimator-based supervisory control and other adaptive control baselines
Testing on realistic control applications like power systems or autonomous vehicles mentioned in the motivation
Sensitivity analysis of the method to violations of the strict observability and eigenvalue gap assumptions
Computational cost analysis of the observability matrix pseudo-inverse computations required at each episode.

Limitations & Open Problems

Limitations:

Strict observability assumption (every matched closed-loop system is $\varepsilon_c$-strictly observable) - TECHNICAL: Standard in system identification but may be violated if controllers introduce unobservable modes
Eigenvalue separation assumption (destabilizing controllers have spectral radius $\geq 1 + \varepsilon_a$) - TECHNICAL: Rules out marginally unstable controllers near the stability boundary
Markov parameter separation assumption ($\lvert G_{[i,j]} - G_{[i’,j]} \rvert_{op} \geq \gamma$) - RESTRICTIVE: Requires candidate models to be sufficiently different, excluding similar models
Known candidate model collection - NATURAL: Standard assumption in supervisory control, though restrictive for model-free settings
Linear system assumption - RESTRICTIVE: Many practical systems are nonlinear
Gaussian noise assumptions - NATURAL: Standard in stochastic control theory
Episode-based switching rather than continuous adaptation - TECHNICAL: Could potentially be relaxed with more sophisticated analysis

Open Problems:
Extend to nonlinear systems while maintaining finite-time guarantees and safe exploration of destabilizing controllers
Develop model-free versions that learn the candidate controller set online rather than assuming a pre-specified collection