Stein Variational Gradient Descent (SVGD) for SWMM5

of functional gradient descent that minimizes the KL divergence.12:01 PMClaude responded: Yes, but the fit depends on what you're trying to estimate.Yes, but the fit depends on what you're trying to estimate. Here's where SVGD makes sense for SWMM5 and where it doesn't. Good fit: SWMM5 calibration as Bayesian parameter inference. Treat the target distribution as the posterior over calibration parameters (Manning's n, subcatchment width, % imperviousness, RTK triplets, infiltration params, conduit roughness) given observed flow/depth/volume. Run an ensemble of particles, where each particle is a full SWMM5 parameter vector, push them toward the high-posterior region while keeping them spread out. The kernel repulsion term is the selling point: unlike a single least-squares calibration, you get a population of equally-plausible parameter sets, which is exactly the equifinality problem stormwater modelers live with. SVGD gives you uncertainty bands on predictions almost for free. Mechanics you'd need: Each SVGD iteration requires the gradient of log-posterior w.r.t. parameters. SWMM5 has no analytic gradient, so you finite-difference it — N+1 engine runs per gradient per particle, or use a surrogate/emulator. With 20–50 particles and 10+ parameters this gets expensive fast. This is the real bottleneck, not the math. Bound/transform parameters (log for roughness, logit for fractions) so particles don't wander into physically invalid space where the engine fails to converge. The kernel bandwidth (median heuristic) controls the diversity-vs-accuracy tradeoff. Where it's weaker: SVGD assumes a differentiable, reasonably smooth target. SWMM5's dynamic-wave output can be locally non-smooth (regime switches, surcharge onset, weir/orifice transitions), which makes finite-difference gradients noisy and can stall particles. For low-dimensional problems plain MCMC (DREAM, which the hydrology community already uses heavily) or even Latin Hypercube + GLUE may be more robust and is better-understood by reviewers. Practical recommendation: build it as a surrogate-accelerated SVGD — train a fast emulator (Gaussian process or small neural net) on a design of SWMM5 runs, do SVGD on the differentiable emulator, then validate the resulting particle cloud against the real engine. That sidesteps the gradient cost and the non-smoothness, and it pairs naturally with the RL-for-SWMM work you've been doing.
SVGD SWMM5 Calibration Explorer

SVGD SWMM5 Calibration Explorer

Stein Variational Gradient Descent over a 2-parameter posterior: Manning’s n vs. subcatchment width. Particles transport toward the target; a kernel repulsion term preserves diversity — the equifinality story made visible.

0 = median heuristic. Higher = stronger spread / more repulsion.

Higher σ → broader, flatter posterior → more uncertainty.

Couples n and width so many (n, width) pairs fit equally well — the classic non-identifiability in rainfall-runoff calibration.

Iteration
0
Mean log-post
n — mean ± sd
width — mean ± sd

Parameter space — posterior + particle cloud

x: Manning’s n  |  y: subcatchment width (ft)
particle high posterior low posterior true value

The update being applied each iteration

xi ← xi + ε φ(xi),   φ(xi) = (1/N) Σj [ k(xj,xi) ∇xj log p(xj) + ∇xj k(xj,xi) ]

First term is an attraction toward high posterior density (the calibration gradient); the second is a repulsion from the RBF kernel that stops the cloud collapsing to one mode. Together they minimize KL divergence to the target. Here log p is a synthetic SWMM5-style calibration surface — in practice you’d swap in a surrogate-accelerated SWMM5 likelihood.

Educational toy: the posterior is a synthetic 2-parameter surface, not a live engine run. Production use needs a finite-difference or surrogate gradient of the real SWMM5 likelihood.