of functional gradient descent that minimizes the KL divergence.12:01 PMClaude responded: Yes, but the fit depends on what you're trying to estimate.Yes, but the fit depends on what you're trying to estimate. Here's where SVGD makes sense for SWMM5 and where it doesn't.
Good fit: SWMM5 calibration as Bayesian parameter inference. Treat the target distribution as the posterior over calibration parameters (Manning's n, subcatchment width, % imperviousness, RTK triplets, infiltration params, conduit roughness) given observed flow/depth/volume. Run an ensemble of particles, where each particle is a full SWMM5 parameter vector, push them toward the high-posterior region while keeping them spread out. The kernel repulsion term is the selling point: unlike a single least-squares calibration, you get a population of equally-plausible parameter sets, which is exactly the equifinality problem stormwater modelers live with. SVGD gives you uncertainty bands on predictions almost for free.
Mechanics you'd need:
Each SVGD iteration requires the gradient of log-posterior w.r.t. parameters. SWMM5 has no analytic gradient, so you finite-difference it — N+1 engine runs per gradient per particle, or use a surrogate/emulator. With 20–50 particles and 10+ parameters this gets expensive fast. This is the real bottleneck, not the math.
Bound/transform parameters (log for roughness, logit for fractions) so particles don't wander into physically invalid space where the engine fails to converge.
The kernel bandwidth (median heuristic) controls the diversity-vs-accuracy tradeoff.
Where it's weaker: SVGD assumes a differentiable, reasonably smooth target. SWMM5's dynamic-wave output can be locally non-smooth (regime switches, surcharge onset, weir/orifice transitions), which makes finite-difference gradients noisy and can stall particles. For low-dimensional problems plain MCMC (DREAM, which the hydrology community already uses heavily) or even Latin Hypercube + GLUE may be more robust and is better-understood by reviewers.
Practical recommendation: build it as a surrogate-accelerated SVGD — train a fast emulator (Gaussian process or small neural net) on a design of SWMM5 runs, do SVGD on the differentiable emulator, then validate the resulting particle cloud against the real engine. That sidesteps the gradient cost and the non-smoothness, and it pairs naturally with the RL-for-SWMM work you've been doing. SVGD SWMM5 Calibration Explorer
Welcome
SVGD SWMM5 Calibration Explorer
Watch a cloud of particles — each a candidate SWMM5 parameter set
(Manning’s n vs. subcatchment width) — transported by Stein Variational Gradient
Descent toward the calibration posterior, while a repulsion term keeps them spread so you can
see equifinality.
What you can do:
Play the SVGD iteration and watch particles climb the posterior
Tune the kernel bandwidth to trade accuracy against diversity
Add an equifinality ridge to see particles spread, not collapse
Read uncertainty bands straight off the particle cloud
Export the final particle ensemble as CSV
1 of 6
SVGD SWMM5 Calibration Explorer
Stein Variational Gradient Descent over a 2-parameter posterior:
Manning’s n vs. subcatchment width. Particles transport toward the
target; a kernel repulsion term preserves diversity — the equifinality story made visible.
0 = median heuristic. Higher = stronger spread / more repulsion.
Higher σ → broader, flatter posterior → more uncertainty.
Couples n and width so many (n, width) pairs fit equally well — the classic
non-identifiability in rainfall-runoff calibration.
Iteration
0
Mean log-post
—
n — mean ± sd
—
width — mean ± sd
—
Parameter space — posterior + particle cloud
x: Manning’s n | y: subcatchment width (ft)
particle high posterior low posterior true value
The update being applied each iteration
xi ← xi + ε φ(xi),
φ(xi) = (1/N) Σj [ k(xj,xi) ∇xj log p(xj) + ∇xj k(xj,xi) ]
First term is an attraction toward high posterior density (the calibration gradient);
the second is a repulsion from the RBF kernel that stops the cloud collapsing to one mode. Together they
minimize KL divergence to the target. Here log p is a synthetic SWMM5-style
calibration surface — in practice you’d swap in a surrogate-accelerated SWMM5 likelihood.