Optimizing Computational-Statistical Runtime for Wasserstein Distance Estimation
Abstract
Squared Wasserstein distance is a frequently used tool to measure discrepancy between probability distributions. This distance is typically computed between empirical measures of size $n$ from two underlying random samples. Unfortunately, even in lower dimensional Euclidean space problems $\left( d \in \{2,3\} \right)$, algorithms for Wasserstein distance computation with approximate or exact precision guarantees scale poorly in the runtime as a function of $n$ and the desired precision. In resp...
Description / Details
Squared Wasserstein distance is a frequently used tool to measure discrepancy between probability distributions. This distance is typically computed between empirical measures of size from two underlying random samples. Unfortunately, even in lower dimensional Euclidean space problems , algorithms for Wasserstein distance computation with approximate or exact precision guarantees scale poorly in the runtime as a function of and the desired precision. In response, we consider the computational-statistical runtime, where the goal is to estimate from samples the Wasserstein distance between potentially smooth measures up to -additive error in expectation with respect to the sampling; we allow computational cost for collecting a sample. Towards this, we develop a Sample-Sketch-Solve paradigm where we introduce a regular cartesian grid sketch of the samples. We show that (especially under -Hölder smooth distributions) this can compress the data without increasing asymptotic error, and also regularizes the structure which enables faster exact algorithms. Ultimately, we approximate within error in time for Hölder smooth distributions on ; an optimal for when and nearly optimal as when .
Source: arXiv:2605.20122v1 - http://arxiv.org/abs/2605.20122v1 PDF: https://arxiv.org/pdf/2605.20122v1 Original Link: http://arxiv.org/abs/2605.20122v1
Please sign in to join the discussion.
No comments yet. Be the first to share your thoughts!
May 20, 2026
Data Science
Statistics
0