The GLOSS algorithm¶
Overview¶
At round \(t\), a surrogate model \(\hat{f}_t\) is fit on the observed dataset \((X, y)\) and produces \(\mu(x)\) and (optionally) \(\sigma(x)\). Three streams then run in parallel on the same surrogate, each producing \(n_i\) candidates that together form the \(q = \sum_i n_i\) recommended batch.
Stream 1 — Global¶
Maximizes the upper-confidence-bound (UCB) acquisition
where \(s = +1\) for maximization (\(-1\) for minimization) and \(\kappa\) controls the exploration weight (default \(\kappa = 2\)). Picks are sorted by acquisition score and greedily accepted while enforcing a minimum pairwise distance \(r_{div}\), so UCB does not cluster all \(n_g\) picks on a single high-acquisition mode.
Stream 2 — Local¶
Refines inside a BallTree window around the current observed best \(x^{*}\). A naive implementation would scan every candidate in the window — \(\mathcal{O}(n)\) per round, becoming the bottleneck on \(n = 10^5\) pools.
The fix: restrict the scan to the top-\(K\) candidates by predicted \(\mu\), keeping the cost at \(\mathcal{O}(K)\) regardless of \(n\). The default \(K = 500\) preserves the recovered local optima relative to a full-pool scan, and gives roughly a 40× wall-clock speedup at \(n = 10^5\).
Stream 3 — Unexplored¶
Maximizes the geometric distance \(d(x, X_{\text{obs}})\) to the observed set, using either Euclidean (default) or Jaccard / Tanimoto distance (for binary fingerprints such as ECFP).
Important
This stream uses no surrogate signal. Its picks are determined entirely by the geometry of the search space and the locations of previously sampled points, making it robust to a poorly calibrated \(\hat{f}_t\).
This is the operational answer to the overfitting trap: it forces every round to deposit data in regions the surrogate has not yet seen, so the surrogate’s blind spots get filled even when its \(\mu/\sigma\) predictions are unreliable.
Why decomposition beats a single acquisition¶
A single acquisition function encodes one global trade-off between predicted mean and predicted uncertainty. When \(q\) points are picked greedily from such a function, they all satisfy the same trade-off and tend to cluster in the same high-acquisition region. In a multi-modal landscape this means a round of \(q\) evaluations contributes essentially no information outside the current basin.
GLOSS replaces this implicit single-criterion choice with an explicit budget split: by guaranteeing that some fraction of every batch satisfies a purely geometric criterion (Unexplored), the algorithm forces every round to contribute training data outside the basin the surrogate is currently fitting, regardless of what \(\hat{f}_t\) believes about that region.