Benchmarks¶

The benchmark suite compares GLOSS against four baselines on three chemistry datasets covering the full spectrum of value sources.

Setup¶

Dataset	\(n\)	Source	Notes
Buchwald–Hartwig	3,955	Experimental yields	One-hot reaction features
QM9 HOMO–LUMO gap	100,000	DFT properties	20 RDKit descriptors
Arrhenius-2D	10,000	Virtual reaction surface	Three peak basins, exposes secondary-peak trap

Algorithms: GLOSS (4:2:2 ratio, \(q=8\)), UCB-BO (ablation isolating multi-stream architecture from acquisition choice), BO(EI), GA, Random.

Protocol: All algorithms share an identical RF surrogate (100 trees), initialize from 8 points drawn from the bottom 20% of each pool (no lucky-init confound), and run for 20 rounds × 5 seeds.

Headline result — QM9-100k¶

Algorithm	\(t_{95}\) (rounds)	Reach 95%
GLOSS (4:2:2)	7.2	5 / 5
UCB-BO	16.6	3 / 5
BO(EI)	18.4	2 / 5
GA	20.0	0 / 5
Random	20.0	0 / 5

→ 2.31× / 2.56× speedup over UCB-BO and BO(EI).

Scaling — GLOSS’s advantage grows with pool size¶

Sweeping QM9 pool size from \(n=5{,}000\) to \(n=100{,}000\):

GLOSS: \(t_{95}\) stays between 3.4 and 7.6 rounds across the entire sweep — no systematic trend with \(n\).
BO(EI): 6.8 → 18.4 rounds (degrades).
UCB-BO: 5.2 → 16.6 rounds (degrades).
Random: Never reaches 95% at any scale.

Note

A larger pool means the bottom-20% initialization sits, on average, farther from the global optimum, so the initial surrogate has high uncertainty over a larger fraction of the space. Without a dedicated exploration stream, BO concentrates each batch near whichever locally promising region its few initial points are closest to. GLOSS’s Unexplored stream keeps injecting geometrically diverse points whose acquisition is independent of the current surrogate.

Trajectory analysis (Arrhenius-2D, seed 61)¶

The Arrhenius surface has a primary peak at \((0.44, 0.45)\) with \(y_{\mathrm{opt}}=18.35\) and a wider secondary peak at \((0.83, 0.66)\) reaching \(\approx 74\%\) of the global value.

Algorithm	Peak % at round	Behavior
GLOSS	100% @ 11	Probes the secondary-peak region first, then locks onto the primary peak.
UCB-BO	63% @ 17	Trapped at the secondary peak from round 1 onward.
BO(EI)	75% @ 17	Same secondary-peak trap, marginally wider exploration.
GA	100% @ 19	Tournament selection wanders broadly; transitions basins around round 15.

Reproducing¶

git clone https://github.com/zbc0315/gloss.git
cd gloss
pip install -e ".[all]"
python -m benchmarks.bench_main --study all

Individual studies:

python -m benchmarks.bench_main --study main         # 3 datasets × 5 algorithms
python -m benchmarks.bench_main --study scaling      # QM9 n=5k → 100k
python -m benchmarks.bench_main --study complexity   # Arrhenius C1 → C5
python -m benchmarks.bench_main --study pilot        # quick smoke (1 seed, 10 rounds)