These tasks are examples, not hard-coded limits of CliffSearch. The evolutionary loop is task-agnostic:
to use CliffSearch on a custom problem, you supply a benchmark that is interfaced with the search runtime
and exposes the artifact contract plus the primary metric to optimize.
Reported losses are validation-set losses, averaged over the benchmark seeds and hyperparameter settings
defined for each task. For the evolved transformer hyper-connections and optimizer tasks, these benchmark
hyperparameters are fixed across nodes rather than tuned per node, so the absolute losses are typically
higher than what per-node retuning would achieve. When some benchmark runs fail, the aggregate uses
worst-successful-loss imputation for the missing runs. The benchmark is meant to support fair relative
comparison across nodes, not individually tuned final training runs.
Transformer Hyper-Connection Discovery
Evolve custom manifold hyper-connection modules inside a fixed transformer NanoGPT training stack with
4 hyper-connections on the Shakespeare dataset, evaluated with 3 benchmark random seeds. The website
includes analyzed runs for both theory+code and matched
code+design-intent artifact modes. In config names this mode appears as
code_only; formal theory_content is empty but
summary_md still carries the node's design and ideation principles.
Optimizer Discovery
Evolve EvoOptimizer(torch.optim.Optimizer) while keeping a regular NanoGPT stack
on the Shakespeare dataset fixed, again evaluated with 3 benchmark random seeds.
Native Optimizer Discovery
Example ablation task: evolve EvoOptimizer(torch.optim.Optimizer) on small native
classification benchmarks with linear and MLP models; the reported validation loss is averaged over
32 benchmark runs per node across 4 datasets, 2 seeds, and a 2x2 learning-rate/weight-decay grid.