Hyperparameters
All hyperparameters are stored on-chain in hparams.move and are governance-updatable. No node restart is required — nodes read hyperparameters from the chain at the start of each window.
Parameter Reference
| Parameter | Default | Description |
|---|---|---|
window_duration_ms |
600,000 | Window length in milliseconds (10 minutes) |
put_window_open_ms |
480,000 | Gradient upload deadline within a window (8 minutes) |
topk_compression |
32 | Number of top-k DCT coefficients transmitted per gradient |
top_g |
15 | Number of top-ranked peers selected for aggregation |
openskill_beta |
25/6 ≈ 4.17 | OpenSkill performance variance parameter |
openskill_tau |
25/300 ≈ 0.083 | OpenSkill drift per window (skill decay) |
emission_per_window |
70,000,000,000 | Tokens emitted per window in base units (= 70 VRAM at 9 decimals) |
checkpoint_frequency |
100 | Windows between checkpoint anchoring |
min_miner_stake |
1,000,000,000 | Minimum stake to register as miner (1 SUI = 1e9 MIST) |
min_validator_stake |
10,000,000,000 | Minimum stake to register as validator (10 SUI) |
validator_offset |
2 | Number of windows a new validator must wait before evaluating |
gauntlet_gamma |
0.99 | Decay factor applied to gauntlet scores each window |
sync_threshold |
3 | Minimum gradient size ratio for a peer to pass the sync fast-eval check |
Understanding Key Parameters
window_duration_ms and put_window_open_ms
The window is split into two phases:
t=0ms t=480,000ms t=600,000ms
│ │ │
├── training phase ─────────┤── score phase ─────┤
│ (miners train + upload) │ (validators eval) │
Miners must upload their gradient before put_window_open_ms. After that deadline, validators begin downloading and evaluating. This separation prevents validators from racing to evaluate while miners are still uploading.
topk_compression
Controls the compression ratio. With topk_compression = 32, only 32 DCT coefficients are transmitted per gradient tensor. Higher values mean better gradient quality but larger uploads.
The effective compression ratio depends on model size. For a 1B-parameter model, topk_compression = 32 provides extremely aggressive compression (~99.997% reduction); validators tolerate some approximation error because the loss delta evaluation captures the net effect.
top_g
Each window, only the top top_g miners (by OpenSkill ordinal) contribute their gradients to the aggregated checkpoint. This prevents low-quality gradients from polluting the shared model state.
openskill_beta and openskill_tau
betacontrols how much a single window's result can move a miner's skill estimate. Higher beta → more uncertainty → slower convergence to stable ratings.taucontrols how quickly ratings drift toward the prior when a miner is inactive. Higher tau → ratings decay faster → more emphasis on recent windows.
checkpoint_frequency
Checkpoints are anchored on-chain every checkpoint_frequency windows. In between, miners load the most recent available checkpoint. Increasing this reduces on-chain storage costs but increases the amount of training state that may be lost if an aggregator fails.
Governance
Hyperparameters are updated via the update_hparams entry function in hparams.move, which requires the HparamsAdminCap. In production, governance will route this through a multisig or DAO structure.
Changes take effect at the next window boundary — nodes read fresh hyperparameters at the start of each window loop.