Hyperparameters

All hyperparameters are stored on-chain in hparams.move and are governance-updatable. No node restart is required — nodes read hyperparameters from the chain at the start of each window.

Parameter Reference

Parameter	Default	Description
`window_duration_ms`	600,000	Window length in milliseconds (10 minutes)
`put_window_open_ms`	480,000	Gradient upload deadline within a window (8 minutes)
`topk_compression`	32	Number of top-k DCT coefficients transmitted per gradient
`top_g`	15	Number of top-ranked peers selected for aggregation
`openskill_beta`	25/6 ≈ 4.17	OpenSkill performance variance parameter
`openskill_tau`	25/300 ≈ 0.083	OpenSkill drift per window (skill decay)
`emission_per_window`	70,000,000,000	Tokens emitted per window in base units (= 70 VRAM at 9 decimals)
`checkpoint_frequency`	100	Windows between checkpoint anchoring
`min_miner_stake`	1,000,000,000	Minimum stake to register as miner (1 SUI = 1e9 MIST)
`min_validator_stake`	10,000,000,000	Minimum stake to register as validator (10 SUI)
`validator_offset`	2	Number of windows a new validator must wait before evaluating
`gauntlet_gamma`	0.99	Decay factor applied to gauntlet scores each window
`sync_threshold`	3	Minimum gradient size ratio for a peer to pass the sync fast-eval check

Understanding Key Parameters

`window_duration_ms` and `put_window_open_ms`

The window is split into two phases:

t=0ms                    t=480,000ms         t=600,000ms
  │                           │                    │
  ├── training phase ─────────┤── score phase ─────┤
  │   (miners train + upload) │   (validators eval) │

Miners must upload their gradient before put_window_open_ms. After that deadline, validators begin downloading and evaluating. This separation prevents validators from racing to evaluate while miners are still uploading.

`topk_compression`

Controls the compression ratio. With topk_compression = 32, only 32 DCT coefficients are transmitted per gradient tensor. Higher values mean better gradient quality but larger uploads.

The effective compression ratio depends on model size. For a 1B-parameter model, topk_compression = 32 provides extremely aggressive compression (~99.997% reduction); validators tolerate some approximation error because the loss delta evaluation captures the net effect.

`top_g`

Each window, only the top top_g miners (by OpenSkill ordinal) contribute their gradients to the aggregated checkpoint. This prevents low-quality gradients from polluting the shared model state.

`openskill_beta` and `openskill_tau`

beta controls how much a single window's result can move a miner's skill estimate. Higher beta → more uncertainty → slower convergence to stable ratings.
tau controls how quickly ratings drift toward the prior when a miner is inactive. Higher tau → ratings decay faster → more emphasis on recent windows.

`checkpoint_frequency`

Checkpoints are anchored on-chain every checkpoint_frequency windows. In between, miners load the most recent available checkpoint. Increasing this reduces on-chain storage costs but increases the amount of training state that may be lost if an aggregator fails.

Governance

Hyperparameters are updated via the update_hparams entry function in hparams.move, which requires the HparamsAdminCap. In production, governance will route this through a multisig or DAO structure.

Changes take effect at the next window boundary — nodes read fresh hyperparameters at the start of each window loop.

Hyperparameters

Parameter Reference

Understanding Key Parameters

window_duration_ms and put_window_open_ms

topk_compression

top_g