API Reference: optim

grilly.optim provides GPU-accelerated optimizers and learning rate schedulers.

Source: optim/

Base Class

Class	Description
`Optimizer`	Base optimizer. Provides `zero_grad()`, `step()`, `state_dict()`, `load_state_dict()`.

Optimizers

Class	Signature	Description
`Adam`	`Adam(params, lr=1e-3, betas=(0.9, 0.999), eps=1e-8)`	Adam optimizer.
`AdamW`	`AdamW(params, lr=1e-3, betas=(0.9, 0.999), eps=1e-8, weight_decay=0.01)`	AdamW with decoupled weight decay.
`SGD`	`SGD(params, lr, momentum=0.0, nesterov=False)`	Stochastic gradient descent.
`NLMS`	`NLMS(params, lr=0.01, mu=0.1)`	Normalized least mean squares.
`NaturalGradient`	`NaturalGradient(params, lr=0.01)`	Natural gradient descent.
`HypergradientAdamW`	`HypergradientAdamW(params, lr=1e-3, hyper_lr=1e-7)`	AdamW with hypergradient LR tuning.
`AutoHypergradientAdamW`	`AutoHypergradientAdamW(params, lr=1e-3)`	OSGM-style auto LR with surprise signal.
`AffectAdam`	`AffectAdam(params, lr=1e-3)`	Adam with affect-modulated updates.

Learning Rate Schedulers

All schedulers accept an optimizer and modify its learning rate via scheduler.step().

Class	Signature	Description
`StepLR`	`StepLR(optimizer, step_size, gamma=0.1)`	Decay LR by `gamma` every `step_size` epochs.
`CosineAnnealingLR`	`CosineAnnealingLR(optimizer, T_max, eta_min=0)`	Cosine annealing to `eta_min`.
`ReduceLROnPlateau`	`ReduceLROnPlateau(optimizer, patience=10, factor=0.1)`	Reduce LR when metric plateaus. Call `step(metric)`.
`OneCycleLR`	`OneCycleLR(optimizer, max_lr, total_steps)`	Super-convergence 1cycle policy.

Usage Pattern

from grilly.optim import AdamW, CosineAnnealingLR

optimizer = AdamW(model.parameters(), lr=1e-3)
scheduler = CosineAnnealingLR(optimizer, T_max=100)

for epoch in range(100):
    for x, y in loader:
        logits = model(x)
        loss = loss_fn(logits, y)
        grad = loss_fn.backward(np.ones_like(loss), logits, y)
        model.zero_grad()
        model.backward(grad)
        optimizer.step()
    scheduler.step()

GPU-Accelerated Updates

Adam and AdamW updates are implemented as Vulkan compute shaders (adam-update.glsl, adamw-update.glsl). When the C++ backend is available, parameter updates run on GPU without downloading gradients to CPU.