Grilly
Deep learning, well done.
GPU-accelerated neural network framework using Vulkan compute shaders. PyTorch-like API that runs on any GPU -- AMD, NVIDIA, Intel -- no CUDA dependency.
Why Grilly?
CUDA locks you into NVIDIA hardware. Grilly uses Vulkan compute shaders, which run on every major GPU vendor. Same training code, any GPU.
| PyTorch (CUDA) | Grilly (Vulkan) | |
|---|---|---|
| AMD GPUs | ROCm (Linux data center only) | Full support |
| NVIDIA GPUs | Full support | Full support |
| Intel Arc | No | Full support |
| Windows consumer GPUs | NVIDIA only | All vendors |
At a Glance
- 194 GLSL compute shaders compiled to SPIR-V
- 1,820 tests passing
- C++ backend (
grilly_core) with pybind11 bindings -- zero CPU-GPU ping-pong - PyTorch-like API:
nn.Module,nn.Linear,F.relu,AdamW - Autograd engine with
Variable, reverse-mode autodiff, full operator overloading - JIT compilation via
@grilly.jitfor fused GPU dispatch - AMP with
autocast()andGradScaler
Quick Install
pip install grilly
For GPU acceleration, build the C++ backend:
git clone https://github.com/grillcheese-ai/grilly.git
cd grilly
pip install -e ".[dev]"
cmake -B build -DPYBIND11_FINDPYTHON=ON
cmake --build build --config Release
See Installation for full details.
Quick Example
import numpy as np
from grilly import nn
from grilly.optim import AdamW
model = nn.Sequential(
nn.Linear(784, 256),
nn.ReLU(),
nn.Linear(256, 10),
)
optimizer = AdamW(model.parameters(), lr=1e-3)
loss_fn = nn.CrossEntropyLoss()
x = np.random.randn(32, 784).astype(np.float32)
targets = np.random.randint(0, 10, (32,))
logits = model(x)
loss = loss_fn(logits, targets)
grad = loss_fn.backward(np.ones_like(loss), logits, targets)
model.zero_grad()
model.backward(grad)
optimizer.step()
Architecture
Python (nn.Module) --> C++ Bridge (grilly_core) --> Vulkan Compute Shaders
nn/ modules pybind11 bindings 194 SPIR-V shaders
functional/ ops VMA persistent mapping AMD / NVIDIA / Intel
optim/ BufferPool allocation No CUDA needed
grilly/
├── backend/ # Vulkan GPU dispatch, autograd core, JIT, AMP
├── cpp/ # C++ pybind11 extension (grilly_core)
├── nn/ # nn.Module layers, SNN, multimodal, LoRA, autograd
├── functional/ # Stateless F.* API (mirrors torch.nn.functional)
├── optim/ # Optimizers and LR schedulers
├── utils/ # DataLoader, VulkanTensor, HuggingFaceBridge
├── shaders/ # 194 GLSL compute shaders + compiled SPIR-V
├── experimental/ # VSA, MoE routing, temporal reasoning
└── tests/ # 1,820 tests
Ecosystem
| Package | Description |
|---|---|
| optimum-grilly | HuggingFace Optimum backend -- from_pretrained to Vulkan inference |
| CubeMind | Neuro-vector-symbolic reasoning powered by grilly |
License
MIT -- see LICENSE.