Skip to content

Grilly

Deep learning, well done.

GPU-accelerated neural network framework using Vulkan compute shaders. PyTorch-like API that runs on any GPU -- AMD, NVIDIA, Intel -- no CUDA dependency.


Why Grilly?

CUDA locks you into NVIDIA hardware. Grilly uses Vulkan compute shaders, which run on every major GPU vendor. Same training code, any GPU.

PyTorch (CUDA) Grilly (Vulkan)
AMD GPUs ROCm (Linux data center only) Full support
NVIDIA GPUs Full support Full support
Intel Arc No Full support
Windows consumer GPUs NVIDIA only All vendors

At a Glance

  • 194 GLSL compute shaders compiled to SPIR-V
  • 1,820 tests passing
  • C++ backend (grilly_core) with pybind11 bindings -- zero CPU-GPU ping-pong
  • PyTorch-like API: nn.Module, nn.Linear, F.relu, AdamW
  • Autograd engine with Variable, reverse-mode autodiff, full operator overloading
  • JIT compilation via @grilly.jit for fused GPU dispatch
  • AMP with autocast() and GradScaler

Quick Install

pip install grilly

For GPU acceleration, build the C++ backend:

git clone https://github.com/grillcheese-ai/grilly.git
cd grilly
pip install -e ".[dev]"
cmake -B build -DPYBIND11_FINDPYTHON=ON
cmake --build build --config Release

See Installation for full details.

Quick Example

import numpy as np
from grilly import nn
from grilly.optim import AdamW

model = nn.Sequential(
    nn.Linear(784, 256),
    nn.ReLU(),
    nn.Linear(256, 10),
)

optimizer = AdamW(model.parameters(), lr=1e-3)
loss_fn = nn.CrossEntropyLoss()

x = np.random.randn(32, 784).astype(np.float32)
targets = np.random.randint(0, 10, (32,))

logits = model(x)
loss = loss_fn(logits, targets)
grad = loss_fn.backward(np.ones_like(loss), logits, targets)

model.zero_grad()
model.backward(grad)
optimizer.step()

Architecture

Python (nn.Module)  -->  C++ Bridge (grilly_core)  -->  Vulkan Compute Shaders
  nn/ modules              pybind11 bindings              194 SPIR-V shaders
  functional/ ops          VMA persistent mapping          AMD / NVIDIA / Intel
  optim/                   BufferPool allocation            No CUDA needed
grilly/
├── backend/        # Vulkan GPU dispatch, autograd core, JIT, AMP
├── cpp/            # C++ pybind11 extension (grilly_core)
├── nn/             # nn.Module layers, SNN, multimodal, LoRA, autograd
├── functional/     # Stateless F.* API (mirrors torch.nn.functional)
├── optim/          # Optimizers and LR schedulers
├── utils/          # DataLoader, VulkanTensor, HuggingFaceBridge
├── shaders/        # 194 GLSL compute shaders + compiled SPIR-V
├── experimental/   # VSA, MoE routing, temporal reasoning
└── tests/          # 1,820 tests

Ecosystem

Package Description
optimum-grilly HuggingFace Optimum backend -- from_pretrained to Vulkan inference
CubeMind Neuro-vector-symbolic reasoning powered by grilly

License

MIT -- see LICENSE.