BitCore: Quantization-Aware Training Toolkit

Training and deploying ternary-quantized neural networks typically requires juggling two very different codepaths: a gradient-aware training path that simulates low-bit arithmetic, and a deployment path that actually runs it. BitCore bridges the two with a single BitLinear module that works as a drop-in replacement for nn.Linear—train with quantization-aware gradients, then flip to optimized inference with one call.

View the BitCore GitHub Repository

The Problem

Ternary quantization constrains neural network weights to {-1, 0, +1}, enabling massive compression and fast inference via bitwise operations. But standard PyTorch nn.Linear layers have no concept of ternary weights. Researchers typically implement custom forward passes with straight-through estimators for training, then manually rewrite inference logic for deployment—an error-prone process that makes it hard to iterate quickly and deploy reliably.

How BitCore Works

BitCore provides a unified BitLinear layer that handles both phases:

Training Mode (default) The layer stores full-precision weights internally but quantizes them on every forward pass using a chosen quantization scheme. Gradients flow through via straight-through estimators, so standard PyTorch optimizers (Adam, SGD, etc.) work out of the box.
Deployment Mode (layer.deploy()) A single method call freezes the ternary weights into their packed representation and routes inference through BitOps for hardware-accelerated matrix multiplication. No code changes to the rest of your model or pipeline.

Quantization Schemes

BitCore ships with three quantizers, selectable via a string argument:

Quantizer	Key Idea
BitNet	Scales weights by their mean absolute value, then rounds to {-1, +1} (default)
TWN (Ternary Weight Networks)	Adds a zero threshold, producing true ternary {-1, 0, +1} weights with a learned scale
ParetoQ	Pareto-optimal quantization scheme balancing accuracy and compression

from bitcore import BitLinear

layer_bitnet  = BitLinear(128, 64, quant_type="bitnet")
layer_twn     = BitLinear(128, 64, quant_type="twn")
layer_paretoq = BitLinear(128, 64, quant_type="paretoq")

Drop-In Model Conversion

Existing models with standard nn.Linear layers can be converted in place—no architectural changes required:

from bitcore import BitLinear
import torch.nn as nn

# Convert a single layer
linear = nn.Linear(128, 64)
bitlinear = BitLinear.from_linear(linear, quant_type="bitnet")

# Convert an entire model recursively
def convert_to_bitlinear(model):
    for name, module in model.named_children():
        if isinstance(module, nn.Linear):
            setattr(model, name, BitLinear.from_linear(module))
        else:
            convert_to_bitlinear(module)
    return model

This is particularly useful for taking a pretrained full-precision model and fine-tuning it with ternary quantization via knowledge distillation—exactly the workflow used in TAT-VPR and TeTRA-VPR.

Training Example

BitLinear layers are fully compatible with standard PyTorch training loops:

import torch
import torch.nn as nn
import torch.optim as optim
from bitcore import BitLinear

model = nn.Sequential(
    BitLinear(784, 256, quant_type="bitnet"),
    nn.ReLU(),
    BitLinear(256, 10, quant_type="bitnet"),
)

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters())

model.train()
for batch in dataloader:
    x, y = batch
    optimizer.zero_grad()
    output = model(x)
    loss = criterion(output, y)
    loss.backward()
    optimizer.step()

Deployment

Switching to deployment mode packs the ternary weights and routes computation through BitOps:

model.eval()
for module in model.modules():
    if isinstance(module, BitLinear):
        module.deploy()

with torch.no_grad():
    output = model(x)  # Now uses optimized BitOps kernels

API Reference

`BitLinear`

BitLinear(
    in_features: int,
    out_features: int,
    bias: bool = True,
    eps: float = 1e-6,
    quant_type: str = "bitnet"
)

Parameter	Type	Default	Description
`in_features`	`int`	—	Number of input features
`out_features`	`int`	—	Number of output features
`bias`	`bool`	`True`	Include a bias term
`eps`	`float`	`1e-6`	Epsilon for numerical stability
`quant_type`	`str`	`"bitnet"`	Quantization scheme: `"bitnet"`, `"twn"`, or `"paretoq"`

Methods:

Method	Description
`forward(x)`	Forward pass with quantization-aware gradients (training) or optimized inference (deployed)
`deploy()`	Switch to deployment mode — packs weights and enables BitOps acceleration
`from_linear(linear, quant_type)`	Class method to convert an existing `nn.Linear` to `BitLinear`

The BitOps Stack

BitCore sits in the middle of a three-layer stack for ternary neural networks:

Research models (TAT-VPR, TeTRA-VPR) define the architectures and training recipes.
BitCore provides the quantization-aware BitLinear layer for training and seamless deployment switching.
BitOps supplies the low-level, hardware-optimized ternary matrix multiplication kernels that BitCore calls in deployment mode.

Together, this stack takes a ternary quantization idea from research prototype to efficient on-device inference with minimal friction.

Requirements

Python 3.9+
PyTorch 2.0.0+
(Optional) BitOps for deployment-mode acceleration

License

MIT License

Share on

X (formerly Twitter) Facebook LinkedIn

Oliver Grainge