BitCore: Quantization-Aware Training Toolkit

Training and deploying ternary-quantized neural networks typically requires juggling two very different codepaths: a gradient-aware training path that simulates low-bit arithmetic, and a deployment path that actually runs it. BitCore bridges the two with a single BitLinear module that works as a drop-in replacement for nn.Linear—train with quantization-aware gradients, then flip to optimized inference with one call.

View the BitCore GitHub Repository

The Problem

Ternary quantization constrains neural network weights to {-1, 0, +1}, enabling massive compression and fast inference via bitwise operations. But standard PyTorch nn.Linear layers have no concept of ternary weights. Researchers typically implement custom forward passes with straight-through estimators for training, then manually rewrite inference logic for deployment—an error-prone process that makes it hard to iterate quickly and deploy reliably.

How BitCore Works

BitCore provides a unified BitLinear layer that handles both phases:

  1. Training Mode (default) The layer stores full-precision weights internally but quantizes them on every forward pass using a chosen quantization scheme. Gradients flow through via straight-through estimators, so standard PyTorch optimizers (Adam, SGD, etc.) work out of the box.

  2. Deployment Mode (layer.deploy()) A single method call freezes the ternary weights into their packed representation and routes inference through BitOps for hardware-accelerated matrix multiplication. No code changes to the rest of your model or pipeline.

Quantization Schemes

BitCore ships with three quantizers, selectable via a string argument:

QuantizerKey Idea
BitNetScales weights by their mean absolute value, then rounds to {-1, +1} (default)
TWN (Ternary Weight Networks)Adds a zero threshold, producing true ternary {-1, 0, +1} weights with a learned scale
ParetoQPareto-optimal quantization scheme balancing accuracy and compression
from bitcore import BitLinear

layer_bitnet  = BitLinear(128, 64, quant_type="bitnet")
layer_twn     = BitLinear(128, 64, quant_type="twn")
layer_paretoq = BitLinear(128, 64, quant_type="paretoq")

Drop-In Model Conversion

Existing models with standard nn.Linear layers can be converted in place—no architectural changes required:

from bitcore import BitLinear
import torch.nn as nn

# Convert a single layer
linear = nn.Linear(128, 64)
bitlinear = BitLinear.from_linear(linear, quant_type="bitnet")

# Convert an entire model recursively
def convert_to_bitlinear(model):
    for name, module in model.named_children():
        if isinstance(module, nn.Linear):
            setattr(model, name, BitLinear.from_linear(module))
        else:
            convert_to_bitlinear(module)
    return model

This is particularly useful for taking a pretrained full-precision model and fine-tuning it with ternary quantization via knowledge distillation—exactly the workflow used in TAT-VPR and TeTRA-VPR.

Training Example

BitLinear layers are fully compatible with standard PyTorch training loops:

import torch
import torch.nn as nn
import torch.optim as optim
from bitcore import BitLinear

model = nn.Sequential(
    BitLinear(784, 256, quant_type="bitnet"),
    nn.ReLU(),
    BitLinear(256, 10, quant_type="bitnet"),
)

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters())

model.train()
for batch in dataloader:
    x, y = batch
    optimizer.zero_grad()
    output = model(x)
    loss = criterion(output, y)
    loss.backward()
    optimizer.step()

Deployment

Switching to deployment mode packs the ternary weights and routes computation through BitOps:

model.eval()
for module in model.modules():
    if isinstance(module, BitLinear):
        module.deploy()

with torch.no_grad():
    output = model(x)  # Now uses optimized BitOps kernels

API Reference

BitLinear

BitLinear(
    in_features: int,
    out_features: int,
    bias: bool = True,
    eps: float = 1e-6,
    quant_type: str = "bitnet"
)
ParameterTypeDefaultDescription
in_featuresintNumber of input features
out_featuresintNumber of output features
biasboolTrueInclude a bias term
epsfloat1e-6Epsilon for numerical stability
quant_typestr"bitnet"Quantization scheme: "bitnet", "twn", or "paretoq"

Methods:

MethodDescription
forward(x)Forward pass with quantization-aware gradients (training) or optimized inference (deployed)
deploy()Switch to deployment mode — packs weights and enables BitOps acceleration
from_linear(linear, quant_type)Class method to convert an existing nn.Linear to BitLinear

The BitOps Stack

BitCore sits in the middle of a three-layer stack for ternary neural networks:

  • Research models (TAT-VPR, TeTRA-VPR) define the architectures and training recipes.
  • BitCore provides the quantization-aware BitLinear layer for training and seamless deployment switching.
  • BitOps supplies the low-level, hardware-optimized ternary matrix multiplication kernels that BitCore calls in deployment mode.

Together, this stack takes a ternary quantization idea from research prototype to efficient on-device inference with minimal friction.

Requirements

  • Python 3.9+
  • PyTorch 2.0.0+
  • (Optional) BitOps for deployment-mode acceleration

License

MIT License