Higgs quantization of AR-DiT

Use Higgs and QJL to quantize NVIDIA’s LongLive autoregressive DiT inference to 4-bit. Quantize both weights and activations (including the KV cache), and benchmark output quality. We are not evaluating inference speed, throughput, memory savings, or choice of GPU—focus on whether the quantized model still produces acceptable generations.

Submit