Implements TurboQuant (ICLR 2026, arXiv:2504.19874) KV cache compression directly inside a Transformers inference script. All algorithms are self-contained. Minimal dependencies.
- uses https://huggingface.co/g023/Qwen3-1.77B-g023 as the demonstration model (throw model files in Qwen3-BEST folder)