Search | News by Netwrck

NanoQuant: Efficient Sub-1-Bit Quantization of Large Language Models

(arxiv.org) by chrsw | view | 0 comments

Ask HN: Does treating Inflation as a "Quantization Snap" resolve slow-roll?

by aplowe | view | 0 comments

A guide to model quantization in fine-tuning (and how to pick the right GGUF)

(siquick.com) by siquick | view | 0 comments

Quantization, LoRA, and the 8% Problem Benchmarking Local LLMs for Production AI

(walsenburgtech.com) by cowartc | view | 0 comments

RAG 40x faster using binary quantization (2024)

(lightning.ai) by teleforce | view | 0 comments

KYB Engine at 3 Quantization Levels: Accuracy Held. Cost Dropped 6x

(walsenburgtech.com) by cowartc | view | 0 comments

Show HN: TurboQuant-WASM – Google's vector quantization in the browser

(github.com) by teamchong | view | 0 comments

Mixed Precision Quantization on mlx comes with TurboQuant implementation

(twitter.com) by jsilence | view | 1 comments

Salomi, a research repo on extreme low-bit transformer quantization

(github.com) by Edward9055 | view | 0 comments

Fujitsu One Compression (LLM Quantization)

(FujitsuResearch.github.io) by measurablefunc | view | 0 comments

Zero-Bit Quantization for Neural Network Weight Encoding (ZBQ/1.0)

(medium.com) by AdnanMasood | view | 0 comments

RaBitQ Binary Quantization 101

(elastic.co) by tamnd | view | 0 comments

TurboQuant: KV Cache Quantization to 3.5 Bits with Zero Accuracy Loss- ICLR 2026

(darshanfofadiya.com) by DARSHANFOFADIYA | view | 0 comments

TurboQuant: Online Vector Quantization with Near-Optimal Distortion Rate

(openreview.net) by tamnd | view | 0 comments

Quantization from the Ground Up

(ngrok.com) by samwho | view | 0 comments

FlashHead: Up to 40% Faster Multimodal Reasoning on Top of Quantization

(huggingface.co) by Embedl-Wilhelm | view | 1 comments

Show HN: Qwodel – An open-source unified pipeline for LLM quantization

by kinderasteroid | view | 0 comments

Emergent Quantization from a Dynamic Vacuum

(journals.aps.org) by Rover222 | view | 1 comments

Power-of-Two Quantization for Efficient FPGA-Based GRU Architectures

(mdpi.com) by PaulHoule | view | 0 comments

Ask HN: Does treating Inflation as a "Quantization Snap" resolve slow-roll?

by aplowe | view | 0 comments

9x MobileNet V2 size reduction with Quantization aware training

(github.com) by gauravvij137 | view | 1 comments

Quantization-Aware Distillation

(ternarysearch.blogspot.com) by paladin314159 | view | 0 comments

Quantization-Aware Distillation for NVFP4 Inference Accuracy Recovery [pdf]

(research.nvidia.com) by gmays | view | 0 comments

LLM Quantization and NVFP4

(ternarysearch.blogspot.com) by paladin314159 | view | 0 comments

Nvidia: Quantization-Aware Distillation for NVFP4 Inference Accuracy Recovery [pdf]

(research.nvidia.com) by tosh | view | 0 comments

Removing branches from the hot path: A 60% speed-up for Product Quantization

(twitter.com) by bobvanluijt | view | 0 comments

Quantization and distillation effects on code LLMs

(arxiv.org) by nkko | view | 0 comments

SatQuant: Fix YOLOv8 quantization accuracy on satellite imagery (Edge TPU)

(github.com) by gulis-dev | view | 0 comments

Restructuring Vector Quantization with the Rotation Trick

(arxiv.org) by fzliu | view | 0 comments

Fractional quantization in insulators from Hall to Chern

(nature.com) by westurner | view | 0 comments