News
Latest
Top
Search
Submit
Login
Search
▲
10
NanoQuant: Efficient Sub-1-Bit Quantization of Large Language Models
(arxiv.org)
by chrsw |
view
|
0 comments
▲
2
Ask HN: Does treating Inflation as a "Quantization Snap" resolve slow-roll?
by aplowe |
view
|
0 comments
▲
1
A guide to model quantization in fine-tuning (and how to pick the right GGUF)
(siquick.com)
by siquick |
view
|
0 comments
▲
1
Quantization, LoRA, and the 8% Problem Benchmarking Local LLMs for Production AI
(walsenburgtech.com)
by cowartc |
view
|
0 comments
▲
1
RAG 40x faster using binary quantization (2024)
(lightning.ai)
by teleforce |
view
|
0 comments
▲
1
KYB Engine at 3 Quantization Levels: Accuracy Held. Cost Dropped 6x
(walsenburgtech.com)
by cowartc |
view
|
0 comments
▲
1
Show HN: TurboQuant-WASM – Google's vector quantization in the browser
(github.com)
by teamchong |
view
|
0 comments
▲
1
Mixed Precision Quantization on mlx comes with TurboQuant implementation
(twitter.com)
by jsilence |
view
|
1 comments
▲
1
Salomi, a research repo on extreme low-bit transformer quantization
(github.com)
by Edward9055 |
view
|
0 comments
▲
1
Fujitsu One Compression (LLM Quantization)
(FujitsuResearch.github.io)
by measurablefunc |
view
|
0 comments
▲
1
Zero-Bit Quantization for Neural Network Weight Encoding (ZBQ/1.0)
(medium.com)
by AdnanMasood |
view
|
0 comments
▲
1
RaBitQ Binary Quantization 101
(elastic.co)
by tamnd |
view
|
0 comments
▲
1
TurboQuant: KV Cache Quantization to 3.5 Bits with Zero Accuracy Loss- ICLR 2026
(darshanfofadiya.com)
by DARSHANFOFADIYA |
view
|
0 comments
▲
1
TurboQuant: Online Vector Quantization with Near-Optimal Distortion Rate
(openreview.net)
by tamnd |
view
|
0 comments
▲
1
Quantization from the Ground Up
(ngrok.com)
by samwho |
view
|
0 comments
▲
1
FlashHead: Up to 40% Faster Multimodal Reasoning on Top of Quantization
(huggingface.co)
by Embedl-Wilhelm |
view
|
1 comments
▲
1
Show HN: Qwodel – An open-source unified pipeline for LLM quantization
by kinderasteroid |
view
|
0 comments
▲
1
Emergent Quantization from a Dynamic Vacuum
(journals.aps.org)
by Rover222 |
view
|
1 comments
▲
1
Power-of-Two Quantization for Efficient FPGA-Based GRU Architectures
(mdpi.com)
by PaulHoule |
view
|
0 comments
▲
1
Ask HN: Does treating Inflation as a "Quantization Snap" resolve slow-roll?
by aplowe |
view
|
0 comments
▲
1
9x MobileNet V2 size reduction with Quantization aware training
(github.com)
by gauravvij137 |
view
|
1 comments
▲
1
Quantization-Aware Distillation
(ternarysearch.blogspot.com)
by paladin314159 |
view
|
0 comments
▲
1
Quantization-Aware Distillation for NVFP4 Inference Accuracy Recovery [pdf]
(research.nvidia.com)
by gmays |
view
|
0 comments
▲
1
LLM Quantization and NVFP4
(ternarysearch.blogspot.com)
by paladin314159 |
view
|
0 comments
▲
1
Nvidia: Quantization-Aware Distillation for NVFP4 Inference Accuracy Recovery [pdf]
(research.nvidia.com)
by tosh |
view
|
0 comments
▲
1
Removing branches from the hot path: A 60% speed-up for Product Quantization
(twitter.com)
by bobvanluijt |
view
|
0 comments
▲
1
Quantization and distillation effects on code LLMs
(arxiv.org)
by nkko |
view
|
0 comments
▲
1
SatQuant: Fix YOLOv8 quantization accuracy on satellite imagery (Edge TPU)
(github.com)
by gulis-dev |
view
|
0 comments
▲
1
Restructuring Vector Quantization with the Rotation Trick
(arxiv.org)
by fzliu |
view
|
0 comments
▲
1
Fractional quantization in insulators from Hall to Chern
(nature.com)
by westurner |
view
|
0 comments