News
Latest
Top
Search
Submit
Login
Search
▲
10
NanoQuant: Efficient Sub-1-Bit Quantization of Large Language Models
(arxiv.org)
by chrsw |
view
|
0 comments
▲
2
Quantization for Neural Networks
(leimao.github.io)
by eigenBasis |
view
|
0 comments
▲
2
Ask HN: Does treating Inflation as a "Quantization Snap" resolve slow-roll?
by aplowe |
view
|
0 comments
▲
1
GGUF vs. GPTQ vs. AWQ: The Plain-English Guide to LLM Quantization
(vettedconsumer.com)
by ermantrout |
view
|
0 comments
▲
1
KVarN: Native vLLM KV-cache quantization back end by Huawei
(github.com)
by theanonymousone |
view
|
0 comments
▲
1
Spikes in LLMs Are Bias Vectors: Spike-Free Quantization
(arxiv.org)
by sbulaev |
view
|
0 comments
▲
1
Show HN: Glq LLM quantization using E8 lattice
(github.com)
by acd |
view
|
0 comments
▲
1
Emergent Quantization from a Dynamic Vacuum
(journals.aps.org)
by bookofjoe |
view
|
0 comments
▲
1
3.125-Bit LLM quantization bypassing tensor cores
(blog.djellalmohamedaniss.workers.dev)
by dmaniss |
view
|
0 comments
▲
1
Evaluation of Various MLX Quantizations
(github.com)
by d-_-b |
view
|
1 comments
▲
1
Scalar and Binary Quantization for Pgvector Vector Search and Storage (2024)
(jkatz05.com)
by eigenBasis |
view
|
0 comments
▲
1
Emergent Quantization from a Dynamic Vacuum
(journals.aps.org)
by davedx |
view
|
0 comments
▲
1
Quantization for Modern AI Systems (70-page free eBook)
(pawankjha.substack.com)
by pawanjha25 |
view
|
0 comments
▲
1
Advanced Quantization Algorithm for LLMs
(github.com)
by lastdong |
view
|
0 comments
▲
1
LLM Quantization
(huggingface.co)
by Anon84 |
view
|
0 comments
▲
1
A DuckDB extension for vector search indexes with pluggable quantization
(github.com)
by CheeseWang |
view
|
0 comments
▲
1
The Quantization Robustness of Diffusion Language Models in Coding Benchmarks
(arxiv.org)
by matt_d |
view
|
0 comments
▲
1
SAW-INT4: System-Aware 4-Bit KV-Cache Quantization for Real-World LLM Serving
(arxiv.org)
by matt_d |
view
|
0 comments
▲
1
A guide to model quantization in fine-tuning (and how to pick the right GGUF)
(siquick.com)
by siquick |
view
|
0 comments
▲
1
Quantization, LoRA, and the 8% Problem Benchmarking Local LLMs for Production AI
(walsenburgtech.com)
by cowartc |
view
|
0 comments
▲
1
RAG 40x faster using binary quantization (2024)
(lightning.ai)
by teleforce |
view
|
0 comments
▲
1
KYB Engine at 3 Quantization Levels: Accuracy Held. Cost Dropped 6x
(walsenburgtech.com)
by cowartc |
view
|
0 comments
▲
1
Show HN: TurboQuant-WASM – Google's vector quantization in the browser
(github.com)
by teamchong |
view
|
0 comments
▲
1
Mixed Precision Quantization on mlx comes with TurboQuant implementation
(twitter.com)
by jsilence |
view
|
1 comments
▲
1
Salomi, a research repo on extreme low-bit transformer quantization
(github.com)
by Edward9055 |
view
|
0 comments
▲
1
Fujitsu One Compression (LLM Quantization)
(FujitsuResearch.github.io)
by measurablefunc |
view
|
0 comments
▲
1
Zero-Bit Quantization for Neural Network Weight Encoding (ZBQ/1.0)
(medium.com)
by AdnanMasood |
view
|
0 comments
▲
1
RaBitQ Binary Quantization 101
(elastic.co)
by tamnd |
view
|
0 comments
▲
1
TurboQuant: KV Cache Quantization to 3.5 Bits with Zero Accuracy Loss- ICLR 2026
(darshanfofadiya.com)
by DARSHANFOFADIYA |
view
|
0 comments
▲
1
TurboQuant: Online Vector Quantization with Near-Optimal Distortion Rate
(openreview.net)
by tamnd |
view
|
0 comments
▲
1
Quantization from the Ground Up
(ngrok.com)
by samwho |
view
|
0 comments
▲
1
FlashHead: Up to 40% Faster Multimodal Reasoning on Top of Quantization
(huggingface.co)
by Embedl-Wilhelm |
view
|
1 comments
▲
1
Show HN: Qwodel – An open-source unified pipeline for LLM quantization
by kinderasteroid |
view
|
0 comments
▲
1
Emergent Quantization from a Dynamic Vacuum
(journals.aps.org)
by Rover222 |
view
|
1 comments
▲
1
Power-of-Two Quantization for Efficient FPGA-Based GRU Architectures
(mdpi.com)
by PaulHoule |
view
|
0 comments
▲
1
Ask HN: Does treating Inflation as a "Quantization Snap" resolve slow-roll?
by aplowe |
view
|
0 comments
▲
1
9x MobileNet V2 size reduction with Quantization aware training
(github.com)
by gauravvij137 |
view
|
1 comments
▲
1
Quantization-Aware Distillation
(ternarysearch.blogspot.com)
by paladin314159 |
view
|
0 comments
▲
1
Quantization-Aware Distillation for NVFP4 Inference Accuracy Recovery [pdf]
(research.nvidia.com)
by gmays |
view
|
0 comments
▲
1
LLM Quantization and NVFP4
(ternarysearch.blogspot.com)
by paladin314159 |
view
|
0 comments
▲
1
Nvidia: Quantization-Aware Distillation for NVFP4 Inference Accuracy Recovery [pdf]
(research.nvidia.com)
by tosh |
view
|
0 comments
▲
1
Removing branches from the hot path: A 60% speed-up for Product Quantization
(twitter.com)
by bobvanluijt |
view
|
0 comments
▲
1
Quantization and distillation effects on code LLMs
(arxiv.org)
by nkko |
view
|
0 comments
▲
1
SatQuant: Fix YOLOv8 quantization accuracy on satellite imagery (Edge TPU)
(github.com)
by gulis-dev |
view
|
0 comments
▲
1
Restructuring Vector Quantization with the Rotation Trick
(arxiv.org)
by fzliu |
view
|
0 comments
▲
1
Fractional quantization in insulators from Hall to Chern
(nature.com)
by westurner |
view
|
0 comments