News
Latest
Top
Search
Submit
Login
Search
▲
10
NanoQuant: Efficient Sub-1-Bit Quantization of Large Language Models
(arxiv.org)
by chrsw |
view
|
0 comments
▲
2
Ask HN: Does treating Inflation as a "Quantization Snap" resolve slow-roll?
by aplowe |
view
|
0 comments
▲
1
Power-of-Two Quantization for Efficient FPGA-Based GRU Architectures
(mdpi.com)
by PaulHoule |
view
|
0 comments
▲
1
Ask HN: Does treating Inflation as a "Quantization Snap" resolve slow-roll?
by aplowe |
view
|
0 comments
▲
1
9x MobileNet V2 size reduction with Quantization aware training
(github.com)
by gauravvij137 |
view
|
1 comments
▲
1
Quantization-Aware Distillation
(ternarysearch.blogspot.com)
by paladin314159 |
view
|
0 comments
▲
1
Quantization-Aware Distillation for NVFP4 Inference Accuracy Recovery [pdf]
(research.nvidia.com)
by gmays |
view
|
0 comments
▲
1
LLM Quantization and NVFP4
(ternarysearch.blogspot.com)
by paladin314159 |
view
|
0 comments
▲
1
Nvidia: Quantization-Aware Distillation for NVFP4 Inference Accuracy Recovery [pdf]
(research.nvidia.com)
by tosh |
view
|
0 comments
▲
1
Removing branches from the hot path: A 60% speed-up for Product Quantization
(twitter.com)
by bobvanluijt |
view
|
0 comments
▲
1
Quantization and distillation effects on code LLMs
(arxiv.org)
by nkko |
view
|
0 comments
▲
1
SatQuant: Fix YOLOv8 quantization accuracy on satellite imagery (Edge TPU)
(github.com)
by gulis-dev |
view
|
0 comments
▲
1
Restructuring Vector Quantization with the Rotation Trick
(arxiv.org)
by fzliu |
view
|
0 comments
▲
1
Fractional quantization in insulators from Hall to Chern
(nature.com)
by westurner |
view
|
0 comments