by Michele Laurelli
Reducing precision of weights/activations to lower memory and computation.
Converts 32-bit floats to 8-bit integers or lower. Reduces model size 4x with minimal accuracy loss. Essential for edge deployment.
INT8 quantization
Mobile deployment
4-bit LLM quantization