AI Blog

by Michele Laurelli

Quantization

Technique

Definition

Reducing precision of weights/activations to lower memory and computation.

Converts 32-bit floats to 8-bit integers or lower. Reduces model size 4x with minimal accuracy loss. Essential for edge deployment.

INT8 quantization

Mobile deployment

4-bit LLM quantization

The process of using a trained model to make predictions on new data.

Michele Laurelli - AI Research & Engineering