by Michele Laurelli
A neural network with billions of parameters trained on massive text datasets to understand and generate human language.
LLMs like GPT-4, Claude, and LLaMA are trained on diverse text from the internet. They demonstrate emergent abilities like reasoning, few-shot learning, and task generalization. Scale is key - larger models show better performance.
GPT-4 with 1.76 trillion parameters
LLaMA 2 for open-source applications
Claude for long-context understanding
A family of large language models developed by OpenAI that use transformer architecture for text generation.
A neural network architecture based entirely on attention mechanisms, without recurrent or convolutional layers.
The process of adapting a pre-trained model to a specific task by continuing training on task-specific data.