Course · 5 modules · 8 lessons · 15 min

Advanced LLM Concepts

A second-volume tour of the techniques pushing large language models forward — advanced training, modern inference and serving, retrieval and embeddings, alignment, and adversarial robustness.

← All courses
Advanced Training
·Byte-Latent TransformersBLT replaces tokenization entirely — it operates directly on raw UTF-8 bytes, dynamically grouping them into variable-length patches based on local entropy.2 min·Catastrophic ForgettingCatastrophic forgetting is the abrupt loss of previously learned capability that happens when a neural network is trained on a new task or domain.2 min·Curriculum LearningCurriculum learning presents training examples in a meaningful order — usually easy to hard — instead of randomly, and modern LLM data mixing is its most consequential form.2 min·Gradient CheckpointingGradient checkpointing trades compute for memory by storing activations only at selected layers and recomputing the rest during backpropagation.2 min·GrokkingGrokking is the eerie phenomenon where a model first memorizes its training data with poor validation accuracy, then — long after training loss has plateaued — suddenly generalizes.2 min·In-Context LearningIn-context learning is the surprising ability of large language models to acquire a new task from a handful of examples shown in the prompt, without any parameter updates.2 min·Model CollapseModel collapse is the irreversible loss of distribution tails that occurs when AI models are trained, generation after generation, on data produced by other AI models.2 min·Multi-Token PredictionMulti-token prediction trains the model to predict several future tokens at once through parallel prediction heads, producing richer representations and faster inference.1 min