Course · 5 modules · 8 lessons · 15 min

Advanced LLM Concepts

A second-volume tour of the techniques pushing large language models forward — advanced training, modern inference and serving, retrieval and embeddings, alignment, and adversarial robustness.

← All courses

Advanced Training

·Byte-Latent TransformersBLT replaces tokenization entirely — it operates directly on raw UTF-8 bytes, dynamically grouping them into variable-length patches based on local entropy.2 min→·Catastrophic ForgettingCatastrophic forgetting is the abrupt loss of previously learned capability that happens when a neural network is trained on a new task or domain.2 min→·Curriculum LearningCurriculum learning presents training examples in a meaningful order — usually easy to hard — instead of randomly, and modern LLM data mixing is its most consequential form.2 min→·Gradient CheckpointingGradient checkpointing trades compute for memory by storing activations only at selected layers and recomputing the rest during backpropagation.2 min→·GrokkingGrokking is the eerie phenomenon where a model first memorizes its training data with poor validation accuracy, then — long after training loss has plateaued — suddenly generalizes.2 min→·In-Context LearningIn-context learning is the surprising ability of large language models to acquire a new task from a handful of examples shown in the prompt, without any parameter updates.2 min→·Model CollapseModel collapse is the irreversible loss of distribution tails that occurs when AI models are trained, generation after generation, on data produced by other AI models.2 min→·Multi-Token PredictionMulti-token prediction trains the model to predict several future tokens at once through parallel prediction heads, producing richer representations and faster inference.1 min→