Course · 9 modules · 79 lessons · 593 min

Prompt Engineering

Core prompting techniques, reasoning elicitation, system prompts, structured output, context engineering, and production safety.

← All courses

Foundations

·Attention and Position EffectsLLMs exhibit a U-shaped attention curve — prioritizing information at the beginning and end of the context while partially losing content in the middle — with actionable placement rules that measurably improve output quality.7 min→·Context Window MechanicsThe context window is the fixed-capacity input buffer that constrains every LLM interaction, with sizes ranging from 8K to 2M+ tokens, where nominal capacity and effective capacity diverge significantly.7 min→·How LLMs Process PromptsUnderstanding the four-stage pipeline — tokenization, embedding, attention, and generation — reveals why word choice, ordering, and structure mechanically alter LLM outputs.10 min→·In-Context LearningIn-context learning (ICL) is the emergent ability of large language models to learn tasks from examples provided in the prompt — without any parameter updates — enabling few-shot prompting and fundamentally changing how we program AI systems.7 min→·Mental Models for PromptingFour mental models — completion engine, instruction follower, role player, and pattern matcher — provide complementary lenses for understanding LLM behavior, and knowing which model to apply in a given situation determines prompt effectiveness.8 min→·Prompt Engineering vs. Context EngineeringPrompt engineering crafts the instructions telling the model what to do, while context engineering designs the information environment — what documents, history, state, and tools enter the context window — and production systems require both.8 min→·Temperature and SamplingTemperature, top-k, and top-p (nucleus sampling) are the control knobs that determine how the model selects from its predicted probability distribution, ranging from deterministic extraction to creative exploration.7 min→·Tokenization for Prompt EngineersTokenization determines how text is segmented into the fundamental units an LLM processes, directly affecting cost, multilingual performance, and prompt behavior in ways that are invisible but consequential.7 min→·What Is a PromptA prompt is the complete structured input sent to an LLM, composed of distinct segments — system message, user input, assistant prefill, and tool results — each influencing generation in specific, measurable ways.9 min→

Core Prompting Techniques

·Delimiter and Markup StrategiesUsing structural delimiters — XML tags, markdown headers, triple quotes, and custom markers — to separate prompt sections improves model comprehension by 15-20%, enables reliable parsing, and is the foundation of professional prompt layout.7 min→·Few-Shot PromptingFew-shot prompting provides 3-8 input-output examples in the prompt to demonstrate the desired task, leveraging in-context learning to improve output quality, format consistency, and task comprehension beyond what instructions alone achieve.7 min→·Instruction PromptingInstruction prompting uses clear, specific, actionable directives to guide model behavior, where the specificity gradient — from vague ("summarize") to precise ("summarize in 3 bullet points of max 20 words each") — directly determines output quality and consistency.7 min→·Many-Shot PromptingMany-shot prompting uses 20-500+ examples in long-context models, approaching fine-tuning quality on some tasks while preserving the flexibility of in-context learning, with most gains realized by around 50 examples.7 min→·Negative Prompting and ConstraintsTelling an LLM what NOT to do ("do not hallucinate") is systematically less effective than telling it what TO do ("only cite provided sources"), because negation is processed less reliably by attention mechanisms and can paradoxically increase the unwanted behavior.7 min→·Prefilling and Output PrimingPrefilling starts the assistant's response with predetermined text — such as `{` for JSON or `Step 1:` for structured reasoning — exploiting the autoregressive generation mechanism to dramatically improve output format compliance and quality.7 min→·Prompt ChainingPrompt chaining decomposes complex tasks into sequential LLM calls where the output of one prompt becomes the input to the next, enabling tasks too complex for a single prompt while introducing error propagation that must be managed through validation gates.7 min→·Prompt Templates and VariablesPrompt templates are reusable prompt structures with `{variable}` slots that separate the static prompt logic from dynamic content, enabling consistent, maintainable, and testable prompt engineering at production scale.6 min→·Role and Persona PromptingAssigning the model a specific role or persona ("You are an expert tax attorney...") activates domain-relevant knowledge clusters, producing measurably better output on domain-specific tasks with 10-20% quality improvements, while the design spectrum ranges from light framing to detailed character sheets.7 min→·Zero-Shot PromptingZero-shot prompting provides only instructions — no examples — relying entirely on the model's pretrained knowledge and instruction tuning to perform a task, and works best for well-defined tasks on capable, instruction-tuned models.7 min→

Reasoning Elicitation

·Chain-of-Thought PromptingChain-of-thought prompting dramatically improves LLM reasoning by including step-by-step worked examples that teach the model to show its work before answering.8 min→·Extended Thinking and Thinking BudgetsExtended thinking gives LLMs a dedicated, often hidden, token budget for internal reasoning before producing a visible response, formalizing the insight that harder problems benefit from more "thinking time."8 min→·Metacognitive PromptingMetacognitive prompting asks the model to reflect on its own knowledge, confidence, and reasoning quality, producing better-calibrated outputs that distinguish what the model knows from what it does not.8 min→·Self-Ask and DecompositionSelf-ask prompting teaches the model to break complex questions into smaller sub-questions, answer each independently, and synthesize the results into a final answer.7 min→·Self-ConsistencySelf-consistency improves chain-of-thought reasoning by sampling multiple reasoning paths at non-zero temperature and selecting the most common final answer through majority voting.8 min→·Step-Back PromptingStep-back prompting improves reasoning by first asking the model to identify the relevant high-level principle or concept before attempting to solve the specific problem.8 min→·Structured Reasoning FormatsStructured reasoning formats provide explicit templates -- such as OTA, Given-Find-Solution, and Claim-Evidence-Reasoning -- that guide the model's reasoning into a predictable, task-appropriate structure.7 min→·Tree-of-Thought PromptingTree-of-thought prompting extends chain-of-thought from a single linear reasoning path to a branching search tree, enabling the model to explore, evaluate, and backtrack through multiple reasoning strategies.8 min→·Zero-Shot Chain-of-ThoughtAdding "Let's think step by step" to a prompt -- with no examples at all -- can dramatically improve reasoning performance by triggering the model's latent step-by-step generation capabilities.8 min→

System Prompts And Instruction Design

·Behavioral Constraints and RulesBehavioral constraints shape LLM behavior through specific, positively framed, and well-structured rules that achieve 15-20% better compliance when formatted as numbered lists rather than prose.7 min→·Dynamic System PromptsDynamic system prompts are assembled at runtime from modular components -- including user roles, feature flags, time-sensitive context, and personalization slots -- enabling applications to customize LLM behavior for each user and situation.7 min→·Instruction Following and ComplianceLLM instruction compliance depends on instruction salience, formatting, position, and the model's training-shaped attention budget, and understanding these factors enables systematic improvement of adherence rates.8 min→·Instruction Hierarchy DesignInstruction hierarchy establishes a chain of command -- system over developer over user over tool data -- that determines which instructions take priority when they conflict, serving as a primary defense against prompt injection.7 min→·Meta-PromptingMeta-prompting uses one LLM call to generate, refine, or optimize the prompt for another LLM call, creating a two-layer system where the model acts as its own prompt engineer.8 min→·Multi-Turn Instruction PersistenceSystem prompt instructions lose effectiveness over long conversations, typically degrading after 20-30 turns, requiring active reinforcement techniques to maintain consistent model behavior.8 min→·Prompt Versioning and ManagementProduction prompts should be treated as code artifacts with version control, changelogs, regression testing, A/B testing infrastructure, and rollback procedures to ensure reliable, measurable, and reversible prompt evolution.8 min→·System Prompt AnatomyAn effective system prompt consists of six core components -- role definition, context, behavioral constraints, tool instructions, output format, and examples -- arranged to maximize instruction adherence within a limited token budget.8 min→

Structured Output And Format Control

·Classification and Labeling OutputClassification and labeling output techniques use prompt design, label space engineering, and output constraints to reliably sort LLM inputs into predefined categories with calibrated confidence.7 min→·Constrained Decoding from the Prompt PerspectiveConstrained decoding uses grammar-based filtering, regex constraints, and schema enforcement at the token level to guarantee structural output validity, complementing prompt-based format control.7 min→·Extraction and Parsing PromptsExtraction and parsing prompts instruct LLMs to locate, identify, and structure specific information from unstructured text into defined fields, bridging the gap between raw documents and structured databases.6 min→·JSON Mode and Schema EnforcementJSON mode and schema enforcement ensure LLM outputs conform to machine-parseable JSON structures through API-level constraints, prompt design, and external validation.7 min→·Markdown and Rich Text OutputMarkdown output prompting controls how LLMs format responses with headers, tables, lists, and code blocks, enabling consistent, readable, and structured human-facing content.7 min→·Multi-Step Output PipelinesMulti-step output pipelines chain multiple LLM calls where each step's structured output feeds as input to downstream code or prompts, enabling complex tasks through decomposition.7 min→·Output Length ControlOutput length control uses prompt instructions, parameter settings, and structural techniques to manage the trade-off between brevity and completeness in LLM responses.6 min→·XML and Tag-Based OutputXML and tag-based output uses labeled opening and closing tags to structure LLM responses, excelling at nested mixed content, human readability, and seamless integration with Anthropic's Claude models.8 min→

Context Engineering Fundamentals

·Context Assembly PatternsContext assembly patterns are software engineering approaches for dynamically constructing the LLM context window at runtime, selecting and arranging information based on the current query, user state, and application logic.7 min→·Context Budget AllocationContext budget allocation divides the context window into purposeful zones — system prompt, conversation history, retrieved knowledge, tool results, and safety buffer — with specific token budgets that adapt to window size and task requirements.7 min→·Context Caching and Prefix ReuseContext caching stores the computed key-value representations of stable prompt prefixes across requests, reducing latency by 30-50% and costs by up to 90% on cached tokens for applications with repetitive context structures.7 min→·Context Compression TechniquesContext compression techniques — including summarization, truncation, structured extraction, deduplication, and perplexity-based pruning — reduce token usage by 50-75% while preserving the information models need to generate accurate responses.7 min→·Conversation History ManagementConversation history management applies strategies like sliding windows, summarization, and selective retention to maintain conversational coherence while keeping token costs within the context budget.7 min→·Information Priority and OrderingInformation positioning within the context window follows a U-shaped attention curve — models attend most to the beginning and end, losing information in the middle — making strategic ordering a critical factor in output quality.7 min→·Long-Context Design PatternsLong-context design patterns address the unique challenges of working with 100K+ token context windows, where effective capacity falls below nominal capacity and explicit organization strategies become essential for maintaining model performance.7 min→·Multi-Modal Context DesignMulti-modal context design integrates images, audio, video, and PDFs alongside text in the context window, managing token costs, placement strategies, and modality-specific formatting to maximize model comprehension across input types.8 min→·State and Memory in ContextState and memory patterns — including scratchpads, pinned facts, running tallies, and working memory blocks — enable LLMs to maintain, update, and reference persistent information within and across conversation turns.7 min→·What Is Context EngineeringContext engineering is the discipline of designing what information enters an LLM's context window and how it is organized, determining model performance more than the prompt instructions themselves.7 min→

Retrieval And Knowledge Integration

·Chunking for Context QualityHow you split documents into chunks for retrieval determines not just what gets found but how well the model can reason over and generate from retrieved context.7 min→·Citation and Attribution PromptingDesigning citation instructions that models follow consistently transforms RAG outputs from unverifiable text into auditable, trust-building responses with traceable claims.7 min→·Dynamic Context AugmentationRather than retrieving all context upfront, dynamic augmentation makes runtime decisions to fetch additional information based on confidence levels, identified gaps, and intermediate reasoning results.7 min→·Grounding and FaithfulnessGrounding techniques instruct the model to generate claims only from provided context, reducing RAG hallucination rates from 20-30% to 5-10% through structured prompting patterns.7 min→·Hybrid Retrieval Context PatternsCombining dense (embedding), sparse (keyword), and structured (SQL/graph) retrieval methods through fusion produces more robust context than any single method alone.7 min→·Knowledge Conflicts and ResolutionWhen retrieved documents contradict each other or conflict with the model's training data, explicit conflict resolution strategies prevent the model from silently choosing one version or hallucinating a compromise.7 min→·RAG Prompt DesignThe prompt template that wraps retrieved documents and user queries determines whether a RAG system produces faithful, well-cited answers or hallucinates despite having the right information.6 min→·Reranking and Context SelectionInitial retrieval casts a wide net returning 10-50 candidates, but only 3-5 chunks fit the context window — reranking and selection determine which make the cut.7 min→·Retrieval Query DesignThe user's raw question is rarely the optimal retrieval query — transforming it through rewriting, decomposition, and hypothetical document generation dramatically improves what gets retrieved.7 min→

Domain Specific Prompting

·Classification and Extraction at ScaleRunning classification and extraction prompts across thousands of inputs requires batch consistency, drift detection, calibration monitoring, and sampling strategies that single-input prompting does not demand.8 min→·Code Generation PromptingEffective code generation requires specifying language, runtime environment, dependencies, and expected behavior with the same precision as giving an architect both blueprint requirements and building codes.7 min→·Code Review and Debugging PromptsCode review and debugging prompts are fundamentally analytical rather than creative, requiring the model to identify issues in existing code rather than generate new code from scratch.7 min→·Conversational and Dialogue DesignDesigning multi-turn conversational systems requires managing persona consistency, topic flow, graceful redirects, and state tracking across turns — skills distinct from single-turn prompting.8 min→·Creative Writing PromptingCreative writing prompting controls style, tone, and voice through character-level motivation and constraint-based direction rather than prescriptive line-by-line instructions.7 min→·Data Analysis and SummarizationAnalytical prompting requires specifying the type of analysis (extractive vs abstractive, comparative, trend-based), the level of detail, and the analytical framework to produce actionable insights rather than vague summaries.7 min→·Mathematical and Logical PromptingMathematical prompting requires knowing when to leverage the model's reasoning ability, when to delegate to code-based computation, and how to structure verification steps that catch errors.7 min→·Translation and Multilingual PromptingEffective multilingual prompting requires cultural adaptation beyond word-for-word translation, awareness of tokenization cost disparities across languages, and strategies for maintaining quality in lower-resource languages.8 min→

Safety Testing And Production

·A/B Testing and Prompt ExperimentsA/B testing prompts applies controlled experimentation to compare prompt variants with real users, measuring causal impact on task success, user satisfaction, and cost through statistically rigorous traffic splitting.9 min→·Cost and Latency OptimizationCost and latency optimization for LLM applications involves systematic techniques — prompt compression, caching, model routing, and batching — to find the best trade-off on the cost-quality Pareto frontier.10 min→·Guardrails and Output FilteringGuardrails are programmable safety layers that inspect, validate, and filter LLM outputs before they reach the user, functioning as quality control inspectors at the end of a production line.10 min→·Prompt Debugging and Failure AnalysisPrompt debugging systematically identifies why an LLM produces incorrect or unexpected outputs by reproducing failures, isolating causal components, and verifying fixes — applying the same disciplined methodology used to debug software.10 min→·Prompt Injection Defense TechniquesPrompt injection attacks attempt to override or subvert an LLM's intended instructions, and defending against them requires layered security strategies spanning input sanitization, architectural isolation, and runtime detection.9 min→·Prompt Optimization TechniquesPrompt optimization uses systematic methods — ablation studies, component analysis, and automated tuning — to improve prompt performance, analogous to tuning a recipe by changing one ingredient at a time.9 min→·Prompt Testing and EvaluationPrompt evaluation uses structured test datasets, automated scoring methods, and regression testing to systematically measure prompt quality — treating prompts with the same rigor as software code.10 min→·Red-Teaming PromptsRed-teaming is systematic adversarial testing of LLM applications — hiring a locksmith to test your locks — using structured attack taxonomies, human creativity, and automated tools to discover vulnerabilities before real attackers do.9 min→