Latest Research · Module 36·6 min read

Kimi K2

Moonshot AI’s second-generation Kimi model. Built on the line’s defining bet — extreme long context — and updated with the reasoning-model techniques the rest of the field developed in 2024–2025.

The five-bullet version

  • Kimi (from Beijing’s Moonshot AI) was an early long-context model — supporting context lengths in the multi-million-token range.
  • K2 is the second-generation release, building on K1.5 with stronger reasoning capabilities.
  • The long-context bet: rather than retrieve, just put the whole document (or repo, or book) in the prompt.
  • K2 combines that with modern post-training: SFT + RL, reasoning mode, tool use.
  • Part of the broader Chinese LLM ecosystem (Qwen, DeepSeek, Yi, GLM, Kimi) that became globally competitive in 2024–2026.

§ 00 · THE KIMI LINEMoonshot AI’s contribution

KimiKimi. A line of LLMs from Moonshot AI (Beijing), distinguished by extreme long-context support. The first Kimi model in 2023 offered 200k tokens; later versions reached 1M+ and reportedly 10M in some research configurations. K2 is the second-generation flagship. is the LLM line from Moonshot AI, a Beijing-based company that spent its early years differentiating on long context. Where most 2023 models stopped at 4k–32k tokens, Kimi was shipping 200k context from the start — before frontier US labs reached the same milestone.

The release line:

§ 01 · LONG CONTEXT AS THE ORIGINAL BETSkipping RAG by brute force

The thesis behind Kimi’s long-context emphasis: many tasks that look like retrieval-augmented generation problems can be solved by just stuffing the whole source into the prompt. If you have 1M tokens of context:

For these cases, retrieval is no longer a hard requirement. The engineering complexity of building a RAG system goes away. The trade-off is cost: long contexts are expensive to serve (see KV Cache lesson). For applications where the cost is acceptable, this is a real simplification.

§ 02 · WHAT K2 ADDEDSpecific updates

The K2 release pushed on three directions:

§ 03 · WHERE THIS MODEL IS COMPETITIVEPractical positioning

Kimi K2 is in the bracket of strong open and semi-open Chinese models alongside Qwen 3, DeepSeek-V3/R1, and GLM-4. Comparative strengths:

§ 04 · THE LONG-CONTEXT ARMS RACEHow the field caught up

Long context was Kimi’s original differentiator, but by 2025 the field caught up:

The Kimi line’s response has been to push further (10M-token research configurations) and to combine long context with stronger reasoning. Whether the long-context-first strategy remains a differentiator long-term depends on whether 1M+ becomes table stakes across the field.

CHECKA team is building a tool that summarizes year-long meeting transcripts (hundreds of meetings, ~2M tokens total). They want a single model, not a RAG pipeline. Which type of model fits best?

§ 05 · TAKING THIS FORWARDRelated context-length topics

For why long context is technically hard, see the KV Cache lesson — long contexts grow the cache linearly in tokens, which is the dominant inference-cost factor. For when to use long context vs RAG, see Context Engineering and Advanced RAG.

§ · GOING DEEPERLong context and reasoning in the Kimi line

Moonshot AI’s Kimi models established themselves on long-context document understanding — the original Kimi Chat offered 200K-token contexts when frontier models were still at 32K. The Kimi-K1.5 technical report (2024) documented the training recipe: a mixed-modal architecture, long-context RL, and the engineering required to make million-token inference economical.

Kimi K2 (2025) brought the reasoning-model recipe to the long-context regime: RL on verifiable rewards combined with retention of multi-million-token context handling. The long-context piece depends on a constellation of infrastructure work — position-encoding extensions like YaRN (Peng et al. 2023), serving optimizations for sparse attention, and training-time exposure to long sequences. Worth following for anyone interested in retrieval-free long-document workloads.

§ · FURTHER READINGReferences & deeper sources

  1. Moonshot AI (2024). Kimi-K1.5: Scaling Reinforcement Learning with LLMs · arXiv
  2. Chen et al. (2023). Extending Context Window of Large Language Models via Position Interpolation · arXiv
  3. Peng, Quesnelle, Fan, Shippole (2023). YaRN: Efficient Context Window Extension of Large Language Models · ICLR
  4. Su et al. (2021). RoFormer: Enhanced Transformer with Rotary Position Embedding · arXiv
  5. Dao (2023). FlashAttention-2 · arXiv

Original figures live in the linked sources — open the papers for the canonical visuals in their full context.