Skip to main content
9 – 17 UHR +49 8031 3508270 LUITPOLDSTR. 9, 83022 ROSENHEIM
DE / EN
LLM Moonshot AI China

Moonshot Kimi

Moonshot Kimi K2.6 - Open-source model with 256K context, 300-agent swarm orchestration and SOTA coding (SWE-Bench Pro 58.6%). On Cloudflare Workers AI. Only self-hosting is GDPR-compliant!

License Modified MIT
GDPR Hosting Available
Context 256K (standard), up to 2M (long-context evaluation) Tokens
Modality Text, Image, PDF, Video, Code → Text, Code

Versions

Overview of available model variants

ModelReleaseEUStrengthsWeaknessesStatus
Kimi-K2.6 Recommended
2026-04-20
Long-Horizon Coding & Agentic Workflows Agent Swarm up to 300 Sub-Agents / 4,000 coordinated steps SWE-Bench Pro 58.6% - beats GPT-5.4 (57.7%) and Claude Opus 4.6 (53.4%) Terminal-Bench 2.0: 66.7% (up from 50.8% on K2.5) 256K Context Window, native INT4 quantization Day-0 availability on Cloudflare Workers AI (April 2026) Open-weight under Modified MIT License
Very resource-intensive for full 256K context self-hosting Limited EU cloud options (no AWS Bedrock / Azure / Vertex)
Current
Kimi-K2.5
2026-01-27
Native Multimodal (Image, Video, PDF, Text) 256K Context (up to 2M in long-context eval) MoonViT Vision Encoder State-of-the-art Coding (LiveCodeBench 85.0%) MMLU-Pro 87.1% - Top-Tier Knowledge Agent Swarm up to 100 Sub-Agents / 1,500 Tool Calls 36M+ monthly active users On Cloudflare Workers AI (March 2026)
Very resource-intensive (8x H100 for 256K)
Current
Kimi-K2-Thinking
2025
Chain-of-Thought Reasoning 256K Context Window Agentic Workflows Best Open-Source Thinking Model
Resource-intensive
Current
Kimi-K2-Instruct
2025
General-Purpose Chat 128K Context Window Stable and reliable
Current
Kimi-K2-Base
2025
Foundation Model Fine-tuning possible
Requires Post-Training
Current

Use Cases

Typical applications for this model

Coding & Software Development
Agentic AI Workflows
Long Document Analysis
Multimodal Reasoning Tasks
Self-Hosted Enterprise Deployments
Research & Development
Visual Question Answering

Technical Details

API, features and capabilities

API & Availability
Availability Public
Latency (TTFT) ~1200ms
Features & Capabilities
Tool Use Function Calling Structured Output Vision Reasoning Mode File Upload
Training & Knowledge
Knowledge Cutoff 2025-12
Fine-Tuning Available (LoRA, Full Fine-Tuning, PEFT)
Language Support
Best Quality English, Chinese
Supported Multilingual
Best quality in English and Chinese

Hosting & Compliance

GDPR-compliant hosting options and licensing

GDPR-Compliant Hosting Options
Self-Hosted
Own EU Infrastructure
Recommended for GDPR compliance
License & Hosting
License Modified MIT
Security Filters Customizable
On-Premise

Benchmarks

Performance comparison with standardized tests

LiveCodeBench v6
85.0%
SWE-Bench Verified
76.8%
MMLU-Pro
87.1%
AIME 2025
96.1%
GPQA-Diamond
87.6%
MMMU-Pro
78.5%
MathVision
84.2%
OCRBench
92.3%
OmniDocBench
88.8%

Important Notice for Enterprises: innFactory AI Consulting from Rosenheim recommends Moonshot Kimi exclusively for self-hosted deployments. When using the API, data is processed in China - only self-hosting in your own infrastructure ensures GDPR compliance for the DACH region (Germany, Austria, Switzerland).

Latest Update: Kimi K2.6 (April 2026)

On April 20, 2026, Moonshot AI released Kimi K2.6 as a generally available model - eight days after the K2.6 Code Preview. The new generation significantly outperforms K2.5 on agentic coding and long-horizon workflows:

  • SWE-Bench Pro: 58.6% (vs. 50.7% for K2.5) - beats GPT-5.4 (57.7%) and Claude Opus 4.6 (53.4%)
  • Terminal-Bench 2.0: 66.7% (up from 50.8% for K2.5)
  • BrowseComp (Agent Swarm): 86.3% (up from 78.4% for K2.5)
  • Toolathlon: 50.0% (up from 27.8% for K2.5)
  • Agent Swarm scaling: up to 300 parallel sub-agents and 4,000 coordinated steps
  • Day-0 availability on Cloudflare Workers AI (@cf/moonshotai/kimi-k2.6)
  • 1T parameter MoE architecture, 32B active, 256K context, native INT4 quantization
  • Open-weight on Hugging Face (moonshotai/Kimi-K2.6)

K2.6 is therefore currently our recommended Kimi model for self-hosting and agentic coding pipelines. A successor Kimi K3 (rumored 2.5T+ parameters and 1M context) is announced for Q3 2026 but has not been released as of June 2026.

Key Strengths

Ultra-Long Context (256K Tokens)

Kimi K2.5 offers one of the longest context windows in the open-source space:

  • 256,000 tokens - approximately 640 text pages
  • Ideal for extensive document analysis
  • Enables complex multi-step reasoning over long conversations
  • Surpasses most available open-source LLMs

Mixture-of-Experts Architecture

Innovative MoE architecture for maximum efficiency:

  • 1 trillion parameters total
  • Only 32 billion parameters active per request
  • 384 experts, 8 activated per token
  • Multi-Head Latent Attention (MLA) for efficient processing
  • Stable training through MuonClip optimizer
  • MMLU-Pro: 87.1% - Competitive with leading proprietary models in general knowledge

Native Multimodality (K2.5)

Kimi K2.5 is fully multimodal-trained with outstanding benchmark results:

  • MoonViT Vision Encoder (400M parameters)
  • Processes images, videos, PDFs, and text natively
  • ~15 trillion mixed visual/text tokens in training
  • MMMU-Pro: 78.5% - State-of-the-art visual multimodal understanding
  • MathVision: 84.2% - Mathematics with visual integration
  • OCRBench: 92.3% - Best OCR performance of all tested models
  • OmniDocBench: 88.8% - Excellent document understanding
  • Code generation from UI designs possible

State-of-the-Art Coding

Outstanding performance in programming tasks (2026 results):

  • LiveCodeBench v6: 85.0% pass@1 - highest score among open-source models
  • SWE-Bench Verified: 76.8% success rate (agentic coding) - close to proprietary frontier models
  • Full-stack capabilities: Frontend, Backend, DevOps
  • Debugging and automatic code optimization
  • Precise function/tool calling for API integration
  • AIME 2025: 96.1% (Math Olympiad Level) - excellent mathematical problem-solving

Agentic Intelligence

Specialized in autonomous reasoning workflows:

  • Native tool-calling support
  • Multi-step task decomposition (“think → act → re-think”)
  • Agent Swarm System: Up to 100 parallel sub-agents for complex tasks
  • Nearly 100% precision in API-based tool calls
  • Kimi K2 Thinking: Transparent chain-of-thought processes
  • GPQA-Diamond: 87.6% - Graduate-level reasoning performance

Important Considerations

Privacy Considerations

Important for Enterprises: Moonshot AI is a Chinese company. When using the API, data is processed on servers in China. For sensitive enterprise data, we strongly recommend self-hosting.

  • Chinese Company: Server location China (with API usage)
  • GDPR: No explicit GDPR compliance with API usage
  • No EU Regions: Not available through AWS Bedrock, Azure AI, or Vertex AI
  • Recommendation: Self-hosting in your own EU infrastructure

Self-Hosting as Solution

The open-source model can be operated in your own infrastructure:

  • All data remains under your control
  • No dependency on external APIs
  • Full GDPR compliance possible
  • Hardware Requirements: 8x H100 GPUs for full 256K context (INT4)
  • Support for vLLM, SGLang, TensorRT-LLM deployment

Resource Requirements

Kimi is a resource-intensive model:

  • Full Context (256K): 8x NVIDIA H100 (INT4 quantization)
  • Standard Context (128K): 4-6x high-end GPUs
  • INT4, INT8, and FP16 quantization supported
  • For smaller deployments: Use shorter context windows

Technical Highlights

Model Variants

Moonshot AI offers various specializations:

  • Kimi-K2.6: Current generation, long-horizon coding, 300-agent swarm, 256K context
  • Kimi-K2.5: Multimodal, 256K context, predecessor of K2.6
  • Kimi-K2-Thinking: Reasoning-focused, transparent thought processes
  • Kimi-K2-Instruct: General-purpose chat, 128K context
  • Kimi-K2-Base: Foundation model for custom fine-tuning

Open-Source Licensing

Transparent open-source availability:

  • Modified MIT License - permissive usage
  • Complete model weights on Hugging Face
  • Active community on GitHub
  • Technical reports and benchmarks publicly available

Integration with CompanyGPT

If you wish to operate Kimi in your own infrastructure, we support you with integration into CompanyGPT - our GDPR-compliant chat platform for enterprises.

Availability

API Access:

  • Public API available via platform.moonshot.ai
  • OpenAI-compatible API format
  • Server location: China
  • Not recommended for sensitive EU data

Self-Hosting:

  • Model Weights: Hugging Face (moonshotai/Kimi-K2.6, moonshotai/Kimi-K2.5)
  • GitHub Repository: MoonshotAI/Kimi-K2
  • Deployment via vLLM, SGLang, TensorRT-LLM
  • New: Kimi K2.6 on Cloudflare Workers AI with Day-0 support (since April 20, 2026)
  • Kimi K2.5 on Cloudflare Workers AI (since March 19, 2026)
  • NVIDIA NIM integration available
  • Recommended for GDPR-compliant usage

Our Recommendation

Kimi K2.6 is technically impressive and offers state-of-the-art performance in agentic coding (SWE-Bench Pro 58.6%), long-horizon workflows, and multimodal tasks - currently beating even GPT-5.4 and Claude Opus 4.6 on SWE-Bench Pro. The ultra-long 256K context is unique in the open-source space.

For DACH enterprises, we recommend exclusively self-hosting in your own EU infrastructure. When using the API via platform.moonshot.ai, data is processed in China, which is GDPR-critical.

Best Choice: Self-hosted Kimi-K2.6 for agentic coding pipelines and multimodal applications or Kimi-K2-Thinking for complex reasoning tasks - each in your own infrastructure with full data control.

Alternative: For companies without their own GPU infrastructure, we recommend EU-hosted alternatives like Anthropic Claude or Google Gemini via corresponding EU regions.

Cost estimation for this model

For up-to-date token pricing, model variants and EU availability, see our sister project ai-prices.eu. It helps you compare and estimate the operational cost of leading AI models for your specific use case.

Compare prices on ai-prices.eu

ai-prices.eu is a project by innFactory AI Consulting GmbH and provides transparent cost estimates for leading AI models.

Consultation for this model?

We help you select and integrate the right AI model for your use case.