Important Notice for Enterprises: innFactory AI Consulting from Rosenheim recommends Moonshot Kimi exclusively for self-hosted deployments. When using the API, data is processed in China - only self-hosting in your own infrastructure ensures GDPR compliance for the DACH region (Germany, Austria, Switzerland).

Latest Update: Kimi K3 (July 2026)

On July 16, 2026, Moonshot AI unveiled Kimi K3 – at 2.8 trillion parameters the company’s largest model to date and more than twice the size of the previous flagship K2.6. The MoE architecture activates 16 of 896 experts per token, the context window grows to 1 million tokens, and image input is supported natively.

Artificial Analysis leaderboard: debut at #3 – behind Claude Fable 5 and GPT-5.6 Sol, ahead of Claude Opus 4.8 and GPT-5.5 High in many individual benchmarks
Leader on the Arena.ai frontend code arena
According to Moonshot’s own long-horizon evaluation, Elo 1547 (+732 vs. K2.6) – a vendor figure, to be interpreted with appropriate caution
~21% fewer output tokens than K2.6, currently only one reasoning level (“max”)
API pricing: ~$3.00 / 1M input, ~$15.00 / 1M output – well above K2.x prices

Important for EU deployments: The open weights are not yet available at release (announced by July 27, 2026). Until then, K3 runs exclusively via website and API with data processing in China – meaning there is currently no GDPR-compliant path for K3. For productive EU use, self-hosted Kimi K2.6 remains our recommendation until the K3 weights are published and the hardware requirements for self-hosting are clarified.

Kimi K2.7 Code (June 2026)

On June 12, 2026, Moonshot AI released Kimi K2.7 Code - a coding-focused successor to the flagship Kimi K2.6 (released April 20, 2026). As usual, the model is available as open weights under the Modified MIT License, with a clear focus on software development and agentic coding pipelines.

1T parameter MoE architecture, ~32B active, 384 experts, 256K context window
Forced thinking mode and OpenAI-compatible API
Kimi Code Bench v2: 21.8% - an improvement over K2.6
Pricing: ~$0.95 / 1M input tokens, ~$4.00 / 1M output tokens

Important benchmark caveat: The scores cited above come from Moonshot’s own evaluation suites. No independent third-party verification was available at the release date, so these numbers should be interpreted with appropriate caution.

For EU deployments, self-hosting on your own EU infrastructure via the open weights on Hugging Face remains the cleanest GDPR path. Managed clouds currently offer related models (AWS Bedrock: Kimi K2 Thinking / K2.5, EU via Frankfurt; Azure Foundry: Kimi K2.6 / K2.5 in Preview), but K2.7 Code itself is best run self-hosted for now. As the general flagship, Kimi K2.6 remains our recommendation; K2.7 Code primarily addresses coding use cases.

Kimi K2.6 (April 2026)

On April 20, 2026, Moonshot AI released Kimi K2.6 as a generally available model - eight days after the K2.6 Code Preview. The new generation significantly outperforms K2.5 on agentic coding and long-horizon workflows:

SWE-Bench Pro: 58.6% (vs. 50.7% for K2.5) - beats GPT-5.4 (57.7%) and Claude Opus 4.6 (53.4%)
Terminal-Bench 2.0: 66.7% (up from 50.8% for K2.5)
BrowseComp (Agent Swarm): 86.3% (up from 78.4% for K2.5)
Toolathlon: 50.0% (up from 27.8% for K2.5)
Agent Swarm scaling: up to 300 parallel sub-agents and 4,000 coordinated steps
Day-0 availability on Cloudflare Workers AI (@cf/moonshotai/kimi-k2.6)
1T parameter MoE architecture, 32B active, 256K context, native INT4 quantization
Open-weight on Hugging Face (moonshotai/Kimi-K2.6)

K2.6 is therefore currently our recommended Kimi model for self-hosting and agentic coding pipelines. The successor Kimi K3 was released on July 16, 2026 (see above) – as long as its open weights have not been published, K2.6 remains the first choice for EU operation.

Key Strengths

Ultra-Long Context (256K Tokens)

Kimi K2.5 offers one of the longest context windows in the open-source space:

256,000 tokens - approximately 640 text pages
Ideal for extensive document analysis
Enables complex multi-step reasoning over long conversations
Surpasses most available open-source LLMs

Mixture-of-Experts Architecture

Innovative MoE architecture for maximum efficiency:

1 trillion parameters total
Only 32 billion parameters active per request
384 experts, 8 activated per token
Multi-Head Latent Attention (MLA) for efficient processing
Stable training through MuonClip optimizer
MMLU-Pro: 87.1% - Competitive with leading proprietary models in general knowledge

Native Multimodality (K2.5)

Kimi K2.5 is fully multimodal-trained with outstanding benchmark results:

MoonViT Vision Encoder (400M parameters)
Processes images, videos, PDFs, and text natively
~15 trillion mixed visual/text tokens in training
MMMU-Pro: 78.5% - State-of-the-art visual multimodal understanding
MathVision: 84.2% - Mathematics with visual integration
OCRBench: 92.3% - Best OCR performance of all tested models
OmniDocBench: 88.8% - Excellent document understanding
Code generation from UI designs possible

State-of-the-Art Coding

Outstanding performance in programming tasks (2026 results):

LiveCodeBench v6: 85.0% pass@1 - highest score among open-source models
SWE-Bench Verified: 76.8% success rate (agentic coding) - close to proprietary frontier models
Full-stack capabilities: Frontend, Backend, DevOps
Debugging and automatic code optimization
Precise function/tool calling for API integration
AIME 2025: 96.1% (Math Olympiad Level) - excellent mathematical problem-solving

Agentic Intelligence

Specialized in autonomous reasoning workflows:

Native tool-calling support
Multi-step task decomposition (“think → act → re-think”)
Agent Swarm System: Up to 100 parallel sub-agents for complex tasks
Nearly 100% precision in API-based tool calls
Kimi K2 Thinking: Transparent chain-of-thought processes
GPQA-Diamond: 87.6% - Graduate-level reasoning performance

Important Considerations

Privacy Considerations

Important for Enterprises: Moonshot AI is a Chinese company. When using the API, data is processed on servers in China. For sensitive enterprise data, we strongly recommend self-hosting.

Chinese Company: Server location China (with API usage)
GDPR: No explicit GDPR compliance with API usage
No EU Regions: Not available through AWS Bedrock, Azure AI, or Vertex AI
Recommendation: Self-hosting in your own EU infrastructure

Self-Hosting as Solution

The open-source model can be operated in your own infrastructure:

All data remains under your control
No dependency on external APIs
Full GDPR compliance possible
Hardware Requirements: 8x H100 GPUs for full 256K context (INT4)
Support for vLLM, SGLang, TensorRT-LLM deployment

Resource Requirements

Kimi is a resource-intensive model:

Full Context (256K): 8x NVIDIA H100 (INT4 quantization)
Standard Context (128K): 4-6x high-end GPUs
INT4, INT8, and FP16 quantization supported
For smaller deployments: Use shorter context windows

Technical Highlights

Model Variants

Moonshot AI offers various specializations:

Kimi K3: New flagship (July 2026), 2.8T parameters, 1M context – open weights announced (by July 27, 2026)
Kimi-K2.6: Long-horizon coding, 300-agent swarm, 256K context – current self-hosting recommendation
Kimi-K2.5: Multimodal, 256K context, predecessor of K2.6
Kimi-K2-Thinking: Reasoning-focused, transparent thought processes
Kimi-K2-Instruct: General-purpose chat, 128K context
Kimi-K2-Base: Foundation model for custom fine-tuning

Open-Source Licensing

Transparent open-source availability:

Modified MIT License - permissive usage
Complete model weights on Hugging Face
Active community on GitHub
Technical reports and benchmarks publicly available

Integration with CompanyGPT

If you wish to operate Kimi in your own infrastructure, we support you with integration into CompanyGPT - our GDPR-compliant chat platform for enterprises.

Availability

API Access:

Public API available via platform.moonshot.ai
Kimi K3 available via website and API since July 16, 2026 (~$3.00 / 1M input, ~$15.00 / 1M output)
OpenAI-compatible API format
Server location: China
Not recommended for sensitive EU data

Self-Hosting:

Model Weights: Hugging Face (moonshotai/Kimi-K2.6, moonshotai/Kimi-K2.5)
GitHub Repository: MoonshotAI/Kimi-K2
Deployment via vLLM, SGLang, TensorRT-LLM
New: Kimi K2.6 on Cloudflare Workers AI with Day-0 support (since April 20, 2026)
Kimi K2.5 on Cloudflare Workers AI (since March 19, 2026)
NVIDIA NIM integration available
Recommended for GDPR-compliant usage

Our Recommendation

Kimi K3 marks a leap for open-weight models: #3 on the Artificial Analysis leaderboard, 1M context, and native multimodality. However, as long as the open weights are not published (announced by July 27, 2026), the only access is via the China-based API – making it unusable for DACH enterprises for now. Kimi K2.6 remains technically impressive with state-of-the-art performance in agentic coding (SWE-Bench Pro 58.6%), long-horizon workflows, and multimodal tasks.

For DACH enterprises, we recommend exclusively self-hosting in your own EU infrastructure. When using the API via platform.moonshot.ai, data is processed in China, which is GDPR-critical.

Best Choice: Self-hosted Kimi-K2.6 for agentic coding pipelines and multimodal applications or Kimi-K2-Thinking for complex reasoning tasks - each in your own infrastructure with full data control. Once the K3 weights are released, we will assess its self-hosting suitability (at 2.8T parameters, hardware requirements are likely to be substantial).

Alternative: For companies without their own GPU infrastructure, we recommend EU-hosted alternatives like Anthropic Claude or Google Gemini via corresponding EU regions.

Model	Release	Strengths	Weaknesses	Status
Kimi K3	2026-07-16	2.8T parameter MoE (16 of 896 experts active) - largest Kimi model to date 1M token context window, multimodal input (text + image) Artificial Analysis leaderboard: debut at #3 behind Claude Fable 5 and GPT-5.6 Sol Leader on the Arena.ai frontend code arena ~21% fewer output tokens than K2.6 (according to Moonshot) OpenAI-compatible API	Open weights not yet available at release (announced by July 27, 2026) API pricing well above K2.x: ~$3.00 / 1M input, ~$15.00 / 1M output Only one reasoning level (max), high time-to-first-token in early measurements	Current
Kimi K2.7 Code	2026-06-12	Coding-focused successor to the K2.6 flagship 1T parameter MoE, ~32B active, 384 experts, 256K context Forced thinking mode, OpenAI-compatible API Kimi Code Bench v2: 21.8% (improvement over K2.6) Open weights under Modified MIT License Low pricing: ~$0.95 / 1M input, ~$4.00 / 1M output tokens	Benchmarks come from Moonshot's own eval suites - no independent third-party verification at release K2.7 Code itself not yet on EU managed clouds (best run self-hosted)	Current
Kimi-K2.6 Recommended	2026-04-20	Long-Horizon Coding & Agentic Workflows Agent Swarm up to 300 Sub-Agents / 4,000 coordinated steps SWE-Bench Pro 58.6% - beats GPT-5.4 (57.7%) and Claude Opus 4.6 (53.4%) Terminal-Bench 2.0: 66.7% (up from 50.8% on K2.5) 256K Context Window, native INT4 quantization Day-0 availability on Cloudflare Workers AI (April 2026) Open-weight under Modified MIT License	Very resource-intensive for full 256K context self-hosting Limited EU cloud options (no AWS Bedrock / Azure / Vertex)	Current
Kimi-K2.5	2026-01-27	Native Multimodal (Image, Video, PDF, Text) 256K Context (up to 2M in long-context eval) MoonViT Vision Encoder State-of-the-art Coding (LiveCodeBench 85.0%) MMLU-Pro 87.1% - Top-Tier Knowledge Agent Swarm up to 100 Sub-Agents / 1,500 Tool Calls 36M+ monthly active users On Cloudflare Workers AI (March 2026)	Very resource-intensive (8x H100 for 256K)	Current
Kimi-K2-Thinking	2025	Chain-of-Thought Reasoning 256K Context Window Agentic Workflows Best Open-Source Thinking Model	Resource-intensive	Current
Kimi-K2-Instruct	2025	General-Purpose Chat 128K Context Window Stable and reliable	—	Current
Kimi-K2-Base	2025	Foundation Model Fine-tuning possible	Requires Post-Training	Current

Moonshot Kimi

Versions

Use Cases

Technical Details

Hosting & Compliance

Benchmarks