Versions

Overview of available model variants

Model	Release	Strengths	Weaknesses	Status
MiniMax-M3 Recommended	2026-06-01	1M-token context window Native multimodality (image and video understanding) MSA architecture (MiniMax Sparse Attention) – up to 20x lower per-token compute at 1M context 9x faster prefill, 15x faster decoding vs. predecessor 59.0% on SWE-Bench Pro (beats GPT-5.5 and Gemini 3.1 Pro) Open weights announced (Hugging Face, GitHub)	No native EU cloud availability Official API runs on Chinese infrastructure Open weights not yet final at launch (release within ~10 days announced)	Current
MiniMax-M2.7	2026-05-19	Sparse Mixture-of-Experts (MoE) architecture Optimised for agent teams and complex coding workflows Top performance on SWE-Pro, Terminal Bench 2 and MLE-Bench Lite Native multi-agent collaboration Open source (MIT licence)	Superseded by M3 No native EU cloud availability Official API runs on Chinese infrastructure	Deprecated
MiniMax-M2	October 2025	First generation of the agentic coding model Open source via Hugging Face and GitHub	Superseded by M2.7 and M3	Deprecated

Technical Details

API, features and capabilities

API & Availability

Availability Public

Features & Capabilities

Tool Use Function Calling Structured Output Vision Reasoning Mode Code Execution Web Browsing File Upload

Training & Knowledge

Knowledge Cutoff Early 2026

Fine-Tuning Available (Full Fine-tuning, LoRA)

Language Support

Best Quality English, Chinese

Supported Multilingual

Best quality in English and Chinese

As an AI consultancy from Rosenheim, we support companies across the DACH region (Germany, Austria, Switzerland) with GDPR-compliant integration of open-weight models such as MiniMax-M3. Through self-hosting in EU data centres, the model can be deployed for agentic workflows and long-context tasks in compliance with data protection regulations.

MiniMax-M3: Frontier Coding with 1M Context

MiniMax unveiled MiniMax-M3 on 1 June 2026 – according to the lab, the first open-weight model to combine frontier coding, a 1-million-token context window and native multimodality (image and video understanding) in a single model.

MSA Architecture (MiniMax Sparse Attention)

The new architecture design is the key to its efficiency on long contexts:

Up to 20x lower per-token compute at 1M context compared to the predecessor
More than 9x faster prefill, more than 15x faster decoding
Significantly lower inference cost for long-context workloads

Native Multimodality

M3 understands text, images and video in a unified model – ideal for agentic coding with UI screenshots, document processing and multimodal research.

Coding Performance

59.0% on SWE-Bench Pro – beats GPT-5.5 and Gemini 3.1 Pro, narrowly behind Claude Opus 4.7
Optimised for complex, multi-step software engineering tasks
Well-suited for autonomous agent harnesses

API Pricing

approx. $0.60 / 1M input tokens and $2.40 / 1M output tokens
Launch promotion: 50% off ($0.30 / $1.20)
Substantially cheaper than Claude Opus 4.7 or GPT-5.5 at comparable coding performance

Open Weights

MiniMax has announced that weights will be released on Hugging Face and GitHub within approximately 10 days of launch – enabling self-hosting on your own EU infrastructure.

Previous Generation: MiniMax-M2.7

MiniMax-M2.7 (released May 2026) remains relevant for pure coding-agent workflows without multimodality and with smaller context requirements. It offers:

Sparse Mixture-of-Experts (MoE) architecture
56.22% on SWE-Pro, 57.0% on Terminal Bench 2, 66.6% on MLE-Bench Lite
MIT licence, fully open source

For new projects, however, we recommend going directly with MiniMax-M3 thanks to its 1M context, multimodality and higher coding performance.

EU Deployment Options

Self-Hosting in EU Data Centres

For GDPR compliance, we offer support with:

Deployment on AWS EU regions (Frankfurt, Ireland)
Azure EU regions (West Europe, Germany)
Google Cloud EU regions (Frankfurt, Belgium)
Private cloud or on-premise in your own data centre

Hardware Requirements

MiniMax-M3 and M2.7 are available in various quantisations:

BF16: Full precision for research and benchmarks
FP8: Recommended for production deployment
INT4/INT8: Efficient quantisation for limited resources

Alternative API Access

For rapid prototyping without your own infrastructure:

MiniMax API (platform.minimaxi.com): Official access from MiniMax AI
Third-party providers: Together.ai, Fireworks AI, OpenRouter
Model weights: Hugging Face (MiniMaxAI) and GitHub

Note: Direct API usage occurs via Chinese infrastructure and is not GDPR-compliant without a data processing agreement and self-hosting.

Local or Cloud? Calculate the Cost

For agentic coding workloads, a precise cost calculation between API usage and self-hosting often pays off. The local-vs-cloud AI inference calculator from ai-prices.eu lets you compare hardware, electricity and operating costs against cloud API pricing and determine the break-even point for your workload.

Integration & Support

Our Recommendation

Self-hosting in EU data centres is the best option for GDPR-compliant usage of MiniMax-M3. We support you with:

Infrastructure planning and hardware sizing (especially for 1M-context workloads)
Deployment, quantisation and optimisation
Integration into existing agent frameworks
Compliant usage within CompanyGPT
Fine-tuning for specific use cases

For companies seeking a leading open-weight model with frontier coding, 1M context and native multimodality, MiniMax-M3 is an excellent choice – provided the appropriate infrastructure is available.

Cost estimation for this model

For up-to-date token pricing, model variants and EU availability, see our sister project ai-prices.eu. It helps you compare and estimate the operational cost of leading AI models for your specific use case.

Compare prices on ai-prices.eu

ai-prices.eu is a project by innFactory AI Consulting GmbH and provides transparent cost estimates for leading AI models.

MiniMax-M3

Versions

Use Cases

Technical Details

Hosting & Compliance

Benchmarks