MiniMax M3 Review 2026: 15.6x Faster Decoding

What is MiniMax M3?

MiniMax, the Shanghai-based AI lab backed by Tencent, Alibaba, and miHoYo (the studio behind Genshin Impact), has teased its next-generation M3 AI model. The company claims the M3 model achieves 15.6x faster decoding speed and 9.7x faster prefill speed compared to M2 when processing 1M-token contexts [citation:4].

Technical Innovation: MiniMax Sparse Attention (MSA)

The secret sauce behind the M3 teaser is something MiniMax calls MiniMax Sparse Attention (MSA). It is built on a technique called GQA-driven dynamic block selection. Instead of having the model pay attention to every single piece of information in a massive context window, MSA intelligently picks which blocks of data actually matter for a given query. The result is dramatically less compute for roughly the same quality of output [citation:4].

Key Performance Claims

15.6x faster decoding than M2 for 1M-token contexts
9.7x faster prefill than M2 for 1M-token contexts
Maintains output quality comparable to M2
1M-token context window support

Company Context

MiniMax listed on the Hong Kong Stock Exchange in January 2026. Its backers—Tencent, Alibaba, and miHoYo—represent a cross-section of China tech and gaming elite [citation:4]. The company previously released the M2 model series (M2, M2.5, and M2.7) with detailed technical reports on engineering innovations.

What We Don't Know Yet

Parameter count
Licensing terms
Release date
Full benchmark results

Pricing

Not yet announced.

Pros

Dramatic speed improvements (15.6x)
Maintains output quality
Backed by major investors
Built on proven M2 foundation
MSA is novel technical approach

Cons

Teaser only — no release date
No parameter count yet
No independent benchmarks
China-based model (data governance)

Who Should Watch?

Perfect for: AI researchers, efficiency-focused developers, and organizations needing high-speed long-context models.

Verdict

MiniMax M3 represents a potential breakthrough in efficient attention mechanisms. If MSA delivers on its promises, it could significantly reduce inference costs for long-context applications [citation:4].

Rating: N/A (Teaser) - Watch for official release.

Search AI Hub