Microsoft unveils Phi-4-reasoning models, Phi-4-reasoning-plus, and Phi-4-mini-reasoning models, setting a new benchmark in efficient, small language models (SLMs) with advanced reasoning capabilities.
Table of Contents
Microsoft Launches New Phi-4 Reasoning Models
Microsoft has officially launched a new generation of its small language models (SLMs) under the Phi-4 series, introducing Phi-4 reasoning models, Phi-4-reasoning-plus, and Phi-4-mini-reasoning. This significant announcement ushers in what the company calls a “new era of AI”, aimed at making powerful reasoning capabilities more accessible, efficient, and scalable—even on low-resource devices.
These models demonstrate Microsoft’s increasing dominance in the space of compact yet high-performing AI systems, competing against much larger models in reasoning tasks, problem solving, and agentic applications.
A Brief Look Back: The Rise of Small Language Models
One year ago, Microsoft introduced its first major venture into small language models with the Phi-3 family. These models, launched via Azure AI Foundry, were born from Microsoft’s foundational research aimed at delivering high-performance AI tools optimized for efficiency.
The goal was clear: expand the range of AI models so that developers, enterprises, and edge systems alike could benefit from powerful AI—without the computational demands of giant models like GPT-4 or Claude Opus.
With the introduction of the Phi-4 Reasoning Models, Microsoft is not only continuing that mission but also redefining what is achievable in the small-model category.
Introducing the Phi-4 Reasoning Models
Microsoft’s new launch includes three variants:
- Phi-4-reasoning
- Phi-4-reasoning-plus
- Phi-4-mini-reasoning
These models represent a significant advancement in reasoning capabilities within a small parameter footprint. They are specifically engineered to handle complex, multi-step cognitive tasks traditionally reserved for large language models.
Let’s break down what makes each of these models unique.

Phi-4-Reasoning: Compact Yet Competitive
Phi-4-reasoning is a 14-billion parameter open-weight model trained using supervised fine-tuning (SFT) on highly curated reasoning datasets, including demonstrations generated by OpenAI’s o3-mini. This model is optimized for:
- Multi-step logical reasoning
- Mathematical problem-solving
- Algorithmic thinking
- Planning and coding tasks
Despite its modest size, Phi-4-reasoning delivers performance that matches or exceeds much larger models, including OpenAI’s o1-mini and DeepSeek-R1-Distill-Llama-70B. On the AIME 2025 test (used to qualify for the USA Math Olympiad), it outperforms even the 671-billion parameter DeepSeek-R1 model.
Available now on Azure AI Foundry and Hugging Face, this model is proof that size is no longer the sole determinant of AI capability.
Phi-4 Reasoning Models: More Tokens, More Power
Phi-4 Reasoning Models is an enhanced version of the base reasoning model. It introduces reinforcement learning (RL) to take advantage of 1.5x more tokens during inference. This allows the model to think deeper and generate even more accurate responses.
Key highlights:
- Trained using reinforcement learning (RLHF)
- Uses 1.5 times more tokens at inference time
- Delivers superior accuracy compared to both Phi-4-reasoning and larger competing models
Phi-4 Reasoning Models is particularly adept at high-stakes applications such as scientific research, philosophical deduction, and complex data analytics.
Phi-4-Mini-Reasoning: Small but Smart
Phi-4-mini-reasoning is designed for resource-constrained environments, such as:
- Mobile devices
- Embedded systems
- Edge computing
- Educational applications
This compact transformer-based model is optimized specifically for mathematical reasoning. It was fine-tuned using synthetic datasets generated by DeepSeek-R1 and includes over one million diverse math problems ranging from middle school to Ph.D. level.
Ideal use cases include:
- On-device tutoring assistants
- Interactive learning platforms
- Offline AI-powered educational tools
Despite its small size, Phi-4-mini-reasoning delivers a powerful performance in constrained environments—providing step-by-step explanations and precise answers.
A Step Beyond Phi-4: Why These Models Matter
The Phi-4-reasoning models mark a significant leap in model efficiency and reasoning accuracy. Microsoft has proven that smaller models can outperform larger ones—provided they are trained with the right techniques and data.
Here’s what’s different:
Feature | Phi-4-Reasoning | Phi-4-Reasoning-Plus | Phi-4-Mini-Reasoning |
---|---|---|---|
Parameters | 14B | 14B+ RL & token scaling | Lightweight (undisclosed) |
Optimization | Supervised fine-tuning | Reinforcement learning | Synthetic math data |
Strengths | General reasoning, math | Accuracy, depth | On-device math tutoring |
Deployment Platforms | Azure, HuggingFace | Azure, HuggingFace | Mobile, edge devices |
Phi in Windows 11 and Copilot+ PCs
One of the most exciting integrations of Phi models is within Windows 11 and Copilot+ PCs. A new variant, called Phi Silica, has been optimized to run directly on NPUs (Neural Processing Units) in modern PCs.
Highlights of this integration:
- Preloaded in memory for ultra-fast token response
- OS-managed with low power consumption
- Integrated into native features like Click to Do
- Available in Outlook for offline Copilot summarization
- Available to developers as APIs for third-party applications
These integrations make Phi an everyday AI assistant, capable of operating locally without the need for persistent cloud connectivity.
Built with Responsible AI
Microsoft continues to emphasize responsibility and safety in all its AI endeavors. The Phi-4 family was developed following Microsoft’s six AI principles:
- Accountability
- Transparency
- Fairness
- Reliability & Safety
- Privacy & Security
- Inclusiveness
To uphold these standards, the models were trained using:
- Supervised Fine-Tuning (SFT)
- Direct Preference Optimization (DPO)
- Reinforcement Learning from Human Feedback (RLHF)
These methods ensure that the models are not only intelligent but also safe, unbiased, and ethical in their interactions. Microsoft has also released detailed model cards, highlighting known limitations, performance benchmarks, and usage guidelines.
Performance Benchmarks: The Evidence
Microsoft’s technical report on Phi-4 reasoning models provides extensive quantitative analysis. Highlights include:
- Outperforming DeepSeek-R1-Distill-70B in math, logic, and science
- Competing with DeepSeek-R1 (671B) in Ph.D.-level problem solving
- Near state-of-the-art performance on reasoning benchmarks like:
- GSM8K (grade school math)
- MATH (advanced math)
- AQUA-RAT (arithmetic reasoning)
- OpenBookQA (science comprehension)
These results show that Microsoft has not only shrunk the model size but also scaled up performance, setting a new benchmark in small model efficiency.
Availability and Access
All three models—Phi-4-reasoning, Phi-4-reasoning-plus, and Phi-4-mini-reasoning—are available now:
- Azure AI Foundry
- HuggingFace Model Hub
- Windows 11 Copilot+ PCs (coming soon)
Microsoft is encouraging developers, researchers, and businesses to start exploring the capabilities of these models across their applications.
The Future of Small Language Models
Microsoft’s continued investment in small, powerful, and efficient models signals a clear trend in AI development: smaller is smarter—when done right.
The Phi-4 reasoning models:
- Break the myth that bigger is always better
- Enable edge-first and offline-first AI applications
- Make AI more inclusive and accessible by lowering hardware requirements
- Create possibilities for safe and transparent reasoning AI in real-world tasks
Final Thoughts
With the introduction of the Phi-4 reasoning models, Microsoft is not just competing with frontier models like GPT-4 or Claude—but redefining the playing field. By optimizing for speed, size, safety, and reasoning, the Phi-4 family sets a new gold standard in practical, scalable, and responsible AI.
Whether you’re a developer working on embedded systems, an educator seeking smart tutoring tools, or an enterprise looking to power intelligent assistants, the Phi-4 reasoning models provide the tools to build AI that is fast, lightweight, and powerful.