Microsoft releases three multimodal AI models to compete with OpenAI and Google
Microsoft AI released three foundational models on April 2: MAI-Transcribe-1 for speech-to-text across 25 languages, MAI-Voice-1 for audio generation, and MAI-Image-2 for video generation. The company positions these models as cheaper alternatives to Google and OpenAI offerings. Models are available on Microsoft Foundry with pricing starting at $0.36 per hour for transcription.
Microsoft Releases Three Multimodal Models to Compete With OpenAI and Google
Microsoft AI announced the release of three foundational models on April 2, 2026: MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2. All three are now available on Microsoft Foundry, with the transcription and voice models also available in MAI Playground, a new large language model testing platform launched March 19.
The models were developed by Microsoft's MAI Superintelligence team, led by CEO Mustafa Suleyman. The team was formed and announced in November 2025.
Model Capabilities and Performance
MAI-Transcribe-1 converts speech to text across 25 languages and is 2.5 times faster than Microsoft's Azure Fast offering, according to the company. Pricing starts at $0.36 per hour.
MAI-Voice-1 generates audio, producing 60 seconds of audio output in one second. The model supports custom voice creation. Pricing begins at $22 per 1 million characters.
MAI-Image-2 is a video-generation model. Pricing starts at $5 per 1 million input tokens and $33 per 1 million output tokens.
Microsoft claims these models are cheaper than comparable offerings from Google and OpenAI, positioning cost as a primary competitive advantage in an increasingly crowded generative AI market.
Microsoft's Dual Strategy
The release reinforces Microsoft's strategy of building proprietary AI capabilities while maintaining its partnership with OpenAI. Microsoft has invested more than $13 billion in OpenAI through a multi-year agreement and integrates OpenAI models across its product portfolio.
According to Suleyman, a recent renegotiation of the Microsoft-OpenAI partnership enabled Microsoft to pursue independent superintelligence research. The company applies the same dual approach to semiconductors, both manufacturing its own chips and purchasing from external suppliers.
"We're building Humanist AI," Suleyman wrote in a blog post. "We have a distinct view when creating our AI models — putting humans at the center, optimizing for how people actually communicate, training for practical use."
Suleyman told VentureBeat that additional models from Microsoft AI will launch soon on Foundry and integrate directly into Microsoft products.
What This Means
Microsoft's move signals confidence in its ability to develop competitive foundation models independently while preserving strategic partnerships. The pricing structure—particularly the cheaper transcription and voice generation offerings—directly targets enterprises evaluating alternatives to established vendors. However, the company's continued reliance on OpenAI demonstrates that even with substantial internal AI capabilities, Microsoft views OpenAI's technology as complementary rather than redundant. The success of these models depends on adoption velocity and real-world performance matching the company's efficiency claims.
Related Articles
Mistral releases Leanstral 1.5: 119B parameter open-source model for Lean 4 proof assistance
Mistral AI has released Leanstral 1.5, an open-source 119B parameter mixture-of-experts model designed specifically for Lean 4 proof assistance. The model features 128 experts with 4 active per token (6.5B activated parameters), a 256k token context window, and multimodal input capabilities.
GitHub Copilot CLI adds Microsoft C++ Language Server plugin with automated setup
GitHub has added the Microsoft C++ Language Server as a plugin to the Copilot CLI marketplace. The plugin includes a built-in setup skill designed to automate C++ project configuration.
Portugal releases Amália, open-source 9B parameter AI model trained on European Portuguese
Portugal has released Amália, its first national AI model trained specifically for European Portuguese. Built on EuroLLM-9B with 9 billion parameters, the model is fully open-source with weights, datasets, and code published under an open license. The government has committed €5.5m in initial funding through 2027.
Google releases Gemini 3.1 Flash Lite Image, its fastest and cheapest image generation model
Google has released Gemini 3.1 Flash Lite Image, also called Nano Banana 2 Lite, which the company describes as its fastest and cheapest image generation model. The model is available through Google's AI Studio and Gemini API with the identifier gemini-3.1-flash-lite-image.
Comments
Loading...