Multiverse Computing launches API portal for compressed AI models to reduce cloud dependence
Multiverse Computing, a Spanish startup, has launched a self-serve API portal giving developers direct access to compressed versions of models from OpenAI, Meta, DeepSeek, and Mistral AI. The move targets enterprises seeking to reduce cloud infrastructure dependence and lower compute costs through edge deployment. The company claims its HyperNova 60B 2602 model delivers faster responses at lower cost than the original OpenAI model it was derived from.
Multiverse Computing Launches API Portal for Compressed AI Models
Multiverse Computing, a Spanish AI startup, is pushing compressed models into mainstream enterprise use with the launch of a self-serve API portal that gives developers direct access to smaller, optimized versions of models from OpenAI, Meta, DeepSeek, and Mistral AI.
The move addresses a growing concern in the AI industry: dependence on external compute infrastructure. With private company defaults reaching 9.2% — the highest rate in years — VC firm Lux Capital recently warned companies to formalize cloud compute commitments in writing rather than rely on handshake agreements. Multiverse's approach sidesteps this risk by enabling deployment directly on user devices and enterprise infrastructure.
CompactifAI App and Local Deployment
Multiverse simultaneously launched CompactifAI, a ChatGPT-like AI chat application that showcases the capabilities of its compressed models. The app embeds Gilda, a model small enough to run locally and offline on compatible devices. However, the system has practical limitations: older iPhone models lack sufficient RAM and storage, forcing the app to route requests to cloud-based models via API through a system named Ash Nazg. When routing to the cloud, the privacy advantage of local processing disappears.
The app currently has fewer than 5,000 downloads per month, suggesting it serves as a proof-of-concept rather than a primary revenue driver. The real target is enterprises.
Enterprise API Portal and Cost Reduction
The self-serve API portal, launching today, eliminates the need for AWS Marketplace intermediaries and provides real-time usage monitoring — a critical feature for cost-conscious enterprises. CEO Enrique Lizaso stated the portal "gives developers direct access to compressed models with the transparency and control needed to run them in production."
The primary draw remains clear: lower compute costs. Smaller models also offer advantages for specific use cases, particularly agentic coding workflows where AI autonomously completes multistep programming tasks.
Model Performance Claims
Multiverse's latest compressed model, HyperNova 60B 2602, is built on gpt-oss-120b — an OpenAI model with publicly available source code. According to the company, HyperNova 60B 2602 delivers faster responses at lower cost than the original, though independent benchmarks confirming this claim are not yet available.
This positioning aligns with recent trends in the smaller-model space. Mistral AI this week launched Mistral Small 4, claiming simultaneous optimization for general chat, coding, agentic tasks, and reasoning. The narrowing gap between small and large models is driving enterprise adoption.
Customer Base and Funding
Multiverse already serves more than 100 global customers, including the Bank of Canada, Bosch, and Iberdrola. After raising $215 million in Series B funding last year, the company is reportedly raising €500 million ($540 million USD equivalent) at a valuation exceeding €1.5 billion ($1.6 billion USD equivalent).
The use cases justifying this valuation extend beyond cost optimization: embedded AI in drones, satellites, and connectivity-constrained environments represents a significant market opportunity that requires true edge deployment rather than cloud fallback.
What This Means
Multiverse's API launch signals that compressed models are moving from research artifacts to production-grade tools. For enterprises, the appeal is clear — lower costs, reduced cloud dependence, and privacy benefits for sensitive workloads. The company's existing customer roster and multi-billion-dollar valuation suggest the market is taking this category seriously. However, the narrow gap between Multiverse's claims and independently verified performance remains a key uncertainty.
Related Articles
Adobe Firefly now learns custom visual styles from user-uploaded images
Adobe is rolling out custom models for Firefly, allowing creators to train the generative model on 10-30 of their own images to generate new content matching their specific visual style. The feature costs 500 credits per training session and supports three methods: photography style, illustration style, and character consistency.
NVIDIA Nemotron 3 Super now available on Amazon Bedrock with 256K context window
NVIDIA Nemotron 3 Super, a hybrid Mixture of Experts model with 120B parameters and 12B active parameters, is now available as a fully managed model on Amazon Bedrock. The model supports up to 256K token context length and claims 5x higher throughput efficiency over the previous Nemotron Super and 2x higher accuracy on reasoning tasks.
Google AI Studio adds real-time multiplayer game coding with Gemini 3.1 Pro
Google has launched a vibe coding feature in Google AI Studio that converts natural language descriptions into working applications using Gemini 3.1 Pro. The platform now supports real-time multiplayer games and automatically configures databases, authentication, and third-party service integrations through an "Antigravity Agent."
Anthropic adds always-on channels to Claude Code, enabling async AI agent capabilities
Anthropic has added "channels" to Claude Code, enabling Claude to respond to incoming messages, webhooks, and notifications asynchronously without user intervention. The research preview supports Telegram and Discord with custom channel support, running through MCP servers with two-way communication.