model releaseDeepSeek

DeepSeek Releases V4-Pro-Base with 1.6 Trillion Parameters

TL;DR

DeepSeek has released DeepSeek-V4-Pro-Base, a 1.6 trillion parameter foundation model now available on Hugging Face. The base model uses BF16 precision for weights and includes support for F8_E4M3, I64, and F32 tensor types.

1 min read
0

DeepSeek Releases V4-Pro-Base with 1.6 Trillion Parameters

DeepSeek has released DeepSeek-V4-Pro-Base, a 1.6 trillion parameter foundation model now available on Hugging Face. The base model weights are distributed in BF16 precision format.

Technical Specifications

The model includes 1.6 trillion parameters and supports multiple tensor types: BF16 (Brain Floating Point 16), I64 (64-bit integer), F32 (32-bit floating point), and F8_E4M3 (8-bit floating point). The model files are available in the Safetensors format.

As a base model, DeepSeek-V4-Pro-Base is designed for fine-tuning rather than direct deployment. No inference providers have yet added support for hosting the model.

Availability

The model is part of a 4-item collection on Hugging Face that has received 225 interactions. DeepSeek has not disclosed context window size, benchmark scores, or pricing information at the time of release.

The model card on Hugging Face does not include training cutoff dates, architecture details, or performance metrics. Download statistics for the first month are not yet available.

What This Means

At 1.6 trillion parameters, DeepSeek-V4-Pro-Base represents one of the largest openly available foundation models. The "Pro-Base" designation suggests this is an untuned variant intended for research and custom fine-tuning rather than production use. The absence of immediate inference provider support and limited documentation indicates this is an early-stage release targeting the research community and developers who will build instruction-tuned or task-specific variants. The size places it in direct competition with other frontier models, though performance comparisons cannot be made without published benchmarks.

Related Articles

model release

Google DeepMind releases Gemma 4 12B Unified: encoder-free multimodal model with 256K context window

Google DeepMind has released Gemma 4 12B Unified, an encoder-free multimodal model that processes text, images, and audio through a single decoder-only transformer. The model features 11.95 billion parameters, a 256K token context window, and achieves 77.2% on MMLU Pro and 72.0% on LiveCodeBench v6.

model release

ByteDance Open-Sources Bernini-R Video Diffusion Model With Semantic Planning Architecture

ByteDance released Bernini-R, an open-source video generation and editing model that combines an MLLM-based semantic planner with a DiT-based renderer. The model requires Hopper-class GPUs (H100/H800/H200) for optimal performance and supports multiple tasks including text-to-video, video editing, and reference-guided generation.

model release

Nvidia releases Nemotron 3 Ultra: 550B-parameter MoE model with 1M context window for agentic workflows

Nvidia has released Nemotron 3 Ultra, a 550-billion parameter mixture-of-experts model with 55 billion active parameters and support for up to 1 million token context windows. The model uses a hybrid Transformer-Mamba architecture and is designed specifically for long-running agentic workflows including agent orchestration, coding agents, and complex enterprise tasks.

model release

NVIDIA Releases Nemotron-3-Ultra: 550B Parameter Model with 1M Token Context and Configurable Reasoning

NVIDIA released Nemotron-3-Ultra-550B-A55B-NVFP4, a 550B parameter model with 55B active parameters, featuring a 1M token context window and configurable reasoning mode. The model uses a hybrid LatentMoE architecture combining Mamba-2, Mixture-of-Experts, and Attention layers with Multi-Token Prediction, trained with NVIDIA's NVFP4 quantization-aware approach.

Comments

Loading...