DeepSeek's surprisingly inexpensive AI model challenges industry giants. The Chinese startup claims to have trained its powerful DeepSeek V3 neural network for a mere $6 million, utilizing only 2048 GPUs, a stark contrast to competitors' significantly higher costs. This seemingly low figure, however, omits substantial expenses such as research, refinement, data processing, and infrastructure.
Image: ensigame.com
DeepSeek V3's innovative architecture contributes to its efficiency. Key technologies include Multi-token Prediction (MTP), which predicts multiple words simultaneously; Mixture of Experts (MoE), employing 256 neural networks for accelerated training; and Multi-head Latent Attention (MLA), focusing on crucial sentence elements for improved accuracy.
Image: ensigame.com
Contrary to DeepSeek's publicized figures, SemiAnalysis reveals a massive computational infrastructure involving approximately 50,000 Nvidia Hopper GPUs, valued at roughly $1.6 billion, with operational costs reaching $944 million. This substantial investment, coupled with high salaries for its researchers (exceeding $1.3 million annually), contradicts the initial $6 million training cost claim.
Image: ensigame.com
DeepSeek's unique structure—a subsidiary of High-Flyer, a Chinese hedge fund, owning its data centers and operating independently—provides agility and control. This self-funded approach contrasts with cloud-dependent competitors. The company's total investment in AI development surpasses $500 million.
Image: ensigame.com
While DeepSeek's success showcases the potential of well-funded independent AI companies, its "budget-friendly" narrative is an oversimplification. The reality points to significant investment, technological breakthroughs, and a highly skilled team as the true drivers of its achievements. However, even with these substantial investments, its costs still pale in comparison to competitors, with a reported $5 million spent on R1 versus ChatGPT's $100 million for ChatGPT4o. The significant cost difference remains a key differentiator.