A one-year-old artificial-intelligence (AI) startup born out of a Chinese hedge fund released a powerful AI model on January 20 that is challenging investors’ assumptions about the economics of building such systems.
R1, as the model is called, is an open-source, advanced-reasoning model—the kind that is designed to mimic the way humans think through problems. It was developed by DeepSeek, whose founder, Liang Wenfeng, reportedly accumulated 10,000 of NVIDIA’s graphics processing units (GPUs) while at his quantitative hedge fund, which relied on machine-learning investment strategies. The kicker: DeepSeek says it spent less than US$6 million to train the model that was used as a base for R1—a fraction of the billions of dollars that Western companies such as OpenAI have spent on their foundational models. This detail stunned the market and walloped the share prices of large tech companies and other parts of the burgeoning AI industry on January 27.
The knee-jerk reactions are a reminder that we are still in the early stages of a potential AI revolution, and that no matter the conviction some insiders and onlookers may seem to have, no one knows with certainty where the path will lead, let alone how many twists and turns the industry will encounter along the way. As more details become available, companies and investors will be able to better assess the broader implications of DeepSeek’s achievement, which may reveal that the initial market reaction was overdone in some cases. However, should DeepSeek’s claims that its methods lead to dramatic improvements in cost efficiency be substantiated, it may actually bode well for the adoption of AI tools over time.
Although the techniques employed by DeepSeek are not entirely new, the company’s execution is considered extraordinary for how it significantly reduced training and inference costs. The company reportedly not only used fewer and less-powerful chips but also trained its model in a matter of weeks. There is considerable debate regarding the accuracy of the roughly US$6 million training cost figure that has been reported as well as speculation that DeepSeek may have utilized rival models such as Meta Platforms’ Llama, Alibaba’s Qwen, and OpenAI’s GPT-4 in a process known as distillation to create a student model. Therefore, it may be that the true training cost of DeepSeek’s model is understated.
There is the risk that more efficient AI models may reduce the growth rate of demand for computing power and, in turn, the pace of investment in AI infrastructure. But while investors seem to have equated improved efficiency with a diminished need for AI infrastructure, the history of technology shows that better performance and lower costs typically lead to wider adoption and faster growth over time. AI could follow the same trajectory.
Among the likely beneficiaries of cheaper, more efficient AI are the software providers building applications on top of these models. As more users discover the advantages of AI, increased demand for user-facing software also helps the cloud-services companies that provide the computing power and data storage. However, the commoditization of large language models may negatively affect creators of these models, particularly those that have eschewed open-source technology.
Also, greater computing efficiency may lead to higher volumes of cloud data and thus the need for more data centers and electrical power over time. This would benefit infrastructure providers on the condition that higher demand more than offsets price declines.
The field of AI is still evolving, and further advances in the technology and the methods for developing it are likely. Innovation is always disruptive, but on the whole, DeepSeek’s breakthrough would seem to be a good sign for the industry.