Deepseek - China's Chat GPT moment

The AI Race hots up

Jan 27, 2025

About two weeks ago we wrote about a new Chinese AI model that reportedly performed at par with the leading US models such as Open AI o1 but has been produced for a fraction of the cost. It was produced by a company called Deepseek. We noted that the big five AI companies (Google, Microsoft, Amazon, Oracle, Meta) are collectively spending about $220bn per annum on AI Datacentres and have plans to do continue at this pace for two or three years at least. A lot of this spending is on Nvidia GPUs. The AI datacentres costing billions are needed to train, run and maintain ever larger LLMs

We speculated at the time that if the Deepseek model was as good as the early reports suggested, these investments could prove to be a mistake and demand for Nvidia chips could be much less than forecast.

We noted that the Chinese have a long track record of commoditising products and producing them for the lowest costs. They have done this most recently with electric Vehicles (EVs).

Since then, there have been many reports about tests of the Deepseek LLM confirming that its performance was very good. China maybe just a few months behind the leading companies. In the last few days, there have been many articles on this subject with some envisaging LLMs that could run on ordinary laptop computers. There are reports that many US companies have privately stated they are astonished by the progress the Chinese appear to have made

AI is an important priority for the US. This week the newly returned President Trump announced Stargate where Softbank and MGX will eventually invest up to $500bn for AI infrastructure for Open AI.

At the end of last week, China announced a strategic plan to invest up to $139bn in AI infrastructure.

In mid-January, we sharply reduced our exposure to large cap US tech and now have 30% of our portfolio in cash. At the time, we were worried about the high valuations and sharply rising bond yields as well as the likely inflationary implications of the tariffs proposed by the Trump government.

At the start of the US market open later today, we will sell our residual position in Nvidia. It has been a very profitable ride but we are now checking out, at least for now.

The article below from the Indian Express newspaper about the Deepseek LLM model gives some of the background to the story .

DeepSeek: Is this China’s ChatGPT moment and a wake-up call for the US?

Chinese AI lab DeepSeek released two new AI models this month. Their limited use of resources to achieve extraordinary results is making the world take notice

For years, the United States of America has been the undisputed leader in artificial intelligence, especially with it being home to big tech companies such as OpenAI, Anthropic, Google, Meta, and more.

However, January 2025 has changed the game, with China threatening this dominance. The sense of urgency in the Trump administration is palpable. The shift in narrative began a few weeks ago, when Chinese AI lab DeepSeek unveiled its large language model DeepSeek-V3. The biggest takeaway here was that DeepSeek-V3 was built using a fraction of the cost required to assemble the frontier models of OpenAI, Meta, etc.

DeepSeek’s technological feat has surprised everyone from Silicon Valley to the entire world. The Chinese lab has created something monumental—they have introduced a powerful open-source AI model that rivals the best offered by the US companies. Since AI companies require billions of dollars in investments to train AI models, DeepSeek’s innovation is a masterclass in optimal use of limited resources. This indicates that along with investments, foresight too is needed to innovate in the truest sense. It also goes on to prove how necessity can drive innovation in unexpected ways.

China’s emergence as a strong player in AI is happening at a time when US export controls have restricted it from accessing the most advanced NVIDIA AI chips. These controls have also limited the scope of Chinese tech firms to compete with their bigger western counterparts. Consequently, these companies turned to downstream applications instead of building proprietary models. Advanced hardware is vital to building AI products and services, and DeepSeek achieving a breakthrough shows how restrictions by the US may have not been as effective as it was intended.

Under these circumstances, DeepSeek’s fame is a story in itself. The Chinese AI company reportedly just spent $5.6 million to develop the DeepSeek-V3 model which is surprisingly low compared to the millions pumped in by OpenAI, Google, and Microsoft. Sam Altman-led OpenAI reportedly spent a whopping $100 million to train its GPT-4 model. On the other hand, DeepSeek trained its breakout model using GPUs that were considered last generation in the US. Regardless, the results achieved by DeepSeek rivals those from much more expensive models such as GPT-4 and Meta’s Llama.

DeepSeek is based out of HangZhou in China and has entrepreneur Lian Wenfeng as its CEO. Wenfeng, who is also the co-founder of the quantitative hedge fund High-Flyer, has been working on AI projects for a long time. Reportedly in 2021, he bought thousands of NVIDIA GPUs which many viewed to be another quirk of a billionaire. However, in 2023, he launched DeepSeek with an aim of working on Artificial General Intelligence. In one of his interviews to the Chinese media, Wenfeng said that his decision was motivated by scientific curiosity and not profits. Reportedly, when he set up DeepSeek, Wenfeng was not looking for experienced engineers. He wanted to work with PhD students from China’s premier universities who were aspirational. Reportedly, many of the team members had been published in top journals with numerous awards. Wenfeng’s ethos and belief system is reflected in DeepSeek’s open-sourced nature which has earned admiration from the global AI community.

Setting a new benchmark for innovation

Even as AI companies in the US were harnessing the power of advanced hardware like NVIDIA H100 GPUs, DeepSeek relied on less powerful H800 GPUs. This could have been only possible by deploying some inventive techniques to maximise the efficiency of these older generation GPUs. Apart from older generation GPUs, technical designs like multi-head latent attention (MLA) and Mixture-of-Experts make DeepSeek models cheaper as these architectures require fewer compute resources to train.

DeepSeek-V3 has now surpassed bigger models like OpenAI’s GPT-4, Anthropic’s Claude 3.5 Sonnet, and Meta’s Llama 3.3 on various benchmarks, which include coding, solving mathematical problems, and even spotting bugs in code. Even as the AI community was gripping to DeepSeek-V3, the AI lab released yet another reasoning model, DeepSeek-R1, last week. The R1 has outperformed OpenAI’s latest O1 model in several benchmarks, including math, coding, and general knowledge.

DeepSeek is gaining global attention at a time when OpenAI was restructuring itself to be a for-profit organisation. The Chinese AI lab has released its AI models as open source, a stark contrast to OpenAI, amplifying its global impact. Being open source, developers have access to DeepSeeks weights, allowing them to build on the model and even refine it with ease. This open-source nature of AI models from China could likely mean that Chinese AI tech would eventually get embedded in the global tech ecosystem, something which so far only the US has been able to achieve.

What is at stake on the global stage?

The runaway success of DeepSeek also raises some concerns around the wider implications of China’s AI advancement. While being open-source, it allows for global collaboration; its development, based on Chinese state regulations, could potentially hinder its expansion.

Critics and experts have said that such AI systems would likely reflect authoritarian views and censor dissent. This is something that has been a raging concern when it came to the debate around allowing ByteDance’s TikTok in the US. While largely impressed, some members of the AI community have questioned the $6 million price tag for building the DeepSeek-V3. Additionally, many developers have pointed out that the model bypasses questions about Taiwan and the Tiananmen Square incident.

Now, more than ever, there are questions on if AI would reflect democratic values and openness, especially if it has been developed by authoritarian government-led nations.

Why is the US rattled?

On the second day as the President of the United States, Donald Trump announced the Stargate Project, a massive $500 billion initiative that brings together tech titans OpenAI, Oracle, and SoftBank. In his address, Trump explicitly said that the US intends to have an edge over China. The Stargate project aims to create state-of-the-art AI infrastructure in the US with over 100,000 American jobs. Trump highlighted how he wants the US to be the world leader in AI. “This project ensures that the United States will remain the global leader in AI and technology, rather than letting competitors like China gain the edge,” Trump said.

The rushed announcement of the mighty Stargate Project indicates the desperation of the US to maintain its top position. While DeepSeek may or may not have spurred any of these developments, the Chinese lab’s AI models creating waves in the AI and developer community worldwide is enough to send out feelers.

Moreover, China’s breakthrough with DeepSeek challenges the long-held notion that the US has been spearheading the AI wave—driven by big tech like Google, Anthropic, and OpenAI, which rode on massive investments and state-of-the-art infrastructure. The undisputed AI leadership of the US in AI showed the world how it was important to have access to massive resources and cutting-edge hardware to ensure success. DeepSeek is in a way undermining the assumption that US-based AI companies have the advantage over AI firms from other countries. Until last year, many had claimed that China’s AI advancements were years behind the US.

Long-term Investing

Discussion about this post