On May 20, after giving a keynote speech at the 2025 Computer Exhibition in Taipei, China, Nvidia CEO Jensen Huang accepted an exclusive interview with technology blogger Ben Thompson.

In this interview, Huang Renxun discussed a series of AI cooperation agreements recently signed by Nvidia with Saudi Arabia and the UAE, a ban on H20 chip exports against China, and frankly expressed concerns about the current US chip export control policy, believing that this strategy may weaken the US's technological leadership in the future, including Nvidia.

Huang Renxun also elaborated on his views on the global economic landscape, believing that AI technology may not only significantly promote global GDP growth, but may also help the United States alleviate the trade deficit problem to a certain extent.

In the interview, Huang Renxun introduced the core advantages of the "NVIDIA Full Stack" solution - maximizing AI efficiency through deep integration of software and hardware. He explained that modular design can bring more flexibility to customers, and customers can select system components according to their needs without having to buy them all in packages.

At the same time, he also mentioned the key role of the Dynamo system in improving inference performance. Through a comprehensive layout, Nvidia has built an AI infrastructure platform that runs from chips to software, from training to reasoning.

01

AI itself constitutes a complete new industry

Powered by AI factory

Q: In the past few interviews, I can feel that you really want the world to understand the potential of GPUs. At that time, ChatGPT had not yet been released, and now, the entire market seems to be hanging on your financial performance. I know you are in the silent financial report period now, and I will not ask questions related to financial report. But I want to know that being pushed to such a position has become the focus of global technology attention. How do you feel?

Huang Renxun: To be honest, this incident has not touched me much emotionally, but I have always been very clear about one thing: in the process of constantly reshaping Nvidia, promoting technological progress and leading the development of the industry has always been the core mission of our work. We are determined to be at the forefront, overcome the most challenging technical problems, and continue to create value for the entire ecosystem.

Today, Nvidia is no longer just a chip design company, but an enterprise that provides a comprehensive computing platform with data centers as the core. We not only built a full-stack AI platform covering training and inference, but also achieved deep integration and modular decoupling of software and hardware architectures for the first time, providing flexibility and scalability for the widespread participation of the ecosystem.

In this year’s Computex keynote speech, I particularly emphasized: What we are building now is not just the computer systems required by the “technology industry”, but building infrastructure for the new industrial form of “artificial intelligence”. AI is not only a technological revolution, but also a labor revolution – it significantly enhances human work ability, especially in emerging fields such as robots, which will be more profound in the future.

More importantly, AI is not just a technological breakthrough, it itself is a huge and brand new industrial system. And this industry will be driven by the infrastructure we call “AI factory” – its core is the data center with super-large computing power as the cornerstone. We are just beginning to realize that the focus of the times is shifting: in the future, data centers will no longer be just the carrier of cloud computing, but will become a true AI factory, and its scale and importance will far exceed our imagination today.

Q: Microsoft CEO Satya Nadella mentioned on the latest earnings call that they reported a token processing data – I remember it was from last quarter. Is this the financial report details you are most concerned about?

Huang Renxun: In fact, the number of real tokens generated far exceeds that number. Microsoft’s data only covers the part they generate for third-party customers. The token processing volume they use internally is actually much larger than that. In addition, this number does not include the total amount of tokens generated by OpenAI. So, based on the numbers in Microsoft reports alone, you can imagine how large the number of tokens actually generated in the entire ecosystem is.

02

“AI Diffusion Rules” may backfire the United States

Q: Recently you have reached a series of AI cooperation agreements with Saudi Arabia and the UAE. From your point of view, why are these collaborations important? Why did you come here in person? What does this mean to you?

Huang Renxun: They personally invited me to attend, and our trip was also to announce two quite large AI infrastructure construction plans: one in Saudi Arabia and the other in Abu Dhabi. Leaders of both countries have realized that they must participate in this AI revolution and that their country has unique strategic advantages, namely abundant energy resources.

However, these countries have shortcomings in their labor force. Their national development has long been limited by labor and population size. The emergence of AI has provided them with historic opportunity: to achieve the transformation from an “energy economy” to a “digital labor” and “robot labor” economy.

We have participated in the establishment of a new company in Saudi Arabia called “HUMAIN”. Their goal is to step on the world stage, build a global AI factory, and attract international companies including OpenAI to participate in the cooperation (Representatives of OpenAI were also present). This is a very significant project.

Q: To some extent, this also seems to mean a challenge to the AI Diffusion Rule? I understand that this rule is particularly strict with these countries, such as stipulating restrictions on the export volume of chips, which must be controlled by US companies, and in some aspects it must rely on local manufacturing in the United States. Compared with the past, you have a stronger voice to oppose this rule this time. You used to be less involved in government policy-level affairs directly, and now Nvidia has become one of the core technology companies in the world. Can you quickly adapt to this role change?

Huang Renxun: It’s not that I don’t want to participate, but that there was really no need for this in the past. For most of Nvidia’s development, we have focused on R&D technology, building companies, cultivating the industry ecology, and constantly moving forward in competition. We are always building supply chains and ecosystems, which is already very large and complex.

But as soon as the AI proliferation rules were introduced, we immediately expressed our attitude. Now everyone can see clearly – this policy is completely wrong. It is a fundamental strategic mistake for the United States. If the original intention of the AI Proliferation Rules was to ensure the U.S. leadership in the AI field, it could actually backfire and cost us our original advantage.

AI is not as simple as a certain model or a certain layer of software, it is a complete technology stack. This is why when people talk about Nvidia, they are talking not only about chips, but also about systems, infrastructure, AI factories, and even the entire deployment framework. AI is integrated in multiple layers: from the chip layer to the factory layer, infrastructure layer, model layer, application layer, and every layer is crucial – true competitiveness comes from this complete stack.

If the United States wants to stay ahead in the global AI competition, it must be ahead at every level. At present, at a critical moment when competitors quickly catch up and accelerate their layout, we choose to limit the spread of our own technology around the world – this is undoubtedly “shot yourself in the foot.” In fact, we have foreseeed this result from the very beginning.

03

It is impossible to prevent China from participating in the AI revolution

DeepSeek is an outstanding representative

Q: Are you referring to other model developers by “international competitors”?

Huang Renxun: China has performed very well in the field of AI. About 50% of AI researchers around the world are Chinese, and you can’t stop them from participating in this technological change, and you can’t stop them from moving forward. To be honest, projects like DeepSeek are very outstanding representatives. If we don’t even want to admit this, it is a kind of self-deception and I can’t accept it at all.

Q: Have the restrictions targeting them stimulated their technological breakthroughs in certain areas (such as memory management and bandwidth efficiency)?

Huang Renxun: Competition is the engine that drives progress. Enterprises need competition to motivate themselves, and so do the country. We certainly stimulated their technological advancement, no doubt.

But personally, I had foreseeed that China would develop rapidly at every stage of AI. For example, Huawei is a very powerful company and a world-class technology company. China’s AI researchers and scientists are also world-class. If you have been to the offices of Anthropic, OpenAI or DeepMind, you will find that there are a lot of top talents from China. This is not surprising.

And, the AI Proliferation Rules aim to limit other countries to access U.S. technology, a policy that was wrong from the beginning. What we really should do is to accelerate the popularization of American technology worldwide – while it is still time. If our goal is to keep the United States ahead of the world in AI, then this set of rules is just right now.

The AI Diffusion Rules also ignore the nature of the AI technology “stack”. The AI stack is like a computing platform: the stronger the platform and the wider the foundation, the more developers it attracts, the stronger the applications it generates, and the higher the value of the platform. Conversely, the more developers, the more prosperous the ecosystem, and the larger the platform installed capacity, it will further attract more developers. This “positive feedback loop” is crucial to the development of any computing platform and is the fundamental reason for Nvidia’s success today.

You can’t say, “The United States does not need to participate in competition in the Chinese market.” It is a gathering place for half of the world’s developers. From a computing architecture and infrastructure perspective, this decoupling is completely untenable. We should give American companies the opportunity to participate in competition in the Chinese market – narrowing the trade deficit, creating tax revenue for the United States, developing industries, and providing employment. This is not only beneficial to the United States, but also to the healthy development of the global technology ecosystem.

If we choose to give up participation and allow China to build a complete and prosperous local ecosystem, and American companies are completely absent, then the United States will no longer dominate this new platform in the future. AI technology is spreading rapidly around the world. If we do not actively participate in the competition, what will eventually spread out will be other people’s technical and leadership.

Q: I strongly agree with you. In my opinion, the current policy logic of trying to limit chip sales but allowing the other party to obtain all chip manufacturing equipment is simply putting the cart before the cart. We know very well that tracking chips is much harder than tracking devices. There is a saying that in Washington, some semiconductor equipment manufacturers have been deeply engaged in many years and are good at lobbying, while Nvidia has relatively little influence there, so it is at a disadvantage in policy games. Do you think this statement is valid? Do you also think it is particularly difficult to make Washington understand your position?

Huang Renxun: In the past few years, we have indeed spent a lot of effort to gradually establish some sense of existence in Washington. We do have only a small team there, and companies of the same size as us usually have hundreds of public relations and policy teams in Washington, and we only have a few of them. But I want to say, these people are very outstanding. They are not only trying to tell Nvidia’s story, but also helping policy makers understand how chips work, how the AI ecosystem works, and what unexpected chain consequences certain policies will have.

What we really want is that the United States wins the competition. Every company should hope that its own country can win, and every country should also hope that its own company can win. This is not a wrong wish, but a good thing. It is a good thing for people to win; it is a good thing for them to be successful; it is a good thing for them to be successful; it is also a good thing for them to compete. If a country desires greatness, we should not be jealous; if a company desires excellence, I will not be jealous. This motivation will inspire everyone to keep moving forward and achieve better results. I love seeing people who desire excellence.

There is no doubt that China is eager to become a strong country, and that is nothing wrong with it. They should have pursued greatness. The AI scientists and researchers I know have achieved what they are today because of this ambition. They are indeed very outstanding.

What we have to do is not try to trip others, but run faster ourselves. Nvidia has achieved what it is today because we have received some special treatment, but because we have been running hard.

I think the kind of thinking you mentioned that “protect yourself by restricting your opponent” will only make the other party stronger – because they are already amazing in themselves.

Q: The Trump administration prohibits you from exporting H20 chips to China, and this chip was actually customized and designed based on the policy framework of the previous administration. As a result, I was later told that “this is not OK.” Now they are still working on new restrictions. Do you think policymakers finally realize that the world is highly interconnected and that actions in one place will trigger a ripple effect in another? Are they finally beginning to realize that “complete decoupling” is unrealistic, and maybe it is time to return to a more pragmatic, management-oriented approach? Are you optimistic about this or are you ready for the worst?

Huang Renxun: The US president has a vision he wants to achieve. I support him and believe that he will eventually lead the United States to positive results in a respectful way. He will be competitive and strive to find opportunities for cooperation. Of course, I am not in the White House and I don’t know what they think internally, but I understand this.

Regarding the ban on H20 chips, we have designed the design according to the maximum limit that the Hopper architecture can achieve, and all the things that can be cut are cut off. We’ve done a massive write-off for this, and I remember it’s $5.5 billion. In history, no company has ever written off such a large inventory. So this additional ban on H20 is extremely heavy for us and at a huge cost. Not only is this direct loss of $5.5 billion, but we have also voluntarily given up potential sales of $15 billion and tax revenue of about $3 billion.

You should know that the annual demand for AI chips in the Chinese market is about US$50 billion. Note that it is not 50 million, it is 50 billion US dollars. What is this concept? It is equivalent to the annual revenue of the entire Boeing company. Let us give up such a market – not only profits and revenue, but also the accompanying ecological construction and global influence. This price cannot be ignored.

Q: If China eventually builds an alternative to CUDA, will it pose a long-term threat to Nvidia?

Huang Renxun: That’s right. Anyone naively believes that the idea that just by the next step of export control and banning China from using H20 chips can prevent their development in the field of AI is extremely ignorant.

04

AI will drive significant growth in global GDP

Q: When did you really realize that Nvidia will become an “infrastructure company”?

Huang Renxun: If you look back at my past keynote speeches, you will actually find that I started talking about many things that are happening today five years ago. Maybe I didn’t speak clearly enough at that time, and the language was not as accurate as it is now, but the direction we are moving forward has always been very clear, consistent and firm.

Q: So, you now talk about “robots” at the end of each speech. That is actually the “five-year trailer” that we need to pay close attention to? In other words, this is not a distant future, but a reality that will come true within a few years?

Huang Renxun: That’s right, I think it’s really coming soon and it will happen in the next few years.

A profound and significant thing in the entire industry is that for the past 60 years, we have been in the IT industry, that is, an industry that provides tools and technologies for mankind. But now, we are stepping out of the IT category for the first time – all the products we originally sold were IT equipment, and now we are starting to enter two new areas of manufacturing and operations.

The so-called manufacturing field means that we are manufacturing robots or using robot systems to make other products; the so-called operational field means that we are providing “digital employees”. The total global operating expenses and capital expenditures are about $50 trillion, while the overall IT industry is only about $1 trillion. Now, thanks to AI, we are about to step from that $1 trillion industry into a new market with a scale of 50 times the scale.

I believe that although some traditional jobs will be replaced and will indeed disappear, at the same time, a large number of brand new job opportunities will emerge. Especially with the popularity of the new form of “agents”, robot systems may directly promote the actual expansion of global GDP.

The logic behind it is actually very simple: we are facing the dilemma of labor shortage. The unemployment rate in the United States is at a historic low. You can see it all over the country: restaurants cannot recruit waiters, and many factories cannot recruit workers. In this context, the concept of “spends $100,000 a year to hire a robot” will be accepted without hesitation because it can significantly improve their income and output capabilities.

So I judge that in the next five to ten years, we may experience a substantial GDP expansion and will also witness the birth of a brand new industry. The core of this industry is to produce digital results based on the system of “generating tokens” – this is something that the public is now beginning to slowly understand.

Q: The two speeches you gave at Computex 2025 and last month were actually completely different in style. My understanding is: GTC is for hyperscale cloud service providers, while Computex 2025 is for the enterprise IT market. So your current goal is enterprise IT?

Huang Renxun: It can be said that – enterprise IT, and “agents and robots”. The core carrier of enterprise IT is an agent, while the core application of manufacturing is robots. Why is this so important? Because this is the starting point of the future ecosystem.

05

Will Dynamo become the AI factory operating system?

Q: In your recent GTC talk, you mentioned some of the limitations of traditional data centers and explained why Nvidia’s solution is a more appropriate option. I understand this as some kind of objection to “special chips” (i.e., ASICs). On the one hand, you showed NVIDIA’s complete product roadmap, which shows that we have a long-term and clear technical direction; on the other hand, you talked about the balance of “latency and bandwidth” and pointed out that GPUs can flexibly adapt to different types of AI workloads because of their programmability, and are not able to do a single task like ASICs. These dedicated ASIC chips are created by some super-large cloud service providers themselves. In contrast, Nvidia offers a universal, scalable solution that is more suitable for a rapidly changing AI world.

Huang Renxun: Your understanding is right. I did convey these views, but my intention is not to oppose ASIC, but to help people understand: how the next generation of data centers should be designed. We have been thinking about this question for many years.

The key challenge is: the energy in the data center is limited. So if you think of it as an “AI factory”, the first priority is how to make each watt of electricity generate as much computing throughput as possible. And the unit we measure this throughput is token. You can produce extremely cheap tokens, such as free reasoning for open source models; you can also produce high-quality, high-value tokens, and users may pay $1,000 or even $10,000 per month for this.

Q: You also mentioned a “agent worth $100,000” in your speech.

Huang Renxun: Yes. You ask me if I would spend $100,000 a year to hire an AI assistant? I’m very willing to answer. We hire talents with annual salary of hundreds of thousands or even millions of dollars every day. It is of course worthwhile to spend $100,000 to increase the productivity of an employee with an annual salary of $500,000.

The key is that the tokens you produce in an AI factory are of a variety of quality. You need a lot of cheap and high value-added tokens. If the chip or system you build can only handle a certain type of token, it will be idle most of the time, resulting in wasted computing resources. Therefore, the essence of the question is: how to design a platform that can handle high-throughput free tokens and be competent for high-quality tasks?

If your computing architecture is too fragmented, different types of tasks will be inefficient when migrating between different chips. If you only focus on high token rates, its overall throughput will usually drop. If you pursue high throughput performance, the system interactivity will be limited and the user experience will be reduced.

“It’s easy to optimize on the X-axis or Y-axis”, but it’s very difficult to “fill the entire two-dimensional space”. And this is exactly what Nvidia is trying to solve through the Blackwell architecture, FP4 low-precision computing format, NVLink 72 high-speed interconnection, HBM high bandwidth memory, and the core Dynamo decoupled inference system.

Q: Is Dynamo the “data center operating system” you call it?

Huang Renxun: That’s what I can say. Its design starting point is because the reasoning process of the large language model is not a unified and constant process, but is staged and different from task.

We break this process down into two main stages:

Pre-fill stage: dealing with context, equivalent to background work such as “understanding who you are” and “what do you care about”;
Decode stage: The process of generating actual tokens often involves complex calculations such as chain-of-thought and search enhancement (RAG).

The demand for computing resources in the decoding phase is highly dynamically changing—sometimes there is little to much floating point operation, and sometimes it requires a lot of them. The significance of Dynamo is that it can automatically disassemble, distribute and schedule inference tasks to the optimal resource nodes in the entire data center.

Q: From an architectural point of view, is Dynamo the software system that treats the entire data center as a GPU?

Huang Renxun: That’s right. It is essentially the operating system of the AI factory.

Q: How do you view the future of reasoning models? Will they be more used in the agent workflow? Or is it mainly used to generate training data to help the model optimize itself?

Huang Renxun: I think it depends on the cost. But from a trend perspective, inference models will become the “default computing unit” of AI. As hardware and software advance, we process reasoning will be surprisingly fast.

For example: The performance of the Grace Blackwell platform is 40 times that of the previous generation; the next generation is 40 times more improved; the model itself is becoming more and more efficient. So, from now on, it is entirely possible for me to increase the speed of reasoning by 100,000 times in the next five years.

The current AI system has actually completed a “sized thinking like a mountain” in a place you can’t see. It just doesn’t show that you see it “thinking”. This is a system of “quick thinking” – even a task that originally required in-depth reasoning and belonged to “slow thinking”, has become extremely rapid here.

Q: You mentioned that the construction of power infrastructure in the United States is full of difficulties, but in some Gulf countries, China and other places, the acquisition and construction of power is much faster. Isn’t the problems solved by Nvidia not so urgent in these areas?

Huang Renxun: Your perspective is very interesting, I really didn’t think so before. But no matter which country the size of the data center is always limited, so the performance-power ratio (efficiency per watt) is always very critical.

We can do a simple calculation: a 1 GW data center has a shell, electricity, land and operation cost of about $30 billion; plus computing, storage, network and other parts, it is about $50 billion; if you have to build two systems to achieve the same performance because the system is inefficient, then the initial construction cost will expand from $30 billion to $60 billion. So you have to offset the extra costs with an extremely efficient architecture. In this world, “free calculations” are sometimes not cheap enough.

06

Nvidia’s full-stack strategy

Q: You have mentioned many times that “I hope you (the customer) buys the full set of Nvidia products, but as long as you buy any part of it, you will be happy.” This sentence sounds very pragmatic, like the CEO of an enterprise software company. If customers need to build a complete AI factory, Nvidia’s full-stack solution will undoubtedly bring the most benefits, as you say. But many customers don’t need a “full stack” and they only buy a portion of it. But once they start using a part of Nvidia, they usually keep using it. So from a strategic perspective, covering these customers is also very valuable, right?

Huang Renxun: To serve customers, this is the smart way to do it. If you look at Nvidia’s market strategy, we have always been building a complete end-to-end solution. Because software and hardware must be closely combined to maximize performance, we can “decouple” software and hardware well, allowing customers to select some components according to their needs.

If the customer doesn’t want to use our software – no problem. Our system is designed to be flexible enough that we can do it if the customer wants to replace certain components.

Now, the Grace Blackwell architecture has been deployed in different cloud services around the world, and each cloud service provider integrates based on our standards, but they implement it in different ways. We were able to fit into their system very smoothly.

This is actually the real advantage of Nvidia’s business model, but it is also the embodiment of our positioning as a “computing platform company”. What we value most is that customers can use at least part of our technology stack: if they choose our computing stack, that’s great; if they choose our network stack (I value networks as much as computing), it’s also great; if they choose both, that’s even better!

I always believe that Nvidia can build the best overall system. If I don’t believe we do better, it means something wrong with us and we must improve and regain our confidence.

Our company has 36,000 to 38,000 employees, and everyone is working together to do one thing: to build the world’s leading accelerated computing platform and AI computing platform. So, if there is a company with only 14 people who can do better than us, it will be very painful for me, and we must double our efforts to catch up.

Q: But you also believe in the power of scale, and to maximize scale, you must sell the product in the way customers want it to.

Huang Renxun: It’s totally correct, that’s the key. We have our own preferences, but we serve our customers according to the way they like.

07

Game: Multiple characters of GeForce

Q: Only 10% of your GTC talks are about GeForce, but to us, it is still very important. Is this “important” because we are doing GPUs, and can everything be scaled? How to explain the relationship between Nvidia and the game?

Huang Renxun: I really hope I can say that RTX PRO is impossible without GeForce, Omniverse cannot be implemented without GeForce, and every pixel we see cannot be presented without GeForce. Robots can’t work without GeForce, so does Newton.

GeForce itself is not the core theme of GTC, because the latter mainly focuses on areas such as high-performance computing, enterprise and AI, and we also have a dedicated game developer conference. So on GTC, GeForce’s product launch won’t be the core focus like other areas, but everyone knows that GeForce plays a crucial role in everything we do.

Q: Does this mean that gamers may not fully realize that GeForce’s role has far exceeded a simple graphics rendering engine?

Huang Renxun: That’s true. We rendered only one of every ten pixels, which is a shocking number. Suppose I give you a puzzle, one of the ten puzzles, and I won’t give you the remaining nine at all, and you must find a way to make up for them yourself.

Q: I’m trying to connect the game to the other areas you just mentioned. You said that Nvidia has very rigorously distinguished different modules in design, and has clear software management and achieved decoupling. This reminds me of the driver issues on Windows at once. To be honest, this ability itself is one of your core technical advantages.

Huang Renxun: Drivers are indeed very underlying technology and the content involved is extremely complex. In fact, the “abstract” of drivers is itself a revolutionary concept, and Microsoft has played a key role in promoting this system. It can be said that without the design of this abstract layer, there would not be today’s Windows ecosystem. It is precisely by establishing an API abstraction layer that the hardware layer can continuously evolve and change without affecting the compatibility and stability of the upper-level software.

Currently our drivers are open source, but frankly, I haven’t seen many people really get involved. The reason is that whenever we launch a new GPU, almost all the work we did on the old drivers has to be rewritten or replaced. Only teams with huge engineering capabilities like Nvidia can continue to drive the evolution of this system—a task that is nearly impossible for most companies.

But it is precisely because we can provide deep optimization exclusive drivers for each generation of GPUs that we build a stable and powerful abstraction layer and isolation layer. Whether based on CUDA or DirectX, developers can develop on these platforms with confidence without worrying about changes in the underlying hardware.

Author：Tencent Technology
Source：https://mp.weixin.qq.com/s/FsFcN8XjvGoGIa76OOv7ww
The copyright belongs to the author. For commercial reprints, please contact the author for authorization. For non-commercial reprints, please indicate the source.

01