AI Guide
AI Technology Advancement
DeepSeek's AI Training Acceleration Technologies
DeepSeek has open-sourced two AI training acceleration technologies, DualPipe and EPLB, designed to reduce the cost and time of large model training.
- DualPipe enables simultaneous forward and backward computations and overlaps computation with communication, reducing idle time and improving data transfer efficiency during training.
- EPLB balances the workload between GPUs by dynamically adjusting expert assignments, optimizing the uneven resource allocation in the MoE architecture. DeepSeek-V3's pre-training cost was significantly lower than models of similar scale due to the adoption of DualPipe technology. The open-source training and inference framework also discloses performance analysis data, helping developers understand communication-computation overlap strategies and underlying implementation details to improve training speed and resource utilization.
News List
虎嗅.最新
49 minutes ago
一文看懂DeepSeek开源第四弹,梁文锋亲自下场开发
DeepSeek开源了两项AI训练加速技术:DualPipe和EPLB,旨在降低大模型训练的成本和时间。DualPipe通过使前向计算和反向传播同时进行,并重叠计算与通信,减少了训练过程中的空闲时间,提高了数据传输效率。EPLB则通过动态调整专家分配来平衡GPU间的工作负载,优化了MoE架构中的资源分配不均问题。DeepSeek-V3的预训练因采用了DualPipe技术,成本远低于同规模模型。此外,开源的训练和推理框架还披露了性能分析数据,帮助开发者理解通信-计算重叠策略和底层实现细节,从而提升训练速度和资源利用率。
AI Guide
Technology
DeepSeek R1 Model Integrated into ByteDance's Wukong Browser
ByteDance's Wukong Browser has officially integrated the DeepSeek R1 model.
News List
36kr.AI
2 hours ago
AI Guide
AI & Tech
DeepSeek Advancements
- DeepSeek has open-sourced three algorithms aimed at improving computational performance.
AI Search Behavior
- An author noted that DeepSeek's search engine used their Geek Park article as a key reference when answering the question "What are good AI headphones like?". The AI adopted the article's interpretation of user needs, particularly highlighting its satisfaction of essential needs in diverse immigrant communities.
- In contrast, Tencent Yuanbao, which uses DeepSeek, suggested a startup brand, WISHEE AI, due to positive reviews in its referenced material.
- Experiments show that AI searches favor recent, relevant articles with clear judgments, even from personal accounts.
NVIDIA Financials and Future Outlook
- NVIDIA reported $39.3 billion in revenue for fiscal year 2025, Q4, a 77.9% year-over-year increase, though gross margin was slightly below expectations.
- Data center business growth drives revenue, up 93.3% YoY, while gaming declined by 11%.
- The company forecasts $43 billion in revenue for the next quarter, with a further decrease in gross margin.
- The market is focused on the progress of Blackwell, the impact of DeepSeek, and capital expenditure of major cloud providers.
- Blackwell contributed $11 billion this quarter, but slow GB200 scaling affected revenue and gross margin.
- DeepSeek's low-cost strategy may affect market expectations regarding NVIDIA's competitive advantages but could also stimulate application breakthroughs.
- Cloud service giants are maintaining high capital expenditure growth, but changes need to be monitored.
- The market will closely watch the progress of the next-generation Blackwell Ultra.
News List
虎嗅.最新
2 hours ago
没想到,我轻松干预了DeepSeek 的搜索结果
一位作者发现,DeepSeek 搜索引擎在回答“好用的 AI 耳机是什么样的”这个问题时,采纳了其在极客公园上发表的文章作为核心参考文献。文章中对用户需求的解读,如“核心在于其满足了多元的移民聚集地区群体的刚需”,影响了 DeepSeek 的判断。对比之下,接入 DeepSeek 的腾讯元宝给出的答案则推荐了一个市场预算较少的初创品牌 WISHEE AI,原因是元宝引用的文献中包含了对该产品明确的肯定性评价。通过进一步实验,作者发现 AI 搜索会倾向于采纳包含明确判断、近期发布、且与问题相关度高的文章,甚至个人公众号上的内容也会被纳入参考。
虎嗅.最新
3 hours ago
英伟达:DeepSeek,戳破了老黄的“皮衣”?
英伟达发布了2025财年第四季度财报,营收393亿美元,同比增长77.9%,但毛利率略低于预期。数据中心业务是最大增长动力,同比增长93.3%,而游戏业务同比下滑11%。公司预计下季度收入430亿美元,毛利率将进一步下滑。市场关注Blackwell的进展、DeepSeek的影响以及大厂的资本开支。Blackwell本季度贡献了110亿美元,但GB200爬坡较慢影响了营收和毛利率。DeepSeek的低成本策略可能会影响市场对英伟达护城河的预期,但也会带动应用场景爆发。云服务巨头的资本开支仍保持较高增长,但后续需持续关注变化。目前是产品过渡期,下一代BlackwellUltra的进展将是市场关注重点。
36kr.AI
3 hours ago
AI Guide
AI and Semiconductor Industry
Nvidia's Performance and Future Prospects
Nvidia's latest financial report reveals a significant revenue increase, driven by surging orders for the H20 chip and a promising outlook for the Blackwell platform. Demand for AI infrastructure is robust, with GPUs remaining the preferred choice. The company has addressed initial production defects in the Blackwell chip and is now in mass production, preparing for Blackwell Ultra. Market demand for Blackwell is exceptionally high, fueled by accelerated inference needs and large-scale post-training and model customization, necessitating greater computational acceleration.
AI Data Center Investment
Meta plans to invest over $200 billion in constructing next-generation AI data center parks.
Scaling Law and Architectural Advancements
The direction of Scaling Law is shifting towards Test-Time Scaling, with Nvidia's Hopper and Blackwell architectures poised to enhance model inference efficiency. The Blackwell series is particularly anticipated for its capabilities.
Impact of Export Controls
Affected by export controls, Nvidia introduced a special version of the H20 chip for the Chinese market, which has seen significant order growth.
News List
虎嗅.最新
5 hours ago
DeepSeek没有冲击英伟达,黄仁勋还大赞R1
英伟达最新财报显示营收大幅增长,H20芯片订单激增,Blackwell平台前景光明,AI基础设施需求旺盛,GPU仍是首选。ScalingLaw方向变化,AI行业走向Test-TimeScaling,英伟达Hopper和Blackwell架构能提升模型推理效率,尤其Blackwell系列备受期待。Blackwell芯片投产曾遇缺陷,后经修复,黄仁勋称已大规模量产,并为BlackwellUltra生产做好准备。市场对Blackwell需求惊人,推理需求加速,后训练和模型定制规模巨大,总体上需要更高计算加速。受出口管制影响,英伟达为中国市场推出特供版H20芯片,订单显著增长。Meta计划投资超2000亿美元建设新一代AI数据中心园。
AI Guide
No news to summarize.
News List
36kr.AI
6 hours ago
AI Guide
AI Chip Market and Nvidia's Performance
Nvidia's Optimism Amidst AI Innovation
Nvidia CEO Jensen Huang expressed optimism about the company's future despite the emergence of DeepSeek's R1 model. He sees R1 as an "excellent innovation" driving demand for compute power in AI, benefiting Nvidia due to the high compute consumption of reasoning models.
Nvidia's Sales Surge
Nvidia's sales continue to surge, reporting record-breaking revenue of $39.3 billion. The company projects further growth in the next quarter, estimating revenue of around $43 billion. Data center sales nearly doubled in 2024 to $115 billion. Huang highlighted strong demand for Nvidia's Blackwell chip, custom-built for reasoning, and expects strong growth in 2025.
Robust AI Chip Market
The AI chip market remains robust, with major companies like Meta, Google, and Amazon investing heavily in AI infrastructure.
News List
TechCrunch
7 hours ago
Nvidia CEO Jensen Huang shrugs off DeepSeek as sales soar
Nvidia CEO Jensen Huang remains optimistic about the company’s future, stating that DeepSeek’s R1 model will not impact sales. Huang views R1 as an “excellent innovation” that signifies the growing demand for compute power in AI. He emphasized that reasoning models like R1 can consume significantly more compute, benefiting Nvidia. Nvidia’s sales continue to surge, with record-breaking revenue of $39.3 billion reported. The company anticipates further growth in the next quarter, projecting revenue of around $43 billion. Data center sales nearly doubled in 2024 to $115 billion. Huang highlighted the strong demand for Nvidia’s Blackwell chip, custom-built for reasoning, and expects strong growth in 2025. Despite concerns over DeepSeek, the AI chip market remains robust, with major companies like Meta, Google, and Amazon investing heavily in AI infrastructure.
AI Guide
AI Industry
DeepSeek Technology Breakthrough
Shenwan Hongyuan Securities reports that the breakthrough of DeepSeek technology may mark an explosive growth for the AI industry similar to the mobile internet in 2010, akin to the "iPhone 4 moment." The report suggests that the AI industry is in an early "incubation" stage, and DeepSeek's technological advancements will accelerate the application and popularization of AI.
AI Inference Model Competition
The inference model war triggered by DeepSeek R1 is intensifying, with Alibaba, Anthropic, ByteDance, and others entering the fray. DeepSeek is accelerating the launch of the R2 model, and Shen Xiangyang points out that the industry focus has shifted to the Reasoner mode. Various manufacturers are vying to launch inference models, aiming to seize users. Alibaba released the inference model QwQ-Max-Preview, but it is only a preview version. Anthropic, on the other hand, released a complete model, Claude 3.7 Sonnet, with performance surpassing OpenAI and DeepSeek R1. Companies are eager to release products to get ahead of DeepSeek, leveraging multimodal differentiation for competitive advantage.
News List
虎嗅.最新
17 hours ago
DeepSeek小心,帝国反击战打响了
DeepSeekR1引发的推理模型大战愈演愈烈,阿里、Anthropic、字节等纷纷入局。DeepSeek正加速推出R2模型,沈向洋指出行业焦点已转向Reasoner模式。各厂商争相推出推理模型,意在抢夺用户。阿里发布推理模型QwQ-Max-Preview,但仅为预览版,Anthropic则发布完整模型Claude3.7Sonnet,性能超越OpenAI和DeepSeekR1。厂商急于推出产品,旨在抢在DeepSeek之前,利用多模态差异化竞争。DeepSeek的崛起促使大厂重视AItoC,阿里计划发布QwenChatAPP。这场竞赛预示着AI推理模型领域的激烈竞争和快速发展。
虎嗅.最新
17 hours ago
神似2010,DeepSeek开启AI的“iPhone4”时刻
申万宏源证券报告指出,DeepSeek技术的突破可能标志着AI产业迎来类似2010年移动互联网的爆发式增长,如同“iPhone4时刻”。报告认为,AI产业正处于早期“孕育”阶段,DeepSeek技术的突破将加速AI应用落地和普及。硬件端和应用端都将经历从主题投资到成长投资的过渡,结构性创新和下一轮基本面改善周期蕴藏投资机会。AI应用爆发将推动基础设施和工具类企业崛起。与2010年相比,当前市场环境发生显著变化,AI概念股仍有较大上升空间。投资者应把握科技创新趋势,提前布局。
AI Guide
Artificial Intelligence
Nvidia's Situation
Nvidia is facing challenges.
News List
36kr.AI
18 hours ago
AI Guide
Artificial Intelligence
Text Generation in Video
The technology to generate text within videos has emerged.
DeepSeek's DeepGEMM for FP8 GEMM Optimization
DeepSeek has open-sourced DeepGEMM, an FP8 GEMM library implemented with only 300 lines of code. It focuses on optimizing matrix multiplication, particularly on NVIDIA Hopper GPUs. DeepGEMM achieves speeds up to 1350+ FP8 TFLOPS on Hopper GPUs and up to 2.7x acceleration in small-batch matrix multiplication. It employs optimizations such as warp-specialized kernels, Tensor Memory Accelerator (TMA), and specialized PTX instructions to concurrently execute data movement, tensor core MMA instructions, and CUDA core enhancements, thus improving computational efficiency and throughput. The design is simple, avoiding complex templates to lower the learning curve, and it outperforms expert-tuned kernels across various matrix shapes. DeepGEMM is suitable for V3/R1 training and inference, with excellent performance in AI model real-time inference and batch data processing.
News List
虎嗅.最新
1 days ago
DeepSeek最新开源,比英伟达更懂如何优化英伟达?
DeepSeek开源了DeepGEMM,一个仅用300行代码实现的FP8 GEMM库,专注于优化矩阵乘法,尤其是在NVIDIA Hopper GPU上的性能。DeepGEMM在Hopper GPU上实现了高达1350+ FP8 TFLOPS的速度,并在小批量矩阵乘法中实现了高达2.7倍的加速。它通过warp专用内核、张量内存加速器(TMA)和专用PTX指令等优化技术,实现了数据移动、张量核心MMA指令和CUDA核心提升的并发执行,从而提高了计算效率和吞吐量。DeepGEMM的设计简洁,避免了复杂模板,降低了学习和使用门槛,同时在多种矩阵形状上表现优于专家调优的内核,适用于V3/R1训练和推理,尤其在AI模型实时推理和批量数据处理方面表现出色。
36kr.AI
1 days ago