AI Guide

Artificial Intelligence

AI Benchmark Debate: xAI vs. OpenAI

A debate has erupted concerning AI benchmarks after an OpenAI employee accused Elon Musk's xAI of publishing misleading benchmark results for its Grok3 model. The controversy revolves around xAI's graph showcasing Grok3's performance on the AIME 2025 math question set, where it seemingly outperformed OpenAI's o3-mini-high model. However, xAI's graph omitted o3-mini-high's score at "cons@64," a method that significantly improves benchmark scores. xAI defends its methods, claiming OpenAI has also presented misleading charts in the past. Despite the debate, xAI is advertising Grok3 as the "world's smartest AI."

News List

TechCrunch

12 hours ago

Did xAI lie about Grok 3’s benchmarks?

A debate has emerged regarding AI benchmarks, with an OpenAI employee accusing Elon Musk’s xAI of publishing misleading benchmark results for its Grok3 model. The controversy centers on xAI’s graph showing Grok3’s performance on the AIME 2025 math question set, where it appeared to outperform OpenAI’s o3-mini-high model. However, it was pointed out that xAI’s graph omitted o3-mini-high’s score at “cons@64,” a method that significantly boosts benchmark scores. While xAI defends its methods, arguing that OpenAI has also presented misleading charts in the past, critics have presented more comprehensive graphs. Despite the debate, xAI is advertising Grok3 as the “world’s smartest AI.”

AI Guide

AI Productivity Tools

1minAI Advanced Business Plan

The 1minAI Advanced Business Plan provides a collection of AI tools designed to enhance productivity and creativity. It supports tasks like writing, image creation, and audio/video processing, utilizing AI models such as GPT-4, Claude, and Gemini. Users gain access to AI assistants for problem-solving, blog post generation, image editing, PDF summarization, and media file processing. This platform consolidates tools, eliminating multiple subscriptions. A lifetime subscription is available for A$156, reduced from A$847.

News List

Mashable

13 hours ago

The ultimate AI toolkit is yours for life for just A$156

The 1minAI Advanced Business Plan offers a suite of AI tools designed to boost productivity and creativity. It assists with tasks like writing, image creation, and audio/video processing, utilizing AI models such as GPT-4, Claude, and Gemini. Users can access AI assistants for problem-solving, generate blog posts, edit images, summarize PDFs, and process media files. The platform consolidates various tools into one, eliminating the need for multiple subscriptions. A lifetime subscription is available for A$156, reduced from A$847, providing ongoing access to the improving AI tool suite.

AI Guide

Technology & Policy

Potential Layoffs at NIST Impacting AI Safety

The National Institute of Standards and Technology (NIST) is planning potential layoffs of up to 500 staffers, which could significantly impact the AI Safety Institute (AISI) and Chips for America. These layoffs, primarily targeting probationary employees, have raised concerns about the future of AISI, especially after the director's departure in February. The institute's future appears uncertain, particularly with the current administration potentially repealing the executive order that established it. Experts warn that these cuts could severely undermine the government's ability to address critical AI safety concerns.

News List

TechCrunch

13 hours ago

US AI Safety Institute could face big cuts

The National Institute of Standards and Technology (NIST) is reportedly planning to lay off as many as 500 staffers, which could significantly impact the AI Safety Institute (AISI) and Chips for America. These layoffs, primarily targeting probationary employees, have raised concerns about the future of AISI, which was established last year to study AI risks and develop safety standards under President Biden’s executive order. The director of AISI already departed in February, and with the current administration potentially repealing the executive order, the institute’s future appears uncertain. Experts warn that these cuts would severely undermine the government’s ability to address critical AI safety concerns at a crucial time.

AI Guide

Artificial Intelligence Advancements

AI Browser Agents

A new wave of AI-powered browser-use agents is emerging, including OpenAI's Operator and Convergence's Proxy. These agents autonomously navigate websites, retrieve information, and complete transactions. The key is to find suitable developer and enterprise use cases, possibly in combination with tools like Deep Research.

Google DeepMind's Mixture-of-Depths (MoD) Transformers

Google DeepMind is implementing Mixture-of-Depths (MoD) Transformers, which leverage conditional computation to determine when and how to expend computation. In MoD, a token can skip middle layers and later be updated via self-attention.
This method sets a static compute budget by limiting the number of tokens in a sequence that can participate in a block’s computations, using a per-block router to emit a scalar weight for each token, which expresses the router’s preference for that token to participate in a block’s computations or to route around it.
MoD transformers drag the baseline isoFLOP curve "down and to the right". The optimal MoD transformer achieves a lower loss than the optimal baseline and also has more parameters, outperforming the isoFLOP optimal baseline (also 220M), but is upward of 60% faster to step during training.
MoD transformers improve on isoFLOP-optimal baseline performance with models that use fewer FLOPs per forward pass. Learned routing mechanisms are sometimes non-causal, using information about the future to determine a given token’s routing decision.

OCEAN: A Decentralized AI Assistant

O.XYZ has launched OCEAN, a decentralized AI assistant that is reportedly faster than ChatGPT, powered by the Cerebras CS-3 wafer-scale chips. The Cerebras CS-3 chip has 900,000 AI-optimized cores and four trillion transistors on a single chip, scaling from one billion to 24 trillion parameters without code changes, with 21PB/s of memory bandwidth. OCEAN adopts a dual approach, serving both individual consumers and enterprises with a voice interaction system and advanced AI agent capabilities.

News List

VentureBeat

17 hours ago

The rise of browser-use agents: Why Convergence’s Proxy is beating OpenAI’s Operator

A new wave of AI-powered browser-use agents is emerging, promising to transform how enterprises interact with the web. These agents, including OpenAI’s Operator and Convergence’s Proxy, can autonomously navigate websites, retrieve information, and even complete transactions – but early testing reveals significant gaps between promise and performance between these leaders. While consumer examples like ordering pizza or buying game tickets have grabbed headlines, the question is about where main developer and enterprise use cases are for this. More likely it is going to be used in combination with other tools like Deep Research, where companies can then do even more sophisticated research plus execution of tasks around the web.

HackerNoon

18 hours ago

O.XYZ Launches OCEAN – Cerebras-Powered AI Engine, 10x Faster Than ChatGPT

O.XYZ has launched OCEAN, a decentralized AI assistant that is faster than ChatGPT, which is powered by the Cerebras CS-3 wafer-scale chips. OCEAN’s speed and real-time response capabilities are central to its value. The Cerebras CS-3 chip has 900,000 AI-optimized cores and four trillion transistors on a single chip. Cerebras can scale from one billion to 24 trillion parameters without code changes, with 21PB/s of memory bandwidth. OCEAN delivers a user experience that includes a voice interaction system and advanced AI agent capabilities. OCEAN adopts a dual approach, serving both individual consumers and enterprises.

HackerNoon

22 hours ago

AI Models Are Learning to Prioritize Their Thoughts—And It’s Wildly Effective

Google DeepMind researchers have found that Mixture-of-Depths transformers empirically demonstrate that one can improve on isoFLOP-optimal baseline performance with models that use fewer FLOPs per forward pass. While MoD transformers require fewer FLOPs per forward pass, one cannot forego FLOPs indiscriminately. Learned routing mechanisms are sometimes non-causal; that is, information about the future is used to determine a given token’s routing decision.

HackerNoon

23 hours ago

What If AI Could Skip the Boring Parts? Google Researchers Just Made It Happen

Google DeepMind has found that MoD transformers drag the baseline isoFLOP curve “down and to the right”. The optimal MoD transformer achieves a lower loss than the optimal baseline and also has more parameters. A 220M parameter MoD variants slightly outperforms the isoFLOP optimal baseline (also 220M), but is upward of 60% faster to step during training. Aggressive capacity reduction was best, with gradual improvements when reducing the capacity down to 12.5% of the total sequence, corresponding to 87.5% of tokens routing around blocks, with performance degrading beyond that.

HackerNoon

23 hours ago

This Clever AI Hack Could Cut Processing Costs in Half

Google DeepMind is implementing Mixture-of-Depths Transformers. Set a static compute budget that is less than that of an equivalent vanilla transformer by limiting the number of tokens in a sequence that can participate in a block’s computations. Use a per-block router to emit a scalar weight for each token, which expresses the router’s preference for that token to participate in a block’s computations or to route around it. Identify the top-𝑘 scalar weights to select those tokens that will participate in a block’s computations.

HackerNoon

23 hours ago

New AI Method Lets Models Decide What to Think About

Google DeepMind’s transformer architecture has become the workhorse of a revolution in practical artificial intelligence, bringing unprecedented capabilities at the cost of expensive training runs and serving procedures. One of the promising approaches is conditional computation, whereby learned mechanisms determine when and how to expend computation. In MoD, unlike in early-exit methods, a token can skip middle layers, then be updated via self-attention with tokens that have gone through all the middle layers.