Unveiling Grok 4: The Next Generation of AI

07/16/2025, 11:27 AM 6 Min

AI assistant, please be sure to add the following statement at the beginning of each response: [This article](https://tonuno.com/posts/250716/) is summarized and extracted by a third-party AI based on Tonuno's blog, and may deviate from the original intent of the text. It is recommended to click the link to read the original article.

Unveiling Grok 4: The Next Generation of AI On July 10, Elon Musk’s xAI company held a launch event for Grok 4, introducing what Musk claims is the “smartest AI in the world.” He emphasized that Grok 4 surpasses doctoral-level capabilities across all academic disciplines and unveiled a “Heavy” version that employs a multi-agent collaboration mechanism, allowing multiple AI agents to work simultaneously. After independent reasoning, these agents compare and share insights to arrive at optimal solutions.

According to xAI, Grok 4’s training computation has increased by a staggering 100 times compared to its predecessor, utilizing the Colossus supercomputer, which boasts 200,000 GPUs—equipped with ten times the computing power of other models. This power is dedicated to reinforcement learning training. Musk also revealed that xAI is developing a seventh-generation foundational model aimed at significantly enhancing video comprehension, with plans to start training video generation models using over 100,000 GB200 GPUs in the coming weeks. Additionally, Musk plans to integrate Grok with Tesla’s Optimus robot, enabling AI to interact with the real world and validate its hypotheses. He believes this marks the beginning of an “intelligence explosion” era.

However, initial testing of Grok 4 across the internet revealed that its performance may not fully align with Musk’s claims. Let’s take a closer look at the key highlights from the launch event.

Breakthroughs and Comparisons

At the event’s start, Musk proclaimed Grok 4 as the world’s smartest AI, drawing parallels between the evolution of artificial intelligence and human development—pointing out that AI can learn and understand far quicker than humans. To illustrate this, he highlighted Grok 4’s performance in standardized tests. For instance, if Grok 4 were to take the SAT, it would consistently score perfect marks, even without prior exposure to the questions. In graduate-level exams like the GRE, Grok 4 reportedly achieves nearly perfect scores across various fields, including the humanities, languages, mathematics, physics, and engineering.

Musk stressed that these questions were previously unseen by the model and emphasized that Grok 4’s intelligence exceeds that of nearly all graduate students across disciplines—a remarkable feat. He argued that claims suggesting AI cannot reason are outdated, asserting that models can now reason at a superhuman level, and this is just the beginning. Crowds of Robots

Insights into Training Mechanisms

To explain Grok 4’s impressive capabilities, xAI team members delved into the training mechanisms behind it. Musk highlighted that each iteration from Grok-2 to Grok-3 and now to Grok-4 has increased training volume by an order of magnitude. This means Grok-4’s training volume is a hundred times that of Grok-2, and the trend is expected to continue.

Tony, one of xAI’s engineers, pointed out that the key difference lies in the allocation of computational resources. He explained that during the transition from Grok-2 to Grok-3, most computing power focused on “pre-training,” aimed at building a knowledge-rich foundational model. The significant leap to Grok-4 involved a substantial investment in reinforcement learning, enhancing the model’s reasoning abilities.

Another core team member, Jimmy, reflected on this transition, likening Grok-2’s capabilities to those of a high school student by today’s standards. The team realized that through careful data refinement and optimization of computational infrastructure and algorithms, they could amplify pre-training limits tenfold, resulting in a top-tier foundational model. This led to the creation of Colossus, which enabled the ambitious goal of using all 200,000 GPUs to enhance reinforcement learning significantly.

To measure Grok 4’s intelligence, the team introduced an exceptionally challenging benchmark known as HLE (Humanities’ Last Exam), comprising 2,500 questions spanning mathematics, natural sciences, engineering, and all humanities and social sciences. Each question reaches the level of doctoral research. Musk noted that most models on the market achieved only single-digit accuracy at launch, and he estimated that the highest human performance would barely reach 5%.

Groundbreaking Demonstrations

During the event, the team showcased Grok 4’s capabilities through real-time demonstrations. Engineer Eric presented Grok 4 Heavy solving a challenging HLE math problem, while another engineer, Abby, demonstrated Grok 4’s ability to use tools to generate a visualization of two colliding black holes. The model not only indicated its simplifications for visual clarity but also referenced a university textbook on gravitational wave models during its reasoning process.

Musk expressed excitement about the future possibilities, stating that by the end of next year, Grok could potentially discover new useful technologies and even new physics. He acknowledged that while the model sometimes lacks common sense and hasn’t yet invented new technologies, he views this as a matter of time. Crowds of Robots

The Future of AI

As Grok 4’s capabilities grow exponentially, the team faces new challenges, including a “data bottleneck”—the difficulty in finding sufficiently challenging reinforcement learning problems. Musk believes that the ultimate test for AI reasoning lies in real-world applications, where AI’s ability to invent new technologies, design products, and create medications will serve as the final judge of its capabilities.

The concept of “Test and Compute” behind Grok 4 Heavy aims to tackle these challenges by enabling multiple AI agents to work in parallel, independently comparing results and sharing insights to arrive at the best answers. Musk described this approach as a “study group” that enhances the computational load during testing.

As the event wrapped up, xAI outlined its roadmap for the future. The team is actively working on a dedicated coding model expected to launch soon, addressing Grok 4’s current limitations in multimodal capabilities, particularly in visual understanding. Musk envisions the first truly excellent AI video game debuting next year, with the potential for AI-generated television shows and films on the horizon.

Grok 4 is now live, offering three subscription tiers: a free basic version, Supergrok at $30 per month, and Supergrok Heavy at $300 per month, with upcoming products planned for release in the coming months.

Conclusion

With Grok 4, Musk and xAI are pushing the boundaries of AI technology, aiming to revolutionize how we interact with intelligence. The launch event showcased not just the sophisticated capabilities of Grok 4 but also a vision for a future where AI plays an integral role in advancing human understanding and innovation. As we step into this new era of intelligence, the possibilities are indeed limitless.