Austin, February 23:  Grok 3 was recently released by Elon Musk's xAI and with promised upgrades to the Grok 2 model. The latest version of Grok was launched via a live video on X, where Elon Musk and the xAI team of engineers showed the benchmarks and capabilities of the new model. All benchmarks showed that the model was powerful and easily beat other leading AI products in the market.

However, despite the grand launch and people moving towards the new xAI model, the debates over Grok 3 benchmarks started. OpenAI of Applied Research Boris Power accused the Grok team of deceiving and cheating people into believing that Grok 3 was a better model than o3-mini. The employee said that OpenAI o3-mini was better in every evaluation compared to the latest Grok 3 model launched by xAI. Deepseek R1 Security Concerns: China’s AI Reasoning Model Fails Multiple Tests, Achieves 9.8 Security Risk Score Out 10, Says Report.

Grok 3 Inherently an o1 Level

o3 Mini Better Than Grok 3 Which xAI Overselling, Argued OpenAI’s Boris Power

Igor Babuschkin Argued Boris Power, Refuting Accusations

'Grok Looks Good There'

 

Boris Power said, "Grok 3 is genuinely a decent model, but there is no need to oversell." On X, a user posted alleging that Grok 3 reasoning was inherently an 'o1 level model'. The person said that the capabilities gap between OpenAI and xAI was nine months. The X user shared an AIME 2025 Performance chart and highlighted the difference.

On the other hand, xAI's Igor Babuschkin said that the allegations were "Completely Wrong". He said, "We just used the same method you guys used", and shared the benchmark test image again with the AIME 2024 test. However, according to a report by TechCrunch, AIME 2025 and older versions of tests used for determining the model's math capabilities were not that reliable. It said that some questioned the AIME's validity.

On the other hand, an OpenAI employee said that xAI did not include an AIME 2025 graph of o3-mini-high at cons@24 (consensus@64), meaning "running a query through the model 64 times and marking it correct if the most common output is correct." The report mentioned that OpenAI previously had similar misleading benchmark chats  comparing its own models. Grok Voice Mode Released: Elon Musk Announces Rolling Out Highly Anticipated Voice Support on Grok App As Beta, Memory Feature Coming Soon.

On the other hand, another user said, "It's Hilarious how some people see my plot as an attack on OpenAI and others as an attack on Grok, while in reality it's DeepSeek propaganda." The user said Grok 3 looked good there, and OpenAI's model was behind the benchmarks.

(The above story first appeared on LatestLY on Feb 23, 2025 12:02 PM IST. For more news and updates on politics, world, sports, entertainment and lifestyle, log on to our website latestly.com).