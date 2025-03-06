Alibaba's Qwen has released a new reasoning model, QwQ 32B, that is trained on 32 billion parameters to rival Deepseek R1. The Chinese AI company said that it investigated recipes for scaling RL (Reinforcement Learning) and achieved 'impressive' results based on the Qwen2.5-32B. Alibaba Qwen said that the RL training improved match and coding performance and that continuous scaling of Reinforcement Learning could help a medium-size model achieve better performance against MoE models. The company said, "Qwen2.5-Plus + Thinking (QwQ) = QwQ-32B". OpenAI GPT-4.5 Rolled Out to All Plus Users, Offers Better Performance and Enhanced Capabilities Than Previous Model.

QwQ 32B Reasoning Model Released by Alibaba's Qwen

