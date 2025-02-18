China's DeepSeek has launched NSA, a hardware-aligned and natively trainable sparse attention mechanism to offer users ultra-fast long-context training and inferences. DeepSeek NSA offers a dynamic hierarchical sparse strategy, fine-gained token selection, and coarse-gained token compression. The China-based DeepSeek AI company said its NSA would speed up the inferences and reduce pre-training costs without compromising performance. DeepSeek NSA is also said to outperform Full Attention models on various benchmarks. Grok 3 Launched by Elon Musk’s xAI Outperforming DeepSeek R1, OpenAI o1 and Gemini-2 Flash Thinking; Check Modes, Versions and More.

DeepSeek Launched NSA Mechanism for Faster Inferences, Lower Training Costs

🚀 Introducing NSA: A Hardware-Aligned and Natively Trainable Sparse Attention mechanism for ultra-fast long-context training & inference! Core components of NSA: • Dynamic hierarchical sparse strategy • Coarse-grained token compression • Fine-grained token selection 💡 With… pic.twitter.com/zjXuBzzDCp — DeepSeek (@deepseek_ai) February 18, 2025

