↓Skip to main content

🏢 DeepSeek-AI

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

16 February 2025·2722 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 DeepSeek-AI

NSA: a novel sparse attention mechanism achieves efficient long-context modeling by combining algorithmic innovations with hardware-aligned optimizations, surpassing full attention models across vario…

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

22 January 2025·2866 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 DeepSeek-AI

DeepSeek-R1 significantly improves LLM reasoning by using reinforcement learning, achieving performance comparable to OpenAI’s top models while addressing previous challenges of poor readability and l…