Skip to main content

🏢 DeepSeek-AI

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
·2722 words·13 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 DeepSeek-AI
NSA: a novel sparse attention mechanism achieves efficient long-context modeling by combining algorithmic innovations with hardware-aligned optimizations, surpassing full attention models across vario…
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
·2866 words·14 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 DeepSeek-AI
DeepSeek-R1 significantly improves LLM reasoning by using reinforcement learning, achieving performance comparable to OpenAI’s top models while addressing previous challenges of poor readability and l…