🏢 DeepSeek-AI
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
·2722 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 DeepSeek-AI
NSA: a novel sparse attention mechanism achieves efficient long-context modeling by combining algorithmic innovations with hardware-aligned optimizations, surpassing full attention models across vario…
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
·2866 words·14 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 DeepSeek-AI
DeepSeek-R1 significantly improves LLM reasoning by using reinforcement learning, achieving performance comparable to OpenAI’s top models while addressing previous challenges of poor readability and l…