Skip to main content

🏢 Shanghai AI Laboratory

Lumina-Image 2.0: A Unified and Efficient Image Generative Framework
·3416 words·17 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Shanghai AI Laboratory
Lumina-Image 2.0: A unified & efficient image generative framework, outperforming previous models with only 2.6B parameters.
LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis
·2431 words·12 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Shanghai AI Laboratory
LeX-Art: High-quality text-to-image generation via scalable data synthesis.
A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond
·2301 words·11 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Shanghai AI Laboratory
Survey on improving efficiency in large reasoning models across language, multimodality, and beyond.
LEGO-Puzzles: How Good Are MLLMs at Multi-Step Spatial Reasoning?
·2895 words·14 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Multimodal Reasoning 🏢 Shanghai AI Laboratory
MLLMs still struggle with spatial reasoning! LEGO-Puzzles benchmark reveals critical deficiencies, paving the way for AI advancement.
Aether: Geometric-Aware Unified World Modeling
·2472 words·12 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Shanghai AI Laboratory
AETHER: a unified framework enabling geometry-aware reasoning in world models, achieving zero-shot generalization from synthetic to real-world data.
CLS-RL: Image Classification with Rule-Based Reinforcement Learning
·2967 words·14 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Classification 🏢 Shanghai AI Laboratory
CLS-RL: Rule-based RL tackles catastrophic forgetting in MLLM image classification, outperforming SFT with better generalization and efficiency.
Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts
·3804 words·18 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Shanghai AI Laboratory
Linear-MoE: Integrates Linear Sequence Modeling with Mixture-of-Experts, achieving efficiency gains and competitive performance in large language models.
Lost in Literalism: How Supervised Training Shapes Translationese in LLMs
·3432 words·17 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Machine Translation 🏢 Shanghai AI Laboratory
LLMs show translationese due to supervised training biases. Polishing references and filtering unnatural instances can mitigate this issue.
Liger: Linearizing Large Language Models to Gated Recurrent Structures
·4096 words·20 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Shanghai AI Laboratory
Liger: LLMs linearized to gated recurrent models, enabling efficient deployment via key matrix repurposing and LoRA fine-tuning.
MoM: Linear Sequence Modeling with Mixture-of-Memories
·2764 words·13 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Shanghai AI Laboratory
MoM: Enhancing linear sequence modeling via mixture-of-memories for improved recall and reduced memory interference.
LASP-2: Rethinking Sequence Parallelism for Linear Attention and Its Hybrid
·2654 words·13 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Shanghai AI Laboratory
LASP-2 revolutionizes linear attention training by achieving 36.6% faster speeds than Ring Attention via a novel sequence parallelism method, boosting efficiency for very long sequences.
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning
·1736 words·9 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Shanghai AI Laboratory
OREAL, a novel RL framework, achieves state-of-the-art mathematical reasoning in LLMs using only binary outcome rewards, demonstrating that a 7B model can match the performance of 32B models.
BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning
·2687 words·13 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Shanghai AI Laboratory
BoostStep enhances large language models’ mathematical abilities by refining single-step reasoning through a novel step-level in-context learning strategy, achieving significant improvements on variou…
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
·3509 words·17 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Shanghai AI Laboratory
Task Preference Optimization (TPO) significantly boosts multimodal large language models’ visual understanding by aligning them with fine-grained visual tasks via learnable task tokens, achieving 14.6…
Are Your LLMs Capable of Stable Reasoning?
·2140 words·11 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Shanghai AI Laboratory
G-Pass@k & LiveMathBench: Evaluating the stability of LLMs.
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations
·6546 words·31 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Document Parsing 🏢 Shanghai AI Laboratory
OmniDocBench, a novel benchmark, tackles limitations in current document parsing by introducing a diverse, high-quality dataset with comprehensive annotations, enabling fair multi-level evaluation of …
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
·3628 words·18 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Shanghai AI Laboratory
OS-Atlas: A new open-source toolkit and model dramatically improves GUI agent performance by providing a massive dataset and innovative training methods, enabling superior generalization to unseen int…