🏢 Shanghai AI Laboratory

Lumina-Image 2.0: A Unified and Efficient Image Generative Framework

27 March 2025·3416 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Shanghai AI Laboratory

Lumina-Image 2.0: A unified & efficient image generative framework, outperforming previous models with only 2.6B parameters.

LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis

27 March 2025·2431 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Shanghai AI Laboratory

LeX-Art: High-quality text-to-image generation via scalable data synthesis.

A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond

27 March 2025·2301 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Shanghai AI Laboratory

Survey on improving efficiency in large reasoning models across language, multimodality, and beyond.

TokenHSI: Unified Synthesis of Physical Human-Scene Interactions through Task Tokenization

25 March 2025·3042 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Shanghai AI Laboratory

TokenHSI: Unified Transformer for Physical Human-Scene Interactions through Task Tokenization.

LEGO-Puzzles: How Good Are MLLMs at Multi-Step Spatial Reasoning?

25 March 2025·2895 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Multimodal Reasoning 🏢 Shanghai AI Laboratory

MLLMs still struggle with spatial reasoning! LEGO-Puzzles benchmark reveals critical deficiencies, paving the way for AI advancement.

Aether: Geometric-Aware Unified World Modeling

24 March 2025·2472 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Shanghai AI Laboratory

AETHER: a unified framework enabling geometry-aware reasoning in world models, achieving zero-shot generalization from synthetic to real-world data.

CLS-RL: Image Classification with Rule-Based Reinforcement Learning

20 March 2025·2967 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Classification 🏢 Shanghai AI Laboratory

CLS-RL: Rule-based RL tackles catastrophic forgetting in MLLM image classification, outperforming SFT with better generalization and efficiency.

Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts

7 March 2025·3804 words·18 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Shanghai AI Laboratory

Linear-MoE: Integrates Linear Sequence Modeling with Mixture-of-Experts, achieving efficiency gains and competitive performance in large language models.

Lost in Literalism: How Supervised Training Shapes Translationese in LLMs

6 March 2025·3432 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Machine Translation 🏢 Shanghai AI Laboratory

LLMs show translationese due to supervised training biases. Polishing references and filtering unnatural instances can mitigate this issue.

Liger: Linearizing Large Language Models to Gated Recurrent Structures

3 March 2025·4096 words·20 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Shanghai AI Laboratory

Liger: LLMs linearized to gated recurrent models, enabling efficient deployment via key matrix repurposing and LoRA fine-tuning.

MoM: Linear Sequence Modeling with Mixture-of-Memories

19 February 2025·2764 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Shanghai AI Laboratory

MoM: Enhancing linear sequence modeling via mixture-of-memories for improved recall and reduced memory interference.

LASP-2: Rethinking Sequence Parallelism for Linear Attention and Its Hybrid

11 February 2025·2654 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Shanghai AI Laboratory

LASP-2 revolutionizes linear attention training by achieving 36.6% faster speeds than Ring Attention via a novel sequence parallelism method, boosting efficiency for very long sequences.

Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning

10 February 2025·1736 words·9 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Shanghai AI Laboratory

OREAL, a novel RL framework, achieves state-of-the-art mathematical reasoning in LLMs using only binary outcome rewards, demonstrating that a 7B model can match the performance of 32B models.

BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning

6 January 2025·2687 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Shanghai AI Laboratory

BoostStep enhances large language models’ mathematical abilities by refining single-step reasoning through a novel step-level in-context learning strategy, achieving significant improvements on variou…

Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment

26 December 2024·3509 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Shanghai AI Laboratory

Task Preference Optimization (TPO) significantly boosts multimodal large language models’ visual understanding by aligning them with fine-grained visual tasks via learnable task tokens, achieving 14.6…

Are Your LLMs Capable of Stable Reasoning?

17 December 2024·2140 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Shanghai AI Laboratory

G-Pass@k & LiveMathBench: Evaluating the stability of LLMs.

OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations

10 December 2024·6546 words·31 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Document Parsing 🏢 Shanghai AI Laboratory

OmniDocBench, a novel benchmark, tackles limitations in current document parsing by introducing a diverse, high-quality dataset with comprehensive annotations, enabling fair multi-level evaluation of …

OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

30 October 2024·3628 words·18 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Shanghai AI Laboratory

OS-Atlas: A new open-source toolkit and model dramatically improves GUI agent performance by providing a massive dataset and innovative training methods, enabling superior generalization to unseen int…