Skip to main content

🏢 Shanghai AI Laboratory

BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning
·2687 words·13 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Shanghai AI Laboratory
BoostStep enhances large language models’ mathematical abilities by refining single-step reasoning through a novel step-level in-context learning strategy, achieving significant improvements on variou…
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
·3509 words·17 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Shanghai AI Laboratory
Task Preference Optimization (TPO) significantly boosts multimodal large language models’ visual understanding by aligning them with fine-grained visual tasks via learnable task tokens, achieving 14.6…
Are Your LLMs Capable of Stable Reasoning?
·2140 words·11 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Shanghai AI Laboratory
G-Pass@k & LiveMathBench: Evaluating the stability of LLMs.
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations
·6546 words·31 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Document Parsing 🏢 Shanghai AI Laboratory
OmniDocBench, a novel benchmark, tackles limitations in current document parsing by introducing a diverse, high-quality dataset with comprehensive annotations, enabling fair multi-level evaluation of …
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
·3628 words·18 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Shanghai AI Laboratory
OS-Atlas: A new open-source toolkit and model dramatically improves GUI agent performance by providing a massive dataset and innovative training methods, enabling superior generalization to unseen int…