🏢 Shanghai AI Laboratory
BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning
·2687 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Shanghai AI Laboratory
BoostStep enhances large language models’ mathematical abilities by refining single-step reasoning through a novel step-level in-context learning strategy, achieving significant improvements on variou…
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
·3509 words·17 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 Shanghai AI Laboratory
Task Preference Optimization (TPO) significantly boosts multimodal large language models’ visual understanding by aligning them with fine-grained visual tasks via learnable task tokens, achieving 14.6…
Are Your LLMs Capable of Stable Reasoning?
·2140 words·11 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Shanghai AI Laboratory
G-Pass@k & LiveMathBench: Evaluating the stability of LLMs.
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations
·6546 words·31 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Document Parsing
🏢 Shanghai AI Laboratory
OmniDocBench, a novel benchmark, tackles limitations in current document parsing by introducing a diverse, high-quality dataset with comprehensive annotations, enabling fair multi-level evaluation of …
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
·3628 words·18 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 Shanghai AI Laboratory
OS-Atlas: A new open-source toolkit and model dramatically improves GUI agent performance by providing a massive dataset and innovative training methods, enabling superior generalization to unseen int…