Paper Reviews by AI
2024
RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response
·2508 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Peking University
ROBUSTFT tackles noisy data in LLM fine-tuning by using multi-expert noise detection and context-enhanced relabeling, significantly boosting model performance in noisy scenarios.
ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing
·5664 words·27 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Tsinghua University
ReMoE: Revolutionizing Mixture-of-Experts with fully differentiable ReLU routing, achieving superior scalability and performance.
Progressive Multimodal Reasoning via Active Retrieval
·3576 words·17 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Multimodal Reasoning
🏢 Gaoling School of Artificial Intelligence, Renmin University of China
AR-MCTS: a novel framework boosting multimodal large language model reasoning by actively retrieving key supporting evidence and using Monte Carlo Tree Search for improved path selection and verificat…
Parallelized Autoregressive Visual Generation
·4274 words·21 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Peking University
Boosting autoregressive visual generation speed by 3.6-9.5x, this research introduces parallel processing while preserving model simplicity and generation quality.
Outcome-Refining Process Supervision for Code Generation
·2838 words·14 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Peking University
Boosting code generation accuracy, Outcome-Refining Process Supervision (ORPS) uses execution feedback and structured reasoning to refine code, achieving significant improvements across models and dat…
MixLLM: LLM Quantization with Global Mixed-precision between Output-features and Highly-efficient System Design
·2482 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Microsoft Research
MixLLM achieves state-of-the-art LLM compression by using mixed-precision quantization between output features, improving accuracy and system efficiency.
MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval
·2604 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 Hong Kong University of Science and Technology
MegaPairs synthesizes 26M+ high-quality multimodal retrieval training examples, enabling state-of-the-art zero-shot performance and surpassing existing methods trained on 70x more data.
LLMs Lost in Translation: M-ALERT uncovers Cross-Linguistic Safety Gaps
·11623 words·55 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 TU Darmstadt
M-ALERT, a new multilingual benchmark, reveals significant safety inconsistencies across languages in top LLMs.
LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis
·2715 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Hong Kong University of Science and Technology
LeviTor: Revolutionizing image-to-video synthesis with intuitive 3D trajectory control, generating realistic videos from static images by abstracting object masks into depth-aware control points.
How to Synthesize Text Data without Model Collapse?
·5702 words·27 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Tsinghua University
Token-level editing prevents language model collapse from synthetic data by theoretically bounding test error and empirically improving model performance.
Flowing from Words to Pixels: A Framework for Cross-Modality Evolution
·3592 words·17 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 Meta GenAI
CrossFlow: Directly evolve any modality to another using flow matching, achieving state-of-the-art results across various tasks!
Fietje: An open, efficient LLM for Dutch
·3094 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 KU Leuven
Fietje: an open-source, efficient Dutch language model outperforming larger models.
DI-PCG: Diffusion-based Efficient Inverse Procedural Content Generation for High-quality 3D Asset Creation
·2004 words·10 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 Tencent PCG
DI-PCG uses a lightweight diffusion transformer to efficiently and accurately estimate parameters of procedural generators from images, enabling high-fidelity 3D asset creation.
Affordance-Aware Object Insertion via Mask-Aware Dual Diffusion
·3907 words·19 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Harvard University
Affordance-Aware Object Insertion uses a novel Mask-Aware Dual Diffusion model & SAM-FB dataset to realistically place objects in scenes, considering contextual relationships.
AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling
·3123 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 NVIDIA Research
AceMath achieves state-of-the-art results in mathematical reasoning by introducing highly effective instruction-tuned models and reward models.
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks
·2677 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Carnegie Mellon University
AI agents are tested in a simulated company, revealing their capability to automate tasks and shortcomings with complex workflows and interfaces.
RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment
·4393 words·21 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 School of Artificial Intelligence, University of Chinese Academy of Sciences
First benchmark for RAG reward models reveals their limitations and the need for preference-aligned training.
Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation
·4162 words·20 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 Zhejiang University
Prompting unlocks 4K metric depth from low-cost LiDAR.
Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN
·2716 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 University of Surrey
Mix-LN boosts deep layer power in LLMs.
LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer
·3553 words·17 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 Tsinghua University
LLaVA-UHD v2 enhances MLLMs by integrating high-resolution visual details using a hierarchical window transformer.