Paper Reviews by AI
2025
Samba-asr state-of-the-art speech recognition leveraging structured state-space models
·1451 words·7 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Speech Recognition
🏢 SandLogic Technologies Pvt Ltd
Samba-ASR, a novel speech recognition model using Mamba architecture, surpasses existing transformer models in accuracy and efficiency, setting a new benchmark for future ASR research.
MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models
·3666 words·18 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Tsinghua University
MotionBench, a new benchmark, reveals that existing video models struggle with fine-grained motion understanding. To address this, the authors propose TE Fusion, a novel architecture that improves mo…
GeAR: Generation Augmented Retrieval
·1952 words·10 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Question Answering
🏢 Microsoft Research
GeAR, a new retrieval model, boosts accuracy by combining document retrieval with fine-grained information generation, leading to better understanding and improved localization.
Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction
·2565 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 Chinese University of Hong Kong
Dispider: A novel system enabling real-time interaction with video LLMs via disentangled perception, decision, and reaction modules for efficient, accurate responses to streaming video.
BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning
·2687 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Shanghai AI Laboratory
BoostStep enhances large language models’ mathematical abilities by refining single-step reasoning through a novel step-level in-context learning strategy, achieving significant improvements on variou…
ToolHop: A Query-Driven Benchmark for Evaluating Large Language Models in Multi-Hop Tool Use
·3646 words·18 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 ByteDance
ToolHop: New benchmark dataset rigorously evaluates LLMs’ multi-hop tool use, revealing significant challenges and variations across different LLM families.
Test-time Computing: from System-1 Thinking to System-2 Thinking
·658 words·4 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Soochow University
Unlocking LLM potential: This paper surveys test-time computing, showing how it boosts reasoning abilities by shifting from reactive System-1 to deliberate System-2 thinking, paving the way for more p…
Scaling Laws for Floating Point Quantization Training
·6363 words·30 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Tencent AI Lab
New scaling laws for efficient floating-point quantization training in LLMs are presented, showing optimal bit allocation and critical data size.
GS-DiT: Advancing Video Generation with Pseudo 4D Gaussian Fields through Efficient Dense 3D Point Tracking
·2867 words·14 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Multimedia Laboratory, the Chinese University of Hong Kong
GS-DiT: Generating high-quality videos with advanced 4D control through efficient dense 3D point tracking and pseudo 4D Gaussian fields.
DepthMaster: Taming Diffusion Models for Monocular Depth Estimation
·2694 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 University of Science and Technology of China
DepthMaster tames diffusion models for faster, more accurate monocular depth estimation by aligning generative features with high-quality semantic features and adaptively balancing low and high-freque…
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
·1374 words·7 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 String
REINFORCE++, a novel RLHF algorithm, achieves superior training stability and computational efficiency compared to existing methods like PPO and GRPO, while maintaining comparable performance.
Personalized Graph-Based Retrieval for Large Language Models
·3633 words·18 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 University of California Santa Cruz
Personalized Graph-based Retrieval-Augmented Generation (PGraphRAG) significantly improves personalized text generation by leveraging user-centric knowledge graphs, especially in cold-start scenarios …
MagicFace: High-Fidelity Facial Expression Editing with Action-Unit Control
·3209 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Center for Machine Vision and Signal Analysis, Faculty of Information Technology and Electrical Engineering, University of Oulu
MagicFace achieves high-fidelity facial expression editing via AU control, preserving identity and background using a diffusion model and ID encoder, significantly outperforming existing methods.
VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
·2577 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 Tencent Youtu Lab
VITA-1.5 achieves near real-time vision and speech interaction by using a novel three-stage training method that progressively integrates speech data into an LLM, enabling fluent conversations.
Virgo: A Preliminary Exploration on Reproducing o1-like MLLM
·2983 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Multimodal Reasoning
🏢 Gaoling School of Artificial Intelligence, Renmin University of China
Virgo: A new multimodal slow-thinking system, significantly improves MLLM reasoning by fine-tuning with text-based long-form thought data, demonstrating comparable performance to commercial systems.
METAGENE-1: Metagenomic Foundation Model for Pandemic Monitoring
·3440 words·17 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 University of Southern California
METAGENE-1, a 7-billion parameter language model, achieves state-of-the-art results in pathogen detection and genomic embedding by leveraging a massive wastewater metagenomic dataset.
Ingredients: Blending Custom Photos with Video Diffusion Transformers
·2689 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Kunlun Inc.
Ingredients: A new framework customizes videos by blending multiple photos with video diffusion transformers, enabling realistic and personalized video generation while maintaining consistent identity…
EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation
·3400 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
AI Applications
Robotics
🏢 AgiBot
EnerVerse: A novel framework seamlessly integrates convolutional and attention mechanisms to generate embodied future spaces for enhanced robotic manipulation, mitigating data scarcity with a generati…
Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large Language Models
·3986 words·19 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Ant Group
AUTO-RT automates LLM vulnerability discovery by using reinforcement learning to optimize complex attack strategies, achieving faster detection and higher success rates than existing methods.
VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control
·3152 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Hong Kong University of Science and Technology
VideoAnydoor: High-fidelity video object insertion with precise motion control, achieved via an end-to-end framework leveraging an ID extractor and a pixel warper for robust detail preservation and fi…