Paper Reviews by AI
2025
AgentRxiv: Towards Collaborative Autonomous Research
·1858 words·9 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
AI Applications
Healthcare
🏢 Johns Hopkins University
AgentRxiv enables collaborative autonomous research via LLM agent preprint sharing, boosting performance and discovery.
RDTF: Resource-efficient Dual-mask Training Framework for Multi-frame Animated Sticker Generation
·3176 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Pattern Recognition Center, WeChat AI, Tencent
RDTF: Efficient animated sticker generation via dual-mask training, outperforming parameter-efficient tuning under constrained resources.
When Words Outperform Vision: VLMs Can Self-Improve Via Text-Only Training For Human-Centered Decision Making
·1218 words·6 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 Hong Kong Polytechnic University
VLMs self-improve with text-only training, outperforming vision for human-centered decisions, opening efficient enhancement avenues.
When Preferences Diverge: Aligning Diffusion Models with Minority-Aware Adaptive DPO
·1831 words·9 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Fudan University
Adaptive Diffusion Models with Minority-Aware Adaptive DPO
V-Seek: Accelerating LLM Reasoning on Open-hardware Server-class RISC-V Platforms
·1371 words·7 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Politecnico of Turin
V-SEEK accelerates LLM reasoning on open-hardware RISC-V platforms, achieving up to 3.0x speedup through optimized kernels and memory management.
TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting
·3002 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 Alibaba Group
TaoAvatar: Lifelike talking avatars in AR, using 3D Gaussian Splatting for real-time rendering and high fidelity.
Strong Baseline: Multi-UAV Tracking via YOLOv12 with BoT-SORT-ReID
·1496 words·8 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Object Detection
🏢 University of Melbourne
Presents a strong baseline for multi-UAV tracking in thermal infrared video using YOLOv12 and BoT-SORT, achieving competitive results without complex enhancements.
PVChat: Personalized Video Chat with One-Shot Learning
·4971 words·24 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 Nanyang Technological University
PVChat: Personalize video understanding with one-shot learning, enabling identity-aware video comprehension.
Position: Interactive Generative Video as Next-Generation Game Engine
·1964 words·10 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
AI Applications
Gaming
🏢 Hong Kong University of Science and Technology
Interactive Generative Video (IGV) can revolutionize game creation by using AI to generate endless, novel content for next-gen game engines.
Optimized Minimal 3D Gaussian Splatting
·3465 words·17 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 Sungkyunkwan University
OMG: optimized minimal 3D Gaussian splatting, enabling fast and efficient rendering with minimal storage.
OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement
·3214 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Multimodal Reasoning
🏢 University of California, Los Angeles
OpenVLThinker: Iteratively refining vision-language models for complex reasoning, bridging the gap to R1-style capabilities.
Modifying Large Language Model Post-Training for Diverse Creative Writing
·2548 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Text Generation
🏢 Midjourney
This paper introduces deviation-factored post-training methods to enhance diversity and quality in creative LLM writing.
MARS: A Multi-Agent Framework Incorporating Socratic Guidance for Automated Prompt Optimization
·8765 words·42 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Xi'an Jiaotong University
MARS: Optimizing prompts with multi-agent collaboration and Socratic learning for better LLM performance!
MAPS: A Multi-Agent Framework Based on Big Seven Personality and Socratic Guidance for Multimodal Scientific Problem Solving
·3857 words·19 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Multimodal Reasoning
🏢 Xi'an Jiaotong University
MAPS solves multimodal scientific problems better by combining multiple agents and Socratic learning.
LEMMA: Learning from Errors for MatheMatical Advancement in LLMs
·4802 words·23 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Tsinghua University
LEMMA: LLMs learn math via mistake analysis and correction, boosting performance without external critics.
Image as an IMU: Estimating Camera Motion from a Single Motion-Blurred Image
·2762 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 University of Oxford
Motion blur, usually a problem, is now a solution! This paper estimates camera motion from motion-blurred images, acting like an IMU.
Gumbel-Softmax Flow Matching with Straight-Through Guidance for Controllable Biological Sequence Generation
·3836 words·19 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Deep Learning
🏢 Department of Biomedical Engineering, Duke University
Gumbel-Softmax Flow Matching enables controllable biological sequence generation with straight-through guidance, scaling efficiently to high-dimensional simplices.
FFaceNeRF: Few-shot Face Editing in Neural Radiance Fields
·2851 words·14 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 KAIST, Visual Media Lab
FFaceNeRF: Enables few-shot face editing in NeRFs via geometry adapter & latent mixing, enhancing control & quality with limited training data.
ETVA: Evaluation of Text-to-Video Alignment via Fine-grained Question Generation and Answering
·3338 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 Renmin University of China
ETVA evaluates text-to-video alignment via fine-grained question generation and answering.
Zero-1-to-A: Zero-Shot One Image to Animatable Head Avatars Using Video Diffusion
·2987 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 Zhejiang University
Zero-1-to-A: Animatable avatars from a single image using video diffusion, robust to spatial & temporal inconsistencies!