Skip to main content

Paper Reviews by AI

2025

AlphaSpace: Enabling Robotic Actions through Semantic Tokenization and Symbolic Reasoning
·327 words·2 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Embodied AI 🏢 Menlo Research
AlphaSpace enables robotic actions via semantic tokenization and symbolic reasoning, enhancing spatial intelligence in LLMs.
Aether: Geometric-Aware Unified World Modeling
·2472 words·12 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Shanghai AI Laboratory
AETHER: a unified framework enabling geometry-aware reasoning in world models, achieving zero-shot generalization from synthetic to real-world data.
Vision-R1: Evolving Human-Free Alignment in Large Vision-Language Models via Vision-Guided Reinforcement Learning
·3123 words·15 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Foundation Model Research Center, Institute of Automation, Chinese Academy of Sciences
Vision-R1: Improves LVLMs via vision-guided reinforcement learning, eliminating the need for human feedback and specialized reward models.
OmnimatteZero: Training-free Real-time Omnimatte with Pre-trained Video Diffusion Models
·2382 words·12 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 OriginAI, Tel-Aviv, Israel
OmnimatteZero: Real-time omnimatte using pre-trained video diffusion, no training needed!
Lost in Cultural Translation: Do LLMs Struggle with Math Across Cultural Contexts?
·3575 words·17 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 155mv Research Lab
LLMs falter on culturally adapted math problems, revealing a critical cultural bias.
AgentRxiv: Towards Collaborative Autonomous Research
·1858 words·9 mins· loading · loading
AI Generated 🤗 Daily Papers AI Applications Healthcare 🏢 Johns Hopkins University
AgentRxiv enables collaborative autonomous research via LLM agent preprint sharing, boosting performance and discovery.
RDTF: Resource-efficient Dual-mask Training Framework for Multi-frame Animated Sticker Generation
·3176 words·15 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Pattern Recognition Center, WeChat AI, Tencent
RDTF: Efficient animated sticker generation via dual-mask training, outperforming parameter-efficient tuning under constrained resources.
When Words Outperform Vision: VLMs Can Self-Improve Via Text-Only Training For Human-Centered Decision Making
·1218 words·6 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Hong Kong Polytechnic University
VLMs self-improve with text-only training, outperforming vision for human-centered decisions, opening efficient enhancement avenues.
When Preferences Diverge: Aligning Diffusion Models with Minority-Aware Adaptive DPO
·1831 words·9 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Fudan University
Adaptive Diffusion Models with Minority-Aware Adaptive DPO
V-Seek: Accelerating LLM Reasoning on Open-hardware Server-class RISC-V Platforms
·1371 words·7 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Politecnico of Turin
V-SEEK accelerates LLM reasoning on open-hardware RISC-V platforms, achieving up to 3.0x speedup through optimized kernels and memory management.
TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting
·3002 words·15 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Alibaba Group
TaoAvatar: Lifelike talking avatars in AR, using 3D Gaussian Splatting for real-time rendering and high fidelity.
Strong Baseline: Multi-UAV Tracking via YOLOv12 with BoT-SORT-ReID
·1496 words·8 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Object Detection 🏢 University of Melbourne
Presents a strong baseline for multi-UAV tracking in thermal infrared video using YOLOv12 and BoT-SORT, achieving competitive results without complex enhancements.
PVChat: Personalized Video Chat with One-Shot Learning
·4971 words·24 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Nanyang Technological University
PVChat: Personalize video understanding with one-shot learning, enabling identity-aware video comprehension.
Position: Interactive Generative Video as Next-Generation Game Engine
·1964 words·10 mins· loading · loading
AI Generated 🤗 Daily Papers AI Applications Gaming 🏢 Hong Kong University of Science and Technology
Interactive Generative Video (IGV) can revolutionize game creation by using AI to generate endless, novel content for next-gen game engines.
Optimized Minimal 3D Gaussian Splatting
·3465 words·17 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Sungkyunkwan University
OMG: optimized minimal 3D Gaussian splatting, enabling fast and efficient rendering with minimal storage.
OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement
·3214 words·16 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Multimodal Reasoning 🏢 University of California, Los Angeles
OpenVLThinker: Iteratively refining vision-language models for complex reasoning, bridging the gap to R1-style capabilities.
Modifying Large Language Model Post-Training for Diverse Creative Writing
·2548 words·12 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Text Generation 🏢 Midjourney
This paper introduces deviation-factored post-training methods to enhance diversity and quality in creative LLM writing.
MARS: A Multi-Agent Framework Incorporating Socratic Guidance for Automated Prompt Optimization
·8765 words·42 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Xi'an Jiaotong University
MARS: Optimizing prompts with multi-agent collaboration and Socratic learning for better LLM performance!
MAPS: A Multi-Agent Framework Based on Big Seven Personality and Socratic Guidance for Multimodal Scientific Problem Solving
·3857 words·19 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Multimodal Reasoning 🏢 Xi'an Jiaotong University
MAPS solves multimodal scientific problems better by combining multiple agents and Socratic learning.
LEMMA: Learning from Errors for MatheMatical Advancement in LLMs
·4802 words·23 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University
LEMMA: LLMs learn math via mistake analysis and correction, boosting performance without external critics.
Buy Me A Coffee