Skip to main content

2025-03-31s

2025

Think Before Recommend: Unleashing the Latent Reasoning Power for Sequential Recommendation
·3963 words·19 mins· loading · loading
AI Generated 🤗 Daily Papers Machine Learning Recommender Systems 🏢 Gaoling School of Artificial Intelligence, Renmin University of China
ReaRec: Unleashing latent reasoning power for sequential recommendation through inference-time multi-step reasoning.
Segment Any Motion in Videos
·2413 words·12 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Segmentation 🏢 UC Berkeley
New method for moving object segmentation by combining long-range motion cues, semantic features, and SAM2, achieving state-of-the-art performance in challenging scenarios.
ORIGEN: Zero-Shot 3D Orientation Grounding in Text-to-Image Generation
·2259 words·11 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 KAIST
ORIGEN: First zero-shot 3D orientation grounding in text-to-image generation.
Hi3DGen: High-fidelity 3D Geometry Generation from Images via Normal Bridging
·2702 words·13 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Chinese University of Hong Kong, Shenzhen
Hi3DGen: High-fidelity 3D geometry generation from images via normal bridging.
Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback
·3814 words·18 mins· loading · loading
AI Generated 🤗 Daily Papers Machine Learning Reinforcement Learning 🏢 ByteDance Seed
This paper enhances Reinforcement Learning from Human Feedback (RLHF) by tackling reward hacking and response diversity issues through improved data construction methods.
X$^{2}$-Gaussian: 4D Radiative Gaussian Splatting for Continuous-time Tomographic Reconstruction
·2612 words·13 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 the Chinese University of Hong Kong
X2-Gaussian enables continuous-time 4D CT reconstruction via dynamic radiative Gaussian splatting and self-supervised respiratory motion learning.
SparseFlex: High-Resolution and Arbitrary-Topology 3D Shape Modeling
·2163 words·11 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Tsinghua University
SparseFlex: Achieves high-res, arbitrary-topology 3D shape modeling via sparse isosurface representation and sectional voxel training. Revolutionizing 3D generative AI!
ReFeed: Multi-dimensional Summarization Refinement with Reflective Reasoning on Feedback
·7449 words·35 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Text Summarization 🏢 Korea Advanced Institute of Science and Technology (KAIST)
ReFeed enhances multi-dimensional summarization by using reflective reasoning on feedback, mitigating trade-offs between dimensions and improving robustness.
Reconstructing Humans with a Biomechanically Accurate Skeleton
·2828 words·14 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 University of Texas at Austin
HSMR: Reconstructing 3D humans with a biomechanically accurate skeleton model from a single image, enhancing pose realism.
A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond
·2301 words·11 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Shanghai AI Laboratory
Survey on improving efficiency in large reasoning models across language, multimodality, and beyond.
PHYSICS: Benchmarking Foundation Models on University-Level Physics Problem Solving
·2247 words·11 mins· loading · loading
AI Generated 🤗 Daily Papers AI Applications Education 🏢 Yale University
PHYSICS: A new benchmark reveals foundation models struggle with university-level physics, highlighting needs for improved reasoning and knowledge integration.
Perceptually Accurate 3D Talking Head Generation: New Definitions, Speech-Mesh Representation, and Evaluation Metrics
·5958 words·28 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Audio-Visual Learning 🏢 Grad. School of AI, POSTECH
New metrics and representation enhance 3D talking head realism by focusing on perceptual lip synchronization.
Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency
·2359 words·12 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Huazhong University of Science and Technology
Free4D: Tuning-free 4D scene generation with spatial-temporal consistency.
4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object Understanding
·4361 words·21 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 King Abdullah University of Science and Technology
4D-Bench: The first benchmark for assessing MLLMs in 4D object understanding, revealing weak temporal understanding and the need for advancements.
MedAgent-Pro: Towards Multi-modal Evidence-based Medical Diagnosis via Reasoning Agentic Workflow
·1815 words·9 mins· loading · loading
AI Generated 🤗 Daily Papers AI Applications Healthcare 🏢 National University of Singapore
MedAgent-Pro: An evidence-based reasoning agentic system for reliable multi-modal medical diagnosis.
OThink-MR1: Stimulating multimodal generalized reasoning capabilities via dynamic reinforcement learning
·2043 words·10 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Multimodal Reasoning 🏢 OPPO Research Institute
OThink-MR1 enhances MLLM reasoning via dynamic reinforcement learning, achieving remarkable cross-task generalization!