2025-03-31s

Think Before Recommend: Unleashing the Latent Reasoning Power for Sequential Recommendation

28 March 2025·3963 words·19 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Recommender Systems 🏢 Gaoling School of Artificial Intelligence, Renmin University of China

ReaRec: Unleashing latent reasoning power for sequential recommendation through inference-time multi-step reasoning.

Segment Any Motion in Videos

28 March 2025·2413 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Segmentation 🏢 UC Berkeley

New method for moving object segmentation by combining long-range motion cues, semantic features, and SAM2, achieving state-of-the-art performance in challenging scenarios.

ORIGEN: Zero-Shot 3D Orientation Grounding in Text-to-Image Generation

28 March 2025·2259 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 KAIST

ORIGEN: First zero-shot 3D orientation grounding in text-to-image generation.

Hi3DGen: High-fidelity 3D Geometry Generation from Images via Normal Bridging

28 March 2025·2702 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Chinese University of Hong Kong, Shenzhen

Hi3DGen: High-fidelity 3D geometry generation from images via normal bridging.

Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback

28 March 2025·3814 words·18 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Reinforcement Learning 🏢 ByteDance Seed

This paper enhances Reinforcement Learning from Human Feedback (RLHF) by tackling reward hacking and response diversity issues through improved data construction methods.

X$^{2}$-Gaussian: 4D Radiative Gaussian Splatting for Continuous-time Tomographic Reconstruction

27 March 2025·2612 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 the Chinese University of Hong Kong

X2-Gaussian enables continuous-time 4D CT reconstruction via dynamic radiative Gaussian splatting and self-supervised respiratory motion learning.

SparseFlex: High-Resolution and Arbitrary-Topology 3D Shape Modeling

27 March 2025·2163 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Tsinghua University

SparseFlex: Achieves high-res, arbitrary-topology 3D shape modeling via sparse isosurface representation and sectional voxel training. Revolutionizing 3D generative AI!

ReFeed: Multi-dimensional Summarization Refinement with Reflective Reasoning on Feedback

27 March 2025·7449 words·35 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Text Summarization 🏢 Korea Advanced Institute of Science and Technology (KAIST)

ReFeed enhances multi-dimensional summarization by using reflective reasoning on feedback, mitigating trade-offs between dimensions and improving robustness.

Reconstructing Humans with a Biomechanically Accurate Skeleton

27 March 2025·2828 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 University of Texas at Austin

HSMR: Reconstructing 3D humans with a biomechanically accurate skeleton model from a single image, enhancing pose realism.

A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond

27 March 2025·2301 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Shanghai AI Laboratory

Survey on improving efficiency in large reasoning models across language, multimodality, and beyond.

PHYSICS: Benchmarking Foundation Models on University-Level Physics Problem Solving

26 March 2025·2247 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers AI Applications Education 🏢 Yale University

PHYSICS: A new benchmark reveals foundation models struggle with university-level physics, highlighting needs for improved reasoning and knowledge integration.

Perceptually Accurate 3D Talking Head Generation: New Definitions, Speech-Mesh Representation, and Evaluation Metrics

26 March 2025·5958 words·28 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Audio-Visual Learning 🏢 Grad. School of AI, POSTECH

New metrics and representation enhance 3D talking head realism by focusing on perceptual lip synchronization.

Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency

26 March 2025·2359 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Huazhong University of Science and Technology

Free4D: Tuning-free 4D scene generation with spatial-temporal consistency.

4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object Understanding

22 March 2025·4361 words·21 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 King Abdullah University of Science and Technology

4D-Bench: The first benchmark for assessing MLLMs in 4D object understanding, revealing weak temporal understanding and the need for advancements.

MedAgent-Pro: Towards Multi-modal Evidence-based Medical Diagnosis via Reasoning Agentic Workflow

21 March 2025·1815 words·9 mins· loading · loading

AI Generated 🤗 Daily Papers AI Applications Healthcare 🏢 National University of Singapore

MedAgent-Pro: An evidence-based reasoning agentic system for reliable multi-modal medical diagnosis.

OThink-MR1: Stimulating multimodal generalized reasoning capabilities via dynamic reinforcement learning

20 March 2025·2043 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Multimodal Reasoning 🏢 OPPO Research Institute

OThink-MR1 enhances MLLM reasoning via dynamic reinforcement learning, achieving remarkable cross-task generalization!