2025-03-31s
2025
Think Before Recommend: Unleashing the Latent Reasoning Power for Sequential Recommendation
·3963 words·19 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Recommender Systems
🏢 Gaoling School of Artificial Intelligence, Renmin University of China
ReaRec: Unleashing latent reasoning power for sequential recommendation through inference-time multi-step reasoning.
Segment Any Motion in Videos
·2413 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Segmentation
🏢 UC Berkeley
New method for moving object segmentation by combining long-range motion cues, semantic features, and SAM2, achieving state-of-the-art performance in challenging scenarios.
ORIGEN: Zero-Shot 3D Orientation Grounding in Text-to-Image Generation
·2259 words·11 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 KAIST
ORIGEN: First zero-shot 3D orientation grounding in text-to-image generation.
Hi3DGen: High-fidelity 3D Geometry Generation from Images via Normal Bridging
·2702 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 Chinese University of Hong Kong, Shenzhen
Hi3DGen: High-fidelity 3D geometry generation from images via normal bridging.
Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback
·3814 words·18 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Reinforcement Learning
🏢 ByteDance Seed
This paper enhances Reinforcement Learning from Human Feedback (RLHF) by tackling reward hacking and response diversity issues through improved data construction methods.
X$^{2}$-Gaussian: 4D Radiative Gaussian Splatting for Continuous-time Tomographic Reconstruction
·2612 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 the Chinese University of Hong Kong
X2-Gaussian enables continuous-time 4D CT reconstruction via dynamic radiative Gaussian splatting and self-supervised respiratory motion learning.
SparseFlex: High-Resolution and Arbitrary-Topology 3D Shape Modeling
·2163 words·11 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 Tsinghua University
SparseFlex: Achieves high-res, arbitrary-topology 3D shape modeling via sparse isosurface representation and sectional voxel training. Revolutionizing 3D generative AI!
ReFeed: Multi-dimensional Summarization Refinement with Reflective Reasoning on Feedback
·7449 words·35 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Text Summarization
🏢 Korea Advanced Institute of Science and Technology (KAIST)
ReFeed enhances multi-dimensional summarization by using reflective reasoning on feedback, mitigating trade-offs between dimensions and improving robustness.
Reconstructing Humans with a Biomechanically Accurate Skeleton
·2828 words·14 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 University of Texas at Austin
HSMR: Reconstructing 3D humans with a biomechanically accurate skeleton model from a single image, enhancing pose realism.
A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond
·2301 words·11 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Shanghai AI Laboratory
Survey on improving efficiency in large reasoning models across language, multimodality, and beyond.
PHYSICS: Benchmarking Foundation Models on University-Level Physics Problem Solving
·2247 words·11 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
AI Applications
Education
🏢 Yale University
PHYSICS: A new benchmark reveals foundation models struggle with university-level physics, highlighting needs for improved reasoning and knowledge integration.
Perceptually Accurate 3D Talking Head Generation: New Definitions, Speech-Mesh Representation, and Evaluation Metrics
·5958 words·28 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Audio-Visual Learning
🏢 Grad. School of AI, POSTECH
New metrics and representation enhance 3D talking head realism by focusing on perceptual lip synchronization.
Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency
·2359 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 Huazhong University of Science and Technology
Free4D: Tuning-free 4D scene generation with spatial-temporal consistency.
4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object Understanding
·4361 words·21 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 King Abdullah University of Science and Technology
4D-Bench: The first benchmark for assessing MLLMs in 4D object understanding, revealing weak temporal understanding and the need for advancements.
MedAgent-Pro: Towards Multi-modal Evidence-based Medical Diagnosis via Reasoning Agentic Workflow
·1815 words·9 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
AI Applications
Healthcare
🏢 National University of Singapore
MedAgent-Pro: An evidence-based reasoning agentic system for reliable multi-modal medical diagnosis.
OThink-MR1: Stimulating multimodal generalized reasoning capabilities via dynamic reinforcement learning
·2043 words·10 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Multimodal Reasoning
🏢 OPPO Research Institute
OThink-MR1 enhances MLLM reasoning via dynamic reinforcement learning, achieving remarkable cross-task generalization!