2025-01-17s

Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models

16 January 2025·1945 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University

This survey paper explores the exciting new frontier of Large Reasoning Models (LRMs), focusing on how reinforcement learning and clever prompting techniques are boosting LLMs’ reasoning capabilities.

SynthLight: Portrait Relighting with Diffusion Model by Learning to Re-render Synthetic Faces

16 January 2025·2347 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Yale University

SynthLight: A novel diffusion model relights portraits realistically by learning to re-render synthetic faces, generalizing remarkably well to real photographs.

Learnings from Scaling Visual Tokenizers for Reconstruction and Generation

16 January 2025·4248 words·20 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Meta

Scaling visual tokenizers dramatically improves image and video generation, achieving state-of-the-art results and outperforming existing methods with fewer computations by focusing on decoder scaling…

Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps

16 January 2025·5585 words·27 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 NYU

Boosting diffusion model performance at inference time, this research introduces a novel framework that goes beyond simply increasing denoising steps. By cleverly searching for better noise candidates…

FAST: Efficient Action Tokenization for Vision-Language-Action Models

16 January 2025·4290 words·21 mins· loading · loading

AI Generated 🤗 Daily Papers AI Applications Robotics 🏢 UC Berkeley

FAST: A novel action tokenization method using discrete cosine transform drastically improves autoregressive vision-language-action models’ training and performance, enabling dexterous and high-freque…

Exploring the Inquiry-Diagnosis Relationship with Advanced Patient Simulators

16 January 2025·2252 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Dialogue Systems 🏢 Baichuan Inc.

AI-powered medical consultations often struggle with the inquiry phase. This paper presents a novel patient simulator trained on real interactions, revealing that effective inquiry significantly impac…

CaPa: Carve-n-Paint Synthesis for Efficient 4K Textured Mesh Generation

16 January 2025·3330 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Graphics AI Lab, NC Research

CaPa: Carve-n-Paint Synthesis generates hyper-realistic 4K textured meshes in under 30 seconds, setting a new standard for efficient 3D asset creation.

AnyStory: Towards Unified Single and Multiple Subject Personalization in Text-to-Image Generation

16 January 2025·2125 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Alibaba Tongyi Lab

AnyStory: A unified framework enables high-fidelity personalized image generation for single and multiple subjects, addressing subject fidelity challenges in existing methods.

RLHS: Mitigating Misalignment in RLHF with Hindsight Simulation

15 January 2025·5724 words·27 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Princeton University

RLHS, a novel alignment algorithm, leverages simulated hindsight feedback to mitigate misalignment in RLHF, significantly improving AI’s alignment with human values and goals.

Do generative video models learn physical principles from watching videos?

14 January 2025·3121 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Google DeepMind

Generative video models struggle to understand physics despite producing visually realistic videos; Physics-IQ benchmark reveals this critical limitation, highlighting the need for improved physical r…