↓Skip to main content

🏢 NYU

PILAF: Optimal Human Preference Sampling for Reward Modeling

6 February 2025·2374 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 NYU

PILAF optimizes human feedback in reward modeling for better LLM alignment by using a novel response sampling strategy that aligns reward modeling with value optimization.

WILDCHAT-50M: A Deep Dive Into the Role of Synthetic Data in Post-Training

30 January 2025·3471 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 NYU

WILDCHAT-50M: Largest public chat dataset refines LLM post-training, showing superior SFT performance with fewer samples.

Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps

16 January 2025·5585 words·27 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 NYU

Boosting diffusion model performance at inference time, this research introduces a novel framework that goes beyond simply increasing denoising steps. By cleverly searching for better noise candidates…