Skip to main content

🏢 NYU

PILAF: Optimal Human Preference Sampling for Reward Modeling
·2374 words·12 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 NYU
PILAF optimizes human feedback in reward modeling for better LLM alignment by using a novel response sampling strategy that aligns reward modeling with value optimization.
WILDCHAT-50M: A Deep Dive Into the Role of Synthetic Data in Post-Training
·3471 words·17 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 NYU
WILDCHAT-50M: Largest public chat dataset refines LLM post-training, showing superior SFT performance with fewer samples.
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps
·5585 words·27 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 NYU
Boosting diffusion model performance at inference time, this research introduces a novel framework that goes beyond simply increasing denoising steps. By cleverly searching for better noise candidates…