🏢 NYU
PILAF: Optimal Human Preference Sampling for Reward Modeling
·2374 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 NYU
PILAF optimizes human feedback in reward modeling for better LLM alignment by using a novel response sampling strategy that aligns reward modeling with value optimization.
WILDCHAT-50M: A Deep Dive Into the Role of Synthetic Data in Post-Training
·3471 words·17 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 NYU
WILDCHAT-50M: Largest public chat dataset refines LLM post-training, showing superior SFT performance with fewer samples.
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps
·5585 words·27 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 NYU
Boosting diffusion model performance at inference time, this research introduces a novel framework that goes beyond simply increasing denoising steps. By cleverly searching for better noise candidates…