Skip to main content

🏢 Artificial Intelligence Research Laboratory, Pennsylvania State University

Cal-DPO: Calibrated Direct Preference Optimization for Language Model Alignment
·2104 words·10 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Artificial Intelligence Research Laboratory, Pennsylvania State University
Cal-DPO calibrates implicit rewards in contrastive preference learning, dramatically improving large language model alignment with human preferences.