🏢 Artificial Intelligence Research Laboratory, Pennsylvania State University
Cal-DPO: Calibrated Direct Preference Optimization for Language Model Alignment
·2104 words·10 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Artificial Intelligence Research Laboratory, Pennsylvania State University
Cal-DPO calibrates implicit rewards in contrastive preference learning, dramatically improving large language model alignment with human preferences.