Skip to main content

🏢 School of Computer Science and Engineering, Sun Yat-Sen University

Weighted-Reward Preference Optimization for Implicit Model Fusion
·4595 words·22 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 School of Computer Science and Engineering, Sun Yat-Sen University
WRPO: Implicitly fuse LLMs, boosting performance without complex alignment or merging!