🏢 School of Computer Science and Engineering, Sun Yat-Sen University
Weighted-Reward Preference Optimization for Implicit Model Fusion
·4595 words·22 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 School of Computer Science and Engineering, Sun Yat-Sen University
WRPO: Implicitly fuse LLMs, boosting performance without complex alignment or merging!