Skip to main content

🏢 T-Tech

Analyze Feature Flow to Enhance Interpretation and Steering in Language Models
·5882 words·28 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 T-Tech
Researchers unveil a data-free method to visualize and control feature flow in LLMs, enhancing interpretability and enabling targeted model steering.
The Differences Between Direct Alignment Algorithms are a Blur
·3273 words·16 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 T-Tech
Direct alignment algorithms are a blur, but this paper shows how a simple SFT phase and a scaling parameter significantly improve alignment quality, regardless of the specific reward function used.