↓Skip to main content

🏢 T-Tech

Analyze Feature Flow to Enhance Interpretation and Steering in Language Models

5 February 2025·5882 words·28 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 T-Tech

Researchers unveil a data-free method to visualize and control feature flow in LLMs, enhancing interpretability and enabling targeted model steering.

The Differences Between Direct Alignment Algorithms are a Blur

3 February 2025·3273 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 T-Tech

Direct alignment algorithms are a blur, but this paper shows how a simple SFT phase and a scaling parameter significantly improve alignment quality, regardless of the specific reward function used.