🏢 Carnegie Mellon University

Fast Best-of-N Decoding via Speculative Rejection

26 September 2024·1456 words·7 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Carnegie Mellon University

Speculative Rejection: A novel algorithm boosts Large Language Model (LLM) alignment by speeding up inference-time alignment by 16-32x!

Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning

26 September 2024·4338 words·21 mins· loading · loading

AI Generated Multimodal Learning Multimodal Reasoning 🏢 Carnegie Mellon University

Emotion-LLaMA: A new multimodal large language model excels at emotion recognition and reasoning, outperforming existing models and leveraging a newly created dataset, MERR.

Efficient LLM Jailbreak via Adaptive Dense-to-sparse Constrained Optimization

26 September 2024·1755 words·9 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Carnegie Mellon University

Adaptive Dense-to-sparse Constrained Optimization (ADC) efficiently jailbreaks LLMs by transforming discrete token optimization into a continuous process, achieving higher success rates than existing …

Efficient $ hi$-Regret Minimization with Low-Degree Swap Deviations in Extensive-Form Games

26 September 2024·570 words·3 mins· loading · loading

AI Generated AI Theory Optimization 🏢 Carnegie Mellon University

New efficient algorithms minimize regret in extensive-form games by cleverly using low-degree swap deviations and a relaxed fixed-point concept, improving correlated equilibrium computation.

Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision

26 September 2024·4056 words·20 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Carnegie Mellon University

AI alignment beyond human supervision is achieved via easy-to-hard generalization: training reward models on easy tasks to effectively evaluate and improve generators on harder tasks, achieving superh…

Dual Prototype Evolving for Test-Time Generalization of Vision-Language Models

26 September 2024·2027 words·10 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 Carnegie Mellon University

Dual Prototype Evolving (DPE) significantly boosts vision-language model generalization by cumulatively learning multi-modal prototypes from unlabeled test data, outperforming current state-of-the-art…

DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos

26 September 2024·2231 words·11 mins· loading · loading

Computer Vision Video Understanding 🏢 Carnegie Mellon University

DreamScene4D generates realistic 3D dynamic multi-object scenes from monocular videos via novel view synthesis, addressing limitations of existing methods with a novel decompose-recompose approach.

Doubly Hierarchical Geometric Representations for Strand-based Human Hairstyle Generation

26 September 2024·2527 words·12 mins· loading · loading

Computer Vision Image Generation 🏢 Carnegie Mellon University

Doubly hierarchical geometric representations enable realistic human hairstyle generation by separating low and high-frequency details in hair strands, resulting in high-quality, detailed virtual hair…

Don't Look Twice: Faster Video Transformers with Run-Length Tokenization

26 September 2024·1863 words·9 mins· loading · loading

Video Understanding 🏢 Carnegie Mellon University

Run-Length Tokenization (RLT) dramatically speeds up video transformer training and inference by efficiently removing redundant video tokens, matching baseline model performance with significant time …

Divergences between Language Models and Human Brains

26 September 2024·2519 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Carnegie Mellon University

Language models struggle with social/emotional intelligence and physical commonsense, unlike human brains. Fine-tuning models on these aspects improves their brain response prediction accuracy.

Diffusion PID: Interpreting Diffusion via Partial Information Decomposition

26 September 2024·5438 words·26 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 Carnegie Mellon University

DiffusionPID unveils the secrets of text-to-image diffusion models by decomposing text prompts into unique, redundant, and synergistic components, providing insights into how individual words and thei…

Data Distribution Valuation

26 September 2024·3717 words·18 mins· loading · loading

AI Theory Valuation 🏢 Carnegie Mellon University

This paper proposes a novel MMD-based method for data distribution valuation, enabling theoretically-principled comparison of data distributions from limited samples, outperforming existing methods in…

Data Attribution for Text-to-Image Models by Unlearning Synthesized Images

26 September 2024·4461 words·21 mins· loading · loading

AI Generated Computer Vision Image Generation 🏢 Carnegie Mellon University

Unlearning synthesized images efficiently reveals influential training data for text-to-image models, improving data attribution accuracy and facilitating better model understanding.

Convergence of $ ext{log}(1/psilon)$ for Gradient-Based Algorithms in Zero-Sum Games without the Condition Number: A Smoothed Analysis

26 September 2024·262 words·2 mins· loading · loading

AI Theory Optimization 🏢 Carnegie Mellon University

Gradient-based methods for solving large zero-sum games achieve polynomial smoothed complexity, demonstrating efficiency even in high-precision scenarios without condition number dependence.

Connecting Joint-Embedding Predictive Architecture with Contrastive Self-supervised Learning

26 September 2024·2598 words·13 mins· loading · loading

Self-Supervised Learning 🏢 Carnegie Mellon University

C-JEPA boosts self-supervised visual learning by integrating contrastive learning with a joint-embedding predictive architecture, enhancing stability and representation quality.

Communication Bounds for the Distributed Experts Problem

26 September 2024·2565 words·13 mins· loading · loading

AI Theory Optimization 🏢 Carnegie Mellon University

This paper presents communication-efficient protocols for the distributed experts problem, achieving near-optimal regret with theoretical and empirical validation.

Causal Temporal Representation Learning with Nonstationary Sparse Transition

26 September 2024·2158 words·11 mins· loading · loading

AI Theory Representation Learning 🏢 Carnegie Mellon University

CtrlNS: A novel framework for causal temporal representation learning tackles the challenge of nonstationary time series by leveraging sparse transition assumptions, achieving improved accuracy in ide…

Causal Inference in the Closed-Loop: Marginal Structural Models for Sequential Excursion Effects

26 September 2024·2206 words·11 mins· loading · loading

AI Theory Causality 🏢 Carnegie Mellon University

Researchers introduce a non-parametric causal inference framework to analyze closed-loop optogenetics designs, revealing previously hidden causal effects of neural circuit manipulations on behavior.

BECAUSE: Bilinear Causal Representation for Generalizable Offline Model-based Reinforcement Learning

26 September 2024·2825 words·14 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Carnegie Mellon University

BECAUSE: a novel algorithm for generalizable offline model-based reinforcement learning that leverages bilinear causal representation to mitigate objective mismatch caused by confounders in offline da…

AutoMix: Automatically Mixing Language Models

26 September 2024·2953 words·14 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Carnegie Mellon University

AutoMix intelligently routes queries to different-sized LLMs based on a smaller model’s self-verification, minimizing cost while maintaining performance.