Skip to main content

🏢 Carnegie Mellon University

Fast Best-of-N Decoding via Speculative Rejection
·1456 words·7 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Carnegie Mellon University
Speculative Rejection: A novel algorithm boosts Large Language Model (LLM) alignment by speeding up inference-time alignment by 16-32x!
Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning
·4338 words·21 mins· loading · loading
AI Generated Multimodal Learning Multimodal Reasoning 🏢 Carnegie Mellon University
Emotion-LLaMA: A new multimodal large language model excels at emotion recognition and reasoning, outperforming existing models and leveraging a newly created dataset, MERR.
Efficient LLM Jailbreak via Adaptive Dense-to-sparse Constrained Optimization
·1755 words·9 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Carnegie Mellon University
Adaptive Dense-to-sparse Constrained Optimization (ADC) efficiently jailbreaks LLMs by transforming discrete token optimization into a continuous process, achieving higher success rates than existing …
Efficient $
hi$-Regret Minimization with Low-Degree Swap Deviations in Extensive-Form Games
·570 words·3 mins· loading · loading
AI Generated AI Theory Optimization 🏢 Carnegie Mellon University
New efficient algorithms minimize regret in extensive-form games by cleverly using low-degree swap deviations and a relaxed fixed-point concept, improving correlated equilibrium computation.
Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision
·4056 words·20 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Carnegie Mellon University
AI alignment beyond human supervision is achieved via easy-to-hard generalization: training reward models on easy tasks to effectively evaluate and improve generators on harder tasks, achieving superh…
Dual Prototype Evolving for Test-Time Generalization of Vision-Language Models
·2027 words·10 mins· loading · loading
Multimodal Learning Vision-Language Models 🏢 Carnegie Mellon University
Dual Prototype Evolving (DPE) significantly boosts vision-language model generalization by cumulatively learning multi-modal prototypes from unlabeled test data, outperforming current state-of-the-art…
DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos
·2231 words·11 mins· loading · loading
Computer Vision Video Understanding 🏢 Carnegie Mellon University
DreamScene4D generates realistic 3D dynamic multi-object scenes from monocular videos via novel view synthesis, addressing limitations of existing methods with a novel decompose-recompose approach.
Doubly Hierarchical Geometric Representations for Strand-based Human Hairstyle Generation
·2527 words·12 mins· loading · loading
Computer Vision Image Generation 🏢 Carnegie Mellon University
Doubly hierarchical geometric representations enable realistic human hairstyle generation by separating low and high-frequency details in hair strands, resulting in high-quality, detailed virtual hair…
Don't Look Twice: Faster Video Transformers with Run-Length Tokenization
·1863 words·9 mins· loading · loading
Video Understanding 🏢 Carnegie Mellon University
Run-Length Tokenization (RLT) dramatically speeds up video transformer training and inference by efficiently removing redundant video tokens, matching baseline model performance with significant time …
Divergences between Language Models and Human Brains
·2519 words·12 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Carnegie Mellon University
Language models struggle with social/emotional intelligence and physical commonsense, unlike human brains. Fine-tuning models on these aspects improves their brain response prediction accuracy.
Diffusion PID: Interpreting Diffusion via Partial Information Decomposition
·5438 words·26 mins· loading · loading
Multimodal Learning Vision-Language Models 🏢 Carnegie Mellon University
DiffusionPID unveils the secrets of text-to-image diffusion models by decomposing text prompts into unique, redundant, and synergistic components, providing insights into how individual words and thei…
Data Distribution Valuation
·3717 words·18 mins· loading · loading
AI Theory Valuation 🏢 Carnegie Mellon University
This paper proposes a novel MMD-based method for data distribution valuation, enabling theoretically-principled comparison of data distributions from limited samples, outperforming existing methods in…
Data Attribution for Text-to-Image Models by Unlearning Synthesized Images
·4461 words·21 mins· loading · loading
AI Generated Computer Vision Image Generation 🏢 Carnegie Mellon University
Unlearning synthesized images efficiently reveals influential training data for text-to-image models, improving data attribution accuracy and facilitating better model understanding.
Convergence of $ ext{log}(1/psilon)$ for Gradient-Based Algorithms in Zero-Sum Games without the Condition Number: A Smoothed Analysis
·262 words·2 mins· loading · loading
AI Theory Optimization 🏢 Carnegie Mellon University
Gradient-based methods for solving large zero-sum games achieve polynomial smoothed complexity, demonstrating efficiency even in high-precision scenarios without condition number dependence.
Connecting Joint-Embedding Predictive Architecture with Contrastive Self-supervised Learning
·2598 words·13 mins· loading · loading
Self-Supervised Learning 🏢 Carnegie Mellon University
C-JEPA boosts self-supervised visual learning by integrating contrastive learning with a joint-embedding predictive architecture, enhancing stability and representation quality.
Communication Bounds for the Distributed Experts Problem
·2565 words·13 mins· loading · loading
AI Theory Optimization 🏢 Carnegie Mellon University
This paper presents communication-efficient protocols for the distributed experts problem, achieving near-optimal regret with theoretical and empirical validation.
Causal Temporal Representation Learning with Nonstationary Sparse Transition
·2158 words·11 mins· loading · loading
AI Theory Representation Learning 🏢 Carnegie Mellon University
CtrlNS: A novel framework for causal temporal representation learning tackles the challenge of nonstationary time series by leveraging sparse transition assumptions, achieving improved accuracy in ide…
Causal Inference in the Closed-Loop: Marginal Structural Models for Sequential Excursion Effects
·2206 words·11 mins· loading · loading
AI Theory Causality 🏢 Carnegie Mellon University
Researchers introduce a non-parametric causal inference framework to analyze closed-loop optogenetics designs, revealing previously hidden causal effects of neural circuit manipulations on behavior.
BECAUSE: Bilinear Causal Representation for Generalizable Offline Model-based Reinforcement Learning
·2825 words·14 mins· loading · loading
Machine Learning Reinforcement Learning 🏢 Carnegie Mellon University
BECAUSE: a novel algorithm for generalizable offline model-based reinforcement learning that leverages bilinear causal representation to mitigate objective mismatch caused by confounders in offline da…
AutoMix: Automatically Mixing Language Models
·2953 words·14 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Carnegie Mellon University
AutoMix intelligently routes queries to different-sized LLMs based on a smaller model’s self-verification, minimizing cost while maintaining performance.