🏢 Carnegie Mellon University
VLM Agents Generate Their Own Memories: Distilling Experience into Embodied Programs of Thought
·2266 words·11 mins·
loading
·
loading
Multimodal Learning
Vision-Language Models
🏢 Carnegie Mellon University
VLMs learn to generate their own memories by abstracting experiences from noisy demonstrations and human feedback, significantly boosting in-context learning performance.
Visual Data Diagnosis and Debiasing with Concept Graphs
·2767 words·13 mins·
loading
·
loading
Computer Vision
Image Classification
🏢 Carnegie Mellon University
CONBIAS tackles dataset bias by representing visual data as concept graphs, diagnosing imbalances via clique analysis, and debiasing through targeted data augmentation for improved model generalizatio…
Understanding Hallucinations in Diffusion Models through Mode Interpolation
·2934 words·14 mins·
loading
·
loading
Computer Vision
Image Generation
🏢 Carnegie Mellon University
Diffusion models generate unrealistic images by smoothly interpolating between data modes; this paper identifies this ‘mode interpolation’ failure and proposes a metric to detect and reduce it.
Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models
·2720 words·13 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 Carnegie Mellon University
MOHAWK: Distilling Transformers’ quadratic knowledge into faster subquadratic SSMs, achieving state-of-the-art performance with <1% of training data!
Towards Understanding Extrapolation: a Causal Lens
·2076 words·10 mins·
loading
·
loading
Machine Learning
Transfer Learning
🏢 Carnegie Mellon University
This work unveils a causal lens on extrapolation, offering theoretical guarantees for accurate predictions on out-of-support data, even with limited target samples.
The Sample-Communication Complexity Trade-off in Federated Q-Learning
·1654 words·8 mins·
loading
·
loading
Reinforcement Learning
🏢 Carnegie Mellon University
Federated Q-learning achieves optimal sample & communication complexities simultaneously via Fed-DVR-Q, a novel algorithm.
The Importance of Online Data: Understanding Preference Fine-tuning via Coverage
·1878 words·9 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Carnegie Mellon University
Hybrid Preference Optimization (HyPO) outperforms existing offline methods for fine-tuning LLMs by leveraging both offline and online data, achieving better performance and efficiency.
Test-Time Adaptation Induces Stronger Accuracy and Agreement-on-the-Line
·2874 words·14 mins·
loading
·
loading
Machine Learning
Few-Shot Learning
🏢 Carnegie Mellon University
Test-time adaptation strengthens the linear correlation between in- and out-of-distribution accuracy, enabling precise OOD performance prediction and hyperparameter optimization without labeled OOD da…
Tactile DreamFusion: Exploiting Tactile Sensing for 3D Generation
·2449 words·12 mins·
loading
·
loading
Multimodal Learning
Vision-Language Models
🏢 Carnegie Mellon University
Tactile DreamFusion: High-resolution tactile sensing enhances 3D generation, creating realistic geometric details previously unattainable.
Synatra: Turning Indirect Knowledge into Direct Demonstrations for Digital Agents at Scale
·2845 words·14 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Carnegie Mellon University
Synatra synthesizes high-quality digital agent training data from online tutorials and web pages, significantly improving agent performance on complex web-based tasks at a fraction of the cost of huma…
Sparse-view Pose Estimation and Reconstruction via Analysis by Generative Synthesis
·2245 words·11 mins·
loading
·
loading
Computer Vision
3D Vision
🏢 Carnegie Mellon University
SparseAGS: High-fidelity 3D reconstruction & camera pose estimation from sparse views via generative synthesis.
Slight Corruption in Pre-training Data Makes Better Diffusion Models
·4250 words·20 mins·
loading
·
loading
Image Generation
🏢 Carnegie Mellon University
Slightly corrupting pre-training data significantly improves diffusion models’ image generation quality, diversity, and fidelity.
SIRIUS : Contexual Sparisty with Correction for Efficient LLMs
·5392 words·26 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 Carnegie Mellon University
SIRIUS: A novel correction mechanism boosts the efficiency of contextually sparse LLMs for complex reasoning tasks, achieving significant latency reduction.
Sequoia: Scalable and Robust Speculative Decoding
·2372 words·12 mins·
loading
·
loading
Large Language Models
🏢 Carnegie Mellon University
SEQUOIA: A novel algorithm boosts Large Language Model (LLM) inference speed by up to 9.5x using a scalable and robust speculative decoding approach!
Sample Complexity of Interventional Causal Representation Learning
·449 words·3 mins·
loading
·
loading
AI Theory
Representation Learning
🏢 Carnegie Mellon University
First finite-sample analysis of interventional causal representation learning shows that surprisingly few samples suffice for accurate graph and latent variable recovery.
S$^{2}$FT: Efficient, Scalable and Generalizable LLM Fine-tuning by Structured Sparsity
·1908 words·9 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Carnegie Mellon University
S2FT: Structured Sparse Fine-Tuning achieves state-of-the-art LLM fine-tuning performance, training efficiency, and inference scalability by selecting sparsely and computing densely.
RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold
·2612 words·13 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 Carnegie Mellon University
Leveraging model-generated synthetic data for LLM finetuning significantly improves efficiency when using both positive and strategically constructed negative examples, resulting in an eight-fold incr…
Rethinking LLM Memorization through the Lens of Adversarial Compression
·2014 words·10 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Carnegie Mellon University
Researchers propose Adversarial Compression Ratio (ACR) to assess LLM memorization, offering an adversarial, flexible, and computationally efficient method for monitoring data misuse and compliance.
Regret Minimization in Stackelberg Games with Side Information
·415 words·2 mins·
loading
·
loading
AI Applications
Security
🏢 Carnegie Mellon University
This research shows how to improve Stackelberg game strategies by considering side information, achieving no-regret learning in online settings with stochastic contexts or followers.
Recursive Introspection: Teaching Language Model Agents How to Self-Improve
·2681 words·13 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Carnegie Mellon University
RISE: Recursive Introspection teaches LLMs to iteratively improve their responses, enabling self-correction and enhanced performance on challenging reasoning tasks.