🏢 Carnegie Mellon University

VLM Agents Generate Their Own Memories: Distilling Experience into Embodied Programs of Thought

26 September 2024·2266 words·11 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 Carnegie Mellon University

VLMs learn to generate their own memories by abstracting experiences from noisy demonstrations and human feedback, significantly boosting in-context learning performance.

Visual Data Diagnosis and Debiasing with Concept Graphs

26 September 2024·2767 words·13 mins· loading · loading

Computer Vision Image Classification 🏢 Carnegie Mellon University

CONBIAS tackles dataset bias by representing visual data as concept graphs, diagnosing imbalances via clique analysis, and debiasing through targeted data augmentation for improved model generalizatio…

Understanding Hallucinations in Diffusion Models through Mode Interpolation

26 September 2024·2934 words·14 mins· loading · loading

Computer Vision Image Generation 🏢 Carnegie Mellon University

Diffusion models generate unrealistic images by smoothly interpolating between data modes; this paper identifies this ‘mode interpolation’ failure and proposes a metric to detect and reduce it.

Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models

26 September 2024·2720 words·13 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Carnegie Mellon University

MOHAWK: Distilling Transformers’ quadratic knowledge into faster subquadratic SSMs, achieving state-of-the-art performance with <1% of training data!

Towards Understanding Extrapolation: a Causal Lens

26 September 2024·2076 words·10 mins· loading · loading

Machine Learning Transfer Learning 🏢 Carnegie Mellon University

This work unveils a causal lens on extrapolation, offering theoretical guarantees for accurate predictions on out-of-support data, even with limited target samples.

The Sample-Communication Complexity Trade-off in Federated Q-Learning

26 September 2024·1654 words·8 mins· loading · loading

Reinforcement Learning 🏢 Carnegie Mellon University

Federated Q-learning achieves optimal sample & communication complexities simultaneously via Fed-DVR-Q, a novel algorithm.

The Importance of Online Data: Understanding Preference Fine-tuning via Coverage

26 September 2024·1878 words·9 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Carnegie Mellon University

Hybrid Preference Optimization (HyPO) outperforms existing offline methods for fine-tuning LLMs by leveraging both offline and online data, achieving better performance and efficiency.

Test-Time Adaptation Induces Stronger Accuracy and Agreement-on-the-Line

26 September 2024·2874 words·14 mins· loading · loading

Machine Learning Few-Shot Learning 🏢 Carnegie Mellon University

Test-time adaptation strengthens the linear correlation between in- and out-of-distribution accuracy, enabling precise OOD performance prediction and hyperparameter optimization without labeled OOD da…

Tactile DreamFusion: Exploiting Tactile Sensing for 3D Generation

26 September 2024·2449 words·12 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 Carnegie Mellon University

Tactile DreamFusion: High-resolution tactile sensing enhances 3D generation, creating realistic geometric details previously unattainable.

Synatra: Turning Indirect Knowledge into Direct Demonstrations for Digital Agents at Scale

26 September 2024·2845 words·14 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Carnegie Mellon University

Synatra synthesizes high-quality digital agent training data from online tutorials and web pages, significantly improving agent performance on complex web-based tasks at a fraction of the cost of huma…

Sparse-view Pose Estimation and Reconstruction via Analysis by Generative Synthesis

26 September 2024·2245 words·11 mins· loading · loading

Computer Vision 3D Vision 🏢 Carnegie Mellon University

SparseAGS: High-fidelity 3D reconstruction & camera pose estimation from sparse views via generative synthesis.

Slight Corruption in Pre-training Data Makes Better Diffusion Models

26 September 2024·4250 words·20 mins· loading · loading

Image Generation 🏢 Carnegie Mellon University

Slightly corrupting pre-training data significantly improves diffusion models’ image generation quality, diversity, and fidelity.

SIRIUS : Contexual Sparisty with Correction for Efficient LLMs

26 September 2024·5392 words·26 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Carnegie Mellon University

SIRIUS: A novel correction mechanism boosts the efficiency of contextually sparse LLMs for complex reasoning tasks, achieving significant latency reduction.

Sequoia: Scalable and Robust Speculative Decoding

26 September 2024·2372 words·12 mins· loading · loading

Large Language Models 🏢 Carnegie Mellon University

SEQUOIA: A novel algorithm boosts Large Language Model (LLM) inference speed by up to 9.5x using a scalable and robust speculative decoding approach!

Sample Complexity of Interventional Causal Representation Learning

26 September 2024·449 words·3 mins· loading · loading

AI Theory Representation Learning 🏢 Carnegie Mellon University

First finite-sample analysis of interventional causal representation learning shows that surprisingly few samples suffice for accurate graph and latent variable recovery.

S$^{2}$FT: Efficient, Scalable and Generalizable LLM Fine-tuning by Structured Sparsity

26 September 2024·1908 words·9 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Carnegie Mellon University

S2FT: Structured Sparse Fine-Tuning achieves state-of-the-art LLM fine-tuning performance, training efficiency, and inference scalability by selecting sparsely and computing densely.

RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold

26 September 2024·2612 words·13 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Carnegie Mellon University

Leveraging model-generated synthetic data for LLM finetuning significantly improves efficiency when using both positive and strategically constructed negative examples, resulting in an eight-fold incr…

Rethinking LLM Memorization through the Lens of Adversarial Compression

26 September 2024·2014 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Carnegie Mellon University

Researchers propose Adversarial Compression Ratio (ACR) to assess LLM memorization, offering an adversarial, flexible, and computationally efficient method for monitoring data misuse and compliance.

Regret Minimization in Stackelberg Games with Side Information

26 September 2024·415 words·2 mins· loading · loading

AI Applications Security 🏢 Carnegie Mellon University

This research shows how to improve Stackelberg game strategies by considering side information, achieving no-regret learning in online settings with stochastic contexts or followers.

Recursive Introspection: Teaching Language Model Agents How to Self-Improve

26 September 2024·2681 words·13 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Carnegie Mellon University

RISE: Recursive Introspection teaches LLMs to iteratively improve their responses, enabling self-correction and enhanced performance on challenging reasoning tasks.