Reinforcement Learning
Multi-Agent Domain Calibration with a Handful of Offline Data
·2343 words·11 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 National Key Laboratory of Novel Software Technology
Madoc: A novel multi-agent framework calibrates RL policies for new environments using limited offline data, achieving superior performance in various locomotion tasks.
Multi-Agent Coordination via Multi-Level Communication
·1851 words·9 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 Peking University
SeqComm, a novel multi-level communication scheme, tackles multi-agent coordination by leveraging asynchronous decision-making and a two-phase communication process for improved efficiency and theoret…
Model-free Low-Rank Reinforcement Learning via Leveraged Entry-wise Matrix Estimation
·1610 words·8 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 KTH
LoRa-PI: a model-free RL algorithm learns and exploits low-rank MDP structures for order-optimal sample complexity, achieving ε-optimal policies with O(poly(A)) samples.
Mitigating Partial Observability in Decision Processes via the Lambda Discrepancy
·2495 words·12 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 UC Berkeley
New metric, λ-discrepancy, precisely detects & mitigates partial observability in sequential decision processes, significantly boosting reinforcement learning agent performance.
Mitigating Covariate Shift in Behavioral Cloning via Robust Stationary Distribution Correction
·2650 words·13 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 KAIST
DrilDICE robustly tackles covariate shift in offline imitation learning by using a stationary distribution correction and a distributionally robust objective, significantly improving performance.
Minimax Optimal and Computationally Efficient Algorithms for Distributionally Robust Offline Reinforcement Learning
·1835 words·9 mins·
loading
·
loading
AI Generated
Machine Learning
Reinforcement Learning
🏢 Duke University
Minimax-optimal, computationally efficient algorithms are proposed for distributionally robust offline reinforcement learning, addressing challenges posed by function approximation and model uncertain…
Mimicking To Dominate: Imitation Learning Strategies for Success in Multiagent Games
·1956 words·10 mins·
loading
·
loading
AI Generated
Machine Learning
Reinforcement Learning
🏢 Singapore Management University
IMAX-PPO: A novel multi-agent RL algorithm leveraging imitation learning to predict opponent actions, achieving superior performance in complex games.
MetaCURL: Non-stationary Concave Utility Reinforcement Learning
·362 words·2 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 Inria
MetaCURL: First algorithm for non-stationary Concave Utility Reinforcement Learning (CURL), achieving near-optimal dynamic regret by using a meta-algorithm and sleeping experts framework.
Meta-Reinforcement Learning with Universal Policy Adaptation: Provable Near-Optimality under All-task Optimum Comparator
·1665 words·8 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 Pennsylvania State University
Provable near-optimality in meta-RL is achieved using a novel bilevel optimization framework and universal policy adaptation algorithm.
Meta-DT: Offline Meta-RL as Conditional Sequence Modeling with World Model Disentanglement
·4081 words·20 mins·
loading
·
loading
AI Generated
Machine Learning
Reinforcement Learning
🏢 Nanjing University
Meta-DT: Offline meta-RL masters unseen tasks via conditional sequence modeling and world model disentanglement, showcasing superior few-shot and zero-shot generalization.
Meta-Controller: Few-Shot Imitation of Unseen Embodiments and Tasks in Continuous Control
·3389 words·16 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 School of Computing, KAIST
Meta-Controller: A novel few-shot behavior cloning framework enables robots to generalize to unseen embodiments and tasks using only a few reward-free demonstrations, showcasing superior few-shot gene…
Measuring Mutual Policy Divergence for Multi-Agent Sequential Exploration
·2042 words·10 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 Xi'an Jiaotong University
MADPO, a novel MARL framework, uses mutual policy divergence maximization with conditional Cauchy-Schwarz divergence to enhance exploration and agent heterogeneity in sequential updating, outperformin…
Maximum Entropy Reinforcement Learning via Energy-Based Normalizing Flow
·2836 words·14 mins·
loading
·
loading
AI Generated
Machine Learning
Reinforcement Learning
🏢 NVIDIA Corporation
MEow, a novel MaxEnt RL framework, achieves superior performance by unifying policy evaluation and improvement steps, enabling exact soft value function calculation without Monte Carlo approximation.
Maximum Entropy Inverse Reinforcement Learning of Diffusion Models with Energy-Based Models
·2261 words·11 mins·
loading
·
loading
Reinforcement Learning
🏢 Korea Institute for Advanced Study
Boosting diffusion model sample quality, especially with few steps, is achieved via a novel maximum entropy inverse reinforcement learning approach, jointly training the model and an energy-based mode…
Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement Learning
·3140 words·15 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 MoE Key Lab of Artificial Intelligence
CoWorld: a novel model-based RL approach tackles offline visual RL challenges by using online simulators as testbeds, enabling flexible value estimation & mitigating overestimation bias for effective …
Maia-2: A Unified Model for Human-AI Alignment in Chess
·2577 words·13 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 University of Toronto
Maia-2: A unified model for human-AI alignment in chess, coherently captures human play across skill levels, significantly improving AI-human alignment and paving the way for AI-guided teaching.
MADiff: Offline Multi-agent Learning with Diffusion Models
·2719 words·13 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 Shanghai Jiao Tong University
MADIFF: Offline multi-agent learning uses attention-based diffusion models to achieve effective coordination and teammate modeling, outperforming existing methods.
Logarithmic Smoothing for Pessimistic Off-Policy Evaluation, Selection and Learning
·2182 words·11 mins·
loading
·
loading
Reinforcement Learning
🏢 Criteo AI Lab
Logarithmic Smoothing enhances pessimistic offline contextual bandit algorithms by providing tighter concentration bounds for improved policy evaluation, selection and learning.
Local Linearity: the Key for No-regret Reinforcement Learning in Continuous MDPs
·451 words·3 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 Politecnico Di Milano
CINDERELLA: a new algorithm achieves state-of-the-art no-regret bounds for continuous RL problems by exploiting local linearity.
Local Anti-Concentration Class: Logarithmic Regret for Greedy Linear Contextual Bandit
·2759 words·13 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 Columbia University
Greedy algorithms for linear contextual bandits achieve poly-logarithmic regret under the novel Local Anti-Concentration condition, expanding applicable distributions beyond Gaussians and uniforms.