Reinforcement Learning
RA-PbRL: Provably Efficient Risk-Aware Preference-Based Reinforcement Learning
·1598 words·8 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 UC San Diego
RA-PbRL introduces a provably efficient algorithm for risk-aware preference-based reinforcement learning, addressing the limitations of existing risk-neutral methods in applications demanding heighten…
QGFN: Controllable Greediness with Action Values
·3928 words·19 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 Hong Kong University of Science and Technology
QGFN boosts Generative Flow Networks (GFNs) by cleverly combining their sampling policy with an action-value estimate, creating controllable and efficient generation of high-reward samples.
Q-Distribution guided Q-learning for offline reinforcement learning: Uncertainty penalized Q-value via consistency model
·4297 words·21 mins·
loading
·
loading
AI Generated
Machine Learning
Reinforcement Learning
🏢 Hong Kong University of Science and Technology
Offline RL struggles with OOD action overestimation. QDQ tackles this by penalizing uncertain Q-values using a consistency model, enhancing offline RL performance.
Provably Efficient Reinforcement Learning with Multinomial Logit Function Approximation
·328 words·2 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 National Key Laboratory for Novel Software Technology, Nanjing University
This paper presents novel RL algorithms using multinomial logit function approximation, achieving O(1) computation and storage while nearly closing the regret gap with linear methods.
Provably Efficient Interactive-Grounded Learning with Personalized Reward
·427 words·3 mins·
loading
·
loading
AI Generated
Machine Learning
Reinforcement Learning
🏢 University of Iowa
Provably efficient algorithms are introduced for interaction-grounded learning (IGL) with context-dependent feedback, addressing the lack of theoretical guarantees in existing approaches for personali…
Provably and Practically Efficient Adversarial Imitation Learning with General Function Approximation
·2037 words·10 mins·
loading
·
loading
AI Generated
Machine Learning
Reinforcement Learning
🏢 Polixir.ai
OPT-AIL: Provably efficient adversarial imitation learning with general function approximation, achieving polynomial sample and interaction complexity, outperforming existing deep AIL methods.
Provable Partially Observable Reinforcement Learning with Privileged Information
·452 words·3 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 Yale University
This paper provides the first provable efficiency guarantees for practically-used RL algorithms leveraging privileged information, addressing limitations of previous empirical paradigms and opening ne…
Prospective Learning: Learning for a Dynamic Future
·2230 words·11 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 John Hopkins University
Prospective Learning: a new framework enabling machines to learn effectively in dynamic environments where data distributions and goals shift over time.
Preference-based Pure Exploration
·357 words·2 mins·
loading
·
loading
AI Generated
Machine Learning
Reinforcement Learning
🏢 University of Michigan
PreTS algorithm efficiently identifies the most preferred policy in bandit problems with vector-valued rewards, achieving asymptotically optimal sample complexity.
Preference Learning of Latent Decision Utilities with a Human-like Model of Preferential Choice
·2258 words·11 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 Aalto University
Human-like choice modeling revolutionizes preference learning! A new tractable model, CRCS, significantly improves utility inference from human data, outperforming existing methods.
Preference Alignment with Flow Matching
·2735 words·13 mins·
loading
·
loading
AI Generated
Machine Learning
Reinforcement Learning
🏢 KAIST AI
Preference Flow Matching (PFM) streamlines preference integration into pre-trained models using flow matching, overcoming fine-tuning limitations and enabling robust alignment with human preferences.
Pre-Trained Multi-Goal Transformers with Prompt Optimization for Efficient Online Adaptation
·2369 words·12 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 Peking University
MGPO: Efficient online RL adaptation via prompt optimization of pre-trained multi-goal transformers.
Policy-shaped prediction: avoiding distractions in model-based reinforcement learning
·2695 words·13 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 Stanford University
Policy-Shaped Prediction (PSP) improves model-based reinforcement learning by focusing world models on task-relevant information, significantly enhancing robustness against distracting stimuli.
Policy Optimization for Robust Average Reward MDPs
·314 words·2 mins·
loading
·
loading
AI Generated
Machine Learning
Reinforcement Learning
🏢 University at Buffalo
First-order policy optimization for robust average-cost MDPs achieves linear convergence with increasing step size and 0(1/ε) complexity with constant step size, solving a critical gap in existing res…
Policy Mirror Descent with Lookahead
·1918 words·10 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 ETH Zurich
Boosting reinforcement learning, this paper introduces h-PMD, a novel algorithm enhancing policy mirror descent with lookahead for faster convergence and improved sample complexity.
Pessimistic Backward Policy for GFlowNets
·1889 words·9 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 POSTECH
Pessimistic Backward Policy for GFlowNets (PBP-GFN) tackles GFlowNets’ tendency to under-exploit high-reward objects by maximizing observed backward flow, enhancing high-reward object discovery and ov…
Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning
·2411 words·12 mins·
loading
·
loading
Reinforcement Learning
🏢 University of Washington
VPL: a novel multimodal RLHF personalizes AI by inferring user-specific latent preferences, enabling accurate reward modeling and improved policy alignment for diverse populations.
Periodic agent-state based Q-learning for POMDPs
·2014 words·10 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 McGill University
PASQL, a novel periodic agent-state Q-learning algorithm, significantly improves reinforcement learning in partially observable environments by leveraging non-stationary periodic policies to overcome …
PEAC: Unsupervised Pre-training for Cross-Embodiment Reinforcement Learning
·3037 words·15 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 Tsinghua University
PEAC: a novel unsupervised pre-training method significantly improves cross-embodiment generalization in reinforcement learning, enabling faster adaptation to diverse robots and tasks.
Parseval Regularization for Continual Reinforcement Learning
·2345 words·12 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 McGill University
Boost continual reinforcement learning with Parseval regularization: maintaining orthogonal weight matrices preserves optimization, significantly improving RL agent training across diverse tasks.