🏢 National Key Laboratory for Novel Software Technology, Nanjing University, China
Near-Optimal Dynamic Regret for Adversarial Linear Mixture MDPs
·308 words·2 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 National Key Laboratory for Novel Software Technology, Nanjing University, China
Near-optimal dynamic regret is achieved for adversarial linear mixture MDPs with unknown transitions, bridging occupancy-measure and policy-based methods for superior performance.
KALM: Knowledgeable Agents by Offline Reinforcement Learning from Large Language Model Rollouts
·3188 words·15 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 National Key Laboratory for Novel Software Technology, Nanjing University, China
KALM: Knowledgeable agents learn complex tasks from LLMs via offline RL using imaginary rollouts, significantly outperforming baselines.
Handling Learnwares from Heterogeneous Feature Spaces with Explicit Label Exploitation
·2061 words·10 mins·
loading
·
loading
Machine Learning
Transfer Learning
🏢 National Key Laboratory for Novel Software Technology, Nanjing University, China
This paper enhances learnware dock systems by using model outputs to improve heterogeneous learnware management, enabling effective task handling even without perfectly matched models.
Gradient-Variation Online Learning under Generalized Smoothness
·271 words·2 mins·
loading
·
loading
AI Generated
AI Theory
Optimization
🏢 National Key Laboratory for Novel Software Technology, Nanjing University, China
This paper presents a novel optimistic mirror descent algorithm achieving optimal gradient-variation regret under generalized smoothness, applicable across convex, strongly convex functions, and fast-…
Avoiding Undesired Future with Minimal Cost in Non-Stationary Environments
·2100 words·10 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 National Key Laboratory for Novel Software Technology, Nanjing University, China
AUF-MICNS: A novel sequential method efficiently solves the avoiding undesired future problem by dynamically updating influence relations in non-stationary environments while minimizing action costs.