🏢 National Key Laboratory for Novel Software Technology, Nanjing University, China

Near-Optimal Dynamic Regret for Adversarial Linear Mixture MDPs

26 September 2024·308 words·2 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 National Key Laboratory for Novel Software Technology, Nanjing University, China

Near-optimal dynamic regret is achieved for adversarial linear mixture MDPs with unknown transitions, bridging occupancy-measure and policy-based methods for superior performance.

KALM: Knowledgeable Agents by Offline Reinforcement Learning from Large Language Model Rollouts

26 September 2024·3188 words·15 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 National Key Laboratory for Novel Software Technology, Nanjing University, China

KALM: Knowledgeable agents learn complex tasks from LLMs via offline RL using imaginary rollouts, significantly outperforming baselines.

Handling Learnwares from Heterogeneous Feature Spaces with Explicit Label Exploitation

26 September 2024·2061 words·10 mins· loading · loading

Machine Learning Transfer Learning 🏢 National Key Laboratory for Novel Software Technology, Nanjing University, China

This paper enhances learnware dock systems by using model outputs to improve heterogeneous learnware management, enabling effective task handling even without perfectly matched models.

Gradient-Variation Online Learning under Generalized Smoothness

26 September 2024·271 words·2 mins· loading · loading

AI Generated AI Theory Optimization 🏢 National Key Laboratory for Novel Software Technology, Nanjing University, China

This paper presents a novel optimistic mirror descent algorithm achieving optimal gradient-variation regret under generalized smoothness, applicable across convex, strongly convex functions, and fast-…

Avoiding Undesired Future with Minimal Cost in Non-Stationary Environments

26 September 2024·2100 words·10 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 National Key Laboratory for Novel Software Technology, Nanjing University, China

AUF-MICNS: A novel sequential method efficiently solves the avoiding undesired future problem by dynamically updating influence relations in non-stationary environments while minimizing action costs.