↓Skip to main content

🏢 Churney ApS

A Best-of-both-worlds Algorithm for Bandits with Delayed Feedback with Robustness to Excessive Delays

26 September 2024·484 words·3 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Churney ApS

New best-of-both-worlds bandit algorithm tolerates arbitrary excessive delays, overcoming limitations of prior work that required prior knowledge of maximal delay and suffered linear regret dependence…