Skip to main content

🏢 Churney ApS

A Best-of-both-worlds Algorithm for Bandits with Delayed Feedback with Robustness to Excessive Delays
·484 words·3 mins· loading · loading
Machine Learning Reinforcement Learning 🏢 Churney ApS
New best-of-both-worlds bandit algorithm tolerates arbitrary excessive delays, overcoming limitations of prior work that required prior knowledge of maximal delay and suffered linear regret dependence…