🏢 Churney ApS
A Best-of-both-worlds Algorithm for Bandits with Delayed Feedback with Robustness to Excessive Delays
·484 words·3 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 Churney ApS
New best-of-both-worlds bandit algorithm tolerates arbitrary excessive delays, overcoming limitations of prior work that required prior knowledge of maximal delay and suffered linear regret dependence…