🏢 Criteo AI Lab
Logarithmic Smoothing for Pessimistic Off-Policy Evaluation, Selection and Learning
·2182 words·11 mins·
loading
·
loading
Reinforcement Learning
🏢 Criteo AI Lab
Logarithmic Smoothing enhances pessimistic offline contextual bandit algorithms by providing tighter concentration bounds for improved policy evaluation, selection and learning.