Bandit Theory Symposium
On March 10, 2025, there will be a symposium about recent developments in the mathematical theory of bandits algorithms at the University of Amsterdam.
The speakers will be Jack Mayo (University of Amsterdam), Wouter Koolen (CWI and University of Twente), Julia Olkhovskaya (Delft University of Technology), and Dirk van der Hoeven (Leiden University).
Update: The schedule has been updated, because Tor Lattimore could not make it.
Schedule
Time | Speaker | |
---|---|---|
14.30-14.45 | Welcome Coffee | |
14.45-15.15 | Jack Mayo
Improved Algorithms for Adversarial Linear Contextual Bandits via Reduction |
AbstractWe provide a new set of results for the problem of adversarial linear contextual bandits using a novel reduction from the adjacent problem of adversarial linear bandits in both the independently parametrized (IP) formulation of Neu & Olkhovskaya (2020) and the jointly parametrized (JP) setting of Liu et al. (2023). We show that, given access to a misspecification-robust linear bandit algorithm, we can achieve \(O(dK\sqrt{T})\) in IP and \(O(d\sqrt{T})\) in JP without access to a simulator. Given simulator access and a misspecification-robust linear bandit algorithm enjoying small-loss guarantees, we provide a new polynomial-time algorithm achieving \(O(dK\sqrt{L^{\star}_{T}})\) in IP and \(O(d\sqrt{L^{\star}_{T}})\) in JP without additional assumptions on the context distribution \(\mathcal{D}\), where \(L^{\star}_{T}\) is the cumulative loss of the best policy. |
15:15-15.45 | Wouter Koolen
Can a biological neuron do linear regression? |
AbstractWe investigate a simple model for biological neurons from the perspective of bandit optimisation. In this simple model, a neuron updates weights based on reward. We then wonder if, and how well, this scheme actually optimises reward. To study this question, we will analyse the scheme as a two-point zeroth-order optimisation method, and evaluate it for the one-neuron task of linear regression. We derive upper bounds for the scheme, present lower bounds for the problem, and contrast the results with first-order methods. We conclude by reflecting on the consequences for neuroscience. |
Coffee Break | ||
16:00-16.30 | Julia Olkhovskaya
Sparse Nonparametric Contextual Bandits |
AbstractWe introduce and analyse a new class of contextual bandit problems, called sparse nonparametric contextual bandits, in which the expected reward of each context-action pair is a sparse linear combination of features belonging to an infinite set of candidate features. We study two notions of sparsity, for which we provide nearly-matching upper and lower bounds. |
16.30-17.00 | Dirk van der Hoeven
A Unified Analysis of Nonstochastic Delayed Feedback for Combinatorial Semi-Bandits, Linear Bandits, and MDPs |
AbstractWe derive a new analysis of Follow The Regularized Leader (FTRL) for online learning with delayed bandit feedback. By separating the cost of delayed feedback from that of bandit feedback, our analysis allows us to obtain new results in four important settings. We derive the first optimal (up to logarithmic factors) regret bounds for combinatorial semi-bandits with delay and adversarial Markov Decision Processes with delay (both known and unknown transition functions). Furthermore, we use our analysis to develop an efficient algorithm for linear bandits with delay achieving near-optimal regret bounds. In order to derive these results we show that FTRL remains stable across multiple rounds under mild assumptions on the regularizer. |
Reception |
Registration
Attendance is free, but registration is highly appreciated.
Directions
The symposium will be in the main University of Amsterdam building in the Science Park in Amsterdam:
Room A1.04 at Science Park 904
See also Google Maps.
When entering through the main entrance, the room will be one floor up on your left.
Organizers
Tim van Erven (tim@timvanerven.nl)
Jack Mayo (jack@symmetric.ai)