Note
Note

A Modern Introduction to Online Learning - Ch 1

These are notes for the text A Modern Introduction to Online Learning by Francesco Orabona on ArXiV.

Chapter 1 - What is Online Learning?

The core idea of Online Learning in this context is presented through a repeated game framework where the goal is to minimize Regret.

The Basic Game (Example):

  1. For rounds \(t = 1, \dots, T\):
    • An adversary chooses a secret number \(y_t \in [0, 1]\).
    • You choose a number \(x_t \in [0, 1]\).
    • The adversary reveals \(y_t\), and you pay the squared loss \((x_t - y_t)^2\).

Goal: Minimize Regret

Initially, if we assume \(y_t\) are i.i.d from some distribution with variance \(\sigma^2\), the best strategy is to always predict the mean, incurring an expected loss of \(\sigma^2 T\). A natural goal is to minimize the excess loss compared to this optimal (but unknown) strategy.

Removing the stochastic assumption, we consider an arbitrary sequence \(y_t\). The performance metric becomes the Regret: how much worse our total loss is compared to the best single fixed action chosen in hindsight.

Definition - Regret (for the example game)

The regret after \(T\) rounds is defined as:

\[\text{Regret}_T := \sum_{t=1}^T (x_t - y_t)^2 - \min_{x \in [0, 1]} \sum_{t=1}^T (x - y_t)^2\]

An algorithm is considered successful (“wins the game”) if \(\text{Regret}_T\) grows sublinearly in \(T\) (i.e., \(\lim_{T\to\infty} \frac{\text{Regret}_T}{T} = 0\)).

General Online Learning Framework:

  • At each round \(t\), the algorithm chooses an action \(x_t\) from a feasible set \(V \subseteq \mathbb{R}^d\).
  • It incurs a loss \(\ell_t(x_t)\), where the loss function \(\ell_t\) can be chosen arbitrarily (adversarially) at each round.
  • The goal is to minimize the regret compared to the best fixed competitor \(u \in V\) in hindsight.

Definition - Regret (General)

For a sequence of loss functions \(\ell_1, \dots, \ell_T\) and algorithm predictions \(x_1, \dots, x_T\), the regret with respect to a competitor \(u \in V\) is:

\[\text{Regret}_T(u) := \sum_{t=1}^T \ell_t(x_t) - \sum_{t=1}^T \ell_t(u)\]

The online algorithm does not know \(u\) or the future losses when making its prediction \(x_t\).

Adversarial Nature: The sequence of losses \(\ell_t\) can be chosen by an adversary, potentially based on the algorithm’s past actions. This makes standard statistical assumptions (like i.i.d data) invalid for the analysis.

A Strategy: Follow-the-Leader (FTL)

A natural strategy is to choose the action at time \(t\) that would have been optimal for the past rounds \(1, \dots, t-1\).

  • In the example game: The best action in hindsight after \(T\) rounds is \(x^*_T = \frac{1}{T} \sum_{t=1}^T y_t\).
  • FTL Strategy: \(x_t = x^*_{t-1} = \frac{1}{t-1} \sum_{i=1}^{t-1} y_i\) (for \(t > 1\)).

Analysis of FTL for the Guessing Game:

  • Lemma 1.2 (Hannan’s Lemma): Let \(x^*_t = \arg\min_{x \in V} \sum_{i=1}^t \ell_i(x)\). Then for any sequence of loss functions \(\ell_t\): \(\sum_{t=1}^T \ell_t(x^*_t) \le \sum_{t=1}^T \ell_t(x^*_T)\) (Playing adaptively based on past data is no worse than playing the single best action in hindsight).
  • Theorem 1.3: For the number guessing game (\(\ell_t(x)=(x-y_t)^2, y_t \in [0,1]\)), the FTL strategy (\(x_t = x^*_{t-1}\)) achieves: \(\text{Regret}_T \le 4 + 4 \ln T\) This is sublinear in \(T\), so FTL “wins” this specific game.
  • Proof Idea: Use Lemma 1.2 to bound the regret against the sequence \(x^*_1, \dots, x^*_T\). Then bound the difference between consecutive optimal actions $$x^_{t-1} - x^_t\(. Show this difference decreases quickly enough (like\)O(1/t)\(), and the sum is bounded by\)O(\ln T)$$.

Key Takeaways from FTL Example:

  • The FTL strategy for this specific game is parameter-free (no learning rates etc. to tune).
  • It is computationally efficient (only requires maintaining a running average).
  • It does not use gradients.
  • Online learning is distinct from statistical learning; concepts like overfitting don’t directly apply.

History Bits:

  • Savage (1951): Introduced the concept of “loss” (regret) in statistical decision problems based on Wald (1950).
  • Hannan (1957): Designed the first randomized algorithm for repeated games with vanishing average regret; proved Lemma 1.2.
This post is licensed under CC BY 4.0 by the author.