Note

A Modern Introduction to Online Learning - Ch 1

Posted Apr 29, 2025

By Ji-Ha Kim

3 min read

These are notes for the text A Modern Introduction to Online Learning by Francesco Orabona on ArXiV.

Chapter 1 - What is Online Learning?

The core idea of Online Learning in this context is presented through a repeated game framework where the goal is to minimize Regret.

The Basic Game (Example):

For rounds $t = 1, \dots, T$:
- An adversary chooses a secret number $y_t \in [0, 1]$.
- You choose a number $x_t \in [0, 1]$.
- The adversary reveals $y_t$, and you pay the squared loss $(x_t - y_t)^2$.

Goal: Minimize Regret

Initially, if we assume $y_t$ are i.i.d from some distribution with variance $\sigma^2$, the best strategy is to always predict the mean, incurring an expected loss of $\sigma^2 T$. A natural goal is to minimize the excess loss compared to this optimal (but unknown) strategy.

Removing the stochastic assumption, we consider an arbitrary sequence $y_t$. The performance metric becomes the Regret: how much worse our total loss is compared to the best single fixed action chosen in hindsight.

Definition - Regret (for the example game)
The regret after $T$ rounds is defined as:
\[\text{Regret}_T := \sum_{t=1}^T (x_t - y_t)^2 - \min_{x \in [0, 1]} \sum_{t=1}^T (x - y_t)^2\]
An algorithm is considered successful (“wins the game”) if $\text{Regret}_T$ grows sublinearly in $T$ (i.e., $\lim_{T\to\infty} \frac{\text{Regret}_T}{T} = 0$).

General Online Learning Framework:

At each round $t$, the algorithm chooses an action $x_t$ from a feasible set $V \subseteq \mathbb{R}^d$.
It incurs a loss $\ell_t(x_t)$, where the loss function $\ell_t$ can be chosen arbitrarily (adversarially) at each round.
The goal is to minimize the regret compared to the best fixed competitor $u \in V$ in hindsight.

Definition - Regret (General)
For a sequence of loss functions $\ell_1, \dots, \ell_T$ and algorithm predictions $x_1, \dots, x_T$, the regret with respect to a competitor $u \in V$ is:
\[\text{Regret}_T(u) := \sum_{t=1}^T \ell_t(x_t) - \sum_{t=1}^T \ell_t(u)\]
The online algorithm does not know $u$ or the future losses when making its prediction $x_t$.

Adversarial Nature: The sequence of losses $\ell_t$ can be chosen by an adversary, potentially based on the algorithm’s past actions. This makes standard statistical assumptions (like i.i.d data) invalid for the analysis.

A Strategy: Follow-the-Leader (FTL)

A natural strategy is to choose the action at time $t$ that would have been optimal for the past rounds $1, \dots, t-1$.

In the example game: The best action in hindsight after $T$ rounds is $x^*_T = \frac{1}{T} \sum_{t=1}^T y_t$.
FTL Strategy: $x_t = x^*_{t-1} = \frac{1}{t-1} \sum_{i=1}^{t-1} y_i$ (for $t > 1$).

Analysis of FTL for the Guessing Game:

Lemma 1.2 (Hannan’s Lemma): Let $x^*_t = \arg\min_{x \in V} \sum_{i=1}^t \ell_i(x)$. Then for any sequence of loss functions $\ell_t$: $\sum_{t=1}^T \ell_t(x^*_t) \le \sum_{t=1}^T \ell_t(x^*_T)$ (Playing adaptively based on past data is no worse than playing the single best action in hindsight).
Theorem 1.3: For the number guessing game ($\ell_t(x)=(x-y_t)^2, y_t \in [0,1]$), the FTL strategy ($x_t = x^*_{t-1}$) achieves: $\text{Regret}_T \le 4 + 4 \ln T$ This is sublinear in $T$, so FTL “wins” this specific game.

Proof Idea: Use Lemma 1.2 to bound the regret against the sequence $x^*_1, \dots, x^*_T$. Then bound the difference between consecutive optimal actions $$

x^_{t-1} - x^_t

$. Show this difference decreases quickly enough (like$O(1/t)$), and the sum is bounded by$O(\ln T)$$.

Key Takeaways from FTL Example:

The FTL strategy for this specific game is parameter-free (no learning rates etc. to tune).
It is computationally efficient (only requires maintaining a running average).
It does not use gradients.
Online learning is distinct from statistical learning; concepts like overfitting don’t directly apply.

History Bits:

Savage (1951): Introduced the concept of “loss” (regret) in statistical decision problems based on Wald (1950).
Hannan (1957): Designed the first randomized algorithm for repeated games with vanishing average regret; proved Lemma 1.2.

This post is licensed under CC BY 4.0 by the author.

Chapter 1 - What is Online Learning?

Definition - Regret (for the example game)

Definition - Regret (General)

Trending Tags