Why Harmonic Numbers Keep Appearing in Algorithm Analysis

Python · Concrete Mathematics

Why Harmonic Numbers Keep Appearing in Algorithm Analysis

Some math ideas look small on paper and then show up everywhere in programming. Harmonic numbers are one of those ideas. They grow slowly, they explain the behavior of several algorithms, and they are easy to explore with a few lines of Python.

What a harmonic number is

The n-th harmonic number is the sum of the reciprocals of the first n positive integers:

$$H_n = \sum_{k=1}^{n} \frac{1}{k} = 1 + \frac{1}{2} + \frac{1}{3} + \cdots + \frac{1}{n}$$

In plain English, you start with 1, then add one-half, then one-third, then one-fourth, and keep going.

n Expanded form Exact value Decimal
1 \(1\) \(1\) 1.000000
2 \(1 + \frac{1}{2}\) \(\frac{3}{2}\) 1.500000
3 \(1 + \frac{1}{2} + \frac{1}{3}\) \(\frac{11}{6}\) 1.833333
4 \(1 + \frac{1}{2} + \frac{1}{3} + \frac{1}{4}\) \(\frac{25}{12}\) 2.083333
5 \(1 + \frac{1}{2} + \cdots + \frac{1}{5}\) \(\frac{137}{60}\) 2.283333

These values keep getting larger, but much more slowly than you might expect.

Computing one harmonic number in Python

The direct translation into Python is simple:

def harmonic_number(n: int) -> float:
    """Return H_n as a floating-point number."""
    if n < 1:
        raise ValueError("n must be at least 1")
    return sum(1 / k for k in range(1, n + 1))

This version is good for learning and for many everyday uses. It also includes a small safety check so the function does not quietly accept invalid input like 0 or negative numbers.

Accuracy note: This function returns a float, which means the result is approximate. That is normal in Python when you divide integers with /.

Getting exact answers with fractions

If you want the exact mathematical value instead of a decimal approximation, Python’s standard library can help.

from fractions import Fraction


def harmonic_number_exact(n: int) -> Fraction:
    """Return H_n exactly as a fraction."""
    if n < 1:
        raise ValueError("n must be at least 1")
    return sum((Fraction(1, k) for k in range(1, n + 1)), start=Fraction(0, 1))

Now harmonic_number_exact(5) returns Fraction(137, 60), which is the exact value shown in the table above.

This is a nice pattern for math-heavy code: use float when you want speed and simple output, and use Fraction when exact arithmetic matters more.

Building the whole sequence efficiently

If you want every value from \(H_1\) through \(H_n\), do not recompute each one from scratch. Keep a running total instead.

def harmonic_sequence(n: int) -> list[float]:
    """Return [H_1, H_2, ..., H_n]."""
    if n < 1:
        raise ValueError("n must be at least 1")

    total = 0.0
    values = []

    for k in range(1, n + 1):
        total += 1 / k
        values.append(total)

    return values

This is faster and easier to scale because each new value is built from the previous one.

How harmonic numbers grow

One of the most important facts about harmonic numbers is that they grow roughly like the natural logarithm:

$$H_n \approx \ln(n) + \gamma$$

Here, \(\gamma\) is the Euler–Mascheroni constant, approximately \(0.57721\).

import math

gamma = 0.5772156649

print(f"{'n':>6} {'H_n':>10} {'ln(n)':>10} {'H_n - ln(n)':>13}")
print("-" * 45)

for n in [1, 2, 5, 10, 100, 1000, 10000]:
    h = harmonic_number(n)
    print(f"{n:>6} {h:>10.6f} {math.log(n):>10.6f} {h - math.log(n):>13.6f}")

As n gets larger, the gap between H_n and ln(n) moves closer to \(\gamma\).

This is an approximation, not an exact identity. It becomes more useful as n grows.

A real example: the coupon collector problem

Imagine a company puts one of n different collectibles in each box, chosen at random. How many boxes do you expect to buy before you collect all of them?

The expected value is:

$$n \cdot H_n$$
def expected_boxes(n: int) -> float:
    """Return the expected number of boxes needed to collect all n items."""
    if n < 1:
        raise ValueError("n must be at least 1")
    return n * harmonic_number(n)

If there are 52 equally likely collectibles, the expected number of boxes is about 236.

Modeling note: This assumes each collectible is equally likely and each box is independent. Real promotions may not behave that neatly.

Why programmers keep seeing harmonic numbers

Harmonic numbers appear in algorithm analysis because many average-case costs look like this:

$$1 + \frac{1}{2} + \frac{1}{3} + \cdots + \frac{1}{n}$$

That pattern shows up in the expected behavior of several algorithms and data structures. The big takeaway is not just that harmonic numbers exist, but that they grow slowly — roughly like \(\log n\). That slow growth is one reason many randomized algorithms remain practical even as input size increases.

Related reading: Understanding String Formatting in Python

Comments