Using Python to Explore Harmonic Numbers
Harmonic numbers are one of those mathematical ideas that look almost too simple to be interesting — and then keep showing up everywhere. They appear in the analysis of sorting algorithms, in probability theory, and in the study of data structures like hash tables. This post builds them up from scratch and implements them in Python, with worked examples at each step.
What Is a Harmonic Number?
The \(n\)-th harmonic number \(H_n\) is the sum of the reciprocals of the first \(n\) positive integers:
$$H_n = 1 + \frac{1}{2} + \frac{1}{3} + \cdots + \frac{1}{n} = \sum_{k=1}^{n} \frac{1}{k}$$In plain English: take 1, add a half, add a third, add a quarter, and so on up to \(\frac{1}{n}\). Here are the first few values as exact fractions, so you can see the pattern clearly:
| \(n\) | Expanded | Exact fraction | Decimal |
|---|---|---|---|
| \(H_1\) | \(1\) | \(1\) | 1.000000 |
| \(H_2\) | \(1 + \frac{1}{2}\) | \(\frac{3}{2}\) | 1.500000 |
| \(H_3\) | \(1 + \frac{1}{2} + \frac{1}{3}\) | \(\frac{11}{6}\) | 1.833333 |
| \(H_4\) | \(1 + \frac{1}{2} + \frac{1}{3} + \frac{1}{4}\) | \(\frac{25}{12}\) | 2.083333 |
| \(H_5\) | \(1 + \frac{1}{2} + \cdots + \frac{1}{5}\) | \(\frac{137}{60}\) | 2.283333 |
Notice that \(H_n\) keeps growing as \(n\) increases, but it grows very slowly — we'll come back to exactly how slowly in a moment.
Computing a Single Harmonic Number in Python
The definition translates almost directly into Python. We want to add up \(\frac{1}{k}\) for every integer \(k\) from 1 to \(n\):
def harmonic_number(n: int) -> float:
"""Return the nth harmonic number H_n = 1 + 1/2 + 1/3 + ... + 1/n."""
return sum(1 / k for k in range(1, n + 1))
Let's unpack every part:
range(1, n + 1) — Python's range(a, b) goes from
\(a\) up to but not including \(b\). So we write n + 1 to make sure \(n\) itself
is included. Without the + 1, we would compute \(H_{n-1}\) by mistake.
1 / k for k in range(...) — this is a generator expression.
It produces the values \(\frac{1}{1}, \frac{1}{2}, \frac{1}{3}, \ldots, \frac{1}{n}\)
one at a time without storing them all in memory at once.
sum(...) — adds up everything the generator produces.
Let's try it out:
for n in range(1, 6):
print(f"H({n}) = {harmonic_number(n):.6f}")
H(1) = 1.000000
H(2) = 1.500000
H(3) = 1.833333
H(4) = 2.083333
H(5) = 2.283333
These match the exact fractions in the table above. ✓
Generating the Whole Sequence at Once
Sometimes you want all the harmonic numbers up to \(H_n\) in one go — for example, to plot them or compare them to another sequence. We can compute the whole sequence efficiently by keeping a running total, adding one new term at each step:
def harmonic_sequence(n: int) -> list[float]:
"""Return [H_1, H_2, ..., H_n] as a list."""
total = 0.0
sequence: list[float] = []
for k in range(1, n + 1):
total += 1 / k
sequence.append(total)
return sequence
The key idea here is accumulation. Instead of recomputing each \(H_k\) from
scratch — which would repeat all the additions — we keep a running total and just
add the next term \(\frac{1}{k}\) at each step. This is faster and avoids floating-point
rounding errors building up from repeated full recalculations.
seq = harmonic_sequence(5)
for i, value in enumerate(seq, start=1):
print(f"H({i}) = {value:.6f}")
H(1) = 1.000000
H(2) = 1.500000
H(3) = 1.833333
H(4) = 2.083333
H(5) = 2.283333
enumerate(seq, start=1) — enumerate pairs each value
in the list with its index. The start=1 argument makes the index begin at 1 instead
of the default 0, so we get (1, H_1), (2, H_2), ... rather than
(0, H_1), (1, H_2), ....
How Fast Do Harmonic Numbers Grow?
Here is one of the most important facts about harmonic numbers. The sum keeps growing as \(n\) increases, but it grows very slowly — roughly at the same rate as the natural logarithm \(\ln(n)\). More precisely:
$$H_n \approx \ln(n) + \gamma$$where \(\gamma \approx 0.5772\) is a famous mathematical constant called the Euler–Mascheroni constant. The gap \(H_n - \ln(n)\) doesn't grow or shrink — it gradually settles toward \(\gamma\) as \(n\) gets large.
Let's verify this in Python:
import math
gamma = 0.5772156649 # Euler-Mascheroni constant
print(f"{'n':>6} {'H_n':>10} {'ln(n)':>10} {'H_n - ln(n)':>13}")
print("-" * 46)
for n in [1, 2, 5, 10, 100, 1000, 10000]:
h = harmonic_number(n)
ln_n = math.log(n) if n > 1 else 0.0
print(f"{n:>6} {h:>10.6f} {ln_n:>10.6f} {h - ln_n:>13.6f}")
print(f"\nEuler-Mascheroni constant γ ≈ {gamma}")
n H_n ln(n) H_n - ln(n)
----------------------------------------------
1 1.000000 0.000000 1.000000
2 1.500000 0.693147 0.806853
5 2.283333 1.609438 0.673895
10 2.928968 2.302585 0.626383
100 5.187378 4.605170 0.582207
1000 7.485471 6.907755 0.577716
10000 9.787606 9.210340 0.577266
Euler-Mascheroni constant γ ≈ 0.5772156649
Watch the rightmost column. As \(n\) grows, \(H_n - \ln(n)\) gets closer and closer to \(\gamma \approx 0.5772\). It never reaches it exactly — it just creeps toward it forever. This is why the approximation \(H_n \approx \ln(n) + \gamma\) gets better and better for large \(n\).
A Real Example: The Coupon Collector Problem
Here is a classic problem where harmonic numbers give a direct, practical answer.
Imagine a breakfast cereal company puts one of \(n\) different collectible toys in each box, chosen at random. How many boxes do you expect to buy before you have collected all \(n\) toys?
The answer is exactly \(n \cdot H_n\):
$$\text{Expected boxes} = n \cdot H_n = n \cdot \sum_{k=1}^{n} \frac{1}{k}$$The reasoning: when you have already collected \(k-1\) distinct toys, the probability that the next box gives you a new one is \(\frac{n-(k-1)}{n}\). The expected number of boxes to get that new toy is the reciprocal of that probability, \(\frac{n}{n-k+1}\). Summing over all \(n\) toys gives \(n \cdot H_n\).
def expected_boxes(n: int) -> float:
"""Expected number of boxes to collect all n toys."""
return n * harmonic_number(n)
print(f"{'n toys':>8} {'Expected boxes':>16}")
print("-" * 28)
for n in [5, 10, 20, 52]:
print(f"{n:>8} {expected_boxes(n):>16.2f}")
n toys Expected boxes
----------------------------
5 11.42
10 29.29
20 71.95
52 235.98
So if there are 52 different toys (one for each card in a deck), you'd need to buy about 236 boxes on average to complete the set. That's the power of a harmonic number formula — one line of code gives you a precise expected value that would be very tedious to work out by hand.
Why Programmers Care About Harmonic Numbers
The connection \(H_n \approx \ln(n) + \gamma\) is why harmonic numbers appear so often in algorithm analysis. Any time an algorithm's cost involves adding up fractions like \(\frac{1}{n} + \frac{1}{n-1} + \cdots + \frac{1}{1}\), the total is \(H_n\), and you immediately know the growth rate is roughly \(\ln(n)\) — which is very slow, and very good.
Some examples from computer science where this pattern shows up: the expected number of comparisons in randomized quicksort, the expected number of probes in a linear-probing hash table, and the analysis of the hiring problem in introductory algorithms courses. In every case, the harmonic number is the answer, and its logarithmic growth is why these algorithms perform so well in practice.
If you are interested in Python fundamentals that help with mathematical programming like this, you might enjoy: Understanding String Formatting in Python.
Comments