Why Harmonic Numbers Keep Appearing in Algorithm Analysis
Some math ideas look small on paper and then show up everywhere in programming. Harmonic numbers are one of those ideas. They grow slowly, they explain the behavior of several algorithms, and they are easy to explore with a few lines of Python.
What a harmonic number is
The n-th harmonic number is the sum of the reciprocals of the first n positive integers:
In plain English, you start with 1, then add one-half, then one-third, then one-fourth, and keep going.
| n | Expanded form | Exact value | Decimal |
|---|---|---|---|
| 1 | \(1\) | \(1\) | 1.000000 |
| 2 | \(1 + \frac{1}{2}\) | \(\frac{3}{2}\) | 1.500000 |
| 3 | \(1 + \frac{1}{2} + \frac{1}{3}\) | \(\frac{11}{6}\) | 1.833333 |
| 4 | \(1 + \frac{1}{2} + \frac{1}{3} + \frac{1}{4}\) | \(\frac{25}{12}\) | 2.083333 |
| 5 | \(1 + \frac{1}{2} + \cdots + \frac{1}{5}\) | \(\frac{137}{60}\) | 2.283333 |
These values keep getting larger, but much more slowly than you might expect.
Computing one harmonic number in Python
The direct translation into Python is simple:
def harmonic_number(n: int) -> float:
"""Return H_n as a floating-point number."""
if n < 1:
raise ValueError("n must be at least 1")
return sum(1 / k for k in range(1, n + 1))
This version is good for learning and for many everyday uses. It also includes a small safety check so the function does not quietly accept invalid input like 0 or negative numbers.
float, which means the result is approximate. That is normal in Python when you divide integers with /.
Getting exact answers with fractions
If you want the exact mathematical value instead of a decimal approximation, Python’s standard library can help.
from fractions import Fraction
def harmonic_number_exact(n: int) -> Fraction:
"""Return H_n exactly as a fraction."""
if n < 1:
raise ValueError("n must be at least 1")
return sum((Fraction(1, k) for k in range(1, n + 1)), start=Fraction(0, 1))
Now harmonic_number_exact(5) returns Fraction(137, 60), which is the exact value shown in the table above.
float when you want speed and simple output, and use Fraction when exact arithmetic matters more.
Building the whole sequence efficiently
If you want every value from \(H_1\) through \(H_n\), do not recompute each one from scratch. Keep a running total instead.
def harmonic_sequence(n: int) -> list[float]:
"""Return [H_1, H_2, ..., H_n]."""
if n < 1:
raise ValueError("n must be at least 1")
total = 0.0
values = []
for k in range(1, n + 1):
total += 1 / k
values.append(total)
return values
This is faster and easier to scale because each new value is built from the previous one.
How harmonic numbers grow
One of the most important facts about harmonic numbers is that they grow roughly like the natural logarithm:
$$H_n \approx \ln(n) + \gamma$$Here, \(\gamma\) is the Euler–Mascheroni constant, approximately \(0.57721\).
import math
gamma = 0.5772156649
print(f"{'n':>6} {'H_n':>10} {'ln(n)':>10} {'H_n - ln(n)':>13}")
print("-" * 45)
for n in [1, 2, 5, 10, 100, 1000, 10000]:
h = harmonic_number(n)
print(f"{n:>6} {h:>10.6f} {math.log(n):>10.6f} {h - math.log(n):>13.6f}")
As n gets larger, the gap between H_n and ln(n) moves closer to \(\gamma\).
n grows.
A real example: the coupon collector problem
Imagine a company puts one of n different collectibles in each box, chosen at random. How many boxes do you expect to buy before you collect all of them?
The expected value is:
$$n \cdot H_n$$def expected_boxes(n: int) -> float:
"""Return the expected number of boxes needed to collect all n items."""
if n < 1:
raise ValueError("n must be at least 1")
return n * harmonic_number(n)
If there are 52 equally likely collectibles, the expected number of boxes is about 236.
Why programmers keep seeing harmonic numbers
Harmonic numbers appear in algorithm analysis because many average-case costs look like this:
$$1 + \frac{1}{2} + \frac{1}{3} + \cdots + \frac{1}{n}$$That pattern shows up in the expected behavior of several algorithms and data structures. The big takeaway is not just that harmonic numbers exist, but that they grow slowly — roughly like \(\log n\). That slow growth is one reason many randomized algorithms remain practical even as input size increases.
Related reading: Understanding String Formatting in Python
Comments