Using Python to Explore Harmonic Number

Using Python to Explore Harmonic Numbers

Harmonic numbers are one of those mathematical ideas that look almost too simple to be interesting — and then keep showing up everywhere. They appear in the analysis of sorting algorithms, in probability theory, and in the study of data structures like hash tables. This post builds them up from scratch and implements them in Python, with worked examples at each step.


What Is a Harmonic Number?

The \(n\)-th harmonic number \(H_n\) is the sum of the reciprocals of the first \(n\) positive integers:

$$H_n = 1 + \frac{1}{2} + \frac{1}{3} + \cdots + \frac{1}{n} = \sum_{k=1}^{n} \frac{1}{k}$$

In plain English: take 1, add a half, add a third, add a quarter, and so on up to \(\frac{1}{n}\). Here are the first few values as exact fractions, so you can see the pattern clearly:

\(n\) Expanded Exact fraction Decimal
\(H_1\) \(1\) \(1\) 1.000000
\(H_2\) \(1 + \frac{1}{2}\) \(\frac{3}{2}\) 1.500000
\(H_3\) \(1 + \frac{1}{2} + \frac{1}{3}\) \(\frac{11}{6}\) 1.833333
\(H_4\) \(1 + \frac{1}{2} + \frac{1}{3} + \frac{1}{4}\) \(\frac{25}{12}\) 2.083333
\(H_5\) \(1 + \frac{1}{2} + \cdots + \frac{1}{5}\) \(\frac{137}{60}\) 2.283333

Notice that \(H_n\) keeps growing as \(n\) increases, but it grows very slowly — we'll come back to exactly how slowly in a moment.


Computing a Single Harmonic Number in Python

The definition translates almost directly into Python. We want to add up \(\frac{1}{k}\) for every integer \(k\) from 1 to \(n\):

def harmonic_number(n: int) -> float:
    """Return the nth harmonic number H_n = 1 + 1/2 + 1/3 + ... + 1/n."""
    return sum(1 / k for k in range(1, n + 1))

Let's unpack every part:

range(1, n + 1) — Python's range(a, b) goes from \(a\) up to but not including \(b\). So we write n + 1 to make sure \(n\) itself is included. Without the + 1, we would compute \(H_{n-1}\) by mistake.

1 / k for k in range(...) — this is a generator expression. It produces the values \(\frac{1}{1}, \frac{1}{2}, \frac{1}{3}, \ldots, \frac{1}{n}\) one at a time without storing them all in memory at once.

sum(...) — adds up everything the generator produces.

Let's try it out:

for n in range(1, 6):
    print(f"H({n}) = {harmonic_number(n):.6f}")
H(1) = 1.000000
H(2) = 1.500000
H(3) = 1.833333
H(4) = 2.083333
H(5) = 2.283333

These match the exact fractions in the table above. ✓


Generating the Whole Sequence at Once

Sometimes you want all the harmonic numbers up to \(H_n\) in one go — for example, to plot them or compare them to another sequence. We can compute the whole sequence efficiently by keeping a running total, adding one new term at each step:

def harmonic_sequence(n: int) -> list[float]:
    """Return [H_1, H_2, ..., H_n] as a list."""
    total = 0.0
    sequence: list[float] = []
    for k in range(1, n + 1):
        total += 1 / k
        sequence.append(total)
    return sequence

The key idea here is accumulation. Instead of recomputing each \(H_k\) from scratch — which would repeat all the additions — we keep a running total and just add the next term \(\frac{1}{k}\) at each step. This is faster and avoids floating-point rounding errors building up from repeated full recalculations.

seq = harmonic_sequence(5)
for i, value in enumerate(seq, start=1):
    print(f"H({i}) = {value:.6f}")
H(1) = 1.000000
H(2) = 1.500000
H(3) = 1.833333
H(4) = 2.083333
H(5) = 2.283333

enumerate(seq, start=1)enumerate pairs each value in the list with its index. The start=1 argument makes the index begin at 1 instead of the default 0, so we get (1, H_1), (2, H_2), ... rather than (0, H_1), (1, H_2), ....


How Fast Do Harmonic Numbers Grow?

Here is one of the most important facts about harmonic numbers. The sum keeps growing as \(n\) increases, but it grows very slowly — roughly at the same rate as the natural logarithm \(\ln(n)\). More precisely:

$$H_n \approx \ln(n) + \gamma$$

where \(\gamma \approx 0.5772\) is a famous mathematical constant called the Euler–Mascheroni constant. The gap \(H_n - \ln(n)\) doesn't grow or shrink — it gradually settles toward \(\gamma\) as \(n\) gets large.

Let's verify this in Python:

import math

gamma = 0.5772156649  # Euler-Mascheroni constant

print(f"{'n':>6}  {'H_n':>10}  {'ln(n)':>10}  {'H_n - ln(n)':>13}")
print("-" * 46)
for n in [1, 2, 5, 10, 100, 1000, 10000]:
    h = harmonic_number(n)
    ln_n = math.log(n) if n > 1 else 0.0
    print(f"{n:>6}  {h:>10.6f}  {ln_n:>10.6f}  {h - ln_n:>13.6f}")

print(f"\nEuler-Mascheroni constant γ ≈ {gamma}")
     n         H_n       ln(n)    H_n - ln(n)
----------------------------------------------
     1    1.000000    0.000000       1.000000
     2    1.500000    0.693147       0.806853
     5    2.283333    1.609438       0.673895
    10    2.928968    2.302585       0.626383
   100    5.187378    4.605170       0.582207
  1000    7.485471    6.907755       0.577716
 10000    9.787606    9.210340       0.577266

Euler-Mascheroni constant γ ≈ 0.5772156649

Watch the rightmost column. As \(n\) grows, \(H_n - \ln(n)\) gets closer and closer to \(\gamma \approx 0.5772\). It never reaches it exactly — it just creeps toward it forever. This is why the approximation \(H_n \approx \ln(n) + \gamma\) gets better and better for large \(n\).


A Real Example: The Coupon Collector Problem

Here is a classic problem where harmonic numbers give a direct, practical answer.

Imagine a breakfast cereal company puts one of \(n\) different collectible toys in each box, chosen at random. How many boxes do you expect to buy before you have collected all \(n\) toys?

The answer is exactly \(n \cdot H_n\):

$$\text{Expected boxes} = n \cdot H_n = n \cdot \sum_{k=1}^{n} \frac{1}{k}$$

The reasoning: when you have already collected \(k-1\) distinct toys, the probability that the next box gives you a new one is \(\frac{n-(k-1)}{n}\). The expected number of boxes to get that new toy is the reciprocal of that probability, \(\frac{n}{n-k+1}\). Summing over all \(n\) toys gives \(n \cdot H_n\).

def expected_boxes(n: int) -> float:
    """Expected number of boxes to collect all n toys."""
    return n * harmonic_number(n)

print(f"{'n toys':>8}  {'Expected boxes':>16}")
print("-" * 28)
for n in [5, 10, 20, 52]:
    print(f"{n:>8}  {expected_boxes(n):>16.2f}")
  n toys    Expected boxes
----------------------------
       5             11.42
      10             29.29
      20             71.95
      52            235.98

So if there are 52 different toys (one for each card in a deck), you'd need to buy about 236 boxes on average to complete the set. That's the power of a harmonic number formula — one line of code gives you a precise expected value that would be very tedious to work out by hand.


Why Programmers Care About Harmonic Numbers

The connection \(H_n \approx \ln(n) + \gamma\) is why harmonic numbers appear so often in algorithm analysis. Any time an algorithm's cost involves adding up fractions like \(\frac{1}{n} + \frac{1}{n-1} + \cdots + \frac{1}{1}\), the total is \(H_n\), and you immediately know the growth rate is roughly \(\ln(n)\) — which is very slow, and very good.

Some examples from computer science where this pattern shows up: the expected number of comparisons in randomized quicksort, the expected number of probes in a linear-probing hash table, and the analysis of the hiring problem in introductory algorithms courses. In every case, the harmonic number is the answer, and its logarithmic growth is why these algorithms perform so well in practice.

If you are interested in Python fundamentals that help with mathematical programming like this, you might enjoy: Understanding String Formatting in Python.

Comments