Singleton Sessions, Retries, and Rate Limits in Python Requests
One reusable session per base URL can centralize headers, adapters, and connection behavior.
Retries help after temporary HTTP failures such as 429 and 503.
Rate limiting is preventive. It controls pacing before the server needs to push back.
Most beginner API code works the same way at first. You import requests, send a
request, inspect the response, and move on. That is enough to get data back. It is not enough
to build something calm, repeatable, and reliable. The moment a script makes repeated calls,
hits temporary failures, or bumps into rate limits, the easy version starts to feel incomplete.
This article is not a general Singleton explanation. It is a practical one. The question here is what happens when one shared session object becomes useful inside real API code, and how retries and rate limiting fit beside that decision.
The First Version Usually Works
import requests
response = requests.get("https://api.example.com/resource", timeout=10)
print(response.status_code)
print(response.text)
There is nothing wrong with this. It sends a request and returns a response. The problem begins when the same base URL is being called repeatedly and the code still behaves as though each call is a completely separate event.
Why a Session Changes the Shape of the Code
A session gives repeated requests one shared home. That means headers, cookies, and connection behavior can live in one place instead of being rebuilt over and over.
import requests
session = requests.Session()
response = session.get("https://api.example.com/resource", timeout=10)
Once you are reusing a session, the next question becomes obvious. If several parts of the same application talk to the same API, should each part quietly create its own session, or should that base URL reuse one shared session object?
A Singleton-Style Session Manager
Singleton matters here because one base URL often benefits from one shared session policy.
import requests
from requests.adapters import HTTPAdapter
class SessionManager:
_instances = {}
def __new__(cls, base_url):
if base_url not in cls._instances:
instance = super().__new__(cls)
instance.session = requests.Session()
instance.session.mount("https://", HTTPAdapter(max_retries=3))
cls._instances[base_url] = instance
return cls._instances[base_url]
This is not Singleton in the abstract. It is Singleton in service of a specific job: one shared session per base URL.
base_url = "https://api.example.com"
first_manager = SessionManager(base_url)
second_manager = SessionManager(base_url)
print(first_manager is second_manager)
print(first_manager.session is second_manager.session)
The important point is not the pattern name by itself. The important point is that one reusable session object can now carry connection rules for every request aimed at that same API base.
Retries Solve a Different Problem
A shared session helps with reuse. It does not by itself solve temporary HTTP failures.
Responses like 429 or 503 often call for patience rather than
immediate failure.
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
retry = Retry(
total=3,
backoff_factor=1,
status_forcelist=[429, 503],
)
adapter = HTTPAdapter(max_retries=retry)
The point of back-off is to avoid hammering the same endpoint in a tight loop. Each retry waits longer than the last.
Mounting Retry Behavior Onto the Session
base_url = "https://api.example.com"
session = SessionManager(base_url).session
retry = Retry(
total=3,
backoff_factor=1,
status_forcelist=[429, 503],
)
adapter = HTTPAdapter(max_retries=retry)
session.mount("https://", adapter)
Now the retry policy lives with the shared session instead of being scattered around individual requests.
That is one of the biggest advantages of reusing a session object. Cross-cutting behavior becomes easier to define once and apply consistently.
Rate Limiting Is More Preventive Than Retry Logic
Retries answer the question, “What should happen after a temporary failure?” Rate limiting answers the earlier question, “How fast should requests be sent in the first place?”
import time
class RateLimiter:
def __init__(self, requests_per_minute):
self.requests_per_minute = requests_per_minute
self.interval = 60 / requests_per_minute
self.last_request_time = 0
def wait(self):
current_time = time.time()
time_since_last_request = current_time - self.last_request_time
if time_since_last_request < self.interval:
time_to_wait = self.interval - time_since_last_request
time.sleep(time_to_wait)
self.last_request_time = time.time()
This is a different responsibility. Instead of recovering after the server pushes back, it slows the client down so that fewer pushbacks happen in the first place.
Putting the Pieces Together
rate_limiter = RateLimiter(requests_per_minute=60)
session = SessionManager("https://api.example.com").session
for _ in range(100):
rate_limiter.wait()
response = session.get(
"https://api.example.com/resource",
timeout=10,
)
print(response.status_code)
This is the practical shape you want to notice. The session owns connection behavior. The retry policy owns temporary HTTP recovery. The rate limiter owns pacing. One concern per component.
What a Beginner Should Keep
The important lesson is not the pattern name. It is responsibility. A shared session can make API code cleaner when one base URL really does deserve one reusable interaction object. Retries and rate limiting solve different problems beside that choice, and together they make the client behave much more reliably.
Frequently Asked Questions
These are the practical questions beginners usually have when shared sessions, retries, and rate limits first start to come together.
Why use a requests.Session() instead of plain requests.get() calls?
A session gives repeated requests one shared home for headers, cookies, adapters, and connection behavior.
What is the point of the Singleton-style session manager here?
It makes one shared session available per base URL so different parts of the same application can reuse the same session policy.
Do retries and sessions solve the same problem?
No. Sessions help with reuse and shared behavior. Retries help recover from temporary HTTP failures.
Why retry on 429 or 503?
Because those responses often indicate temporary conditions where a short wait and retry may succeed.
What does backoff_factor do?
It increases the wait time between retries so the client does not hit the same endpoint again in an aggressive tight loop.
How is rate limiting different from retries?
Rate limiting is preventive. It controls how quickly requests are sent so the client is less likely to hit server-side limits in the first place.
Should all three ideas live in one big class?
Usually no. It is cleaner when the session, retry behavior, and rate limiter each keep a distinct responsibility.
What is the simplest takeaway from this whole setup?
Reuse the session, recover thoughtfully from temporary failures, and pace requests before the server forces you to slow down.
Further Reading
If you want the broader “when does Singleton help?” discussion next, read When the Singleton Pattern Actually Helps .
If you want implementation details next, read Comparing Two Singleton Implementations in Python .
If you want a lower-level URL-handling companion, pair this with your base URL parsing post.
Comments