When putting together a high-performance web serving framework, a number of different software and network components typically have to be considered. For starters, there are CDNs, load balancing, reverse proxies, the application web server, and backend components like the database. In this post, we’ll start by delving into our app server software – how TellApart handles a dynamic request with extremely low latency.
Most of our incoming traffic is made up of a large number of concurrent, short-lived requests, each of which must complete in tens of milliseconds. Our initial design was based on a well-worn server configuration that performed reasonably well right out of the box: the venerable Apache web server, with mod_wsgi for running Python application code.
When we first fired up the web server and unleashed some live traffic, we noticed something curious in the performance data. Here’s how this configuration fared at handling a representative I/O-bound request.
Whoa! What’s up, 99th percentile?! In Apache/mod_wsgi, each request is handled in its own system thread. For instance, if three requests need to be handled concurrently, the OS is responsible for switching between them. So what’s the problem?
It turns out that in CPython, multithreading is subject to the limitations of the GIL. Request-handling threads occasionally enter “GIL battles” with other request-handling threads, burning CPU cycles and slowing everything down. David Beazley sets the record straight in a great series of presentations. Most Python web servers will have no trouble handling dozens of requests per second, but thread-based servers will strain when when attempting to handle hundreds or thousands.
We evaluated several alternatives that did not depend on threads for concurrency and eventually chose to replace Apache/mod_wsgi with a custom web server built around the excellent Gevent coroutine networking library. We call this server TAFE (“taffy”), the TellApart front end. Gevent handles each request using a lightweight thread-like structure called a greenlet (essentially, a coroutine). Unlike threads, greenlets must cooperatively yield control flow over to other greenlets, which frees the system from erratic GIL issues at the cost of a bit of increased development complexity.
Here’s how TAFE performed on the same workload:
Much better. So, was it smooth sailing from here? Well, TAFE is definitely a big improvement, but giving up on system threads for concurrency means spending more time discovering code that doesn’t cooperatively yield. We wrote a Gevent Request Profiler to help us do just that. More on that next time.
Mark Ayzenshtat is TellApart’s CTO.

