Home > C++, HTTP, Java, Networking > First look at servlet and FastCGI performance

First look at servlet and FastCGI performance

As a primarily C++ developer who has also done a lot of web-related development (PHP, JSP, Java Servlets, etc.), one of the nagging questions I have had for years was the possible performance gain by moving away from interpreted languages towards native code for (server-based) web applications.

After the demise of the mainframe and terminals setup in the 1980s, the World Wide Web (WWW, or simply ‘web’), has been making a gradual return to this setup again, by having web-based applications based on servers (‘mainframes’) serve content to web-browser-using clients (‘terminals’). As part of this most processing power had to be located on the servers, with little processing power required on the client-side, until the advent of making fancy UIs in resource-heavy JavaScript on the client.

Even today, however, most of the processing is still done on the servers, with single servers serving thousands of clients per day, hour, or even minute. It’s clear that even saving a second per singular client-request on the server-side can mean big savings. In light of this it is however curious that most server-side processing is done in either interpreted languages via CGI or related (Perl, PHP, ColdFusion, JavaScript, etc.), or bytecode-based languages (C#, Java, VB.NET), instead of going for highly optimised native code.

While I will not go too deeply into the performance differences between those different implementations in this article, I think that most reading this will at least be familiar with the performance delta between the first two groups mentioned. Interpreted languages in general tend to lag behind the pack on sheer performance metrics, due to the complexity of parsing a text-based source file, creating bytecode out of that and running this with the language’s runtime.

In this light, the more interesting comparison in my eyes is therefore that between the last two groups: bytecode-based and native code. To create a fair comparison, I will first have to completely understand how for example Java servlets are implemented and run by a servlet container such as Tomcat in order to create a fair comparison in native code.

As a start, I have however set up a range of examples which I then benchmarked using ApacheBench. The first example uses the ‘Hello World’ servlet example which is provided with Apache Tomcat 8.x. The second uses a basic responder C++ application connected using FastCGI to a Lighttpd server. The third and final example uses C++/Qt to implement a custom QTcpServer instance which does HTTP parsing and responds to queries using a basic REST-based API.

The host system is an Intel 6700K-based x86-64 system, with 32 GB of RAM and running Windows 7 x64 Ultimate. The servlet example is used as-is, with modification to the distribution from Apache. The FastCGI’s C++ example is compiled using Mingw64 (GCC 5.3) with -O1. The Qt-based example is compiled using Mingw (GCC 4.9) from within Qt Creator in debug mode.

All ApacheBench tests are run with 1,000 requests and a concurrency of 1, since no scaling will be tested until the scaling of servlets and their containers is better understood.

Next, the results:

1. Java servlet

Server Software:        Apache-Coyote/1.1
Server Hostname:
Server Port:            8080

Document Path:          /examples/servlets/servlet/HelloWorldExample
Document Length:        400 bytes

Concurrency Level:      1
Time taken for tests:   0.230 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      562000 bytes
HTML transferred:       400000 bytes
Requests per second:    4347.83 [#/sec] (mean)
Time per request:       0.230 [ms] (mean)
Time per request:       0.230 [ms] (mean, across all concurrent requests)
Transfer rate:          2386.21 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.8      0      10
Processing:     0    0   1.1      0      10
Waiting:        0    0   0.8      0      10
Total:          0    0   1.4      0      10

Percentage of the requests served within a certain time (ms)
  50%      0
  66%      0
  75%      0
  80%      0
  90%      0
  95%      0
  98%      0
  99%     10
 100%     10 (longest request)

2. FastCGI

Server Software:        LightTPD/1.4.35-1-IPv6
Server Hostname:
Server Port:            80

Document Path:          /cerflet/
Document Length:        146 bytes

Concurrency Level:      1
Time taken for tests:   26.531 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      307000 bytes
HTML transferred:       146000 bytes
Requests per second:    37.69 [#/sec] (mean)
Time per request:       26.531 [ms] (mean)
Time per request:       26.531 [ms] (mean, across all concurrent requests)
Transfer rate:          11.30 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     0   27  11.0     30      50
Waiting:        0   26  11.0     30      40
Total:          0   27  11.0     30      50

Percentage of the requests served within a certain time (ms)
  50%     30
  66%     30
  75%     30
  80%     40
  90%     40
  95%     40
  98%     40
  99%     40
 100%     50 (longest request)

3. C++/Qt

Server Software:
Server Hostname:
Server Port:            8010

Document Path:          /greeting/
Document Length:        50 bytes

Concurrency Level:      1
Time taken for tests:   0.240 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      109000 bytes
HTML transferred:       50000 bytes
Requests per second:    4166.67 [#/sec] (mean)
Time per request:       0.240 [ms] (mean)
Time per request:       0.240 [ms] (mean, across all concurrent requests)
Transfer rate:          443.52 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.5      0      10
Processing:     0    0   1.2      0      10
Waiting:        0    0   0.9      0      10
Total:          0    0   1.3      0      10

Percentage of the requests served within a certain time (ms)
  50%      0
  66%      0
  75%      0
  80%      0
  90%      0
  95%      0
  98%      0
  99%     10
 100%     10 (longest request)


It should be noted here that to make the FastCGI example work, the original approach using the fcgi_stdio.h header as suggested by the FastCGI documentation had to be abandoned, and instead the fciapp.h header and its methods were used. With the former approach the response times would get slower with each run, with the latter approach they remain constant.

The FastCGI application ended up looking like this:

#include "include/fcgiapp.h"
#include <cstdlib>

int count;
FCGX_Request request;

void initialize() {
	int sock = FCGX_OpenSocket(":9000", 5000);
	if (sock < 0) {
		// fail: handle.
	FCGX_InitRequest(&request, sock, 0);
	count = 0;

int main() {
/* Initialization. */  

/* Response loop. */
	while (FCGX_Accept_r(&request) == 0)   {
		FCGX_FPrintF(request.out, "Content-type: text/html\r\n"
		   "<title>FastCGI Hello! (C, fcgi_stdio library)</title>"
		   "<h1>FastCGI Hello! (C, fcgi_stdio library)</h1>"
		   "Request number %d running on host <i>s</i>\n",
	return 0;

Compared to the baseline values from the Tomcat servlet benchmark, the results from the FastCGI benchmark are downright disappointing, with each request taking roughly 30 ms or longer. The servlet instance needed <1 ms or 10 ms at most. Despite attempts to optimise the FastCGI example, it appears that there exist significant bottlenecks. Whether this is in the Lighttpd server, the mod_fcgi server module, or the FastCGI library is hard to say at this point.

For the C++/Qt example one can unabashedly say that even with the hacked together code which was used, this unoptimised code ran on-par with the highly optimised production code of the Tomcat server and its servlet API. It should be noted hereby that although this example used the Qt networking classes, it didn't use Qt-code for the actual socket communication beyond the accepting of the client connection.

Due to known issues with the QTcpSocket class on Windows, instead a custom, drop-in class was used which interfaces with the Winsock2 (ws2_32) DLL directly using the standard Berkeley socket API. This class has been used with other projects before and is relatively stable at this point. How this class compares performance-wise with the QTcpSocket class is at this point unknown.

Summarising, it seems at this point at the very least plausible that native code can outperform bytecode for web applications. More research has to be done into scaling methods and performance characteristics of applications more complex than a simple 'Hello World' as well.


  1. sevengraff
    November 1, 2016 at 4:55 PM

    I’m blown away at how slow the Lighttpd & FastCGI app is. Something very fishy about that, but it looks like you did the straightforward approach. If you find the issue, I hope you make a follow-up post.

    And while I am sure that native code servers could out-perform other servers, I think the developer ecosystem and tooling outweighs the benefits of going with a C++ approach.

    • November 1, 2016 at 5:52 PM

      My guess is that it has to do with the conversion to the FCGI format and back again in the FCGI library and Lighttpd FCGI module. I have honestly no idea how optimised this is, or not.

      As for your other point, I did a follow-up article already comparing Java Servlets with a simple C++-based ‘servlet’ system using the POCO libraries. That one showed a similar boost in performance for the C++ app, while benefiting from the mature HTTP, JSON, XML and further support in POCO. If I had to implement a web app today, I would probably pick that approach.

      My main issue with Java-based solutions such as Spring is both their use of Inversion oF Control (IOC) and the horrible complexity of the development environment (Java-based IDEs, JREs, Gradle or Maven, etc.). Every time I was involved in such a project, it was hard to not compare it to the simplicity of a C++-based environment. Which is not to say that such a C++ environment cannot be overcomplicated, but you probably know what I mean 🙂

  1. No trackbacks yet.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: