Cerflet: like Servlets, but with more C++

A few months ago I wrote about the research I had been doing on multiple ways to write server-based web applications, using Java Servlets, FastCGI/C++ and Qt/C++. While this showed that C++-based applications tend to be faster than Java-based ones, it only looked at single-threaded, sequential requests.

While looking at ways to get proper concurrent performance out of a Servlet-like C++ implementation I decided to look again at the POCO C++ Libraries [1] and found that its HTTP server implementation implements proper thread-pool-based working threads for excellent scaling across many concurrent requests.

After spending a few hours putting a basic wrapper library together, I wrote the following ‘Hello World’ example code to demonstrate a basic HTTP Cerflet:

#include <httpcerflet.h>

#include <iostream>
#include <string>

using namespace std;


class HelloRequestHandler :	public HTTPRequestHandler {
public:
	void handleRequest(HTTPServerRequest& request, HTTPServerResponse& response) {
		Application& app = Application::instance();
        app.logger().information("Request from " + request.clientAddress().toString());
		
		response.setChunkedTransferEncoding(false);
        response.setContentType("text/html");
		
		std::ostream& ostr = response.send();
		ostr << "<!DOCTYPE html><html><head><title>Hello World</title></head>";
		ostr << "<body>

Hello World!

</body></html>";
	}	
};


int main(int argc, char** argv) {
	// 0. Initialise: create Cerflet instance and set routing.
	HttpCerflet cf;
	RoutingMap map;
	map["/"] = &createInstance<HelloRequestHandler>;
	cf.routingMap(map);
	
	// 1. Start the server	
	return cf.run(argc, argv);
}

In the main() function we create a new HttpCerflet instance and a new RoutingMap. The latter contains the routes we wish to map to a handler, which in this case is the HelloRequestHandler. For the handler instance we create a reference to the template method createInstance<>(), with the name of our custom handler as the template argument.

What this mapping does is that when a new request is mapped against one of the keys of the RoutingMap, it instantiates a copy of the specified handler, pushing it onto a waiting worker thread.

The handler class itself derives from the HTTPRequestHandler class, which is a standard POCO Net class, reimplementing its handleRequest() method. This shows that Cerflet is more of a complement to POCO instead of abstracting it away. The main goal of Cerflet is to hide some of the complexities and boilerplate of POCO’s HTTP server, allowing one to focus on writing the actual business logic.

Benchmarks:

As for performance, an ApacheBench benchmark was run with a concurrency of 5, for a total of 100,000 requests.

1. Java Servlet

Server Software:        Apache-Coyote/1.1
Server Hostname:        127.0.0.1
Server Port:            8080

Document Path:          /examples/servlets/servlet/HelloWorldExample
Document Length:        400 bytes

Concurrency Level:      5
Time taken for tests:   7.697 seconds
Complete requests:      100000
Failed requests:        0
Total transferred:      56200000 bytes
HTML transferred:       40000000 bytes
Requests per second:    12992.07 [#/sec] (mean)
Time per request:       0.385 [ms] (mean)
Time per request:       0.077 [ms] (mean, across all concurrent requests)
Transfer rate:          7130.42 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.2      0       1
Processing:     0    0   0.5      0      14
Waiting:        0    0   0.4      0      14
Total:          0    0   0.5      0      14

Percentage of the requests served within a certain time (ms)
  50%      0
  66%      1
  75%      1
  80%      1
  90%      1
  95%      1
  98%      1
  99%      1
 100%     14 (longest request)

2. Cerflet

Server Software:
Server Hostname:        127.0.0.1
Server Port:            9980

Document Path:          /
Document Length:        99 bytes

Concurrency Level:      5
Time taken for tests:   7.220 seconds
Complete requests:      100000
Failed requests:        0
Total transferred:      19900000 bytes
HTML transferred:       9900000 bytes
Requests per second:    13850.42 [#/sec] (mean)
Time per request:       0.361 [ms] (mean)
Time per request:       0.072 [ms] (mean, across all concurrent requests)
Transfer rate:          2691.63 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.2      0       1
Processing:     0    0   0.5      0      10
Waiting:        0    0   0.4      0      10
Total:          0    0   0.5      0      10

Percentage of the requests served within a certain time (ms)
  50%      0
  66%      0
  75%      1
  80%      1
  90%      1
  95%      1
  98%      1
  99%      1
 100%     10 (longest request)

Notes:

In this benchmark, Cerflet is about 7% faster than the equivalent Tomcat-based Hello World example. Cerflet hereby also logs requests to console, slowing it down somewhat, while Tomcat does not. Cerflet’s Hello World example was compiled using -Og optimisation setting in 32-bit GCC 5.3 (on Windows, MSYS2). The POCO libraries version 1.6 were used, as obtained via MSYS2’s Pacman package manager.

For Tomcat the binary distribution for 8.0.30 as obtained via the official Apache site was used, with the server manually started using the provided startup.bat script. Both servers were run on a Windows 7 Ultimate x64 platform (Intel i7 6700K, 32 GB DDR4) with ApacheBench using the loopback device.

Discussion:

Without compensating for all differences between the two examples used and other potential differences, it is fair to say at this point that both Servlets and Cerflets are roughly equivalent in terms of performance for a simple Hello World example. Likely Cerflets are slightly faster (5-10%) with more gains to be obtained via compiler optimisations (-O3).

The type of operations which would be performed further in the business logic likely will have the most influence on the overall performance between these two platforms. Cerflets do however show that C++-based server-side web applications are more than just a viable option, backed by a mature programming language (C++) and cross-platform libraries (POCO).

Cerflets as they exist today are reminiscent of Spring Boot Java applications, which also feature a built-in HTTP server, thus not relying on a Servlet container (e.g. Tomcat). The advantage of Cerflets is however that they only depend on the POCO libraries (if not linked fully statically), and are not dependent on a central runtime (JVM). This significantly eases deployment.

The Cerflet project’s Github page [2] can be accessed to try the here used example oneself, or to use the HTTP Cerflet implementation in a new project. Both feedback and contributions are welcome.

Maya

[1] http://pocoproject.org/
[2] https://github.com/MayaPosch/Cerflet

Categories: C++, Cerflet, HTTP Tags: , , , ,

First look at servlet and FastCGI performance

January 5, 2016 Leave a comment

As a primarily C++ developer who has also done a lot of web-related development (PHP, JSP, Java Servlets, etc.), one of the nagging questions I have had for years was the possible performance gain by moving away from interpreted languages towards native code for (server-based) web applications.

After the demise of the mainframe and terminals setup in the 1980s, the World Wide Web (WWW, or simply ‘web’), has been making a gradual return to this setup again, by having web-based applications based on servers (‘mainframes’) serve content to web-browser-using clients (‘terminals’). As part of this most processing power had to be located on the servers, with little processing power required on the client-side, until the advent of making fancy UIs in resource-heavy JavaScript on the client.

Even today, however, most of the processing is still done on the servers, with single servers serving thousands of clients per day, hour, or even minute. It’s clear that even saving a second per singular client-request on the server-side can mean big savings. In light of this it is however curious that most server-side processing is done in either interpreted languages via CGI or related (Perl, PHP, ColdFusion, JavaScript, etc.), or bytecode-based languages (C#, Java, VB.NET), instead of going for highly optimised native code.

While I will not go too deeply into the performance differences between those different implementations in this article, I think that most reading this will at least be familiar with the performance delta between the first two groups mentioned. Interpreted languages in general tend to lag behind the pack on sheer performance metrics, due to the complexity of parsing a text-based source file, creating bytecode out of that and running this with the language’s runtime.

In this light, the more interesting comparison in my eyes is therefore that between the last two groups: bytecode-based and native code. To create a fair comparison, I will first have to completely understand how for example Java servlets are implemented and run by a servlet container such as Tomcat in order to create a fair comparison in native code.

As a start, I have however set up a range of examples which I then benchmarked using ApacheBench. The first example uses the ‘Hello World’ servlet example which is provided with Apache Tomcat 8.x. The second uses a basic responder C++ application connected using FastCGI to a Lighttpd server. The third and final example uses C++/Qt to implement a custom QTcpServer instance which does HTTP parsing and responds to queries using a basic REST-based API.

The host system is an Intel 6700K-based x86-64 system, with 32 GB of RAM and running Windows 7 x64 Ultimate. The servlet example is used as-is, with modification to the distribution from Apache. The FastCGI’s C++ example is compiled using Mingw64 (GCC 5.3) with -O1. The Qt-based example is compiled using Mingw (GCC 4.9) from within Qt Creator in debug mode.

All ApacheBench tests are run with 1,000 requests and a concurrency of 1, since no scaling will be tested until the scaling of servlets and their containers is better understood.

Next, the results:

1. Java servlet

Server Software:        Apache-Coyote/1.1
Server Hostname:        127.0.0.1
Server Port:            8080

Document Path:          /examples/servlets/servlet/HelloWorldExample
Document Length:        400 bytes

Concurrency Level:      1
Time taken for tests:   0.230 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      562000 bytes
HTML transferred:       400000 bytes
Requests per second:    4347.83 [#/sec] (mean)
Time per request:       0.230 [ms] (mean)
Time per request:       0.230 [ms] (mean, across all concurrent requests)
Transfer rate:          2386.21 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.8      0      10
Processing:     0    0   1.1      0      10
Waiting:        0    0   0.8      0      10
Total:          0    0   1.4      0      10

Percentage of the requests served within a certain time (ms)
  50%      0
  66%      0
  75%      0
  80%      0
  90%      0
  95%      0
  98%      0
  99%     10
 100%     10 (longest request)

2. FastCGI

Server Software:        LightTPD/1.4.35-1-IPv6
Server Hostname:        127.0.0.1
Server Port:            80

Document Path:          /cerflet/
Document Length:        146 bytes

Concurrency Level:      1
Time taken for tests:   26.531 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      307000 bytes
HTML transferred:       146000 bytes
Requests per second:    37.69 [#/sec] (mean)
Time per request:       26.531 [ms] (mean)
Time per request:       26.531 [ms] (mean, across all concurrent requests)
Transfer rate:          11.30 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     0   27  11.0     30      50
Waiting:        0   26  11.0     30      40
Total:          0   27  11.0     30      50

Percentage of the requests served within a certain time (ms)
  50%     30
  66%     30
  75%     30
  80%     40
  90%     40
  95%     40
  98%     40
  99%     40
 100%     50 (longest request)

3. C++/Qt

Server Software:
Server Hostname:        127.0.0.1
Server Port:            8010

Document Path:          /greeting/
Document Length:        50 bytes

Concurrency Level:      1
Time taken for tests:   0.240 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      109000 bytes
HTML transferred:       50000 bytes
Requests per second:    4166.67 [#/sec] (mean)
Time per request:       0.240 [ms] (mean)
Time per request:       0.240 [ms] (mean, across all concurrent requests)
Transfer rate:          443.52 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.5      0      10
Processing:     0    0   1.2      0      10
Waiting:        0    0   0.9      0      10
Total:          0    0   1.3      0      10

Percentage of the requests served within a certain time (ms)
  50%      0
  66%      0
  75%      0
  80%      0
  90%      0
  95%      0
  98%      0
  99%     10
 100%     10 (longest request)

Discussion:

It should be noted here that to make the FastCGI example work, the original approach using the fcgi_stdio.h header as suggested by the FastCGI documentation had to be abandoned, and instead the fciapp.h header and its methods were used. With the former approach the response times would get slower with each run, with the latter approach they remain constant.

The FastCGI application ended up looking like this:

#include "include/fcgiapp.h"
#include <cstdlib>

int count;
FCGX_Request request;

void initialize() {
	FCGX_Init();
	int sock = FCGX_OpenSocket(":9000", 5000);
	if (sock < 0) {
		// fail: handle.
	}
	
	FCGX_InitRequest(&request, sock, 0);
	count = 0;
}

int main() {
/* Initialization. */  
  initialize();

/* Response loop. */
	while (FCGX_Accept_r(&request) == 0)   {
		FCGX_FPrintF(request.out, "Content-type: text/html\r\n"
		   "\r\n"
		   "<title>FastCGI Hello! (C, fcgi_stdio library)</title>"
		   "<h1>FastCGI Hello! (C, fcgi_stdio library)</h1>"
		   "Request number %d running on host <i>s</i>\n",
			++count);
			
		FCGX_Finish_r(&request);
	}
	  
	return 0;
}

Compared to the baseline values from the Tomcat servlet benchmark, the results from the FastCGI benchmark are downright disappointing, with each request taking roughly 30 ms or longer. The servlet instance needed <1 ms or 10 ms at most. Despite attempts to optimise the FastCGI example, it appears that there exist significant bottlenecks. Whether this is in the Lighttpd server, the mod_fcgi server module, or the FastCGI library is hard to say at this point.

For the C++/Qt example one can unabashedly say that even with the hacked together code which was used, this unoptimised code ran on-par with the highly optimised production code of the Tomcat server and its servlet API. It should be noted hereby that although this example used the Qt networking classes, it didn't use Qt-code for the actual socket communication beyond the accepting of the client connection.

Due to known issues with the QTcpSocket class on Windows, instead a custom, drop-in class was used which interfaces with the Winsock2 (ws2_32) DLL directly using the standard Berkeley socket API. This class has been used with other projects before and is relatively stable at this point. How this class compares performance-wise with the QTcpSocket class is at this point unknown.

Summarising, it seems at this point at the very least plausible that native code can outperform bytecode for web applications. More research has to be done into scaling methods and performance characteristics of applications more complex than a simple 'Hello World' as well.

Maya

A look at asm.js and the future with WebAssembly

December 30, 2015 3 comments

Earlier this year the WebAssembly [1][2] project was announced, describing itself as “a new, portable, size- and load-time-efficient format suitable for compilation to the web”. It’s a W3C community group project, headed by representatives of all major browser developers. It follows similar efforts by Mozilla (asm.js) and Google (NaCl, Native Client) to create a bytecode format to run on browsers. This would complement the existing JavaScript runtimes, while adding much desired features like real multi-threading, local variables and a massive speed boost.

Since the late 90s I have used web-based technologies in both a hobby and professional fashion, observing how a mostly static web began to move towards adding as much scripting as possible to any page, first using Visual Basic Script, JavaScript, ActiveX and Java, then basically just JavaScript. This half-baked language grew from a quick addition by Netscape to keep up with the competition into the center point of the modern web, being forced into roles it was never meant or designed for.

Since that time JavaScript runtimes have become modern wonders of JIT VM implementations, minimising parsing times while dealing with JavaScript’s idiosyncrasies in such a way to maximise performance. There is however no denying that having a text-based scripting language is much slower than starting off with native code, or bytecode for that matter. This is the reasoning which underlies these efforts by Google and Mozilla to respectively use native code and JavaScript as assembly language.

While Google’s NaCl effort is a fairly straightforward implementation which allows a limited set of the native platform’s code (x86/x86-64, ARM or MIPS) to be executed in a sandboxed environment, Mozilla’s asm.js efforts aimed to use JavaScript as an intermediate (almost bytecode) language, providing a mapping from C/C++ code to a subset of JavaScript. The benefit of the asm.js approach is that it runs in virtually every modern browser, while NaCl requires extensive browser support.

With all of this in mind I decided to look at asm.js from the mindset of an experienced C/C++ developer to see just how far one can push this system.

The first hit by harsh reality comes when you realise that there is only a relatively small sub-set of C/C++ code which can be compiled for an asm.js target. Things which do not work include threads, checks for endianness, low-level features such as longjmp [3]. Beyond this, 64-bit integers are to be avoided since JavaScript doesn’t have a native type for this. The more unsettling limitations appear when one considers the further limitations [4]:

  • No access to a non-sandboxed filesystem (except when running inside node.js with the right configuration).
  • No main loop. JavaScript uses cooperative multi-tasking, so a script cannot run indefinitely or wait for input.
  • Networking is limited to Websocket-only unless you bridge with the JavaScript side.
  • Function pointer handling is… hairy at best, broken at worst. [5]
  • Everything is compiled into one massive JavaScript file, including libc and other system libraries.

These limitations are due to the limitations of JavaScript and its runtime. JavaScript is a single-threaded, prototype-based language with no real concept of scoped variables and the like.

To circumvent the issue of not being able to have a main loop, the Emscripten toolchain [6] requires one to register a function to be called to simulate a loop, with the parameters allowing one to specify how often it should be called in total and per second:

void emscripten_set_main_loop(em_callback_func func, int fps, int simulate_infinite_loop)

Furthermore, one can tweak the behaviour of this simulated main loop with further commands, getting something which somewhat approaches a main loop as one would have in most applications [7]. If one intends to use the same code across multiple targets including asm.js, it rapidly becomes clear that one has to liberally use preprocessor statements to check for the Emscripten (emcc/em++) compiler, as detailed in the Emscripten documentation:

int main() {
//...
#ifdef __EMSCRIPTEN__
  emscripten_set_main_loop(one_iter, 60, 1);
#else
  while (1) {
    one_iter();
    // Delay to keep frame rate constant (using SDL)
    SDL_Delay(time_to_next_frame());
  }
#endif
}

// The "main loop" function.
void one_iter() {
  // process input
  // render to screen
}

This is a rather minor adjustment, one may say. Depending on the application one wishes to port to asm.js it might be the worst of concessions one has to make, but the real struggle comes when one wants to do something beyond merely using the built-in LibSDL support to draw to the screen and handle local events.

Emscripten comes with a range of APIs which match or approach/equal standard desktop APIs, such as its audio support (via HTML5 audio) or networking (Berkeley socket API, with limitations). Hereby the latter is easily the most restricted, as one of the things which one does not get with asm.js and thus Emscripten is access to raw sockets, including standard TCP/UDP sockets. Instead Websocket is all one gets.

What this means is that you can only use ‘TCP’, non-blocking sockets with the sockets one creates. All data being sent and received by the asm.js application is further encapsulated by the Websocket protocol, meaning that any server or client one tries to communicate with also has to speak Websocket protocol. While the WebRTC protocol has been added to the project in the past, this implementation is currently non-functional due to issues. In short, networking with asm.js is quite crippled compared to what one would be used to on most other platforms.

A related limitation here is one of an async nature, such as when one uses functions like sleep() in C. In plain JavaScript an approximation of this would merely block the JavaScript runtime [8]. An experimental feature in Emscripten allows one to approximate the intended behaviour, enabled when calling the emcc/em++ compiler with the ‘-s ASYNCIFY=1’ flag passed to it. This feature is crucial to make software like for example ncurses and similar UI-oriented software work correctly.

Originally I had set out to write a simple terminal application in asm.js, with remote server communication and ncurses functionality. After poking at and experimenting with various approaches I found myself rather disappointed in the limitations of the asm.js platform. I realise that entire games and their engines have been ported to asm.js, and I do not say that it is impossible to do so. Merely that is a lot more work than one would assume at first glance. Sadly it’s not so much abou tmerely switching compile targets from native to asm.js, but rather tweaking one’s code to work with this eccentric and highly limited target platform.

After this run-in with asm.js, I figured that I might as well look at the state of WebAssembly (WASM). Despite being announced half a year ago, the Minimum Viable Product (MVP) goal is not close to being reached, with the actual bytecode format for WASM and related design features still in flux. The most recent status update I found on a Mozilla blog, from earlier this month [9].

Even at this early stage, one can already tell that it is far more like Google’s NaCl than asm.js. One very welcome, upcoming feature is that of having threading support [10], as well as real exception handling capabilities. At this point the WASM project is still hampered by having to use a polyfill approach, which simulates the WASM runtime capabilities in JavaScript, giving it the same limitations as asm.js.

In summary, while I can see promise in WASM, I feel that asm.js at the very least is quite overhyped. It’s so limited and specialistic that beyond porting games to run inside a browser using WebGL and LibSDL, it’s hard to think of suitable use-cases. It’s therefore no surprise that most of the projects I stumbled across during my research which actually made it into production are exactly such games.

How long will it take for WASM to reach MVP and post-MVP status? I don’t know. It’s practically a volunteer project at this point, meaning no deadlines or promises. Just the allusions to it possibly becoming something in between asm.js and NaCl. Almost native code, but not quite native.

I’m more interested at this point to see what other approaches to create the new web-based apps of the (near) future have been dreamed up by people so far. Something which does not involve abusing a flawed scripting language’s runtime to do unspeakable things.

Maya

[1] https://github.com/WebAssembly
[2] https://en.wikipedia.org/wiki/WebAssembly
[3] https://kripken.github.io/emscripten-site/docs/porting/guidelines/portability_guidelines.html
[4] https://kripken.github.io/emscripten-site/docs/porting/guidelines/api_limitations.html
[5] https://kripken.github.io/emscripten-site/docs/porting/guidelines/function_pointer_issues.html
[6] https://kripken.github.io/emscripten-site/index.html
[7] https://kripken.github.io/emscripten-site/docs/porting/emscripten-runtime-environment.html#browser-main-loop
[8] https://github.com/kripken/emscripten/wiki/Asyncify
[9] https://hacks.mozilla.org/2015/12/compiling-to-webassembly-its-happening/
[10] https://github.com/WebAssembly/design/blob/master/PostMVP.md#threads

My new game development book got published

October 10, 2015 Leave a comment

Some people may have noticed a drop in published content on this blog for a while. Part of it was due to working on a new book for Packt Publishing, titled ‘Mastering AndEngine Game Development’, which was finalised last month with its publication. For those interested, it can be purchased both at the Packt store [1] and at Amazon [2].

What this book is, is an in-depth look at how to go from ‘making a basic mobile game’ using a game engine such as AndEngine [3], to making a truly advanced (mobile) game using 3D assets in a 2D game with OpenGL ES, dynamic and static lighting, frame-based and skeletal-based animation, anti-aliasing, GLSL shaders, 3D sound and advanced sound effects using OpenAL & OpenSL, and much more. While it’s aimed at extending AndEngine-based games, it’s written in a generic enough manner that it should be useful for those using other game engines, on Android or other platforms.

So far this is my first published book, but it probably won’t be my last. In the meantime I will try to step up the publication of content on this blog again, both with programming and electronics-related postings. Please stay tuned🙂

Maya

[1] https://www.packtpub.com/game-development/mastering-andengine-game-development
[2] http://www.amazon.com/Mastering-AndEngine-Game-Development-Posch/dp/1783981148/
[3] http://www.andengine.org/

Creating a Websocket server with Websocket++

September 16, 2015 4 comments

Recently I had to add a Websocket server to a C++ project. Some research showed that the options here aren’t too many. There are a few C-based options, and one can of course pick the Websocket module from the POCO libraries [1] if one desires a C++ approach. Since this particular project is written in C++ I much preferred a purely C++ solution, preferably stand-alone. Ultimately I picked the (creatively named) Websocket++ library [2], also referred to as Websocketpp. Main arguments here were as mentioned being an object-oriented C++ solution, without significant dependencies, as well as ease of implementation thanks to it being a header-only library.

Websocket++ is a fairly modular library, making heavy use of templating to assemble various configurations, end points and similar into one coherent whole. As the basis for the transport module can for example pick from iostreams (very slow) and ASIO. For the latter one can pick between Boost ASIO and stand-alone ASIO. There is also the option of using no C++11 features and using the Boost alternatives instead. Since this project involved a number of compile targets, not all of which featured a C++11-capable compiler, the final configuration involved a Boost dependency using its ASIO and system library, as well as various other header-only dependencies.

After starting the actual integration of the library into my project, I did however find out that the quality of the documentation is very… sub-optimal. The documentation is split between the GitHub site and the author’s own site, with most of this documentation being completely and utterly outdated. Only after significant amounts of trial and error did I manage to get a fully working implementation. To save others the trouble, I would like to hereby present a (simplified and altered) version of my implementation. I hope it will be useful.

Let’s move on to the header file of our implementation:

#include "websocketpp/server.hpp"
#include "websocketpp/config/asio_no_tls.hpp"

With these two includes we pick from the Websocket++ server role and make available the ASIO configuration without TLS feature, meaning no encrypted connections.

class WebsocketServer {
public:
	static bool init();
	static void run();
	static void stop();

	static bool sendClose(string id);
	static bool sendData(string id, string data);
		
private:
    static bool getWebsocket(const string &id, websocketpp::connection_hdl &hdl);
	
	static websocketpp::server<websocketpp::config::asio> server;
	static pthread_rwlock_t websocketsLock;
	static map<string, websocketpp::connection_hdl> websockets;
	static LogStream ls;
	static ostream os;
	
	// callbacks
	static bool on_validate(websocketpp::connection_hdl hdl);
	static void on_fail(websocketpp::connection_hdl hdl);
	static void on_close(websocketpp::connection_hdl hdl);
};

Our class definition implements a static class. This will allow us to use the Websocket functionality from multiple classes. Websocket++ is thread-safe, so all we have to worry about is multi-thread level access to our own data structures and variables.

Moving on to the implementation, we can see first the usual static initialisations and namespace merging:

// static initialisations
websocketpp::server<websocketpp::config::asio> WebsocketServer::server;
map<string, connection_hdl> WebsocketServer::websockets;
pthread_rwlock_t WebsocketServer::websocketsLock = PTHREAD_RWLOCK_INITIALIZER;
LogStream WebsocketServer::ls;
ostream WebsocketServer::os(&ls);
	
// namespace merging
using websocketpp::connection_hdl;

Next is initialising the library and the server instance:

bool WebsocketServer::init() {
	// Initialising WebsocketServer.
	server.init_asio();

When using the ASIO transport option, we call its init method here.

	// Set custom logger (ostream-based).
	server.get_alog().set_ostream(&os);
	server.get_elog().set_ostream(&os);

We may want to redirect the logging output to our own logging method. Websocket++’s basic logger allows us to set an ostream alternative for the standard std::cout and std::cerr. We will look at this in more detail later on.

	// Register the message handlers.
	server.set_validate_handler(&WebsocketServer::on_validate);
	server.set_fail_handler(&WebsocketServer::on_fail);
	server.set_close_handler(&WebsocketServer::on_close);

Next we set the message handlers. These are all callback methods we will define in a moment.

	// Listen on port.
	int port = 8082;
	try {
		server.listen(port);
	} catch(websocketpp::exception const &e) {
		// Websocket exception on listen. Get char string via e.what().
	}

With all the configuration done, we can start listening using the transport framework, this is done with the listen() call on the server object. This method is not exception-free, so we surround it with a try/catch block.

	// Starting Websocket accept.
	websocketpp::lib::error_code ec;
	server.start_accept(ec);
	if (ec) {
		// Can log an error message with the contents of ec.message() here.
		return false;
	}
	
	return true;
}

Finally we start accepting connections. We just need to start the server proper now, which is done in the following function:

void WebsocketServer::run() {
	try {
		server.run();
	} catch(websocketpp::exception const &e) {
        // Websocket exception. Get message via e.what().
    }
}

Again, this is another method which isn’t exception-free, so we have to surround it with a try/catch block. The other clue here is that when we shut down the server at some point, we have to wait for this (blocking) run() call to return before we for example terminate a thread.

void WebsocketServer::stop() {
	// Stopping the Websocket listener and closing outstanding connections.
	websocketpp::lib::error_code ec;
	server.stop_listening(ec);
	if (ec) {
		// Failed to stop listening. Log reason using ec.message().
		return;
	}
	
	// Close all existing websocket connections.
	string data = "Terminating connection...";
	map<string, connection_hdl>::iterator it;
	for (it = websockets.begin(); it != websockets.end(); ++it) {
		websocketpp::lib::error_code ec;
		server.close(it->second, websocketpp::close::status::normal, data, ec); // send text message.
		if (ec) { // we got an error
			// Error closing websocket. Log reason using ec.message().
		}
	}
	
	// Stop the endpoint.
	server.stop();
}

Shutting down the Websocket server is fairly obvious: first we stop listening. This means we will no longer accept new connections. Next we go through all of the websocket connections we still have and close every single one of them. Finally we call stop() on the server object. This isn’t strictly necessary, but it will ensure that the transport backend is completely shut down and any remaining connections forcefully terminated.

Let’s move on to actually accepting new connections. For this we can use a number of handlers [3], including open() and validate. I picked the validate handler, since it allows one to filter incoming connections and reject any which do not authenticate properly or such:

bool WebsocketServer::on_validate(connection_hdl hdl) {
	websocketpp::server<websocketpp::config::asio>::connection_ptr con = server.get_con_from_hdl(hdl);
	websocketpp::uri_ptr uri = con->get_uri();
	string query = uri->get_query(); // returns empty string if no query string set.
	if (!query.empty()) {
		// Split the query parameter string here, if desired.
		// We assume we extracted a string called 'id' here.
	}
	else {
		// Reject if no query parameter provided, for example.
		return false;
	}
	
	if (pthread_rwlock_wrlock(&websocketsLock) != 0) {
		// Failed to write-lock websocketsLock.
	}
	
	websockets.insert(std::pair<string, connection_hdl>(id, hdl));
	if (pthread_rwlock_unlock(&websocketsLock) != 0) {
		// Failed to unlock websocketsLock.
	}

	return true;
}

This code shows how to obtain the connection behind a connection handle from Websocket++ and to extract the URI including its query parameter string from it.

Here we assume that the connection client has to provide a string-based ID, though one can also use another identifier, based on the implementation. We use pthread-based locking around the websockets map to ensure no concurrent access takes place on this data structure and insert the new websocket handle with its id as key.

We may also wish to implement the fail() and close() handlers:

void WebsocketServer::on_fail(connection_hdl hdl) {
	websocketpp::server<websocketpp::config::asio>::connection_ptr con = server.get_con_from_hdl(hdl);
	websocketpp::lib::error_code ec = con->get_ec();
	// Websocket connection attempt by client failed. Log reason using ec.message().
}

void WebsocketServer::on_close(connection_hdl hdl) {
	// Websocket connection closed.
}

For the fail handler, we can obtain the connection as before, and extract the error code object to learn the reason behind the failure.

The close handler should generally be fairly boring, but it can be informative to have the confirmation in a log or such of a successfully closed connection.

Moving on, we just have to look at how to send data to such a socket.

bool WebsocketServer::sendData(string id, string data) {
	connection_hdl hdl;
	if (!getWebsocket(id, hdl)) {
		// Sending to non-existing websocket failed.
		return false;
	}
	
	websocketpp::lib::error_code ec;
	server.send(hdl, data, websocketpp::frame::opcode::text, ec); // send text message.
	if (ec) { // we got an error
		// Error sending on websocket. Log reason using ec.message().
		return false;
	}
	
	return true;
}

This function obtains the appropriate connection handle based upon the ID, then proceeds to write the provided data to this connection. The getWebsocket() method is a trivial STL map-based find and iteration effort and isn’t further documented here. Do not forget to lock the map while performing said find and iterator actions on it.

Lastly, how to close a socket:

bool WebsocketServer::sendClose(string id) {
	connection_hdl hdl;
	if (!getWebsocket(id, hdl)) {
		// Closing non-existing websocket failed.
		return false;
	}
	
	string data = "Terminating connection...";
	websocketpp::lib::error_code ec;
	server.close(hdl, websocketpp::close::status::normal, data, ec); // send close message.
	if (ec) { // we got an error
		// Error closing websocket. Log reason using ec.message().
		return false;
	}
	
	// Remove websocket from the map.
	pthread_rwlock_rdlock(&websocketsLock);
	websockets.erase(id);
	pthread_rwlock_unlock(&websocketsLock);
	
	return true;
}

Here we again obtain the proper connection handle, only this time we use the ‘close’ method instead of ‘send’. We can send a close reason using a string, or just send an empty string.

Finally the ID is erased from the websockets map and the now invalid connection handle with it.

With this we have everything we need for the Websocket server, except for one thing: the redirecting of the logging output from Websocket++. We saw earlier that we use the set_ostream() method on the logging interfaces. In the class declaration we saw this mysterious ‘LogStream’ type and an ostream, and again in the static initialisations.

What happens here is that this LogStream class is a custom implementation of std::streambuf, assigned to an std::ostream object which then replaces the standard outputs Websocket++’s logging. For the actual streambuf implementation, one would use something like this:

class LogStream : public streambuf {	
private:
	string buffer;
	
protected:
    int overflow(int ch) override {
        buffer.push_back((char) ch);
        if (ch == '\n') {
            // End of line, write to logging output and clear buffer.
			
			buffer.clear();
        }
		
		return ch;
		
        //  Return traits::eof() for failure.
    }
};

We just override the virtual overflow() method in the streambuf class. In the default implementation the default buffer overflows for every character written to the streambuf class and thus our overflow method is called for each character.

Using a string as buffer, we capture each received character and check whether it is a newline character or not. If it is we have a complete line which we can then write to whatever logging functionality we use in our project. After this we empty the buffer string and continue with the new line.

In conclusion, I must say that despite the effort it cost me to get a working integration of Websocket++ in my project, I do think it was worth it. Technically it is a well-designed library with a lot of cool features and thanks to its template-based nature ease of expansion and configuration to fit different purposes. Its main weakness is simply the outdated, lacking and occasionally wrong documentation and examples. Hopefully this article will fix at least part of that problem🙂

Maya

[1] http://pocoproject.org/
[2] https://github.com/zaphoyd/websocketpp
[3] http://www.zaphoyd.com/websocketpp/manual/reference/handler-list

2014 in review

January 2, 2015 Leave a comment

The WordPress.com stats helper monkeys prepared a 2014 annual report for this blog.

Here's an excerpt:

The Louvre Museum has 8.5 million visitors per year. This blog was viewed about 70,000 times in 2014. If it were an exhibit at the Louvre Museum, it would take about 3 days for that many people to see it.

Click here to see the complete report.

Categories: Uncategorized

Power Supply Design Part 1: Unregulated Linear Supplies

December 28, 2014 2 comments

I recently stumbled over a particularly interesting specimen in the family of cheap unregulated power supplies, also lovingly referred to as ‘wallwarts’. Here is the unit in all its prestigious glory:

IMG_20141226_231534

The label seems to claim it’s been certified, but lists no manufacturer or other useful info beyond the useless model number. Inside we find the following:

IMG_20141226_231609

IMG_20141226_231856

What we have here is pretty much the most basic unregulated power supply one can construct, though the bleeder resistor was technically not required. Such luxury. In diagram form we get the following circuit:

psu-linear-unregulated-unsafe
We see the transformer, four diodes (1N4001 or better) forming a bridge rectifier (two extra diodes are cheaper than a center-tapped transformer), the smoothing cap (1,000 uF, 16V) and bleeder resistor (100 Ohm, 1/2W?). 230VAC goes straight into the transformer and is stepped down to the desired voltage.

Now, let’s talk safety. While this circuit will work fine when nothing goes wrong, it is a good idea to consider the two most likely scenarios a circuit like this may encounter in the real world. The first is that of a surge, say from a nearby lightning strike, or an internal short-circuit. The second is when the connected device short-circuits, or its output connector or wires short out. The first scenario results in a massive surge into the adapter, the second will pull more and more power through the circuit until something fails.

With this circuit, the surge or internal short will result in the surge being passed on through the device, into the output and into the connected device. This forms a major electrocution and fire risk. Beyond the circuit failing and cutting off power that way, there are no safety features for this scenario. The same is true for the excessive power draw scenario. Here it’ll keep drawing power until likely something in the circuit blows up, catches on fire or both.

While a transformer in theory electrically isolates a circuit, it has a so-called breakdown voltage at which current will pass straight from the primary into the secondary winding(s), causing a short. During a surge scenario this is likely to happen, depending on the quality of the insulating tape between the windings. One should always consider the scenario where a short forms inside a transformer or related components.

So how to protect against this scenario? There are multiple ways to go about it, but the easiest and cheapest one has to be the humble fuse:

psu-linear-unregulated-fuseWhen the current becomes too much or the voltage too high, the fuse will melt or trip depending on the type of fuse used. One can use thermal fuses if one wants it to be easy to reset: once cooled down they will automatically reset. Regular glass fuses are even cheaper, though probably not as desirable in a closed, maintenance-free unit like a wallwart. There are more options than fuses, of course. One can also look at MOVs, crowbar (zener plus SCR) and clamp (zener plus transistor) overvoltage protection.

At any rate the message should be clear: unregulated linear power supplies are easy and cheap, but one should not skimp on the safeties.

 

Maya

Follow

Get every new post delivered to your Inbox.

Join 2,989 other followers