Archive for the ‘Projects’ Category

NymphRPC: my take on remote procedure call libraries

January 29, 2018 1 comment

Recently I open-sourced NymphRPC [1], which is a Remote Procedure Call (RPC) library I have been working on for the past months. In this article I would like to explain exactly why I felt it to be necessary to unleash yet another RPC library onto the world.

The past years I have had to deal quite a lot with a particular RPC library (Etch [2][3]) due to a commercial project. This is now a defunct project, but it spent some time languishing as an Apache project after Cisco open-sourced it in 2011 and got picked up by BMW as an internal protocol for their infotainment systems [4].

During the course of this aforementioned commercial project it quickly became clear to me that the Etch C library which I was using had lots of issues, including stability and general usability issues (like the calling of abort() without recovery option when any internal assert failed). As the project progressed, I found myself faced with the choice to either debug this existing library, or reimplement it.

At this point the C-based library was around 45,000 lines of code (LoC), with countless concurrency-related and other issues which made Valgrind spit out very long log files, and which proved to be virtually impossible to diagnose and fix. Many attempts resulted in the runtime becoming more unstable in other places.

Thus it was that I made the decision to reimplement the Etch protocol from scratch in C++. Even though there was already an official C++ runtime, it was still in Beta and it too suffered from stability issues. After dismissing it as an option, this led me to the next problem: the undocumented Etch protocol. Beyond a high-level overview of runtime concepts, there was virtually no documentation for Etch or its protocol.


Fast-forward a few months and I had reverse-engineered the Etch protocol using countless Wireshark traces and implemented the protocol in a light-weight C++-based runtime of around 2,000 LoC. Foregoing the ‘official’ runtime architecture, I had elected to model a basic message serialisation/deserialisation flow architecture instead. Another big change was the foregoing of any domain specific language (DSL) as with Etch to define the available methods.

The latter choice was primarily to avoid the complexity that comes with having a DSL and compiler architecture which has to generate functioning code that then has to be compiled into the project in question. In the case of a medium-sized Etch-based project, this auto-generated code ended up adding another 15,000 LoC to the project. With my runtime functions were defined in code and added to the runtime on start-up.

In the end this new runtime I wrote performed much better (faster, lower RAM usage) than the original runtime, but it left me wondering whether there was a better RPC library out there. Projects I looked at included Apache Thrift [5] and Google Protocol Buffers [6].

Both sadly also are quite similar to Etch in that they follow the same DSL (ISL) and auto-generated code for clients/servers path. Using them is still fairly involved and cumbersome. Probably rpclib [7] comes closest, but it’s still very new and has made a lot of design choices which do not appeal to me, including the lack of any type of parameter validation for methods being called.


Design choices I made in NymphRPC include such things as an extremely compact binary protocol (about 2x more compact than the Etch protocol) while allowing for a wide range of types. I also added dynamic callbacks (settable by the client). To save one the trouble of defining each RPC method in both the client and server, instead the client downloads the server’s API upon connecting to it.

At this point NymphRPC is being used for my file revision control project, FLR [8], as the communication fabric between clients and server.

Performance and future

Even though the network code in NymphRPC is pretty robust and essentially the same as what currently runs on thousands of customer systems around the world – as a result of this project that originally inspired the development of NymphRPC – it is still very much under development.

The primary focus during the development of NymphRPC has been on features and stability. The next steps will be to expand upon those features, especially more robust validation and ease of exception handling, and to optimise the existing code.

The coming time I’ll be benchmarking [9] NymphRPC to see how it compares other libraries and optimise any bottlenecks that show up. Until then I welcome anyone who wishes it to play around with NymphRPC (and FLR) and provide feedback 🙂



Categories: C++, Networking, NymphRPC, Protocols, RPC Tags: , ,

Cerflet: like Servlets, but with more C++

May 19, 2016 2 comments

A few months ago I wrote about the research I had been doing on multiple ways to write server-based web applications, using Java Servlets, FastCGI/C++ and Qt/C++. While this showed that C++-based applications tend to be faster than Java-based ones, it only looked at single-threaded, sequential requests.

While looking at ways to get proper concurrent performance out of a Servlet-like C++ implementation I decided to look again at the POCO C++ Libraries [1] and found that its HTTP server implementation implements proper thread-pool-based working threads for excellent scaling across many concurrent requests.

After spending a few hours putting a basic wrapper library together, I wrote the following ‘Hello World’ example code to demonstrate a basic HTTP Cerflet:

#include <httpcerflet.h>

#include <iostream>
#include <string>

using namespace std;

class HelloRequestHandler :	public HTTPRequestHandler {
	void handleRequest(HTTPServerRequest& request, HTTPServerResponse& response) {
		Application& app = Application::instance();
        app.logger().information("Request from " + request.clientAddress().toString());
		std::ostream& ostr = response.send();
		ostr << "<!DOCTYPE html><html><head><title>Hello World</title></head>";
		ostr << "<body>

Hello World!


int main(int argc, char** argv) {
	// 0. Initialise: create Cerflet instance and set routing.
	HttpCerflet cf;
	RoutingMap map;
	map["/"] = &createInstance<HelloRequestHandler>;
	// 1. Start the server	
	return, argv);

In the main() function we create a new HttpCerflet instance and a new RoutingMap. The latter contains the routes we wish to map to a handler, which in this case is the HelloRequestHandler. For the handler instance we create a reference to the template method createInstance<>(), with the name of our custom handler as the template argument.

What this mapping does is that when a new request is mapped against one of the keys of the RoutingMap, it instantiates a copy of the specified handler, pushing it onto a waiting worker thread.

The handler class itself derives from the HTTPRequestHandler class, which is a standard POCO Net class, reimplementing its handleRequest() method. This shows that Cerflet is more of a complement to POCO instead of abstracting it away. The main goal of Cerflet is to hide some of the complexities and boilerplate of POCO’s HTTP server, allowing one to focus on writing the actual business logic.


As for performance, an ApacheBench benchmark was run with a concurrency of 5, for a total of 100,000 requests.

1. Java Servlet

Server Software:        Apache-Coyote/1.1
Server Hostname:
Server Port:            8080

Document Path:          /examples/servlets/servlet/HelloWorldExample
Document Length:        400 bytes

Concurrency Level:      5
Time taken for tests:   7.697 seconds
Complete requests:      100000
Failed requests:        0
Total transferred:      56200000 bytes
HTML transferred:       40000000 bytes
Requests per second:    12992.07 [#/sec] (mean)
Time per request:       0.385 [ms] (mean)
Time per request:       0.077 [ms] (mean, across all concurrent requests)
Transfer rate:          7130.42 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.2      0       1
Processing:     0    0   0.5      0      14
Waiting:        0    0   0.4      0      14
Total:          0    0   0.5      0      14

Percentage of the requests served within a certain time (ms)
  50%      0
  66%      1
  75%      1
  80%      1
  90%      1
  95%      1
  98%      1
  99%      1
 100%     14 (longest request)

2. Cerflet

Server Software:
Server Hostname:
Server Port:            9980

Document Path:          /
Document Length:        99 bytes

Concurrency Level:      5
Time taken for tests:   7.220 seconds
Complete requests:      100000
Failed requests:        0
Total transferred:      19900000 bytes
HTML transferred:       9900000 bytes
Requests per second:    13850.42 [#/sec] (mean)
Time per request:       0.361 [ms] (mean)
Time per request:       0.072 [ms] (mean, across all concurrent requests)
Transfer rate:          2691.63 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.2      0       1
Processing:     0    0   0.5      0      10
Waiting:        0    0   0.4      0      10
Total:          0    0   0.5      0      10

Percentage of the requests served within a certain time (ms)
  50%      0
  66%      0
  75%      1
  80%      1
  90%      1
  95%      1
  98%      1
  99%      1
 100%     10 (longest request)


In this benchmark, Cerflet is about 7% faster than the equivalent Tomcat-based Hello World example. Cerflet hereby also logs requests to console, slowing it down somewhat, while Tomcat does not. Cerflet’s Hello World example was compiled using -Og optimisation setting in 32-bit GCC 5.3 (on Windows, MSYS2). The POCO libraries version 1.6 were used, as obtained via MSYS2’s Pacman package manager.

For Tomcat the binary distribution for 8.0.30 as obtained via the official Apache site was used, with the server manually started using the provided startup.bat script. Both servers were run on a Windows 7 Ultimate x64 platform (Intel i7 6700K, 32 GB DDR4) with ApacheBench using the loopback device.


Without compensating for all differences between the two examples used and other potential differences, it is fair to say at this point that both Servlets and Cerflets are roughly equivalent in terms of performance for a simple Hello World example. Likely Cerflets are slightly faster (5-10%) with more gains to be obtained via compiler optimisations (-O3).

The type of operations which would be performed further in the business logic likely will have the most influence on the overall performance between these two platforms. Cerflets do however show that C++-based server-side web applications are more than just a viable option, backed by a mature programming language (C++) and cross-platform libraries (POCO).

Cerflets as they exist today are reminiscent of Spring Boot Java applications, which also feature a built-in HTTP server, thus not relying on a Servlet container (e.g. Tomcat). The advantage of Cerflets is however that they only depend on the POCO libraries (if not linked fully statically), and are not dependent on a central runtime (JVM). This significantly eases deployment.

The Cerflet project’s Github page [2] can be accessed to try the here used example oneself, or to use the HTTP Cerflet implementation in a new project. Both feedback and contributions are welcome.



Categories: C++, Cerflet, HTTP Tags: , , , ,

New Project: NGS CPU Architecture

August 30, 2014 Leave a comment

For those looking at the scarcity of posts on this blog and wondering what in the world happened to me, I can offer the following explanation: personal (health) issues, as well as the embarking on writing this one book for Packt Publishing on AndEngine game development have taken up most of my time recently. Unfortunately I haven’t had much opportunity to write on this blog for that reason. Fortunately, however, I have not been sitting completely idle and have begun a new project which at least some may find interesting.

The project is a custom CPU architecture I have been wanting to develop for a while now. ‘Great’, I can hear some of you think, ‘Another CPU architecture, why would we need another one?!’ The short version is that this is a pretty experimental architecture, exploring features and designs not commonly used in any mainstream CPU architectures. Consider it a bit of a research project, one aimed at developing a CPU architecture which may be useful for HPC (high-performance computing) as well as general-purpose computing.

The project’s name is ‘Nyanko Grid-scaling System’, or NGS for short. Currently I’m working on the first prototype – a simplified 16-bit version of NGS – featuring only a single ALU. This prototype is referred to as ‘NGS-16’. Even then it has many of the essential features which I think make this into such an interesting project, including:

– unclocked design: all components work without a central clock or pipeline directing them.
– task scheduler: integrating the functionality of the software-based scheduler of an OS.
– virtual memory management: virtual memory management done in hardware.
– driver management: drivers for hardware devices are either in hardware, or directly communicate with the CPU.

Essentially this means that there’s no software-based operating system (OS) as such. A shell will be required to do the actual interfacing with human beings and to instruct the NGS task scheduler to launch new processes, but no OS in the traditional sense. While this also means that existing operating systems cannot be ported to the NGS architecture in any realistic fashion, it does not mean that applications can not be compiled for it. After porting a C/C++ toolchain (GCC or LLVM) to NGS, the average C/C++-based application would only be some library-wrangling and recompile away from functioning.

Moving back to the present, I’m writing NGS-16 in VHDL, with the Lattice MachX02-7000 [1] as the target FPGA. The basic structure has been laid out (components, top entities, signals), with just the architecture implementations and debugging/simulation left to finish. While this prototype is taking the usual short-cuts (leaving out unneeded components, etc.) to ease development, it should nevertheless be a useful representation of what the NGS architecture can do.

The FPGA board I’ll be using is actually produced by a friend, who called it the FleaFPGA following the name of his company: Fleasystems [2]. As you can see on the FleaFPGA page [3], it offers quite a reasonable amount of I/O, including VGA, USB (host), PS/2, audio and an I/O header. The idea is to use as much of this hardware as possible with the initial range of prototypes. I also have another FPGA board (Digilent Nexys 2, Spartan 3E-based), which offers similar specifications (LEs and I/O). Depending on how things work out I may also run NGS-16 on that board. Ultimately I may want to build my own FPGA board aimed specifically at running NGS.

Over the coming months I’ll be blogging about my progress with this NGS-16 prototype and beyond, so stay tuned 🙂



Categories: NGS, VHDL Tags: , , , , ,

Introducing Nt With the Qt-based NNetworkSocket Class

August 17, 2013 Leave a comment

While recently working on a project involving C++, Qt and networking using TCP sockets I came across a nasty surprise in the form of blocking socket functions not working on Windows with QTcpSocket [1]. Not working as in fundamentally broken and a bug report [1] dating back to early 2012, when Qt 4.8 was new. After a quick look through the relevant Qt source code (native socket engine and kin) in an attempt to determine whether it was something I could reasonably fix myself, I determined that this was not a realistic option due to the amount of work involved.

Instead I found myself drawn to the option of implementing a QObject-based class wrapping around the native sockets on Windows (Winsock2) and others (POSIX/Berkeley). Having used these native sockets before, I could only think of how easy it’d be to write such a wrapper class for my needs. These needs involved TCP sockets, blocking functions and client-side functionality, all of which take only a little bit of effort to implement. Any further features could be implemented on an as-needed basis. The result of this is the NNetworkSocket class, which I put on Github today [2] in a slightly expanded version.

As I suspect that it won’t be the first add-on/drop-in/something else class I’ll be writing to complement the Qt framework, I decided to come up with the so very creative name of ‘Nt’ for the project. Much like ‘Qt’, its pronunciation is obvious and silly: ‘Qt’ as ‘cute’ and ‘Nt’ as ‘neat’. Feel free to have a look at the code I put online including the sample application (samples/NNetworkSocket_sample). Documentation will follow at some point once the class has matured some more.

As usual, feel free to provide feedback, patches and donations 🙂



Implementing A Cookiejar for QtWebKit; QNetworkCookieJar Analysis

February 24, 2012 2 comments

As some of you may know already, I am working on the WildFox browser project which while it initially was going to fork the Mozilla code is now building a browser on top of QtWebKit. See the previous blog post for details on this decision. The WildFox project page and source can be found at

One of the features I recently implemented was an advanced cookiejar for storing HTTP cookies. Why was this necessary, you may ask? QtWebKit does implement a cookiejar in QNetworkCookieJar, but even aside from the inability to save any of the cookies to disk, a quick look at its source code shows the following issues: a limit of 50 cookies, which is less than the 300 required by the current standard for HTTP cookies (RFC 2617) [1]. It also uses a basic QList to store the cookies, which requires it to search in linear time through every cookie to find ones for a specific URL and duplicate cookies when storing them.

In other words, the default implementation is unsuitable for any web browser. One thing it does do right, however, is the way it verifies domains. Due to the design of internet Top Level Domains (TLDs) it is impossible to algorithmically determine whether an internet URL is valid, or specifies a proper TLD.

The need to verify the domain is made clear when one imagines someone setting a cookie for the domain .com, which would then be a cookie valid for every website ending with the TLD .com. Obviously this can’t be allowed and the obvious approach would be to disallow single dot domains (.com, .org, .net). This doesn’t work for domains like, however. Disallowing two dot domains would cause issues with the former, single dot type. Further there are more variations on this, such as URLs in the US where the public suffix can entail style domains. Clearly the only way to do this verification is to use a look-up table. This can be found in Mozilla’s public suffix list [2].

What we need for a better QtWebKit cookiejar thus entails the following:

  • the ability to store cookies to disk.
  • storing at least 300 cookies.
  • quick look-ups of cookies based on their domain.

For this we recycle the existing functionality in Qt required to do the public suffix verification. The relevant files in Qt 4.8.0 are:

  • src/corelib/io/qtldurl.cpp
  • src/qurltlds_p.h

The former contains some basic routines to obtain the public suffix which we will expand upon and the latter contains the Public Suffix list processed in a more accessible format. The latter we’ll use almost as-is, with just the Qt namespace sections removed. The former has a major omission we’ll add. The functions we’ll keep from qtldurl.cpp are in renamed form:

  • containsTLDEntry(const QString &entry)
  • isEffectiveTLD(const QString &domain)
  • topLevelDomain(const QString &domain)

We add the following function:

QString getPublicDomain(const QString &domain) {
    QStringList sections = domain.toLower().split(QLatin1Char('.'), QString::SkipEmptyParts);
    if (sections.isEmpty())
        return QString();

    QString tld = "";
    for (int i = sections.count() - 1; i >= 0; --i) {
        tld.prepend(QLatin1Char('.') +;
        if (!isEffectiveTLD(tld.right(tld.size() - 1))) {
             return tld;

    return tld;

This allows us to obtain the public suffix plus the initial non-public (TLD) domain. For example, “; would be reduced to “”. It is different from topLevelDomain() in that the latter would return just the public suffix, e.g. “.org” in the previous example, which is not desirable for our use.

With the domain verification taken care of, we move on to the next stage, which involves the data structure and storage method. To store cookies on disk we elect to use an SQLite database, as this is both efficient in terms of storage density, but also prevents disk fragmentation and allows for SQL-based look-ups instead of filesystem-based ones, as used to be common with older browsers. QtSQL comes with an SQLite driver. Do be sure to use the current version of Qt (4.8) as recently SQLite introduced journaling and the Qt 4.7.x libraries still use the old SQLite client.

For the in-memory data structure we use a QMultiMap. The rationale behind this is the key-based look-up based on the cookie domain. By taking the URL we’re seeking matching cookies for and obtaining its top domain (“”) we can find any cookie in our data structure using this top domain as the key. This means we can search a large number of cookies in logarithmic (O(log N)) time for a match on the domain, a major improvement on the linear (O(N)) search of the default QList.

The link between the in-memory and on-disk storage is accomplished by the following rules:

  • All new cookies and cookie updates are stored in both in-memory and on-disk, except for session cookies, which are stored only in-memory.
  • Stored cookies are read into memory per-domain and on-demand.

In addition to this I have implemented a cookie manager dialogue which allows one to look through and manage (delete) stored cookies. Expired cookies are automatically deleted the first time they are fetched from the database or before they’re stored. Blocking 3rd-party cookies is also very easy, with a comparison between the top domain and the cookie’s intended domain:

QString baseDomain = getPublicDomain(;
if (skip3rd && (baseDomain != getPublicDomain(cookie.domain()))) {

With this we got a relatively efficient cookie storage and retrieval mechanism with the ability to manage the set cookies. It can store an unlimited number of cookies and should remain efficient with look-ups even with over 10,000 cookies thanks to the logarithmic search of the QMultiMap.

Essential features still missing in the WildFox browser at this point are bookmarks and sessions. The next article on WildFox should be about the Chrome extension support I’m currently implementing, with as direct result XMarks bookmark synchronization support as well as the bookmarks feature. Stay tuned.



Categories: HTTP, programming, Projects, Qt, WildFox Tags: , , ,

Surviving The Mozilla Build System, A Brief Guide

November 26, 2011 Leave a comment

Last year I did a couple of interviews for The Register and other sites regarding my WildFox project which in essence had the goal to add h.264 video support to Firefox using the GStreamer backend or FFmpeg. Due to circumstances I didn’t manage to do significant work on this project until a few months ago when I finally began the real modifications to the Firefox source.

As this article isn’t about the quality of the Mozilla source code, or the lack thereof, I won’t dwell on it too long. Suffice it to say that I found a lot of instances of NIH (‘Not Invented Here’) syndrome including the networking, smart pointer and threading sections. As my goal was to add Libav (recent fork of FFmpeg) support to the media backend of Firefox I became intimately familiar with these APIs as I discovered just how much of the code would be ripped out without causing adverse effects, and that the smart pointers do not work with anything but NSISupports-derived types.

Anyway, the build system… at first glance the Mozilla build system seems to use Makefiles, that is until you notice the .in extension indicating that they’re autoconfig templates. Or autobreak as lovingly called by a large section of the internet. After much trial and error I discovered that after putting my new Libav decoder & reader into /content/media/libav and the Libav includes into /media/libav of the source tree, creating a single Makefile with EXPORTS and an individual Makefile for each library of Libav (libavformat, libavcodec, etc.), I still had to edit a host of files to make it all work:




After following the hints in the Mozilla documentation [1] I first discovered the files in /toolkit, and things finally began to work. Until I hit a few snags, like having to add a compiler flag to CXXFLAGS but there being no way to specify this in the Makefiles which didn’t get ignored for some reason. Libav is a C99 project and requires -D__STDC_CONSTANT_MACROS to be added to CXXFLAGS [2] to make it play nice with a C++ project. In the end I put this flag directly into the CXXFLAGS definition in the top of in the root folder. Ugly but it works.

At this point everything builds, the only thing I’m still stuck on is how to add the Libav’s LIB files to the linker flags. As usual the methods I have found do not work and even adding it to didn’t seem to do the trick.

To be quite honest I’m ready to give up on improving the Mozilla source. The changes required to bring it up to speed with proper project standards are just too daunting and severe to be handled just by me. As a fun comparison I started the WildFox-Mimic project a few days ago to investigate what it’d take to create a browser which looks and feels like Firefox, but uses Qt and the QtWebKit engine. The result is a modern, up to date browser with an HTML 5 video/audio backend which uses Qt’s Phonon which wraps around the OS’s media framework, be it DirectShow, GStreamer or something else. In other words it’s perfect. QtWebKit can also use the same NPAPI plugins Firefox uses, so Flash support is available out of the box. The Persona themes and JetPack add-ons can also be supported.

The result with WildFox-Mimic would be a browser with a codebase a fraction the size of the Firefox one, with most of the development and maintenance performed by the Qt and QtWebKit maintainers. This is the direction I think I’ll be heading towards.



Categories: Projects, Qt, WildFox

Sharing Data Even More Universally; UDS Open Sourced

November 22, 2011 Leave a comment

A while ago (September 27th this year, to be precise) I announced[1] the Universal Data Share (UDS) project: a small utility using which people can share files with others without the need for a server or complicated software installs.

Since that time I have been testing the application with some friends and added a lot of bug fixes as well as some features, including:

  • Chunked downloading, to allow larger files to be transferred without the application dying on itself.
  • Downloads tab, with progress indicators and download folder.
  • Filesizes of the remote files reported in the client.
  • Much improved protocol.

The project page has been online for a while now [2]. You can find the most up to date binaries, GitHub link and protocol details there. Yes, that’s a GitHub link, as in the source being freely available. While I haven’t attached a formal license to it, I’d much appreciate it if people didn’t use significant parts of the source without attribution and send me any bug fixes or feature additions they may have produced.

Also important to note that the software is now essentially Beta-level quality, as in that it is feature-complete and relatively stable. Large-scale testing still has to be performed, naturally, and that’s where you people come into play 🙂



Categories: Projects, UDS Tags: