Archive

Archive for the ‘WildFox’ Category

Implementing A Cookiejar for QtWebKit; QNetworkCookieJar Analysis

February 24, 2012 2 comments

As some of you may know already, I am working on the WildFox browser project which while it initially was going to fork the Mozilla code is now building a browser on top of QtWebKit. See the previous blog post for details on this decision. The WildFox project page and source can be found at www.mayaposch.com/wildfox.php.

One of the features I recently implemented was an advanced cookiejar for storing HTTP cookies. Why was this necessary, you may ask? QtWebKit does implement a cookiejar in QNetworkCookieJar, but even aside from the inability to save any of the cookies to disk, a quick look at its source code shows the following issues: a limit of 50 cookies, which is less than the 300 required by the current standard for HTTP cookies (RFC 2617) [1]. It also uses a basic QList to store the cookies, which requires it to search in linear time through every cookie to find ones for a specific URL and duplicate cookies when storing them.

In other words, the default implementation is unsuitable for any web browser. One thing it does do right, however, is the way it verifies domains. Due to the design of internet Top Level Domains (TLDs) it is impossible to algorithmically determine whether an internet URL is valid, or specifies a proper TLD.

The need to verify the domain is made clear when one imagines someone setting a cookie for the domain .com, which would then be a cookie valid for every website ending with the TLD .com. Obviously this can’t be allowed and the obvious approach would be to disallow single dot domains (.com, .org, .net). This doesn’t work for domains like .co.uk, however. Disallowing two dot domains would cause issues with the former, single dot type. Further there are more variations on this, such as URLs in the US where the public suffix can entail city.state.us style domains. Clearly the only way to do this verification is to use a look-up table. This can be found in Mozilla’s public suffix list [2].

What we need for a better QtWebKit cookiejar thus entails the following:

  • the ability to store cookies to disk.
  • storing at least 300 cookies.
  • quick look-ups of cookies based on their domain.

For this we recycle the existing functionality in Qt required to do the public suffix verification. The relevant files in Qt 4.8.0 are:

  • src/corelib/io/qtldurl.cpp
  • src/qurltlds_p.h

The former contains some basic routines to obtain the public suffix which we will expand upon and the latter contains the Public Suffix list processed in a more accessible format. The latter we’ll use almost as-is, with just the Qt namespace sections removed. The former has a major omission we’ll add. The functions we’ll keep from qtldurl.cpp are in renamed form:

  • containsTLDEntry(const QString &entry)
  • isEffectiveTLD(const QString &domain)
  • topLevelDomain(const QString &domain)

We add the following function:

QString getPublicDomain(const QString &domain) {
    QStringList sections = domain.toLower().split(QLatin1Char('.'), QString::SkipEmptyParts);
    if (sections.isEmpty())
        return QString();

    QString tld = "";
    for (int i = sections.count() - 1; i >= 0; --i) {
        tld.prepend(QLatin1Char('.') + sections.at(i));
        if (!isEffectiveTLD(tld.right(tld.size() - 1))) {
             return tld;
        }
    }

    return tld;
}

This allows us to obtain the public suffix plus the initial non-public (TLD) domain. For example, “http://www.slashdot.org” would be reduced to “.slashdot.org”. It is different from topLevelDomain() in that the latter would return just the public suffix, e.g. “.org” in the previous example, which is not desirable for our use.

With the domain verification taken care of, we move on to the next stage, which involves the data structure and storage method. To store cookies on disk we elect to use an SQLite database, as this is both efficient in terms of storage density, but also prevents disk fragmentation and allows for SQL-based look-ups instead of filesystem-based ones, as used to be common with older browsers. QtSQL comes with an SQLite driver. Do be sure to use the current version of Qt (4.8) as recently SQLite introduced journaling and the Qt 4.7.x libraries still use the old SQLite client.

For the in-memory data structure we use a QMultiMap. The rationale behind this is the key-based look-up based on the cookie domain. By taking the URL we’re seeking matching cookies for and obtaining its top domain (“.slashdot.org”) we can find any cookie in our data structure using this top domain as the key. This means we can search a large number of cookies in logarithmic (O(log N)) time for a match on the domain, a major improvement on the linear (O(N)) search of the default QList.

The link between the in-memory and on-disk storage is accomplished by the following rules:

  • All new cookies and cookie updates are stored in both in-memory and on-disk, except for session cookies, which are stored only in-memory.
  • Stored cookies are read into memory per-domain and on-demand.

In addition to this I have implemented a cookie manager dialogue which allows one to look through and manage (delete) stored cookies. Expired cookies are automatically deleted the first time they are fetched from the database or before they’re stored. Blocking 3rd-party cookies is also very easy, with a comparison between the top domain and the cookie’s intended domain:

QString baseDomain = getPublicDomain(url.host());
if (skip3rd && (baseDomain != getPublicDomain(cookie.domain()))) {
     continue;
}

With this we got a relatively efficient cookie storage and retrieval mechanism with the ability to manage the set cookies. It can store an unlimited number of cookies and should remain efficient with look-ups even with over 10,000 cookies thanks to the logarithmic search of the QMultiMap.

Essential features still missing in the WildFox browser at this point are bookmarks and sessions. The next article on WildFox should be about the Chrome extension support I’m currently implementing, with as direct result XMarks bookmark synchronization support as well as the bookmarks feature. Stay tuned.

Maya

[1] http://www.ietf.org/rfc/rfc2617.txt
[2] http://publicsuffix.org/

Advertisements
Categories: HTTP, programming, Projects, Qt, WildFox Tags: , , ,

Surviving The Mozilla Build System, A Brief Guide

November 26, 2011 Leave a comment

Last year I did a couple of interviews for The Register and other sites regarding my WildFox project which in essence had the goal to add h.264 video support to Firefox using the GStreamer backend or FFmpeg. Due to circumstances I didn’t manage to do significant work on this project until a few months ago when I finally began the real modifications to the Firefox source.

As this article isn’t about the quality of the Mozilla source code, or the lack thereof, I won’t dwell on it too long. Suffice it to say that I found a lot of instances of NIH (‘Not Invented Here’) syndrome including the networking, smart pointer and threading sections. As my goal was to add Libav (recent fork of FFmpeg) support to the media backend of Firefox I became intimately familiar with these APIs as I discovered just how much of the code would be ripped out without causing adverse effects, and that the smart pointers do not work with anything but NSISupports-derived types.

Anyway, the build system… at first glance the Mozilla build system seems to use Makefiles, that is until you notice the .in extension indicating that they’re autoconfig templates. Or autobreak as lovingly called by a large section of the internet. After much trial and error I discovered that after putting my new Libav decoder & reader into /content/media/libav and the Libav includes into /media/libav of the source tree, creating a single Makefile with EXPORTS and an individual Makefile for each library of Libav (libavformat, libavcodec, etc.), I still had to edit a host of files to make it all work:

/layout/build
    Makefile.in

/toolkit
toolkit-makefiles.sh
    toolkit-tiers.mk

/content/media
    Makefile.in

After following the hints in the Mozilla documentation [1] I first discovered the files in /toolkit, and things finally began to work. Until I hit a few snags, like having to add a compiler flag to CXXFLAGS but there being no way to specify this in the Makefiles which didn’t get ignored for some reason. Libav is a C99 project and requires -D__STDC_CONSTANT_MACROS to be added to CXXFLAGS [2] to make it play nice with a C++ project. In the end I put this flag directly into the CXXFLAGS definition in the top of configure.in in the root folder. Ugly but it works.

At this point everything builds, the only thing I’m still stuck on is how to add the Libav’s LIB files to the linker flags. As usual the methods I have found do not work and even adding it to configure.in didn’t seem to do the trick.

To be quite honest I’m ready to give up on improving the Mozilla source. The changes required to bring it up to speed with proper project standards are just too daunting and severe to be handled just by me. As a fun comparison I started the WildFox-Mimic project a few days ago to investigate what it’d take to create a browser which looks and feels like Firefox, but uses Qt and the QtWebKit engine. The result is a modern, up to date browser with an HTML 5 video/audio backend which uses Qt’s Phonon which wraps around the OS’s media framework, be it DirectShow, GStreamer or something else. In other words it’s perfect. QtWebKit can also use the same NPAPI plugins Firefox uses, so Flash support is available out of the box. The Persona themes and JetPack add-ons can also be supported.

The result with WildFox-Mimic would be a browser with a codebase a fraction the size of the Firefox one, with most of the development and maintenance performed by the Qt and QtWebKit maintainers. This is the direction I think I’ll be heading towards.

Maya

[1] https://developer.mozilla.org/en/Adding_Files_to_the_Build
[2] http://libav.org/faq.html#I_0027m-using-libavutil-from-within-my-C_002b_002b-application-but-the-compiler-complains-about-_0027UINT64_005fC_0027-was-not-declared-in-this-scope

Categories: Projects, Qt, WildFox