Archive

Archive for February, 2012

Implementing A Cookiejar for QtWebKit; QNetworkCookieJar Analysis

February 24, 2012 2 comments

As some of you may know already, I am working on the WildFox browser project which while it initially was going to fork the Mozilla code is now building a browser on top of QtWebKit. See the previous blog post for details on this decision. The WildFox project page and source can be found at www.mayaposch.com/wildfox.php.

One of the features I recently implemented was an advanced cookiejar for storing HTTP cookies. Why was this necessary, you may ask? QtWebKit does implement a cookiejar in QNetworkCookieJar, but even aside from the inability to save any of the cookies to disk, a quick look at its source code shows the following issues: a limit of 50 cookies, which is less than the 300 required by the current standard for HTTP cookies (RFC 2617) [1]. It also uses a basic QList to store the cookies, which requires it to search in linear time through every cookie to find ones for a specific URL and duplicate cookies when storing them.

In other words, the default implementation is unsuitable for any web browser. One thing it does do right, however, is the way it verifies domains. Due to the design of internet Top Level Domains (TLDs) it is impossible to algorithmically determine whether an internet URL is valid, or specifies a proper TLD.

The need to verify the domain is made clear when one imagines someone setting a cookie for the domain .com, which would then be a cookie valid for every website ending with the TLD .com. Obviously this can’t be allowed and the obvious approach would be to disallow single dot domains (.com, .org, .net). This doesn’t work for domains like .co.uk, however. Disallowing two dot domains would cause issues with the former, single dot type. Further there are more variations on this, such as URLs in the US where the public suffix can entail city.state.us style domains. Clearly the only way to do this verification is to use a look-up table. This can be found in Mozilla’s public suffix list [2].

What we need for a better QtWebKit cookiejar thus entails the following:

  • the ability to store cookies to disk.
  • storing at least 300 cookies.
  • quick look-ups of cookies based on their domain.

For this we recycle the existing functionality in Qt required to do the public suffix verification. The relevant files in Qt 4.8.0 are:

  • src/corelib/io/qtldurl.cpp
  • src/qurltlds_p.h

The former contains some basic routines to obtain the public suffix which we will expand upon and the latter contains the Public Suffix list processed in a more accessible format. The latter we’ll use almost as-is, with just the Qt namespace sections removed. The former has a major omission we’ll add. The functions we’ll keep from qtldurl.cpp are in renamed form:

  • containsTLDEntry(const QString &entry)
  • isEffectiveTLD(const QString &domain)
  • topLevelDomain(const QString &domain)

We add the following function:

QString getPublicDomain(const QString &domain) {
    QStringList sections = domain.toLower().split(QLatin1Char('.'), QString::SkipEmptyParts);
    if (sections.isEmpty())
        return QString();

    QString tld = "";
    for (int i = sections.count() - 1; i >= 0; --i) {
        tld.prepend(QLatin1Char('.') + sections.at(i));
        if (!isEffectiveTLD(tld.right(tld.size() - 1))) {
             return tld;
        }
    }

    return tld;
}

This allows us to obtain the public suffix plus the initial non-public (TLD) domain. For example, “http://www.slashdot.org” would be reduced to “.slashdot.org”. It is different from topLevelDomain() in that the latter would return just the public suffix, e.g. “.org” in the previous example, which is not desirable for our use.

With the domain verification taken care of, we move on to the next stage, which involves the data structure and storage method. To store cookies on disk we elect to use an SQLite database, as this is both efficient in terms of storage density, but also prevents disk fragmentation and allows for SQL-based look-ups instead of filesystem-based ones, as used to be common with older browsers. QtSQL comes with an SQLite driver. Do be sure to use the current version of Qt (4.8) as recently SQLite introduced journaling and the Qt 4.7.x libraries still use the old SQLite client.

For the in-memory data structure we use a QMultiMap. The rationale behind this is the key-based look-up based on the cookie domain. By taking the URL we’re seeking matching cookies for and obtaining its top domain (“.slashdot.org”) we can find any cookie in our data structure using this top domain as the key. This means we can search a large number of cookies in logarithmic (O(log N)) time for a match on the domain, a major improvement on the linear (O(N)) search of the default QList.

The link between the in-memory and on-disk storage is accomplished by the following rules:

  • All new cookies and cookie updates are stored in both in-memory and on-disk, except for session cookies, which are stored only in-memory.
  • Stored cookies are read into memory per-domain and on-demand.

In addition to this I have implemented a cookie manager dialogue which allows one to look through and manage (delete) stored cookies. Expired cookies are automatically deleted the first time they are fetched from the database or before they’re stored. Blocking 3rd-party cookies is also very easy, with a comparison between the top domain and the cookie’s intended domain:

QString baseDomain = getPublicDomain(url.host());
if (skip3rd && (baseDomain != getPublicDomain(cookie.domain()))) {
     continue;
}

With this we got a relatively efficient cookie storage and retrieval mechanism with the ability to manage the set cookies. It can store an unlimited number of cookies and should remain efficient with look-ups even with over 10,000 cookies thanks to the logarithmic search of the QMultiMap.

Essential features still missing in the WildFox browser at this point are bookmarks and sessions. The next article on WildFox should be about the Chrome extension support I’m currently implementing, with as direct result XMarks bookmark synchronization support as well as the bookmarks feature. Stay tuned.

Maya

[1] http://www.ietf.org/rfc/rfc2617.txt
[2] http://publicsuffix.org/

Categories: HTTP, programming, Projects, Qt, WildFox Tags: , , ,