20 Apr 2009, 2:32 p.m.

A First Look at the New WURFL API for PHP

About a month ago, the New WURFL API for PHP was officially released. While the code had been available in one form or another for some time, the official release coincided nicely with the early stages of a new project at work, so it seemed like an appropriate time to have a look at the API and see if it was something we wanted to use.

By way of a refresher, WURFL is a "Device Description Repository" - a huge open-source XML-based database of information regarding mobile handsets and their capabilities. I've discussed WURFL in the past, for example here. Prior to this release, the only practical method of querying WURFL in real time from PHP was via a library named Tera-WURFL, which I blogged about here. In fact, both WURFL and Tera-WURFL were covered in an article I wrote for php|architect magazine last year.

We've generally been very happy with Tera-WURFL, but it's always worth considering one's options, so what follows is an overview of my experiences with, and first impressions of the New WURFL API.

Installation

Installing the New WURFL API was fairly straightforward, but sadly there is a dependency on PEAR::Log. This is immediately a disappointing decision which brings very little gain. Logging isn't a hard thing to do, and using PEAR::Log in turn creates a dependency on the PEAR installer. In practice this means maintaining a PEAR installation on each of your Web servers, which may well not suit everyone.

Otherwise the installation is fairly simple, little more than a case of downloading the archive from the project's SourceForge pages and unzipping the library somewhere accessible on a Web server.

There's no requirement to set up a MySQL database, as there is with Tera-WURFL, since the WURFL API builds a large cache file on disk on its first run, rather than in a database. That process is fairly slow, but need happen once and once only.

Using the New WURFL API from PHP

There isn't a great deal of difference between using Tera-WURFL and the WURFL API in your actual PHP code. Initial configuration aside, here's how to connect to the new WURFL API:


<?php

$wurflconfig  = '/path/to/wurfl-config.xml';
$wurflmanager = WURFL_WURFLManagerProvider::getWURFLManager($wurflconfig);
$device       = $wurflmanager->getDeviceForHttpRequest($_SERVER);

And now a connection to Tera-WURFL:


<?php

require_once '/path/to/tera_wurfl/tera_wurfl.php';
$device = new Tera_Wurfl();
$device->getDeviceCapabilitiesFromAgent($_SERVER['HTTP_USER_AGENT']);

In either case, $device is now a large array containing a vast amount of information about the handset. Since the data for both libraries is ultimately sourced from WURFL, there isn't really anything to choose between them. The only noticeable difference is that Tera-WURFL's data is arranged into a multi-dimensional array which closely mirrors the structure of WURFL's capabilities, while the New WURFL API places all of its information into a single one-dimensional array. If pressed, I'd probably prefer the former.

In that Tera-WURFL example code, the library requires us to pass $_SERVER['HTTP_USER_AGENT'] to its getDeviceForHttpRequest() method, this of course being the HTTP User-Agent header based on which Tera-WURFL queries the WURFL database for device details. You may have spotted that in the New WURFL API example, we pass in $_SERVER, which contains all the HTTP request headers. This hints at an interesting feature which the New WURFL API promises.

Hidden User Agents

In the mobile world we have to put up with a lot of nonsense that those "normal" Web guys are blissfully unaware of. We absolutely need to know what handset a visitor is using in order to tailor pages to devices, and also since we cannot sell mobile content to a user unless we know the handset in question supports that content.

This means that it's crucial that we receive the user-agent string. Sadly, some operators employ "transcoding" software on their Web gateway servers which strips out the HTTP User-Agent request header and replaces it with some generic string. We can't do much about that other than raise awareness (see "Vodafone UK is abusing its position").

However, some slightly better-behaved proxies and transcoders simply "move" the real User-Agent header to a supplementary header. For example Opera Mini identifies itself as, well, Opera Mini, but provides the real user-agent string in the X-OperaMini-Phone-UA header. There are plenty of other alternative headers in use in the wild, and hunting for the real user-agent string is fairly painful. Happily WURFL itself knows, in many cases, where to look for the real user-agent string. Thus, a claim made for the New WURFL API is as follows:

Among other things, the new API will go out of its way to recognise Novarra and other transcoders (including a fair share of funny clients which advertise themselves as fully-fledged browsers), and make sure that the device is recognised for what it is (not for what the transcoder would like you to believe it is)

Given the fuss involved in doing it manually, this was a hugely appealing feature for us. Sadly, it is clear from the source code that this feature is a no-show, at least for the PHP API. If $_SERVER['HTTP_USER_AGENT'] is set to anything, then that is what is deemed to be the user-agent string regardless of what else is there. Only if that is not set does it the API fall through to a few edge cases, none of which are Opera Mini, or Novarra or in fact any of the transcoders that have been causing us problems. I can only assume that this is a feature of the Java API which has not made it into PHP yet.

In fact, my own experience of testing the API with the Opera Mini on my new LG KC910 was that the library threw an uncaught exception resulting in a fatal error. This is either a very poor way to handle unknown devices or a major bug. Tera-WURFL at least falls back gracefully to a generic device.

Swap-in-ability

A further question we wanted to answer was how easy it would be to migrate existing code (including libraries such as Wall4PHP) to the new API, and it looks as though this would not be easy, as the New WURFL API data object is structured quite differently to a Tera-WURFL object. In fact the "official" WURFL API's data structure differs from that of WURFL more than the "unofficial" one does.

Interestingly, there has been a bit of noise on the mailing list about porting Wall4PHP to the new API, but not much action yet.

Updating Device Data

The data in WURFL changes on roughly a monthly basis as new handsets are released, and more data regarding older ones is discovered, so it's important to take an update fairly regularly. I've found no documentation at all on how one would import the latest device data to the New WURFL API, and remove cached data derived from the previous version, other than by doing it manually. There appears to be no update script like Tera-WURFL provides. That's not the end of the world, but would simply be yet another problem to solve ourselves.

Performance

I didn't look into performance, partly because I think it's easy to make a decision based on the above points, but partly because it would be hard to get a fair figure for the New API without a lot of work.

This is because the performance of the new API will be highly dependent on which caching method is employed. The default is file-based which would be slow, but you can swap in memory-based caching, which would be fast but temporary (and potentially memory-intensive). Tera-WURFL is extremely fast, and its performance has never been an issue.

Conclusions

As it happens, we decided not to migrate away from Tera-WURFL for the time being, as there simply does not seem to be any reason to choose the New WURFL API over Tera-WURFL, even for brand new "greenfield" projects where no migration would be required. We'll keep an eye on the New WURFL API all the same: it's a new project and under active development, so we'll probably look at it again in a year or so's time

Links and Tools

Posted by Simon at 01:53:00 PM
21 Apr 2009, 10:54 a.m.

Ciaran McNulty

An interesting article! I guess the move to the API expecting all of $_SERVER leaves the door open to the improvements you're after.

I agree that it seems strange that they'd rely on users installing PEAR::Log separately. A better solution might be one of the following:

a) Just include the relevant components in the distribution - PEAR::Log is opensourced under the MIT licence.

b) Provide a PEAR channel for installing the package, I think you could then include the dependency on PEAR::Log and let the package manager deal with it.

c) Let the application detect if PEAR::Log is installed, and fail gracefully if not.

Any of which would be preferable to the current solution.

23 Jul 2009, 9:52 a.m.

redwan

Nice article. I think I'll stick to Tera WURFL. However, last week when I updated Tera WURFL using the wurfl.xml I downloaded from sourceforge, I discovered that my mozilla firefox browser was being detected as a wireless device and an IE 6 user-agent was detected as a Black Berry handset. I was shocked, I quickly reversed it by installing the older version of Tera WURFL I was using earlier... any insights to this... ???

28 Jul 2009, 1:55 p.m.

Simon [ADMIN]

That seems to be quite common behaviour, redwan. WURFL by itself is geared up specifically to identify mobile browsers, so it'll err on that side.

There is, however, a web browsers patch available form the WURFL site. I believe it comes bundled with Tera-WURFL, but the included data may well be outdated by now, and the patch probably isn't enabled by default anyway.

27 Oct 2009, 9 p.m.

Steve Kamerman

Hey guys, you should check out the new Tera-WURFL 2.0 - I released RC4 last night and it's very good! It includes the same logic as the new Java WURFL API (maybe the same as the PHP API as well) but it is almost completely backwards compatible with the previous versions of Tera-WURFL. Check it out on www.Tera-WURFL.com

5 Nov 2009, 7:07 p.m.

Simon [ADMIN]

Thanks for dropping by, Steve. That's pretty exciting news, and I'll be certain to check out Tera-WURFL 2 as soon as I get a free moment.