Friday, January 21, 2011

Usage share of newsreaders, update

A few months ago, I logged the usage share of various newsreaders for roughly the month of August. Since then, I ran updated tests, alternating monthly between logging by users and messages, which gives me statistics for September through November. Later months are not available because the newsserver I used has now gone away (without notifying users!), and I did not want to switch this script to the new one, since comparability is lost.

One of the uses I had for original statistics collection was to argue why NNTP support for Thunderbird still matters. During an IRC discussion, it was brought up that August is a poor month for logging since there is a tradition of using that month for vacation. Pulling up the data for the month of October, the last one for which I have this data, indicates that approximately 720,000 messages were posted that month, indicating that August is indeed a poor month for indicating volume.

Have the statistics changed much? Google Groups and Thunderbird are both within .2% absolute difference of the scores I calculated last time (44.02% and 12.3%, respectively). Down the line, things change: Outlook Express had 8.98%, followed by Forte Agent at 8.86%. Live Mail had 2.83% and MT-NewsWatcher had 2.51%. Indeed, the tail is longer, with 20.52% as compared to before.

As my new server has a longer retention time, I no longer wish to use the same script as before. My next goal is to log every header of every message posted this year, so that I may collect more information without having to list everything I need, particularly information useful in determining the user of mail-to-news gateways and information to help identify spamminess of messages. I have lots of ideas for possible analysis of data, but first I want usable data.

Monday, January 17, 2011

News URI handling

I have built a version of Thunderbird that should fix news URI handling issues, obtainable from this site. This is a patch queue based off of trunk just after the 3.3a2 release branch, so it should have all of the features of 3.3a2 if not the version number.

In particular, both news and nntp links of any type should work, including news URIs without a server. If any of those links do not work, please tell me, including the circumstances in which they didn't work (e.g., where did you click it, was TB open, were you subscribed to the group or not, etc.). Also, there is a chance that this could regress other handling in news code or OS command line handling, so if you see such regresses, please also tell me.

Thanks in advance for your testing!

Saturday, January 15, 2011

The Great Codec War

At the Second Battle of Chrome, WebM seems to have struck a surprise victory against H.264, when Google announced that it was dropping support for H.264 in <video>. Well, maybe it was only a pyrrhic victory. Reactions seem to differ a lot, but I think a lot of them miss the mark.

I've seen some people (I'm looking at you, Ars) claim that this will help kill HTML 5 video. This claim seems to me to be bogus: HTML 5 video effectively died years ago when no one could agree on a codec. Of the several video sites I use, the only one to support HTML 5 is YouTube; everyone else uses Flash. Since it's already dead, you can hardly kill it by switching codecs. This just shifts the balance more in favor of WebM. And as for H.264 working nearly everywhere, AppleInsider, your chart is just Blatant Lies.

Another thing that people do is compare video codecs to image codecs, most particularly GIF. But H.264 is not GIF: GIF became wildly popular before Unisys seemed to realize that it violated LZW. It is also unclear, looking back a decade after the fact, if Unisys targeted only encoders or both (lzw.info implies both, but Mozilla had GIF code long before the patent expired, and I don't see any information about LZW licensing). H.264, however, clearly mentions licensing for the decoder. Furthermore, it was relatively easy to make a high-quality image codec that doesn't require stepping on patents (which we now call PNG); the video codec market is much more strangled to make that impossible.

While on the topic, I've also seen a few statements that point out that H.264 is an ISO standard and WebM is not. Since people love to make the comparison between H.264 and GIF, I will point out that GIF is not an ISO standard nor an RFC, ITU, W3C, or IEEE document (although the W3C does have a copy of it on their website, it appears to not be accessible from their website by internal links). The commentary about "open standards" typically means "I can implement it by reading this/these specification(s) without paying a fee to anybody", not "there exists an officially-approved, freely-available standard" (incidentally, ISO standards generally are NOT freely-available).

But what about Flash, most people say, both on the issue of support for H.264 (although it will be supporting WebM as well) as well as the fallacy of open support. The answer to that is two simple words: "Legacy content." Flash works for everybody but a few prissy control freaks, and so much stuff—more than just video, in fact—uses it that not supporting it is impractical. Remember, half the web's users do not have HTML 5-capable browsers.

All of that said, where things go in the future is very much an open question. I see several possible directions:

  1. The U.S. declares software patents invalid. Mr. O'Callahan can tell you one scenario that could cause this. It's actually not implausible: the Supreme Court in Bilski seemed mildly skeptical of expansive patentability claims, and a relatively clean software patent claims would probably allow them to make a coherent "narrow" ruling on software patents in general. And, though the U.S. is not the world, an anti-software patent U.S. ruling would probably lead to nullification of software patents worldwide.
  2. MPEG-LA changes their minds and allows royalty-free decoding (not encoding) of H.264. This solution is fairly implausible, unless MPEG-LA desperately decides to try this gambit to stop H.264 from becoming obsolete. The circumstances which would lead them to do this would probably be on the back of a steep descent in H.264 popularity, so the actual value of this outcome would be minor.
  3. Apple caves in and allows either Flash or WebM on iOS. With alternative browsers on mobile allowing these options, that means only about 17% of the mobile market has no support for video other than H.264. Depending on the success of other OSs, this may force Apple to support one of the two alternatives to allow video to work on iOS. I don't know how plausible this is, but seeing as how Android is both newer than iOS and more popular, a long-term decline in Apple's fortunes is not unreasonable.
  4. The world continues as it does today, with no single solution supporting everybody. Not ideal, but it is the path of least resistance. Unfortunately, it's also probably the most likely.

Monday, January 10, 2011

Developing new account types, Part 4: Displaying messages

This series of blog posts discusses the creation of a new account type implemented in JavaScript. Over the course of these blogs, I use the development of my Web Forums extension to explain the necessary actions in creating new account types.

In the previous blog post, I showed how to implement the folder update. Our next step is to display the messages themselves. As of this posting, I will refer less frequently to my JSExtended-based framework (Kent James's SkinkGlue is a less powerful variant; a final version will likely be a hybrid of the two technologies)—it will be slowly phased out over the rest of these series of blog posts.

URLs and pseudo-URLs

As mentioned earlier, messages have several representations. Earlier, we used the message key and the message header as our representations; now, we will be using two more forms: message URIs and necko URLs [1]. The message URI is more or less a serialization of the folder and key unique identifier. It does not have any further property of a "regular" URL (hence the title); most importantly, it is not (necessarily) something that can be run with necko. To convert them to necko URLs, you need to use the message service.

Because message URIs require an extra step to convert to necko URLs, most of the message service uses the message URI instead of the URL (anytime you see a raw string or a variable named messageURI, or (most of the time) URI, it is this pseudo-URL that is being referred to). Displaying messages involves a call to the aptly-named DisplayMessage. Unfortunately, it's also not quite so aptly-named in that it can also effectively mean "fetch the contents of this message to a stream," but I will discuss this later.

This is where the bad news starts. First off, mailnews is a bit lazy when it comes to out parameters. Technically, XPCOM requires that you pass in pointers to all outparams to receive the values; a lot of the calls to DisplayMessage don't pass this value because they ignore it anyways. Second, one of the key calls needed in DisplayMessage turns out to be a [noscript] method on an internal Gecko object. What this means is you can't actually implement the message service in JavaScript.

There is good news, however. Many of the methods in nsIMsgMessageService are actually variants of "fetch the contents of this message"; indeed, the standard implementations typically funnel the methods to a FetchMessage. My solution is to reduce all of this to a single method that you have to implement, and you get your choice of two ways to run it. Owing to implementation design artifacts, I've done it both ways and can show it to you.

Body channel

The first way is to stream the body as a channel. This is probably not the preferred method. Telling us that you did this is simple:

wfService.prototype = {
  getMessageContents: function (aMsgHdr, aMsgWindow, aCallback) {
    let task = new LoadMessageTask(aMsgHdr);
    aCallback.deliverMessageBodyAsChannel(task, "text/html");
  }
};

Seriously, that's the full code to say that you have a channel. The channel itself implements nsIChannel, but we only use very few methods: asyncOpen (we never synchronously open), isPending, cancel, suspend, and resume. The primary purpose of the channel is just to funnel the input stream of the body (not the message headers; those will be written based on the message header). The channel implementation is moderately simple:

function LoadMessageTask(hdr) {
  this._hdr = hdr;
  this._uri = hdr.folder.getUriForMsg(hdr);
  this._server = hdr.folder.server;
}
LoadMessageTask.prototype = {
  runTask: function (protocol) {
    this._listener.onStartRequest(this, this._channelCtxt);
    this._pipe = Cc["@mozilla.org/pipe;1"].createInstance(Ci.nsIPipe);
    this._pipe.init(false, false, 4096, 0, null);
    /* load url */
  },
  onUrlLoaded: function (document) { let body = /* body */; this._pipe.outputStream.write(body, body.length); this._listener.onDataAvailable(this, this._channelCtxt, this._pipe.inputStream, 0, this._pipe.inputStream.available()); }, onTaskCompleted: function (protocol) { this._listener.onStopRequest(this, this._channelCtxt, Cr.NS_OK); }, QueryInterface: XPCOMUtils.generateQI([Ci.nsIChannel, Ci.nsIRequest]), asyncOpen: function (listener, context) { if (this._listener) throw Cr.NS_ERROR_ALREADY_OPENED; this._listener = listener; this._channelCtxt = context; // Fire off the task! this._server.wrappedJSObject.runTask(this); } };

There are some things to note. First, this code can synchronously callback onStartRequest from runTask, which is a necko no-no. However, our magic glue channel gracefully handles this (by posting the call to asyncOpen in another event). Loading the input stream is done with a pipe here, and I'm doing a quick-and-easy implementation that does not take into account potential internationalization issues. I also haven't bothered to implement the other methods I should here, mostly because this code is primarily an artifact of an earlier approach, whose only purpose now is demonstrating channel-based loading.

Body input streams

The second method of implementation is just to give us the message body as an input stream:

getMessageContents: function (aMsgHdr, aMsgWindow, aCallback) {
  let pipe = Cc["@mozilla.org/pipe;1"].createInstance(Ci.nsIPipe);
  pipe.init(false, false, 4096, 0, null);
  aCallback.deliverMessageBodyAsStream(pipe.inputStream, "text/html");
  aMsgHdr.folder.server.wrappedJSObject.runTask(
    new LoadMessageTask(aMsgHdr, pipe.outputStream));
}

function LoadMessageTask(hdr, outstream) {
  this._hdr = hdr;
  this._outputStream = outstream;
}
LoadMessageTask.prototype = {
  runTask: function (protocol) {
    protocol.loadUrl(/* url */, protocol._oneShot);
  },
  onUrlLoaded: function (document) {
    let body = /* body */;
    this._outputStream.write(body, body.length);
  },
  onTaskCompleted: function (protocol) {
    this._outputStream.close();
  }
};

Here, the basic appraoch is still the same: we open up a pipe, stuff our body in one end and give the other end to the stream code. However, we don't need to do the other work that comes with loading the URI, which streamlines the code greatly. We can also pass in to the callback method an underlying request that will take care of network load stopping, etc., for us if we so choose, but the argument is optional.

More implementation

Naturally, you have to add some more contract implementations to get all of the services to work right. The following is a sample of my chrome.manifest as it stands:

component {207a7d55-ec83-4181-a8e7-c0b3128db70b} components/wfFolder.js
component {6387e3a1-72d4-464a-b6b0-8bc817d2bbbc} components/wfServer.js
component {74347a0c-6ccf-4b7a-a429-edd208288c55} components/wfService.js
contract @mozilla.org/nsMsgDatabase/msgDB-webforum {e8b6b6ca-cc12-46c7-9a2c-a0855c311e07}
contract @mozilla.org/rdf/resource-factory;1?name=webforum {207a7d55-ec83-4181-a8e7-c0b3128db70b}
contract @mozilla.org/messenger/server;1?type=webforum {6387e3a1-72d4-464a-b6b0-8bc817d2bbbc}
contract @mozilla.org/messenger/protocol/info;1?type=webforum {74347a0c-6ccf-4b7a-a429-edd208288c55}
contract @mozilla.org/messenger/backend;1?type=webforum {74347a0c-6ccf-4b7a-a429-edd208288c55}
contract @mozilla.org/messenger/messageservice;1?type=webforum-message {7e3d2918-d073-4c98-9ec7-f419a05c29de}

The first and last CIDs, as you'll notice, were not implemented by me (well, kind of). The first is the CID of nsMsgDatabase that I've exposed in one of my comm-central patches; the latter is the CID of my extension message service implementation. Also of importance is that I included a second contract-ID for my service implementation, this is for my new interface msgIAccountBackend, which is the source of the getMessageContents method I implemented earlier, and which you also need to implement to get it to work.

Finally, you need to generate the message URI properly. Fortunately, this just requires you to implement one method:

wfFolder.prototype = {
  get baseMessageURI() {
    if (!this._inner["#mBaseMessageURI"])
      this._inner["#mBaseMessageURI"] = "webforum-message" +
        this._inner["#mURI"].substring("webforum".length);
    return this._inner["#mBaseMessageURI"];
  }
};

Under the hood

For those who wish to know more about is actually going on, I am going to describe the full loading process, from the moment you click on the header to the time you see the output.

Clicking on the header (after some Gecko code that I'll elide) leads you to nsMsgDBView::SelectionChanged. This code is kicked back to the front-end via nsIMsgWindow::commandUpdater's summarizeSelection method. For Thunderbird, this is the method that handles clearing some updates and also decides whether or not to show the message summary (which is "yes if there is a collapsed thread, this pref is set, and this is not a news folder" [2]). Summarization is a topic I'll handle later.

In the case of a regular message, the result of the loading is to display the message. The message URI is constructed, and then passed to nsMessenger::OpenURL, which calls either nsIMsgMessageService::DisplayMessage or nsIWebNavigation::LoadURI, depending on whether or not it can find the message service. The message service converts its URI to the necko URL and then passes that—since it's passed in with the docshell as a consumer—to LoadURI with slightly different flags. And thus begins the real message loading.

Loading URLs by the docshell is somewhat complicated, but it boils down to creating the channel, opening it via AsyncOpen. When the channel is opened (OnStartRequest is called), it tries to find someone who can display it, based on the content type. It turns out that there is a display handler in the core mailnews code that can display message/rfc822 messages, which it does by converting the text into text/html (via libmime) and using the standard HTML display widget. I'm going to largely treat libmime as a black box; it processes text as OnDataAvailable is called and spits out HTML via a mixture of OnDataAvailable and callbacks via the channel's url's header sink, or the channel's url's message window's header sink.

The special extension message service implementation goes a few steps further. By managing the display and channel code itself, it allows new implementors to not worry so much about some of the particular requirements during the loading process. Its AsyncOpen method is guaranteed to not run OnStartRequest synchronously, and also properly manages the load groups and content type manipulation. Furthermore, the channel manually synthesizes the full RFC 822 envelope (the code inspired by some compose code), and ensures that the nsIStreamListener methods are called with the proper request parameter (the original loaded channel must be the request passed).

Alternative implementation

It is still possible to do this without using the helper implementation. In that case, there are alternatives. The first thing to do is to implement the network handler, for which you'll definitely need a protocol implementation, and probably a channel and url as well. A url that does not implement nsIMsgMailNewsUrl and nsIMsgMessageUrl is likely to run into problems with some parts of the code. You can possibly get by without a message service for now, but I suspect it is necessary for some other portions of the code. To get the message header display right, you need a message/rfc822 content-type (which gets changed to text/html, so it has to be settable!).

A possible alternate implementation would be to send a straight text/html channel for the body and then manually call the methods on the header sink, i.e., bypass libmime altogether. A word of caution about this approach is that libmime can output different things based on the query parameters in the URL, and I don't know which of those outputs are used or not.

Next steps

Now that we have message display working, we pretty much have a working implementation of the process of getting new messages and displaying them. There are several ways I can go from here, but for now, I'll make part 5 deal with the account manager. Other parts that I am planning to do soon include dealing with subscription, filters, and other such code.

Notes

  1. If you are not aware, "necko" refers to the networking portion of the Mozilla codebase. The terms "URI" and "URL" also have standard meanings, but for the purposes of this guide, they mean different things. I will try to keep them distinct, but I have a tendency to naturally prefer "URI" most of the time, so I may slip up.
  2. Unfortunately, a lot of the front-end code has taken it upon itself to hardcode checks for certain implementations to enable/disable features. Hopefully, as Kent James and I progress on this work, these barriers can be reduced.

Friday, January 7, 2011

Random code coverage statistics

Test coverage happened to come up in an IRC channel I frequent today, so I thought up some probably useless statistics on code coverage, and applied them to Thunderbird.

Since I lost the original lcov files, I reculled them from the output HTML data. Of the 112,024, only 65,012 were actually run (which matches the summary output, so I'm good). It turns out that, in total, there was a whopping 168,563,629 line executions in the test, or an average of 1,504.89 hits per line. If we only count among the lines that were hit, the average number of times run is 2,592.81.

Given the general paucity of tests in comm-central, why is this number so high? Well, it turns out that some functions in libmime run a lot. The most was mimebuf.cpp's line 215, which ran no fewer than 3,223,519 times, followed closely by line 224 at the still-impressive count of 3,222,934. The most outside of libmime (which swept the top 5) was nsMsgLineBuffer.cpp's lines 140-150, at the count of 1,828,695. I think we can safely say that those lines are well-covered.

The numbers for functions seem similarly skewed: 10,951 functions, 6,443 of them actually run. Between these, there were a total of 26,588,690 function calls, or 2,427.96 or 4,126.76 calls per function (depending on your count). And these high-flying functions are both in libmime, com18n.cpp, to be specific: NextChar_UTF8 and utf8_nextchar, with 1,782,545 calls each. Outside of libmime (who again sweeps the top 5) is nsMsgFolderCache::GetEnv, with a total of 450,400 function calls. I think we can safely say that said function is bug-free.

Wednesday, January 5, 2011

Predicted 2011 Mozilla work

Another year, another time to predict what work I want to get to this year. And, of course, another chance to fail to do that work.

News submodule

This year, I am going to give myself a goal of bringing the total number of open bugs in the MailNews Core: Networking: NNTP component to below 100, or, at the very least, below 104 to make it the least buggy of the mailnews core protocol implementations. I've laid out for myself a map of all bugs in the component, so I know what needs to be worked on. My current work on news URIs by itself should get me almost there: I have patches awaiting review that fix bugs 37465, 108297, 226890, 403242, 498321, and 617287, as well as patches in my queue that fix bugs 108970, 110841, and 224335, with patches for bugs 80972, 108107, 108877, 133793, 167991, 327885, 411568, and 530193 likely to come. In other words, that's supporting no-authority news URLs.

Outside of that, I can easily pick up a few small bugs along the way to get that number down. What's likely not going to be fixed by me is venerable bug 43278, or any expired article issues&mdsash;in other words, any set of bugs that would require as much effort as the no-authority bug. I'm not quite leaving out the authentication-related bugs, but those are in the unlikelier side of the "maybe" pile.

Code coverage

My attempts to get decent JavaScript code coverage appear to have been thwarted once again (though I'm not giving up hope). If a few tweaks don't get my current approach working, I'll probably return to a simpler instrumentation-based approach (blech). I would still like to see at least the C++ code coverage analysis be run on a more regular basis now (at least weekly) so we can get a good timeline of code coverage through the ages. Building versions of mozilla a year in retrospect is not fun, especially given the annoyance of mozmill versioning.

Unfinished things

Once I get no-authority news URIs in the tree, I would like to return to new account types. Actually, the work I did has proven to be very helpful in removing the next road block (since I got intimate knowledge of how URIs are actually run behind the scenes as well as how necko works with the URIs). On the downside, my attempts to get JS to extend C++ classes appear to be getting less stable the more I work with them, so I'll probably abandon that approach and instead turn to another approach: writing an extension layer that makes it simpler to implement all of the functionality without having to get down-and-dirty. I feel justifying in saying that the less you look like an email server, the more you'll be happier not having to deal with the messy glue implementations.

Now, that said, I may still continue working in the vein of the blog posts, since it is still a nice documentation of how things go on just below the hood. In any case, the two things I most want to hide in the implementation are the database and the URL, half of which has already been discussed.

Some people have asked that I continue my guides to pork. However, I have become more persuaded that the current mime implementation needs to be tossed away and restarted from scratch, which reduces my primary motivation for learning it. Furthermore, pork (at least as built in elsa) is pretty much considered abandoned, although there are intentions to rebuild the tool on clang, now that it has a decent C++ parser.

I have been told by Mark Banner that he intends to get a new roadmap for the address book up in the near future. To the extent that it does not conflict with other goals I have, I will probably do some implementation under that. I may also decide to attempt again to work on address book integration with Linux desktops, given some experience I've had over the past year.