Monday, December 8, 2008

A new address book public repo

Veteran readers may recall my old posting of the address book rewrite repository. My original idea did not turn out so well—I had about 5 heads at one point because of review comments and simultaneous patches. Not to mention the history (though fun to look at) was a pain to try to follow.

I therefore created a brand new version of the repo. In the new one, I'm structuring things a bit differently. Rather than trying to keep one branch going, I'm going to have multiple named branches.

The first one is default, or the comm-central import. The tip of the default branch is the tip of comm-central when I imported that.

One set of branches is the experimental let's-add-new-address-book-type branches. I already have a prototype SQL AB developement branch, sqlab. When I start experimenting with Evolution support, there will be an evolutionab branch. And so forth and so on. If you have custom AB types that you want to publish, email me a bundle and you'll get your very own named branch in this repo (for as long as it exists).

Another set of branches is the bugs I'll be working on. These branches will tangle around each other as a fix one bug or another, since some bugs depend on other bugs. I'll try to keep a last-idl tag up-to-date such that any changes on a branch after the last branch point or this tag (which ever comes later) do not change any IDL files.

Friday, December 5, 2008

Debugging palmsync

Also known as "An Exercise in Frustration and Self-Torture." As part of preliminary work for mailing list sanity, I need to make some changes to palmsync. It's not like I haven't changed this code before, but the work I need to do here is more than a simple s/a/b/g command. So, time to fire up the 'ole Windows partition and test.

Pain point number one: trying to get necessary stuff to work. First you need the Palm CDK. After getting that (complete with throw-away email account), you now need something to test the other end of palmsync—the PDA. Back at home, I have a monstrously old PDA whose functioning I would not guarantee. Fortunately, there's a nice palm emulator which I can install.

Now I just needed to Hotsync. Another non-trivial task, as my computer does not have two serial ports between which I could slap a cable (it doesn't even have one serial port…). After figuring that part out comes another problem. The palmsync failed to register. "No problem," I say, "I'll just reinstall it." And it still fails. Turns out that the necessary registry key used %ProgramFiles%. Don't you just love the registry?

That's all set up, so I click the sync button and… it fails. That wouldn't seem to be a problem, as I've already had to diagnose many errors to get to this point. Except I seem to have hit the critical unsupported limit:

  • I'm using a custom-built version of a program (that's really not saying much, though)
  • Said program is built with a technically unsupported compiler (VC 7.1)…
  • …which is unsupported on my OS (Windows Vista)…
  • …using and old SDK (Windows 2000)…
  • …and it's a 32-bit on a 64-bit machine.
  • The emulator officially supports neither Vista…
  • …nor a 64-bit machine.
  • Hotsync doesn't officially support the 64-bit machine either.

Debugging this proved to be a pain. The conduit logs didn't seem to function. Trying to get the VS debugger's nose in there proved a futile task until I did a world rebuild. Which takes 5.5 hours on my system (blame ext3). I finally got the debugger in, only to find a barrier thanks to COM. Some more work managed to shove the debugger in there and finally allowed me to pinpoint the error. Sorry, errors. Suffice to say that it's a mess in there.

So how did I get palmsync under the debugger? First off, let me describe my VS setup: I have mailnews as a project in VS (an NMake project, to be precise), set up such that clicking the build button actually builds in the mailnews directory (thanks to some custom hacking around msys). Based off of some external documentation for debugging Palm conduits, I created a new configuration (called "Palmsync," imaginative little me) whose command ran the hotsync program with -ic as its sole argument. Popping that under the debugger the normal way in VS allowed me to break on Palm's side of the conduit. On the Mozilla side, debugging is done by firing up an instance of TB under the debugger like normal, initiating the hotsync, and breaking and debugging as normal.

Modulo the fact that palmsync is in reality extremely fragile and apparently too buggy to support my nearly-empty testing profiling, it works now. Time to fix some more old bugs solely to be able to get to the point where I can work on the preliminary work for mailing list sanity.

Friday, November 28, 2008

ABC (meme)

Everyone's doing this, it seems.

A
ActiveSync
B
Bug 71728
C
CSS 2.1 spec
D
Pork (on MDC)
E
Hiragana (Wikipedia)
F
FailBlog
G
comp.lang.java.programmer
H
Hunter × Hunter (manga)
I
I can has cheezburger?
J
Java Language Specification, Third Edition
K
OpenJDK
L
I can has cheezburger?
M
[ School resource ]
N
[ School resource ]
O
Bleach (manga)
P
Planet Mozilla
Q
Questionable Content (webcomic)
R
[ Anime video site ]
S
/. (Slashdot)
T
[ School resource ]
U
BOFH (web... column)
V
VGCats (webcomic)
W
Worse Than Failure
X
XKCD (webcomic)
Y
YouTube (specifically, part of a walkthrough for Golden Sun)
Z
Zero Punctation (web video columnist)

A public service announcement

If you are a programmer who is writing code that will be released to the world as open source, this announcement is for you.

If your code will be seen by the world at large, one of your first tasks should be to write documentation. Document all functions as soon as you write them (before is also helpful). Provide samples on how to use code as soon as you finish a module (or earlier, if possible). Do not wait until your 5.0 release. Do not wait until your 1.0 release. Do not even wait until your 0.5 release. Do it as you write your code. The sooner, the better.

Users of your code will thank you profusely if you manage to provide comprehensive documentation along with the actual binaries of reference builds. DO NOT make them have to scour source code or bug developers in IRC channels to figure out a simple task like "get all cards from an address book" or "how do I write a synchronization conduit?"

Wednesday, November 5, 2008

Why I bemoan Tuesday

On Tuesday, November 4, the United States held its presidential and congressional elections. As I am sure most of my readers know by now, the outcome was to elect the Democratic nominee, Obama, as well as increase Democratic gains in the House and Senate (although probably not filibuster-proof). To many, this news was greeted with elation; I do not count myself as one of those people. Let me explain why.

First and foremost, I dislike the results because the United States missed out on an opportune time to make up for its polarization. A Republican presidential victory would leave the government divided, a prospect I feel would be ideal for the government. In lieu of a true multiparty system, we have two major parties who will, by necessary of definition, tend to stake opposite sides of issues and, furthermore, stake them at opposite ends of the spectrum. Giving the entire government solely to one party—actually more, giving it to one with strong enough majorities to evade some mild dissenters—would have the effect of hindering debate.

During the Constitutional Convention, one issue that the drafters considered was the tyranny of the majority. One of the Federalist papers, No. 10, dealt with this topic, mostly by the argument that a larger region would have more diverse parties. However, the two-party system (among other factors) tends to dilute the power of size; another mechanism should be present.

This second mechanism is the tortuous process of law creation. A common criticism of Congress is that it is slow to act. Yet why should lack of speed of action be a bad thing? If one wishes to expedite a bill, one has to cut something out. Almost all of the time it takes to pass a bill is spent on debate. So improving Congress's reaction time would mean that one has to debate less and therefore rely on the bill being correct as it is. We don't regret any rushed actions, do we?

A divided government would force moderation as the Democrats would not have enough votes to override a Republican veto, so a bill would have to amenable to both (and therefore moderate in effect) to pass. With a Democratic president, the extreme positions could show up in laws more easily without general discourse. The large gains in Congress give the Democrats more ability to cut off dissent; thankfully, though, the Senate looks to not be filibuster-proof.

I have discussed enough on the theory of divided government; I also support the Republican candidate on several issues. On social issues I vary between whether or not I agree with McCain, but I do not base my decision on these. Chiefly this is because I think the president has no real power to enforce these opinions on a national level or even be able to have one of these opinions enter into law.

The primary issues I concern myself are more economic in nature. I am an ardent supporter of free trades and no economic subsidies (especially agricultural ones). In terms of policies such as energy, I opine that the government should not favor one alternative over another in terms of, say, research funding, as history has shown the government to be bad at picking winners.

A more generalized position I hold is that I am not in favor of socializing certain fields. One such field is health care; the issue I have here is not one of universal versus non-universal health care but one of privatized versus socialized health care. I am skeptical of claims that socialized (as opposed to privatized) health care would control costs or make health care more efficient. All studies I have seen compare across countries without accounting for factors that would vary between countries.

Another problem with socialization is that government needs restraint. Historically, governments have shown much delight in stealing from funds to pay for general costs, such as Argentina's pension system. Indeed, the United States is quite guilty of this, as our budgetary deficits are partially ameliorated by taking funds from the Social Security Trust.

The final issue I am concerned about is that of regulation. The recent economic and financial crisis has renewed a call for increased regulation. Lack of regulation played little part in the causes of this crisis. In fact, the entire mess started with the mortgage industry, which is the most regulated financial industry. Similarly, calls for caps on executive wages and windfall taxes have increased, both of which are known to discourage innovation. Note how many major oil producers with nationalized oil companies (like Venezuela) are facing problems maintaining output.

On all of this issues, Obama's policies are in clash with my views. He has called to renegotiate NAFTA, create a single-payer health system, enact windfall taxes, and, naturally, called McCain out on supporting deregulation. At least McCain is candid that the economy is his weak point.

And, most importantly of all, he would likely be able to rush his ideas to completion with less discussion. Oh well, we won't get another National Recovery Administration… I hope.

Tuesday, October 28, 2008

Fakeserver

As of 8:18:19 PDT, comm-central now has a mostly-complete (but definitely usable) implementation of an IMAP fakeserver. This means we now have at least partial implementations of all 4 major mailnews protocols: POP, NNTP, IMAP, and SMTP.

Saturday, October 25, 2008

Synchronization

Since my original plan for mailing list sanity was upset by an underestimation of the insanity, I looked into fixing it in the various components. One place where this happened was in palm sync code. This code is so perilous I wouldn't be surprised to find that it broke somewhere along the line. As happens a lot, though, I got a little sidetracked. I tried to sit down and actually figure out what the heck goes on in there.

Like every other time I forayed into this code, I started thinking about how to improve it. What this code most needs is to be generic; there are bugs for adding support for SyncML, ActiveSync, OpenSync, BlackBerries, and even N-Gage devices. There are a few more bugs not citing a specific protocol or device; surprisingly, though, there are no bugs on supporting Microsoft's or Apple's synchronization APIs, as well as iSync. One might also consider something like Google Contacts to be a synchronization architecture.

In short, for Thunderbird and SeaMonkey to support most devices, they need to bridge at least six different APIs. So how can this be done? Fortunately, there are basic steps that need to be done that are common between them: registration, getting the data from the handheld and locally, resolving conflicts, and pushing between them. The primary differences I could find (aside from exact API semantics) were that some APIs requested changes from the applications (calling back when there were conflicts) while others just dumped handles to everything on the application and let the application figure it out.

How could these functions be abstracted from an API point of view? Here's a rough interface:

interface mozIExternalSynchronizer : nsISupports {
   /** The name that one would see in a dialog to enable/disable. */
   readonly attribute nsAUTF8String displayName;
   /** True if the OS sync manager will grab the changes. */
   readonly attribute boolean wantsChanges;
   /** Register this with the appropriate OS sync manager. */
   void enable();
   /** Unregister this with the appropriate OS sync manager. */
   void disable();
   /** Push a set of changes. (Called only if wantsChanges is true) */
   void pushChanges(in nsISimpleEnumerator changes);
   /** Get the records from the other end. (Called only if wantsChanges is false) */
   nsISimpleEnumerator getOtherRecords();
}

Hmm... if we can mostly abstract away the main sync manager stuff, why not abstract the other part? After all, if we're already doing work to synchronize the address book, it's not a large jump to synchronize calendars as well. And who knows what extension developers may be interested in synchronizing? The managers already abstract over various types; we would only need to provide a common glue over this.

Such a feature would need interfaces for defining what could be synchronized (and UI for doing those separately). I don't have these planned out (I need to look more at the APIs, an endeavor I do not have sufficient time for yet). One thing is certain, though: with these changes, Mozilla could add "cross-platform synchronization manager" to its list of features. Indeed, though my main impetus comes from mailnews's current palm sync feature, it does not need to be specific to Thunderbird and SeaMonkey.

Friday, October 24, 2008

Mailing list sanity

With IMAP fakeserver awaiting review, kill-rdf mostly out of my hands (well, except for the subscribe dialog, which needs something I didn't have this morning), my relatively big news change blocking on news connection sanity, and that sanity now just needing some final touchups, I decided to look around for some more things to start on. I therefore turned back to the address book rewrite, which I haven't touched for about a month. What better thing to fill up one's time than by performing the holy grail of the rewrite, mailing list sanity?

Why do I personally consider this one task to be so important? It's because mailing lists, as they stand, are obstinate little things. They are both cards and directories. As two distinct objects, naturally. Yet they are also not quite cards and not quite directories. That they don't fully implement the scope of either makes them unwieldy to work with: one has to invoke isMailList tests in several places. There are approximately 75 such checks in mailnews/addrbook/src alone.

The extent of reliant code puts me off trying to do a one-patch-to-fix-it-all, so I devised some partial steps:

  1. Make an nsIAbMailingList class…
  2. …and temporarily inherit it from nsIAbCard. This will allow me to remove the isMailList from nsIAbCard (in favor of an instanceof-like check) and move mailListURI to the new interface, thereby simplifying nsIAbCard to almost its final state (copy and equals need more work).
  3. Tweak the directory/collection interfaces so that they return nsIAbItem in key places instead of nsIAbCard. This still needs more spec'ing, but the basic framework is in place.
  4. Move nsIAbMailingList to inheriting from nsIAbCollection. Key milestone here in that mailing lists can no longer be viewed as cards.
  5. Flesh out nsIAbCollection. Doesn't need to be in this order, but this is around the latest feasible time.
  6. Shunt nsIAbDirectory mailing-list-specific stuff to nsIAbMailingList. At this time, mailing lists will also no longer be nsIAbDirectory objects.

Steps 1 and 2 are really more like Step 1a and Step 1b. If you're following along on the road map, these correspond to roughly items 3a (first two items), 3b (the last one), and an extended version of 2 (the third and fourth ones). Nice and simple, the changes are in nice, smallish, relatively atomic batches.

Oh wait. Outlook cards and OS X cards could also be mailing lists. The palmsync nsAbIPCCard (which inherits from nsAbCardProperty as well) also cares about being a mailing list. So to complete the first step in the hackery—making the mailing list cards extensions of regular cards—I have to fix generation of mailing list cards in four different places, three of which I can't build on my normal Linux setup and one of which I can't touch on my Windows build (which I touch but rarely). Not to mention that this is probably complex enough to require a few patches on their own. At least fixing these issues will also fix a few ancillary ones as well…

In other news, I have a plan to do some Linux address book integration post-TB 3. This is no way related to any aggravation I may or may not have stemming from using as my primary development system one that has the least number of OS-specific code. I swear… :-)

Tuesday, October 14, 2008

I can has pony?

Not too long ago, I was working on killing RDF in the subscribe dialog. When doing a stress test (a server containing over 180 thousand newsgroups), I noticed that the nsStringStats output at the end had gone up to somewhere in the realm of 180 thousand strings. It couldn't be coincidence, so I ran some leak logs.

Debugging leaks is never fun. Debugging a leak where the object is stored in a global holding pen for almost the entire session is aggravating, as you need to match the references across practically the entire tree. The worst part, however, was that the leaking reference came from JS. After a very long, arduous job of matching up the global references, I narrowed down the suspect to this call (190 characters of whitespace happily elided):

js_Invoke
 XPC_WN_GetterSetter(JSContext*, JSObject*, unsigned int, long*, long*)
  XPCWrappedNative::GetAttribute(XPCCallContext&)
   .L1287
    NS_InvokeByIndex_P
     nsMsgDBFolder::GetServer(nsIMsgIncomingServer**)
      nsCOMPtr::swap(nsIMsgIncomingServer*&)

I just need to look for the JS code that calls nsIMsgFolder.server. There's only… a bit more than 100 of those. Furthermore, the leak didn't seem to stay, so I chalked it up to some sort of GC thing and moved on. (It actually was a valid leak in that the server was left with a reference to a JS object referring to the server again... but I fixed that after I realized what was going on)

In any case, it did leave with an extreme sense of frustration at debugging the leak. Seeing all those js_Invokes lining the stack was the worst part, as the information so is tantalizingly close, yet just out of reach. So I naturally reiterated an opinion I've had for a while: "Why not just integrate JS stacks in the stack trace?" The problem with that is that most tools have different ways of walking the stack…

But wait! There is one place in Mozilla code where there's a common way to walk the stack: nsStackWalk. Therefore, it might be possible that I could coerce that code a little to replace each js_Invoke with the information for the JS function it's actually calling. And, success! With some caveats:

  • It only works in the one definition I have access to (x86 gcc Linux).
  • It relies on being able to tell that I'm calling js_Invoke during the stack trace, i.e., the symbols must be in the same binary.
  • It doesn't do function names, only filenames and line numbers (better than nothing!).
  • It makes xpcom/base depend on js (but not xpconnect!).
  • It relies on js_Invoke's first function parameter being the JSContext *, and relies on the cdecl function call (i.e., the parameter is the one right above the return address).

It's not perfect, not even review-ready, but it's usable.

If you don't get the title, it's parodying this relatively popular internet meme. Without any pictures, though

Friday, September 26, 2008

I swear I'm not bored

Are you tired of using ancient, inflexible, incomprehensible subscription dialogs? Don't you just wish there were an easier way to get that folder subscribed? If you answered either "yes," "no," or "what the hell are you smoking?" to any of those questions, then Deradifablatia™ (Corticol Mendiothrone) may be for you!

Deradifablatia™ is the new system for using subscribe dialogs. It comes recommended by 4 out of 5 medical studentsdoctors and is the preferred method for subscription. Don't be fooled by cheap imitators, ONLY Deradifablatia™ has the extra subscribing power necessary for all your subscription needs!

But wait, there's more! If you act quickly, we'll throw in a free bottle of Bachlorablatia™ (Rectical Mendiothrone) for all your space-saving needs. A $946 value, FREE! Yes, that's right, FREE!

But wait! The next 10 callers will also receive our complementary Forchoablatia™ (Utensal Mendiothrone). So you can get a lifetime supply Deradifablatia™, a bottle of Bachlorablatia™, and a bottle of Forchoablatia™, a $3768 value for only 5 easy installments of $84!

Warning: Deradifablatia™ may cause drowsiness, dyslexia, paranoia, death, or insanity.

Wednesday, September 3, 2008

The identity crisis

Identities in Thunderbird leave something to be desired. The basic idea (insofar as I can tell) is that it represents some key composition information: your signature, name, and also whom should be automatically BCC'd or where the sent message should be placed, and finally, which SMTP server you should use. Let's look at an example of ideal usage.

I use profile for Thunderbird in several different manners. First, there is my Mozilla stuff; I would like to collect all my Mozilla-related messages, etc. into one place. Second, there is my Usenet persona, which is different. Then there is school, and finally other personal relations. So I should (ideally) have four identities each representing how I want to appear. When I write a message, I should be able to select which identity I want to express.

Now how does it work right now? I have 9 identities--each new account I create (4 email, 5 news) creates its own identity. Deleting them more or less requires me to go into my prefs.js (or about:config) and pruning them. Oops. The compose window right now is probably confusing (I have four email addresses selectable but 9 entries in a drop down saying "From"... hmm...), but then again, compose needs its own basket of love. Finally, simultaneously posting to a newsgroup and emailing is... hairy, to say the least. The normal case is at best confusing or redundant while the abnormal case is infuriating.

There's a third piece that needs consideration with identities&emdash;account type extensibility. RSS accounts in the future may want to allow you to comment on posts&emdash;or possibly even write new posts yourself. Similarly, other account types will probably want to have some way to let users write in a generic, HTML manner, what composer is good at. The difference here is in means of transmission.

From a user standpoint, all of this can be done in easy steps. Account creation wouldn't create new identities for each account type, but would allow you to specify an old one or have you create your own. For extensible account types, all that has to be done is to mix in your own compose stuff, present the necessary UI (if any), and the necessary backend hooks. Posting to, e.g., both news and email is as simple as adding a registered newsgroup header (i.e., Newsgroup) and adding a registered email header (i.e., To, CC, or BCC) in the compose window.

Thursday, August 21, 2008

Rewriting and mork

The Great Address Book Rewrite is progressing along. Now that cardForEmail and cardForProperty are both done (note: I am including as "done" what is merely awaiting r/sr), casual users of nsIAbMDBDirectory and nsIAddrDatabase can be refactored to use generic directories.

The directory one is completely eradicated (outside its valid uses), excluding the not-as-invalid use of being used for the database and its use as an include for some constants, which I'm not sure is still valid. Databases, on the other hand, are still very much alive (I intend to kill off the one extant in nsMsgCompose shortly).

Databases are used, of course, in the MDB directory stuff. But LDAP replicates only to a database. That's probably not hard to fix. Palmsync uses the database quite liberally. That's annoying because palmsync doesn't compile by default, I don't use it enough to be able to test for regressions (I have a PocketPC, not a Palm), and I develop primarily on Linux, so my best shot is most likely to take a go at cross-compiling it.

Last but not least is the granddaddy of database usage, import. It in fact uses nsIAddrDatabase more often than addrbook itself, and that statistic is lopsided since nsLDIFService in addrbook is the LDIF import, and addrbook includes it for constants a lot and also defines the interfaces. And import consists of four different importing address books, all of which have to be changed, and at the same time. My previous aborted attempt to eradicate the database from nsLDIFService alone took hours, required careful removal of hacks, and was probably not tested. The kill-database-in-import patch will likely find itself over 100 KiB.

In any case, I now believe that the setup is sufficiently abstracted to permit me to start work on an SQL-backed address book to replace mork, with the goal of finishing it by TB 3.0.

Saturday, August 16, 2008

The Great Address Book Rewrite, now on your machine!

Since committing the changes to the nsIAbCard portion of the refactoring, the Great Address Book Rewrite has been progressing faster. I pushed the changes to bug 449618 and am writing tests for bug 450194, while yet putting finishing touches on bug 450197. As these changes come in, bits and pieces of address book code will bitrot.

So what does an extension developer who wants to keep up with trunk builds do? Use my handy address book rewriting branch. If your extension works on the tip of the tree or its various branches, it will likely work on the trunk when the patch is committed.

The repo is also world-writable (as long as you have access to hg.mozilla.org), which means that anyone else with in-progress Address Book rewriting patches can push there too. Don't worry about creating new heads, I'll sort all of that out when I see changes to the repo.

Finally, I would like to point people to the tentative roadmap to the rewrite.

Thursday, August 7, 2008

The Great Address Book Rewrite

The nsIAbCard portion of the Great Address book rewrite landed not long ago. This patch represents the first major overhaul of the address book interfaces in a while. Most specifically, it removes the nsIAbMDBCard interface, adds the nsIAbItem interface, and completely overhauls the nsIAbCard interface from being a flat list of properties to essentially a spruced-up hashtable.

These changes, 295.33 KiB of them for a -U 3 patch, comprise a removal of 2,732 lines of code for 1,871 lines in a total of 52 files, some of which are OS-specific, and the culmination of over 6 months of editing and revising, a timespan also including a change of computers. I've changed the internals of nsAbCardProperty at least three times, and fixed two bugs in a heretofore unused mork class. The APIs I worked off of have changed several times, partially as a result of my experiences with the work, and partially otherwise. Finally, I have my chance to bitrot other people's WIPs instead of finding that any addressbook change bitrots the patch I have (or another in my queue).

The good news: nsIAbCard is now effectively stable to use in extensions or other core code not yet pushed. The only caveats are these:

  • getProperty* has had some late-breaking API discussions, related to how to handle properties that don't exist. It is possible, but not likely, that semantics of existing methods would be changed.
  • copy is not the best thought-out method and will change in the future.
  • equals has some potential problems, such as the fact that its use in circumstances may violate the reflexive property.
  • UUIDs were stripped from the patch pending future design decisions.
  • The two pre-existing properties will exist only as long as mailing lists are insane.

Will this patch create regressions? Probably one or two minor ones; changing an entire internal model, a heavily-used API, and implementations across several files, some of which are OS-specific, and others which are app-specific makes a regression likely. But I have taken effort to find problems, combing through the little import/export I can test with Linux, as well as using all the features I could. grep has proved invaluable, as well as a few tests of Dehydra to find callees.

What's next from here? The next logical step is the creation of nsIAbCollection and related refactoring. However, async APIs and error conditions are still in flux for that. Not all of the OS-specific directories are fully squared away with the new model, but I have limited testability in those areas. UI unforking is a place where we could win big, especially as it saves me a few places to look for usages. Import and palmsync may also want some more love. Naturally, there are also the spin-off bugs, but many of those are not immediately fixable.

For me, at least, I now have some time to work on other bugs while I wait. There are 12 other bugs in my queue, ranging from an IMAP fakeserver to an aggravating news bug to a brand new module (O.K., total rewrite of an old module), 6 of which are more or less ready to be committed. Oh well, life goes on...

Tuesday, August 5, 2008

I want a pony #2

Okay, continuing on trying to find obsolete members. One feature that I'm sure everybody would love is a JS static analyzer. And while we're on the topic of JS, how much longer until Mozilla gets ES 4.0 and static typed JS?

Maybe this isn't so much a pony as an entire ranch, though...

I want a pony

As I was walking through nsIMsgIncomingServer in search of unused attributes, I noticed that some attributes were added in a revision and their usages removed later. But the checkin comments don't give me any clear indication of when it was removed and what replaced it. Anyone fancy binary searching hundreds of checkins just to find one property?

A simple solution to this is a reverse annotate feature: a script or similar that, given a revision number, will look through the subsequent revisions and find out which version (if any) first changed a given line. Bonus points for ignoring whitespace changes. I could probably hack out such a feature myself for hg, but I know so little about CVS that it's hopeless for me to try there. And since most of the important history of mailnews is only in CVS, an hg-only becomes pointless, at least for the next, oh, 5 years.

And before anyone complains about me wanting a pony, let me tell you this: my cousin actually got a pony for Christmas last year (after asking about it for the last, oh, 17+ years).

Monday, August 4, 2008

Thanks, dbaron

Earlier today, bz r+'d bug 366791, and dbaron then checked it in. For anyone building TB, this represented about 95% of the assertions and 60+% of the output (being such a long assertion doesn't help) running TB normally. This makes breaking on assertions feasible now, with so few false positives. Now, the only spammy thing left is the "recurring into frame construction" warning.

So, in short:

Thanks dbaron!

Wednesday, July 16, 2008

Profiling made visual

When you've got a performance regression resulting from a major patch, pinpointing where you can save time can be annoying. For me, on Linux, the only decent tools is jprof. And I didn't get far in jprof before tripping over a bug in its code that made reliable testing infeasible (ternary operators are wonderful things). After fixing that, I turned to the output.

The output is basic, and, in general, not helpful for deep inspection. Okay, so I know that I'm spinning in this function in specific. But, on a grandiose level, which functions am I really spinning hard in?

In one trace, it's obvious that malloc, JS, card creation, and case conversion are being nice and expensive. All four of those are more or less unavoidable. Where else am I wasting time? It's hard to tell, since many of the top functions produced by both flat and hierarchial views are wasted by irrelevant subfunctions of these. Enter graphviz.

Graphviz is a wonderful library I discovered about a year ago. It takes a file that looks like the code on the left and makes it into the graph on the right.

digraph G {
  A -> B;
  A -> C;
  B -> D;
  C -> D;
}
Graphviz output

The output gets better as you tickle it more and more. But it's flexibility is not why I love it. It's the fact that the simplicity is such that one can easily just write a simple sed or awk script to generate the graph. In the following three commands (that could just as easily be one command, but I'm not that cruel), I took the ugly jprof output and formatted into an easy-to-read graph:

jcranmer@quetzalcoatl /src/tree2/mozilla $ cat tmp3.html | sed -e '/index/,/<\/pre>/!d' -e '/<A href="#[0-9]*">/s/^.* \(.*[0-9]\) \(.*\)<\/A>$/c|\2|\1/' -e '/<a name/s#^.* \(.*[0-9]\)</a> <b>\(.*\)</b>$#f|\2|\1#' -e 's/<hr>/e|--/' -e '/|/!d' -e 's/|\(.*\)(\(.*\))|/|\1|/' -e 's/|.*::\(.*\)|/|\1|/' | awk -'F|' 'BEGIN { skip = 0; print "digraph G {" } $1 == "c" { if (skip == 0) { count[$2] = $3; } } $1 == "f" { for (func in count) { print "\"" func "\"->\"" $2 "\" [label=" count[func] "];"; delete count[func] } skip = 1; print "\"" $2 "\" [sum=" $3 "];" } $1 == "e" { skip = 0 } END { print "}" }' > ~/full.dot
jcranmer@quetzalcoatl ~ $ cat full.dot | gvpr 'BEG_G { $O = graph($.name, "D") } E  { if ($.tail.sum > 200 && $.tail.sum < 1000) { copy($O, $); } }' > full2.dot
jcranmer@quetzalcoatl ~ $ dot -Tpng -o full2.png full2.dot

Now that I've most likely burned you eyes out by using a sed, an awk, and a gvpr (something like awk, but for graphviz) script all from the command line, I feel the need to explain what it's doing. The sed script, in order, grabs only the hierarchical portion of the jprof output, changes the lines into simple fragments surrounded by pipe characters to be readable better by awk, and then scrubs the C++ demangled names into simple function names (although not perfectly). The awk script then compiles the information into a dot file mapping the call graph and annotating the nodes with probe frequencies. Next, gvpr scrubs out all nodes with more than 1000 probes or less than 200 probes. Finally, dot gets a hold of it, and makes a nice PNG of it.

And the PNG is informative. Although enormous, the information leaps out immediately. Floating high up are five functions which are expensive, the fifth of which I never noticed: XPCThrower::ThrowBadResult. Hmm... I quickly threw up a graph of the pre-patch results, and confirmed that it wasn't in the top slots there. Doing some basic math, this one function, and results off of it, produces about 60% of the current regression, assuming that I'm looking at the numbers right. Who said throwing exceptions was cheap?

Anyways, my visual approach to profiling isn't complete. The graph is in plain black and white, where I should be using colors and line thickness to be representing the expensiveness of operations. I might also play around with tickling the data to be able to highlight exact functions where regressions occur, something that I could easily do with gvpr if I had two dot graphs of the translated output. And my output filtering isn't perfect by any means. But all that comes for free in my envisioned perfect profiling extension. Oh well, at least I have something to point to for neat data.

Thursday, July 10, 2008

2900- UNCO bugs

As of a few minutes ago, Thunderbird's UNCO bug count dipped below 2900: it stands at 2892 right now (likely to change within a few minutes). For those of you not satisfied with only one products bugs, the sum of all mailnews UNCO bugs (TB bugs + mailnews portions of MAS and Core) is 3236, with Core contributing 227 and MAS with the other 117. Keep triaging!

Wednesday, July 9, 2008

Mork and SQLite

As I was doing some performance testing for a bit of demorkification, I noted that what I was demorkifying was one of the few places where mork works well. The API makes it very clearly an EAV model, which is a model that makes mork happy. On a higher level, it would be hard to position it in a way where the queries only affect one row (or column of a row in mork).

Mork is a model likes EAV and hates queries. That means keys need to be meaningful, since that's the only query you can do without looking at everything. Meaningful keys create problems, though.

SQLite is a model that loves queries but hates EAV and other models which don't rely on batching. That means that to get good performance, you should really be using larger batch queries. The biggest implication is that querying suddenly drops down to a lower level of index. If you pick the queries and schemas right, SQL can knock searches down from O(n) to O(lg n), and forgo object creation costs as a benefit.

But back to the patch at hand. Basic testing shows that, optimizing what I have to the point of not changing some core implementation stuff, it regresses badly in relative terms on two infrequent items (creating and removing files, i.e., adding/deleting folders). On actually retrieving the cache elements, it does the same (under <1ms average for both on a release build). Surprisingly, mork hates commits, consistently taking longer without a compression and averaging at 15.36ms for 50 commits, while SQLite takes 2.41ms on average for the same 50 commits. The cost, though, is in get/set of the actual EAV pairs; while all operations are under 35ms for 5000 get/sets for SQLite, mork can do the same for 10ms per operation. But mork's subsequent speed comes at a 100ms startup price (for 50 folders, heavier users will consume even more time).

Mork's fast query comes at the price of a slow startup: it loads everything into memory. SQLite could do the same thing, which would improve the get/set times but weaken startup, as well as memory usage as well. Mork also makes committing—which apparently happens more often than actually setting a property—more expensive, especially if you've neglected to compress recently; SQLite shows no loss if you neglect a compression, and (for something this small and static) no strong benefit to vacuuming. I couldn't find any services to measure memory easily, so I'm relying on assumptions there.

Thursday, July 3, 2008

Bugs and trends

As anyone who hangs out on IRC, subscribes to Mozilla's newsgroups, or reads key mailco blogs on a regular basis knows by now, Thunderbird has dipped under 3,000 UNCO bugs. The key chart on the matter now shows quite clearly that we have beaten that number, the first time in about a year. But this one metric does not reveal all.

First on my list of things to point out is the trends. The UNCO count peaked at 3500 and a bit earlier this year. Extrapolating is not the easiest thing here, but it seems that the overarching trend before the recent freefall would have pushed the UNCO count as high as 3700. Meanwhile, one might fret at the uptick in NEW bugs. However, the overarching trend may have been changing before the bugdays (very hard to tell); worst-case, it's only 200 above where the trend line falls. So 700 bugs have been "confirmed" in some sense at the expense of 200 of them staying valid. Impressive work.

Second is to point out that Thunderbird bugs don't represent the entire picture. Combining the bug counts from the Core mailnews components (Mailnews: * and Networking {IMAP, POP, News, SMTP}) and the mailnews side of Seamonkey as well, the aggregate UNCO count peters out to 3268 as of this writing (which is in the middle of a bugday, albeit a rather tame one). All open bugs come out to 10913.

Some components are "healthier" than others. I would consider a component healthy if the number of UNCO bugs is low with respect to NEW bugs, and the NEW bug count is low with respect to all bugs. For core components at least, I would consider both thresholds to be about 10%. The component I follow most closely (Networking: News) satisfies the first metric but not the second (7/143/897). I've not worked out numbers for the Thunderbird side, but a 30%/10% does not seem irrational. Surprisingly enough, Thunderbird does have a NEW count of about 10% of all bugs (2187/20403); the only component not close to this is RSS (106/658).

Time to jump back into bugday and nurse a sick component back to health!

Sunday, June 22, 2008

Threading in mailnews

I was cleaning up my WHATWG mailing list folder—a task which mostly involves looking at the subject of a message and deciding whether or not I cared to keep this piece of correspondence—when I thought about how threading interacted. If you haven't subscribed to this mailing list (which I doubt most readers have), the main WHATWG author (Hixie) writes a message which is a reply to several messages at once.

A brief aside, if I may: for completely unrelated reasons (responding to a new bug which turned out to be a dupe, stupid me for checking validity for duplication), I was perusing RFC 2822, specifically the In-Reply-To field (§ 3.4.6). Interestingly enough, the case of a message having multiple parents is quite well-defined in the spec (and Hixie violates the spec on this point). I did a brief check of the code on this point, and the code will handle the theoretically correct case fine (using In-Reply-To in lieu of References, which is not quite correct, but works for the purposes of threading).

Anyways, the thing that caught me the most was that I often cared more about Hixie's catch-all reply than the earlier message to which the reply had been attached. In essence, I wished for the ability to reroot threads. I thought a little more, and listed other threading enhancements I wanted. But there already is a mammoth chart of threaded view issues—see bug 236849 for a sublist of many of these.

At the core of threading, one can distinguish several levels of threading. The basest is none at all; this is represented by turning threaded view off. Second is relying on subject: one can only tell that two messages are related by this methods, but not which is a reply to the other. Third is typical threading, relying on In-Reply-To and References, which works well. Fourth is what I like to think of as über-threading: parsing the message text to determine the quoted replies and use that to determine the parent of a message. Fifth, and highest, is the ability to redefine threading as the user sees fit. Note that most of these are orthogonal, so that one can have a combination of the inner three to determine a thread's parent.

The utility of redefining your own threading is hard to over-state. How many of you have received email where people blithely hit "Reply to All" and start a new message like that, but others in the same category legitimately use reply features? I myself have one thread like that composing 20 different real threads. Other times you hit those cases where someone one a borked client (*cough*Yahoo!*cough*) and someone changes the thread subject, or a confluence of mailing lists and forwarding and replies (four threads where one is warranted, again in my inbox).

There are touchier areas with respect to threading. For example, the notion of subthreads is powerful (there are RFEs to implement practically every "Apply xxx to whole thread" as also an "Apply xxx to subthread"), but it is a pain in the backend, not least of which is the fact that we have some other bugs inducing loops into the thread hierarchy there. Similarly, the question of what do with multiple parenting (both how to represent it and how to generate it) can be touchy on the UX end. A final thorn I would like to specifically direct your attention to is the idea of dummy thread headers, as referred to in jwz's algorithm, the seminal work on the matter (ignore his anti-NS 4 rant, however, he lives in the glory days of NS 2).

On the other hand, don't expect me to implement any of these improvements soon, nor anyone else for the matter. I merely wanted to express my opinions as Thunderbird drivers debate UI on a higher level, with a tendency that seems to be somewhat towards ignoring some of the finer aspects of good message threading. Ah well....

Wednesday, June 11, 2008

Documentation in Mozilla

Having worked in depths of poorly documented, just plain undocumented, or, worse yet, misdocumented code, I have started taking some initiatives on documenting code. Working with db48x, we have improved some of Mozilla's documentation framework (achievable with make documentation). I'm still polishing the fine edges of bug 433206, but what's in there should be sufficient to make spiffy documentation. The other important component of fix comes from doxygen bug 535379, a simple fix that handles Mozilla's IDLs better.

There's still more to go. There should probably be an official documentation guide for mozilla or at least the components. Someone patching up SVG and dot in doxygen would be helpful, especially the annoying URI mistake.

But the important part is how to document code. At the moment, the class list is provided in a 5-by-several hundred line table containing every IDL file and all exported headers. Wondering about how to do some IO foo, but don't know where to look? Right now, your only choice is to go through this entire list, guessing at names that would produce the right magic. Ideally, however, the documentation would include separate modules that make querying easier. However, before I make a commitment, I need to investigate how namespaces interact with doxygen for best results.

So, the important question is basic documentation. Doxygen's manual is a good starting point, but I'll brush up on basics. Documentation is signified by, alternatively, /*!, /**, /*<, ///, //!. or /*<. The ones with < in the names are used for post-documentation. A comment consists of a brief description (one-sentence, punctuated by a period), followed by potentially several paragraphs of almost-HTML code (doesn't have all HTML tags). Interspersed, though typically at the end, are doxygen tags, denoted by your preference of @ or \ (the majority of code uses @, just to warn you).

To describe all the tags would be arduous and pointless. Common ones are exception (the nsresult values), note, param, and return. See should probably be more common as well. Links to other documentation can be generated by providing the fully-qualified member name, e.g. nsMsgFolderFlags::Directory. Code can be further grouped by using the name tag and @{...@}. The latter signifies a group; one can also distribute comments across multiple members using the format.

More advanced documentation that might be helpful: lists (you can use HTML tags or -, #, and indentation, to represent unordered lists, ordered lists, and nested lists in said order). Formulas can be specified in LaTeX format if you really need it. Message sequence charts can also be generated, as well as generic dot diagrams as well, in addition to the ones doxygen generates for you. But the documentation pages can never be better than the sources from which they are derived...

Saturday, May 31, 2008

Updates

So, as May draws to a close, I have some updates on my mozilla work. First off is the pain I have been enduring as I work on listarchive. As of right now, I can create an account through the account wizard, and have it show up in the folder list without anything blowing up (which is not trivial). It turns out that the lack of the folder being initialized through RDF was that I retrieved it from a direct create instance instead of retrieving it from the RDF service. With a few more functions, I could probably avoid the RDF service, but until jminta finishes his war on RDF, it's just as well to use RDF and have things work magically. Of course, listarchive still requires some modification of sources to work, and don't even think about asking me to port it to Thunderbird 2, it's bad enough on trunk!

In addition, I have put some quality time in with my documentation of creating new account types. A lot more exists in my local work than exists on said page in part because I want to have working prototypes before committing to a guide, and in part because I need to make sure I'm not relying on deep-level unreliable code before posting.

Part 3 of work concerns documentation. If you rely heavily on mxr, you may have noticed that over the past week or so, some idl files got some nifty diagrams. These diagrams are created by doxygen (db48x is so nicely hosting them). You can generate these diagrams and doxygen documentation by typing make documentation from the root makefile. It is helpful to include this update to mozilla, and this update to doxygen. The mozilla update squelches some problems and changes some options to make documentation look better. The doxygen update makes doxygen like mozilla's IDL files better. If you want more tips on documentation, this site is a good starting place. (Writing some mozilla documentation guides is creeping up on my todo list as well).

Various other small things: I've been working on the address book rewrite some more. I have some updates to fakeserver related to writing some tests for old news subscription bugs, and to fix some bugs in code that was never tested. Updating morkreader, doing some pref analyses of de-mork of panacea.dat. I've even found time to start maintaining an account manager rewrite proposal.

Break time's over, here weI go!

Sunday, May 11, 2008

Visual Studio and Mozilla

Visual Studio and Mozilla, sitting in a tree, K-I-S-S-I-N-G… Okay, maybe they're not kissing, but I'm getting them to like each other a whole lot more. I have a quick-and-dirty script that creates the filter list for the .vcproj from the mozilla directory. It appears, however, that mozilla has something on the order of 23000 .cpp, .c, .idl, and .h files, which makes loading the class viewer a bit slow. It also takes quite a bit of time to load the project in the first place.

Right now, my script only loads .cpp, .c, .idl, and .h files into the project, and does so dumbly. Version 1.0 will probably read the Makefile.in files to determine which directories we shouldn't look into, based on a simple heuristic of GNU_CXX being undefined, and OS_ARCH equalling WINNT; I may make some broad-based assumptions about platforms as well (don't include MOZ_THUNDERBIRD unless we're thunderbird, e.g.). It will also prune "empty" directories; in this regard, perhaps generating the folder tree from the output would be easier.

Still to do is to set up options such that clicking "Build" or "Debug" does The Right Thing™. I'd also like to investigate some performance problems, but that's a long ways away. Eventually, I'd like to hook up the project file to a customized add-in that adds mercurial checkout support and adds/removes the files as happens in the hg repo (but it won't run those commands: I'm assuming people are using shared-source trees here).

Announcing Start of Listarchive!

Last night, I started work on the first of my triad of extensions detailed in this earlier post, listarchive. So I can't come up with a good name. The most up-to-date information will be available at this web page, but I will summarize some key points here.

listarchive is an extension that first and foremost aims to provide sane access to mailing list archives as if they were a simple folder in Thunderbird (and probably Seamonkey as well). From a certain user request, it is likely I will touch on other aspects of mailing lists as well in this extension, e.g., the desired "Reply to List" feature. In addition, however, I will also be using my experience with developing this extension to write a guide to developing extensions involving more complex operations in Thunderbird.

The extension, however, is still very much in the design phases. I would very much welcome any feedback to be sent to Pidgeot18+listarchive@gmail.com (with the "+listarchive" component, it helps me triage replies). Specific pieces of information I am looking for at this point:

  • Mailing list archive URIs; I ask in advance to not send any more mailman URIs (unless it happens to be in a different language than English), as I have several already. I am most interested in accumulating a diverse supply of mailing list implementations and international versions to best determine what impact internationalization plays on list archives.
  • Ways in which you think a mailing list archive should tie into regular email. For example, I believe it should translate a mailing list URI reference (e.g., http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2008-May/014646.html) into a link to open the message up as a TB email message.
  • Any questions or suggestions for UI (this comes more into play once I have some prototypes working).
  • Any other question you may have (related to listarchive). I'm quite willing to answer them.

Thanks in advance for any comments you send me. I'm also hoping that this will work out well (no reason it shouldn't).

Monday, May 5, 2008

An Update on Life

So I am quite happy today—then again, I'm always happy, but I'm happy for a specific reason today. Over the weekend, I finally got my new laptop. Now, it's not about to start running Crysis anytime soon, but it's quite good for a mid-range laptop, especially now that I'm determined to do things right.

Step one was to install Linux; I use Debian more for historical reasons than anything else, but it works. This time, I didn't manage to accidentally corrupt the Windows' NTFS tree thereby requiring a reinstallation, nor did I try to netinstall off of a wireless connection. However, wireless has proved as daunting a task to solve as it did the last time I tried to do it. After much finessing and going everywhere for information, the irksome driver finally worked under ndiswrapper randomly today. Still requires a lot of manual intervention, but at least I know how to get it to work.

Also fixed today were two irksome problems. My resolution is now back up at the comfortable 1280x800 instead of an annoyingly fat 1024x768. The mouse is now configured to give me a middle click when I click both left and right buttons, a boon for the touchpad. Finally, the most irksome of my problems was the inability to use <Alt>-# to switch screens in irssi; even that's been fixed.

I don't think I did too badly on those two exams I took today that I should have been studying for instead of setting up my laptop, so that's reassuring. Mozilla's building fine right now (I'm waiting for libgklayout.so to link so I can laugh in its face!); scp'ing my data from my old machine seems to be progressing smoothly.

That said, I've still got things to do. I'm planning on finally getting around to requesting CVS privs today. fluxbox still demands some configuration, and laying down on the carpet for lack of desk space is beginning to aggravate my elbows. Hey, libgklayout.so finished building, and I didn't even notice it. There's also the Vista thing of sides to shore up. So far, the only good thing about Vista that I can find is that it comes with better games than XP. I haven't tested the games I've actually paid for yet, though...

Sunday, April 27, 2008

Software Bloat

Although a belated response, this is an attempt to respond to some of the ideas presented in this newsgroup message, and a few other ideas hinted at in subsequent postings, as well as not publicly espoused.

One of the great culprits of modern computer society is the idea of software bloat: that an application has grown too large to be effective. Microsoft Windows and Office are regularly accused of this sin, but even Thunderbird and Firefox are accused; I'm sure that one could dredge up opinions as to why the Linux kernel or X is too bloated. But are these terms fairly applied?

An adage holds that 80% of features are used by only 20% of users, but the problem is that the only-used 20% is different for each user. I personally don't use mail-merge in office applications at all, but I tend to heavily use the advanced outlining options, for example. Does this mean that the mail-merge feature should be stripped? What about macros--evil for most of the public but invaluable for those who heavily rely on them?

I recall reading Slashdot's response to the awesomebar in Firefox. The average response consisted of yelling why it was such a bad idea to include it, most likely by those who never used it before. I had the same response at first, but quickly found it invaluable: going to a specific bugzilla bug allows me to type the bug number, or heading to RFC 3977 by typing just that. And for that site where I remember the title but not the freaking URL, well, that explains itself. In fact, now, I get annoyed when I realize that I am not using the awesomebar because the regular URL bar is so cumbersome. So is the awesomebar bloat? To non-users, yes, but to users, no.

A more specific example to me is Thunderbird. The aforementioned post consisted of a rant (I'm not going to glorify it) as to why Thunderbird should stick to email and only email, no RSS, no calendar, no NNTP, no address book, no... half of it. Would it be better? The RSS component probably detracts from Thunderbird by being half-implemented (I will slyly add in that news has had even worse problems, and few call for that to be stripped out unless prompted), but would one complain if it was well-implemented? I think not. And there are strong reasons, too.

But there is no clear line to draw when excluding features. News is a worthwhile component to include in an email reader: it is quite close to IMAP in many regards. If one includes news support, why not include uuencode? It's useful for multinational stuff or alt.binaries. Why not then yEnc? Combine-and-decode? X-Face? Advanced message scoring? Feature XYZ? At some point, someone can call it all "bloat", but where is the line you need to draw that makes it not bloat?

What's the alternative to bloat? Look at the ultimate example of bloat, Windows; the non-bloated alternative is Linux (Mac is somewhere in between). You have several varieties of Vista, but hundreds of Linux: Debian, Ubuntu, Gentoo, Slackware, OpenSUSE, Red Hat, Fedora Core, etc. Your desktop? The big ones are GNOME or KDE, but FVWM, Fluxbox, Blackbox, XFCE, and many more exist; cross-interoperability, especially between GNOME and KDE, is not close to what Windows provides. In the place of broken one-size-fits-all, we have a multitude of alternatives which can do poor jobs of talking to each other.

The answer, many claim, is in the idea of pluggability. But, here too, the line is very blurry. At what point do you say that extension XYZ should be included into the core? This is what happened with RSS: it first existed as an extension and was later integrated. Although not well-versed in Firefox history, I would be willing to posit that extensions there too have similarly become incorporated. The sum response is that more plugins are included into the core until someone, once again, screams about the bloat, forks, creates a leaner version, which then becomes just as bloated, ad infinitum.

So, what can one do about bloat? In my opinion, the true, final answer is to let the user decide for him or herself what features are needed. Being able to easily compile the source code with options to turn off feature X goes a long way to this. Increasing modularity helps, and, perhaps most importantly, the biggest single help would be to have true, open, universally-supported standards, not just for online communication, but through interprocess and interplatform communications in all regards (i.e., standards closer to an ODF specification than to XML).

Sunday, April 6, 2008

The Great Addressbook Rewrite

As I write this, bug 413260's first part is almost complete. At a diff of -2639/+1747 (47 files changed), the first portion is quite large. That said, substantive changes take place in only two files, accounting for over half the patch. A small portion is given over to the removal of nsIAbMDBCard, while the rest of the patch centers on merely changing the access points. Its size is mostly due to the large number of places it is accessed, not because the feature set is complex.

My original plan for implementation would have followed three large patches with errata in various other patches. The nsIAbCard is relatively self-contained and atomic, so that was how I came up with the plan. However, the startling results from attempting to do nsIAbDirectory refactoring makes this idea unfeasible.

My goal was to remove as much of nsIAbMDBDirectory as possible, which would require invalidating nsIAddrDatabase. That requires major changes to import and palmsync code. Throw in the difficulty of going from a file to an address book directory or a database to a directory and you get some hair-raising complexity. Then add in the fact that mailing lists are, well, black magic, and this process becomes especially complex. And large: my latest WIP touches some 68 files with a diff of a mere -693/+836.

What causes the sheer magnitude of difference in complexity? A card is quite lightweight: it is essentially a map of properties with a bit of extra stuff. There is therefore only one implementation of a card with five sub-implementations that refine it somewhat. A directory is the opposite: a heavyweight object with 4-6 (depending on how you count them) different implementations, sharing little meat among them. A mailing list is quite like black magic: they pop into existence when you need them, and creating one requires some non-intuitive steps.

So, what's my new plan? Well, lesson #1 is to stave off nsIAddrDatabase-gutting until after mailing lists are done. The next steps will be more atomic: the creation of nsIAbCollection and the involved refactoring come is first (probably split up into two or three patches), followed by mailing list sanity with nsIAbGroup (again, maybe a few patches itself). Only afterwards will I remove nsIAddrDatabase from import and palmsync (probably in two batches). Trivial removal of nsIAbMDBDirectory (i.e., where it is used for cardForEmail) will probably be in one of the first directory patches.

Of course, some of the more ambitious changes will need tests, but it looks like hwaara is being nice and writing tests for import (see bug 421050).

Monday, March 31, 2008

Pluggability

Pluggability is an area where there is often still much to explore. A fair amount of the mailnews work involves this area. The address book rewrite (bug 413260) is designed primarily to refactor the interfaces to make adding the SQLite backend (bug 382876) easier: a fair amount of code assumes mork than is healthy. My current WIPs have limited this code to portions within addrbook, palmsync, and parts of import (i.e., not much at all). Not only is the SQLite backend going to benefit, but LDAP does in part (although I have not implemented enough of its methods), and probably Outlook and OS X directories as well.

Another large area where mailnews could use a pluggable interface is storage. In my last post, I discussed my thoughts on storage. To clarify one point: I am not against pluggable storage APIs or even implementations of most of those presented; it is just my opinion that mbox should probably be the default in lieu of a better choice.

A third area is the account manager. As far as I can tell, the only thing everyone can agree upon is that it does not work well as it stands. Try creating a new account type and you'll see what I mean. Killing RDF will help this in part, but the current consensus is to hold off a rewrite in this area until post-Thunderbird 3.

To keep this post short, I'll omit details in the other areas where we could use pluggability. The view pane could use it at least in part (see this site for more information). Supporting synchronization with mobile devices (several bugs exist here) would need one, if we want to go beyond palmsync.

Thursday, March 27, 2008

Mail storage

There are few things in the world that are universally agreed upon. Mail storage is not one of those: many people say that mbox is a poor format and would rather have some other form of mail storage. The suggestions I've seen include maildir (qmail-style or other), database storage, creating a false filesystem, or use IMAP and shunt the storage problem off to somebody else. Most of these have their own problems.

So why is mbox such a bad format? Supposedly, it doesn't scale. An mbox measuring a few GB's causes problems because it's a large file. There's also the tricky problem of deleting: a one-byte change is cheap, and so is appending, but midfile deletion or insertion is expensive.

In contrast, people sing the praises of maildir: by using one file per message, deletion is cheap. But there are hidden costs. Stating a directory to find new messages or deleted messages is relatively expensive. Also, modern filesystems attach metadata to each file. A 1KB metadata is not noticeable in a 1GB file, but 1KB metadata for each of 50,000 files is 50MB, which can be noticeable.

Using databases for mail storage? Yes, people have suggested it (bug 361087), and one even has the gall to request it as blocking-thunderbird3 (Point of order: I would probably reject maildir as blocking and even pluggable storage APIs I would only go so far as to say wanted). The basic reason cited for doing so is that "databases... are very stable and robust." Note however that mboxes are older, more stable, and more robust in theory and probably in practice too. And scalability? Exact same problems with mbox, only slightly exacerbated (probably going to have more indexes).

The second-to-last option (false filesystem) has problems of its own. From the comments I read, it would appear to force mozilla to carry along another lib*** implementation that I suspect is ill-tested. I also suspect that no one has tried (at least very hard) to port it to Windows. I also suspect this holds the same scalability flaws (the argument for this is "individual mail storage is [not] the job of the MUA anymore," to be fair).

So where are we? The primary argument against mbox is that it scales poorly. Yet all of the other suggested replacements suffer the same problems, manifested in different ways. Echoing Churchill's comment on democracy, mbox is the worst mail storage format except for all the others. It actually has a lot going for it: it's simple and universal, more than the others can claim.

If you really want to fix scalability, there are two options. First, don't keep GB of mail. I may accumulate 100 MB of mail in a year (half of it spam, actually), but I clean my mail out at least yearly to prune conversations that are outdated. Option 2: keep your folders small. Mailing list archives starts a new archive each month by default, which tends to keep the mailing list from getting large.

Wednesday, March 19, 2008

A blizzard of updates, part 2

Yesterday ended the second day of blizzard updates. Today I'm attending a hardware setup party, so I won't have a full day of blizzards. What I did do yesterday:

More on nsIAbCard
I now have a patch once again requesting review in this area. nsAbOSXCard drove me mad when I was going back over my changes. Something about aMember.Equals(aValue) didn't seem like it made sense; it turns out that it wasn't quite right, because the meaning of aMember had been changed ever so slightly.
nsIAbDirectory
I started looking into replacing more of nsIAddrDatabase in import, and immediately backed off. Crazy stuff happens there, so I'll need a few hours to work on that without distraction. I've also hooked up nsIAbCollection and am slowly starting to make it work right. Keyword: slowly.
nsNNTPProtocol
Getting the list of newsgroup is somewhat confusing, as it launches some callbacks to avoid hanging the UI thread. Life will be so much better when protocols go async.
libmime
I didn't spend as much time on this as I expected to, but the initial forays look promising. Its implementation is amazingly simple for having a custom C++-like format, so an automated rewrite should go smoothly. A naming scheme is rigidly enforced for the class/object hierarchy, and connecting the function implementations to the defined virtual function pointers looks trivial. Finding the inheritance list is simple, especially because it doesn't practice MI. The hardest part, though, is constructors. They may need to be done manually.
MorkReader
My biggest announcement is the one I'm saving 'till the end. I've started work on MorkReader.cpp, after spending a few hours going through the morkThumb morkParser morkBuilder code to see how the most complete implementation of mork sees a file. Did you know that db/mork doesn't actually fully implement the mork specification? Also, groups.google.com appears to have spotty records of old netscape.mozilla.public.mail-news postings (I need the ones from late 1998/early 1999), and news.mozilla.org only goes back to 2003. Fun times...

And for today? A hardware party, followed by some more work on MorkReader. Hopefully with ample help from David Bienvenu.

Monday, March 17, 2008

A blizzard of updates, part 1

So it's the conclusion of the first day of my fun-filled week, and I've managed to have 3 updates today, as well as another 2 over the weekend. Here's the list:

nsIAbCard sanity
(bug 413260, pt. 1) Okay, this was done over the weekend, but still. From my previous patch, I converted some property strings to UTF-16 instead of the UTF-8 they were originally, as well as doing a bit more general cleanup in the vicinity.
nsIAbDirectory sanity
(bug 413260, pt. 2) Once again, over the weekend. Work was completed in yanking MDB-specific stuff outside of the abook extended trio (addrbook, import, extensions/palmsync). In addition, I put some time in cleaning out the interface into the new stuff.
bug 400331
I finally got around starting on Sunday to opening this back up. My list of preapproval has grown to include authentication code and newsrc. In my personal builds, it doesn't warn anymore on gcc 4.2, and I'm slowly improving on gcc 4.3 (I won't even try until I get PRInt32 converted to nsresult). Having received permission from David Bienvenu to significantly change to the point where cvsblame becomes unhelpful, I've started moving around functions to make it easier to understand how stuff works. Finally, I'm trimming the size of the class (only trivially right now), and I'm slowly dismantling the horror that is SendFirstNNTPCommand. It will be a long time before I get to doing higher-level logical structure.
bug 11054
Now that all other code tripping me up has been completed, I returned to work on this for the first time in at least a month. Quickly, I discovered that Neil had added a bit of code that cascaded into über-failure and heavy database mangling (horrible for news code). Out of this (which took a few hours to get working again), I discovered that a reasonable assertion to fire becomes hard to sort out quickly when applying trivial optimization (don't recurse into children of ignored messages, which means we can't find out how many are unapplied to correct), and that the UI code is causing future things to break in even more annoying ways. I'll have to look into fixing that breakage; I have an extremely easy way to reproduce said error.
bug 418551
(demork in panacea.dat) Mark Banner pushed his profile-directory creation changes in the middle of yesterday, allowing me to unbitrot this bug and get it working. Note that the past that is posted needs one change to compile: change the .equals to a .Equals. I blame Java.
Bug triage.
I set up today an Outlook/Outlook Express parity bug (bug 423488) based on some rather simple queries. Needless to say, the list is a bit long for my tastes (and I was pretty conservative about adding stuff to said list!). I've also started working on getting updates on news bugs, just as I now get updates on mailnews database bugs. Expect a mass QA-reassign shortly!

Long list for day one, isn't it? Tomorrow, I hope to be able to look into automating a rewrite of libmime, pushing out new changes to bug 413260 given Mark's updated interfaces, and starting writing that new morkreader that I need for 382876 and 11050.

One more thing: I'm starting to compile feedback on news server impls for some basic NNTP planning. I've heard back from Giganews, which has said that RFC 3977 is not on their list right now, and which also tells me that they support LIST OVERVIEW.FMT for the list of XHDR headers. INN is open source, so I can see what they have. Tornado/Typhoon/whatever the heck is the right news server I'm still waiting to hear from. I've picked these server impls because they represent the three news servers I use: news.mozilla.org, news.aioe.org, and news.verizon.net, respectively.

A fun-filled week

Having a nice long week without needing to do much else, I'll be putting in some quality time on mozilla this week. Of my seven or so tasks, I hope to get at least updated patches on all of them. No, I am not committing suicide.

Bug 413260
This is the address book rewrite, one of the to-be-core features of TB 3. Hopefully, part 1 (nsIAbCard rewrites) will be committed by the end of the week and part 2 (nsIAbDirectory rewrites) will be in review stages. The third part (mailing list sanification) should be posted at least in part. I'm not going to work on the hypothetical part 4: implement some of the functions for LDAP.
Demorkification
De-mork in panacea.dat msgFolderCache.sqlite will hopefully be complete and committed as well this week. After that, I'm going to start work on creating a better mork reader (nsMorkReader is insufficient to handle a .mab or .msf file), which blocks completion of bugs 382876 and 11050 (address book and message database, respectively).
Fakeserver implementation
I hope to have more flesh put on fakeserver this week, since I should have more time to actually figure out how to set up an account, which is blocking my work.
Libmime rewrite
With any luck, I should have some time to write some dehydra and elsa scripts that will profile libmime to infect it with the C++ virus. This should allow people to finally approach libmime to be able to hack it and bring it into the 21st century.
nsNNTPProtocol
To say I'm going over it with a fine-tooth comb is an understatement. I've expanded the scope of the rewrite to include whitespace updates, removal of accumulated cruft, function reordering for logical coherency, breaking up SendFirstNNTPCommand for clarity, documentation of what happens, identifying places where code should be updated, and shrinking the size of the class. I do not know want to know what sizeof(nsNNTPProtocol) is right now, it's that large.
NNTP/Usenet wins
ROT-13 implementation and LIST PRETTYNAMESLIST NEWSGROUPS are two low-risk, just-needs-UI wins. Filter-after-the-fact is a medium-risk win. Spam detection and combine-and-decode or other multipart are high-risk, high-value wins (imagine the elation of alt.binaries users or sci.math users). Bug 176238 is instructive to see the full list.

Time for less yapping and more coding!

Wednesday, February 20, 2008

Mork is evil, but...

I decided today to finally start poking around the Mork reader code that was introduced to import history data, since the plan is to use it to migrate mork to SQLite code. I had read earlier enough to know that I would have to look into having morkreader handle multiple tables, but what I saw was just astounding. You see, morkreader is essentially a 580-line hack.

Don't get me wrong, I have nothing against hacks. My patch for fixing searching in base64-encoded messages was quite hacky as well. But a few things justify the hack. First, the legacy code needed some pretty severe refactoring to handle the recursive nature of MIME. Second, the improperly-handled cases should be rather rare in nature: the simplest way to generate a case is to forward-as-attached a message with a base64-encoded attachment. (I concede: a third reason is that I wrote it, but that pales in importance to the other two, I swear...) Morkreader, however, did not need to work around crufty legacy APIs, nor are its improperly-handled components uncommon.

The first assumption that morkreader makes is that there is only one table (mailnews loves having several tables). Adding in support for multiple tables would not be too difficult if the parser wasn't already broken in other ways. The number two assumption it makes is that no line is longer than 80-characters (which should be safe) and that no line is continued more than once (i.e., no more than 160-characters)... which is complete BS, as anyone who has ever subscribed to a mailing list can recognize (think of the References: header). Finally, the code will not handle aborted changesets properly, the banality of which I cannot determine. (Does mailnews code ever use the ! change type?)

So, to assess the accuracy of morkreader, I read the mork specification. Cross-referencing with some mork files of mine (a FF 2 history.dat, an abook.mab, and an inbox msf file), I discovered that the specification itself is inaccurate. Once again, two failings here: (atomScope=c) should be (a=c), and the spec implies that -[...] is the proper way to remove a row, whereas [-...] is the actual method. A subtle statement in the spec says that a + can be omitted from changesets.

I had to fix mork to get my first patch for bug 413260 working, looks like I'll have to fix morkreader as well. Oh well, magic will happen if Friday is as predicted...

Sunday, February 17, 2008

More on rewrites

No one can deny that mailnews needs some rewrites pretty badly. The address book is getting an overhaul right now. Message databases are planned to have a second overhaul soon; thoughts are starting to fly around for an account manager rewrite as well. RSS gets one as well. Compose, MIME, and news code all need rewrites as well. Obviously, most are going to miss TB 3. Address book looks set to make it; ditto with kill-RDF; RSS will also probably slide in. Everyone else gets to wait for TB 4—or even later.

As I have mentioned before, I am in the midst of rewriting address book. The ultimate goal is to replace mork with mozStorage. But the interfaces are a large barrier in implementing these. So bug 382876 is blocked by bug 413260. No sane person would put all of the changes into one patch though, it's just too many. I therefore expect bug 413260 to have three or four patches fixing up one part of the story. And these are not going to be small by any stretch of the imagination: the first part alone is -2000/+1000 lines of code. And all that does is modify nsIAbCard.

Second and third in bug 413260 involve two more interfaces. The second part will be to implement the new nsIAbDirectory, which will involve cleaning up usages of nsIAbMDBDirectory and nsIAddrDatabase. I expect that to end up with 1000 lines of changes at least. The third part is to clean up the mailing list mess; this change is, in my opinion, the most important change of the interface setup. Finally is the maybe-fourth part, implementing the refactored changes into LDAP code.

After getting three or four large patches for bug 413260 committed comes the large patch for bug 382876, which needs some modifications to morkreader as well(which I hopefully won't have to write!). Finishing that allows me to start on message databases. It looks as if some of the ideas surrounding bug 11050 won't be touched until TB 4 simply for the sake of not overloading people with so many rewrites in such a short time.

Finally come the other slew of rewrites. jminta is so kindly doing kill-RDF. Other people are working on the RSS changes; I haven't used RSS on my Trunk builds yet, so I can't evaluate any changes since 2.0 yet nor will I likely do so for some time. Rewriting news code is a nice distraction when I'm frustrated at other code; however I am waiting for permission to really axe large chunks of it before do serious work on it. Compose and MIME get no love at the moment. And the account manager has to wait for agreement before it gets its rewrite: the most people can agree on at this point is that "it needs to change." And so life continues...

Monday, February 11, 2008

Anatomy of a Refactoring

The first part of bug 413260, refactoring nsIAbCard, is finally starting the review process, freeing my up to start on part two, nsIAbMDBDirectory. The goal here is to remove this heavily-used interface. For those of you who are only being introduced to large-scale refactorings, here is a simple step-by-step guide for refactoring.

  1. Pray that only a little JavaScript is involved. As much as people fall in love with JavaScript, I greatly prefer C++. Cases like this prove why: C++ complains when you compile that something goes wrong; these simple problems are deferred until actual execution in JavaScript. Sometimes, these problems crop out in the most out-of-reach places: one usage of nsIAbCard, unfound by grep, exists in msgHdrViewOverlay.js, one of the last places one would expect to find address book usages.
  2. Fire up grep and find usage characteristics. In the case of nsIAbMDBDirectory, I see that it is used outside of addrbook for two reasons: cardForEmail and to get the database. The former is simple to deal with via a minor refactoring, the latter requires some more in-depth analysis.
  3. Mark stuff as deprecated and compile. Note that gcc 4.3 or better is required to catch the most common case (nsCOMPtr stuff) and that, as of right now, my XPIDL/nscore.h patch is needed to mark IDL files as being deprecated. If you're using gcc 4.3, -Wno-conversion is highly recommended.
  4. Ensure that the tests use the new stuff. Tests are the simplest JavaScript to handle, primarily because everything is used. They also alert you to broken migration.
  5. Remove deprecated and change other JS. Don't forget to test the crap out of it. This will by far take the longest to execute. Stuff can crop up in weird places; JavaScript analysis is on my list of things to do, but I'm looking at IDL/C++/JS+Mozilla+ctags+vim automagic first.

Tuesday, February 5, 2008

Politics and civility

There is one rule in particular I try to keep: to read over everything I get fully and carefully. With three email addresses (comprising a half-dozen mailing lists) I regularly check, two daily newspapers, one weekly news magazine, eleven newsgroups of varying daily post rates, and too many RSS feeds to even count anymore, this is a rule I break more often than I would like, despite spending well over an hour each day doing so. The frequency of my blog posting is proof enough of this—I would like to post once every two days, a feat which I have already given up on doing.

To make a long story short, this cutting back of in-depth reading has impacted one of the blogs I read, the Fact Checker for the Washington Post. I mostly skim the article and focus more on the comments these days. And similar to how I transitioned my reading of this blog over its lifetime, the comments have transitioned. Crucially, they have gotten worse as time continues.

It used to be that the comments were thoughtful and pointed out some of the factual errors. Now, the comments have turned nasty, with obvious political slants coming out. In the most recent posting (discussing the Republican candidates' repositioning on major issues), the first comment was a strong anti-Republican that didn't really relate to the article. Fourth was another slamming comment, again irrelevant. Same with the 8th, 11th, 19th, 20th, 21st, and around a third of the comments in general. How many of the rest were the thoughtful, reasoned responses I saw at the beginning? A handful, although many were in response to the fringish comments earlier posted.

After shaking my head at this, I turn to one of my newsgroups, sci.math. Recently, a poster by the alias of JSH posted some stuff. This poster is not particularly well-liked in this newsgroup for an aura of doing shoddy mathematics, inflating claims, and ignoring objections. I am not sufficiently well-versed in the relevant fields to know the correctness of his mathematics (they look suspect to me, but that doesn't count for much), but I do know that his refusal to attempt to factor an RSA number with his factoring algorithm casts suspicion on its correctness, and that he also did not reply to some of my requests for clarification.

With respect to this poster, I once awaited his posts, not because I was fascinated in the mathematics, but because they usually had some measure of debate to go with them. I found the posts on his return disappointing for a similar reason that I was irritated at the comments on the earlier blog. These debates had grown uninteresting. JSH was pontificating without responding, and other people just vehemently skewered him without remorse, as if their entire lives revolved around insulting him as much as possible.

Which brings me, albeit in a roundabout manner (an endemic problem of mine), to my point. It seems that the world at large has grown unable to speak civilly. I have always tried to keep my postings as civil as possible, but it seems that in many replies I look at, the poster made no such attempt. The most egregious violation of civility is in the political arena. Take a group of Democrats with only moderately-held beliefs and a group of Republicans with similarly moderately-held beliefs, and the resulting confrontation will shortly become a physical one without outside intervention. It seems to me that something about politics today has driven people to untenable extremes and is in part the cause of the lack of compromise in today's political world. I just can't see something like the Compromise of 1850 (which staved the American Civil War off for a decade) happening today...

Saturday, February 2, 2008

Changes to come

For the past few weeks, my main work has been involved with the address book rewrite. And a long job that has been—the patch touches over 30 files with a diff measuring some 1800 or so lines removed and about 1000 or so added (total savings seems to be in the 700's). And blimey, I've only modified one interface (to be fair, it is the most used interface...). Still, it isn't finished: import and palmsync almost undoubtedly break with this patch; LDAP may as well. Finally, it is recently bitrotted by another patch (given the scope of changes, bitrotting was likely to begin with).

My work was, however, sped along by another change (this one in the pipelines already). I added a deprecated attribute to XPIDL that allows me to mark a function as deprecated, rebuild, and see who uses that function. It is not reified to JS though (making JS usages as annoying as ever to work with). gcc 4.2 has a problem that makes this useless for virtual functions (essentially making it worthless for XPIDL); gcc 4.3 fixes this, but it is considerably more noisy in warnings and doesn't like linking with gcc 4.2 code. Go figure.

Change #3 is still in my conception pipeline. This one is to make make alltags a tad bit more correct. My idea is to only pipe dist/include and dist/idl into ctags (separately, though). The problem here is that XPIDL functions are typically declared with NS_DECL_NSIABITEM, for example, making it useless when I need to find the definition of functions. Then there is the other problem: I more often want to go to definitions than declarations. Ramping up my configuration magic in vim may be the way to go here.

Those were all stuff that I have worked on so far. Now comes the stuff that I plan to work on. The first on my list (not necessarily the first I will work on) goes back to the account manager. As anyone who has been reading mozilla.dev.apps.thunderbird recently should have discovered by now, the account manager is the source of a fair number of complaints. Between the use of RDF and some confusing UI (especially with regards to RSS), it desperately needs an overhaul. Another problem in the account manager is somewhat difficult to see. It is the server manager as well; trying to use a server without creating an account is impossible, and even with the account, it is difficult.

Number 2 of this second class is involved with filters. Recently, I came across SIEVE, and decided to look into it. In short, it is a specification for a mail filter language. Since it is a series of RFCs (with some draft RFCs including discussion with mail servers), it would probably be supported elsewhere. This conceptual idea is to use Sieve as the filtering backend, which may fix some problems and would definitely open up a few new questions.

Several more things weigh in on my pontification list that I have already mentioned. I've started collecting a list of mailing lists for my webscrape idea; I also have two forums lined up as well for testing purposes. Continuing work on redesigning my blog is a given. De-morkification I've said in my recent posts, and my work in news filter overhauls are still stymied on bug 16913 going through. Ah well, they'll go through in time...