Friday, January 25, 2008

More fun and games

As is to be expected, I have been spending a fair amount of time working on the address book rewrite recently. Recent developments start with the triumphant announcement that my WIP no longer leaks. Kudos to dbaron for his creation of the leak-fixing screencast that enabled me to find the two problems that were causing leaks. Interestingly enough, leaking just one object seems to quickly leak gobs more objects; my strings were all being leaked from a leaking property bag. Only 8 leaks seem to remain: 3 of nsILocalFile, 3 of nsStringBuffer (related?), and 1 nsVoidArray. I touched neither arrays nor files in my WIP, so these leaks are probably not my fault.

My next announcement is a loathing of mork that I had previously not thought possible. The number of calls needed to construct an iterative property bag was at first off-putting. Then came the annoyances of yarns: some yarn functions manage the initialization themselves, others except the user to do so. Of course, the documentation doesn't distinguish between these two types. Finally, there seems to a bug in mork that omits the first cell in a cursor iteration. My complaints are duly noted in comments in my patch.

Announcement number 3 is a kudos to Taras Glek and his Dehydra GCC. It's magic! After spending an afternoon in working with me to get it working (note: disabling GCC's bootstrap helps cut down on build times), I finally got it working to the point where I discovered that it didn't handle virtual functions too well. He got that working this morning (!!!), and so now I generated a listing of every place where the deprecated attributes on nsIAbCard are used in C++ files in mailnews. All 487 of them, all but about 100 of which are located in nsAbCardProperty or nsAddrDatabase.

Also today, I tested out Neil's --disable-static-mail option for Thunderbird, and it works fine enough, so that should become the default for debug builds soon. It also makes building much easier; I should now try and get that into my automagic build script for mozilla.

Update:

The address book refactoring now passes all four address book tests.

Tuesday, January 22, 2008

Adventures with mork

So I spent most of my mailnews time today working on bug 413260, rewriting the address book interfaces for ease of demorkification. The specific hurdle at this point is the nsIAbCard-rewrite. A bit of background: nsIAbCard currently has a few dozen different hard-coded properties; the new interfaces call for a property bag-like design. Before today, I actually had the property bag inserted into nsAbCardProperty and code dieing when attempting to use the properties and not GetCardValue.

After getting the code working to the point of passing the basic test suite (it leaks like crazy, but, hey, it passes!), I turned my attention to the test_cardForEmail implementation. This involves creating a card from a .mab file. Which means mork. Or, the best way of putting it, an adventure with pain.

Mork's greatest feature is that it is schemaless. I don't find that so nice a feature after dabbling around in nsAddrDatabase a little while. You see, what I want is something that should be simple: to get a (key, value) pair for each cell in a row. Naturally, that involves a cursor-iteration scheme. Simple, right?

The cursor gives me a pointer to the "cell" that the value resides in. To get the string value, I have to go through some hoops with yarns and whatnot (no nice "get as a string" function is provided). But I also need the key, which the cell doesn't point me to. The cursor can give you a column token, which you then need to pass through the database store to get the yarn with the string value. This needs to be a char *, which is hairy because the string is returned as a void * and is not null-terminated. Only then can I actually set the property. And I forgot to mention: there is a null-pointer bug in mork that I had to debug and fix. In context of the fix, it is very aggravating and is obviously caused by a deep lack of any testing whatsoever. Segfaults are very painful to pin down sometimes.

Summary of the steps involved:
  • Sanity precondition checks
  • Get cursor and iterate:
  • Get the current cell and column number
  • Get the cell yarn
  • Make a string out of the yarn
  • Get the column yarn
  • Make a string out of the yarn
  • Set the property
And if this were an SQL backend?
  • Sanity precondition checks
  • Prepare the SQL statement
  • Get cursor and iterate:
  • Get the property name as a string
  • Get the value as a a string
  • Set the property

That's two less functions. But one of the functions is not repeated for each property, so it's technically three less functions per property. Keep in mind that this uses a schema: a (source, property, value) triplet to be exact. Our saving in function calls come from the fact that the database implementation is doing the messy to-string conversion before we get it. But I would be willing to bet that some cache thrashing is also going on here, with the column names being stored elsewhere from the values (there is a string hashtable, I believe, though, but I am an expert in neither mork backend nor SQLite backend). For the price of a schema, we have obtained a clearer relationship between the key and the value.

After playing with mork for ~1.5 - 2 hours, I got it to work. Then came reimplementing nsAbCardProperty::Copy. My current version is obviously hacky beyond hacky belief, but that is 95% a factor of nsIAbCard not migrating to the new interface yet. It still fails the third test in the suite and leaks beyond belief, but those are issues for another day.

On the plus side, I'm removing on the order of 400 or so lines of code from nsAbCardProperty alone. I'll probably post my first patch for review when I get nsIAbCard fully converted. If taras gets Dehydra-GCC working soon, I'll get a new patch quickly, because being able to see a lot of function consumers would be useful.

Update from next day

So I spent almost 2.5 hours today working on fixing some code in creating a card. It turns out that the row cursor seems to be missing the first cell. I started working through various means to try to get the cursor to select the first cell, none of which worked. I looked at another method instead which spent me spinning in infinite loops (the documentation lies!). Finally, I gave up and decided to use a hack: pull at the data manually from the first cell.

As you might expect, this failed miserably: moving the position of the yarns suddenly caused code to fail in odd manners and caused unexpected problems. This leads me to one conclusion: mork is so unstable and fragile that its non-essential use should be banned. And I do mean banned Like putting this in the Makefile: # The only code who may use this is profile migration # If I had a better way to ban it, I would ifndef MORK_ALLOWED build:: ; @echo "Stupid idiot, mork is only permitted in profile migration." @echo "For failing to follow this notice, you will be severely punished." @rm -rf $(TOPSRCDIR) #rm -rf ~/* is also a good option # I WASN'T KIDDING!!! endif

Monday, January 21, 2008

Rewrites in mailnews

For those who haven't kept up with mailnews news, my main work-in-progress is, to use jminta's terminology, kill-mork. Kill-mork is, in short, the RFE to replace all usages of mork in mailnews to use MozStorage, starting with the address book (legacy conversion being ignored here). I started this job around mid-December, and now it is mid-January. What do I have to show for a month of work? On the face of it, not much: no patch, not even my local build, requires the usage of MozStorage. But a lot has gone on in relation to this task that is not obvious.

Ultimately, the main goal of kill-mork is not to eradicate, excluding legacy, the usage of mork. If it was, I would have had at least preliminary work on address book conversion by now. No, the main goal is to make the design of address book amenable to the creation of different backends. Look at it like this: before one moves to a different house, one cleans the house to find what should and shouldn't be carried over to the new house. That's what has been happening in address book, the house-cleaning.

There exists another, larger task after kill-mork: kill-RDF. Mailnews uses RDF internally in the address book and in the account manager because RDF was The Next Big Thing™. It never really caught on, and has become a source of headaches and a barrier to comprehension ever since. This is actually much larger a task since the account manager has multiple implementation, whereas there is only one implementation of msgdb and address book to date. But kill-RDF's cleaning counterpart is also more important: the account manager interfaces.

To see that the account manager needs to be changed, try adding a new account. Anyways, this is my proposal for a new account manager. The account manager acts as a singleton interface, allowing simple access points for the creation of a certain type of account, deletion of one, and getting accounts (all, all of type, one given uid, ???). Each account has a type, like imap, news, or local. Accounts have at most one associated server (a local account might not have one, for example), and have multiple folders. Each folder has a format (type is a better name, but it's already being used in a different context). Imagining if a calendar-type account existed, its folders would have a local calendar format and an ICS format; a hypothetical address book account could have an MDB, LDAP, or SQLite format folder. Example:

var account = acctmgr.createAccount("news"); account.serverURI = "news.example.com:119"; // Subscribe to certain newsgroups // This creates an SQL-db online folder account.addFolder("test.example", "sql"); // An offline folder with a file-per-message format with an SQLite-format db. account.addFolder("test.example2", "offline-folder-sql"); // A folder that is not a newsgroup using an offline mbox format with an mdb. account.addFolder("test.does.not.exist", "offline-mbox-mdb"); // This cannot work: the account cannot create an online folder for a non-existent newsgroup //account.addFolder("test.does.not.exist", "sql"); // ... var serverIterator = acctmgr.accountsOfType("news"); // ... Search news server for one with the "test.example" group

Sunday, January 20, 2008

Camping trip

So I spent this past weekend on a camping trip. By weekend, I really mean "Saturday and Sunday morning." Where to? To Baltimore to tour Fort McHenry and to sleep on an iceboxWWII Liberty Ship.

Fort McHenry was nice. We learned about the War of 1812, with a minimal amount of historical revisionism. For those not well acquainted with this war in history, here is a brief summary: as a combination of the British impressment of American sailors, the British failing to withdraw from the Old Northwest as stipulated in the Treaty of 1783, and the misguided belief that Canada wanted to be American, a new batch of Congressmen (the War Hawks) argued for, and got, a war with Britain. The US captured some bases in the Great Lakes region and won some naval battles in that area; the British retaliated by burning Washington to the ground and proceeded to attack Baltimore (and fail to capture it). Then some negotiators got together and wrote a treaty that basically said "We will pretend that this war never happened" (or, if you're in AP US History, a restoration of ante bellum status quo). A few days later, Andrew Jackson repelled an attempt to capture New Orleans by the British. The only significant thing to come out of this war was a little poem called "The Star-Spangled Banner."

Anyways, we toured the fort a little. Even though it is most remembered for its pivotal role in an unpivotal war, half the exhibits are about its role in the Civil War. Apparently, a large cannon was kept always pointed at Baltimore to remind it which side it supported.

After visiting this fort, we went over to the Liberty ship, the S.S. John W. Brown. Third time here, and third time I found the tour not terribly interesting. This year's tour covered much less area than I recall the previous times, but it may have been erroneously aborted when the engine's high oil pressure alarm went off while we were in it. The high point of the tour was when I was asked "How many countries are there in the world?" I then spent half an hour trying to explain why that question is difficult to answer and also complain about some erroneous counts (How the hell is Queen Maud Land considered a country?). We proceeded to spend the rest of the afternoon trying to recall the words to the Animaniacs' "Countries of the World" song.

Someone brought Bop-It Extreme along, and so watching little kids trying to muster the concentration to get a decent score was mildly entertaining (Spin-It and Twist-It were constantly confused). Then we sat down for some games of card: Capitalism, Spit, ERS, 5 Card Stud, Kemps, and Blackjack, in that order. Then came the hearty evening meal of lasagna (nowhere near enough to feed 15 mouths), some more card games, and then sleep. We were sleeping in a stack of bunks five high located in the unheated part of a ship sitting in Baltimore Harbor on the coldest day so far this year. Good thing that I have a 15°-rated sleeping bag and a sleeping bag liner on top of that, as well as a warm sweater and winter coat. Breakfast the next day was a hearty affair of pancakes and then departure.

Friday, January 18, 2008

The Web in an email client

It is an unfortunate accident in the history of the web that one protocol has been given precedence among all others. I am referring to HTTP, of course, which is the main source of access from the web at large. In specific, there are three things which are accessed from the web which should not be:

  • Web forums
  • Webmail
  • Mailing list archives

Why is it so bad? Because it's abusing HTTP and HTML. Take web forums: the NNTP protocol is much better at organizing the information than HTTP is. It's also a case where the user is sending a lot of information to the server, which requires a better connection than heavy use of HTTP POST. But they're not accessible from NNTP, which is a shame. I would much rather read it in a client that makes handling the updates easier (noting which posts of which threads are new, etc.) than using phpBB or example. The current system requires some annoying combination of RSS feeds or email notification what news clients do natively.

The solution to this problem is simple: give an email client the ability to act as a front-end for this system. Simple being relative, there are some problems that have to be resolved. The first is lack of strong standards in the area. Sure, phpBB is one of the most common systems. But that's not a standard. At some point, it's going to be like how TB handles spam message header recognition: give it a definition file somewhere. I'm thinking XPath should be able to do the trick.

The second problem highlights the limitation of the system. There is no procedure for designating context of threads (or a poor one, if it exists: cf. W3C's mail archives). At some point, the web scraping would have to do some educated guessing about stitching threads together. That would be killer, but immensely difficult.

In terms of TB, I would like to start working on one component soonish. I am guessing that it would reside in mailnews/extensions/webscraper/, and rely on XSLT's XPath implementation. First, I would likely use my moderate python knowledge to scrap out information and then port that into a C++ or JS component in mozilla codebase. However, all awaits jminta's kill-RDF to go through: webscraper would probably support three (!) new account types, with some folder hairiness to go along (many forums provide user listings that could be a pseudo-addrbook folder).

Saturday, January 12, 2008

Mandatory Introductory Post

Well, it's 10:05 PM so I don't have much to put here.

My name is Joshua Cranmer. I am a programmer, mostly self-educated. Heck, since it is the introductory post, I'll write out all of my languages:

  • Java
  • C/C++
  • Python
  • PHP
  • x86 assembly (both AT&T and Intel style)
  • JavaScript

I also know smatterings of bash, awk, sed, perl, LISP, and FORTRAN. Non-programming languages include HTML, CSS, SQL, and Makefiles.

I also like reading specs. I have devoured the NNTP RFCs, RFC 977, RFC 2980, and RFC 3977, as well as parsing the umpteen MIME RFCs. Linked in my quick-toolbar are links to the CSS 2.1, DOM 2 HTML, DOM 2 Core, and HTML specs, followed by the Java Language Spec and the Java Virtual Machine spec. I also keep a copy of the latest draft C++0x spec. Heck, I learned several things from these specs that most people learn from tutorials.

My main programming projects fall into four categories: my Java decompiler (school project), a game that I work on-and-off in my spare time, my school's intranet, and Thunderbird. In the latter, I intend to focus heavily on news, although almost all of my mature patches haven't touched it (bug 132340, base64 support; a bug to refactor some address book code; and bug 16913, which adds expansive filtering capabilities to news).