Friday, January 18, 2008

The Web in an email client

It is an unfortunate accident in the history of the web that one protocol has been given precedence among all others. I am referring to HTTP, of course, which is the main source of access from the web at large. In specific, there are three things which are accessed from the web which should not be:

  • Web forums
  • Webmail
  • Mailing list archives

Why is it so bad? Because it's abusing HTTP and HTML. Take web forums: the NNTP protocol is much better at organizing the information than HTTP is. It's also a case where the user is sending a lot of information to the server, which requires a better connection than heavy use of HTTP POST. But they're not accessible from NNTP, which is a shame. I would much rather read it in a client that makes handling the updates easier (noting which posts of which threads are new, etc.) than using phpBB or example. The current system requires some annoying combination of RSS feeds or email notification what news clients do natively.

The solution to this problem is simple: give an email client the ability to act as a front-end for this system. Simple being relative, there are some problems that have to be resolved. The first is lack of strong standards in the area. Sure, phpBB is one of the most common systems. But that's not a standard. At some point, it's going to be like how TB handles spam message header recognition: give it a definition file somewhere. I'm thinking XPath should be able to do the trick.

The second problem highlights the limitation of the system. There is no procedure for designating context of threads (or a poor one, if it exists: cf. W3C's mail archives). At some point, the web scraping would have to do some educated guessing about stitching threads together. That would be killer, but immensely difficult.

In terms of TB, I would like to start working on one component soonish. I am guessing that it would reside in mailnews/extensions/webscraper/, and rely on XSLT's XPath implementation. First, I would likely use my moderate python knowledge to scrap out information and then port that into a C++ or JS component in mozilla codebase. However, all awaits jminta's kill-RDF to go through: webscraper would probably support three (!) new account types, with some folder hairiness to go along (many forums provide user listings that could be a pseudo-addrbook folder).

No comments: