James Fallows asks:
" For all of Gmail really to be available and searchable offline, the entire cache of old messages would obviously have to be stored on your own hard disk. That's now a maximum 7+ gigs per regular Gmail account. More if you've bought extra storage. Do I really want to have all of that on my laptop -- which is the main place where offline access matters? From a couple of Gmail accounts? And Google's "Gears" system of offline sync, already in use with Google Docs, seems to create a separate cache for each browser you use it with. So you could wind up with one 7GB cache for Firefox, and one for Chrome, and... Will there be a way to choose how far back you'd like the sync to run?"
Excellent questions all. So, per my investigations:
- yes, a separate cache per browser. Hm.
- no way to choose how far back to go. For one small account of 128M, Gmail says they will back up to 5 years ago. Larger accounts will presumably not go so far back. There's no information I can find on whether the older emails will be cleared out of the offline cache, or left there.
- On the hard drive, the 128M that Gmail says I'm using, becomes 137M.
The Google Gears FAQ tells us where the data is stored for IE and Firefox, but oddly enough not for Chrome. For Chrome, attachments are stored plainly at for example
C:\Documents and Settings\uname\Local Settings\Application Data\Google\Chrome\User Data\Default\Plugin Data\Google Gears\mail.google.com\https_443\GoogleMail[4]#localserver
The messages themselves are in SQLite databases found similarly
C:\Documents and Settings\uname\Local Settings\Application Data\Google\Chrome\User Data\Default\Plugin Data\Google Gears\mail.google.com\https_443
The data appears to be binary, so not readable without Gears or some SQLite tool. The files don't have .sqlite extensions, but that's what they are. Open them using any of the handy SQlite tools, for example SQLite Administrator which allows export of tables as CSV, HTML, etc etc.
In my install, a file named accountname@gmail.com-GoogleMail#database is the database. The table MessagesFT_content has the message contents. Interestingly all the email body is in html.
Conclusions:
1. it's still a proprietary format, although SQlite utilities can be used to extract the useful information.
2. the lack of control over synchronization means it's not a good backup solution.
3. I still need to test actual offline operation - is the search as good as it is online ?
And So It Begins. How Does It End?
14 hours ago