GMail Backup with getmail
I could just pull my mail via IMAP to a backup machine. I investigated what other options folks were using. The common method is to setup a mail client (Outlook, Outlook Express, Thunderbird, etc.) and just have it sync the mailbox. Some folks were using POP to retrieve the mail, but the protocol doesn't really lend itself to this, and Google's implementation is problematic. So I went with my initial thought of IMAP. I didn't really want to have a full-blown client doing this, but rather something I could run via a script. The obvious answer was fetchmail, the classic command line mail retriever. I came up with another option that I hadn't used - getmail. From the getmail website:
- "getmail is a mail retriever designed to allow you to get your mail from one or more mail accounts on various mail servers to your local machine for reading with a minimum of fuss. getmail is designed to be secure, flexible, reliable, and easy-to-use. getmail is designed to replace other mail retrievers such as fetchmail."
The author, Charles Cazabon, and others are critical of the security and overall design of fetchmail and getmail is designed to address these issues. Plus getmail is written in Python - it is available with most unix-like systems as a package. It runs in Linux or Windows machine ( Cygwin includes it as a standard package as well). After installing getmail you will need to create a ~/.getmail directory with a getmailrc file spelling out what and how you would like your mail retrieved. You need to create a 'retriever' section defining the mailbox/mailserver, a 'destination' section denoting where you would like it stored an 'options' section with general parameters. Sample getmailrc file:
[retriever] type = SimpleIMAPSSLRetriever server = imap.gmail.com username = jdoe@datalinkcontrol.com password = p@ssw0rd mailboxes = ("[Gmail]/All Mail",) port = 993 [destination] type = Maildir path = ~/mailbackup/datalinkcontrol/ [options] received = false delivered_to = false read_all = false verbose = 0
The mailboxes option perhaps needs a little clarification. When using IMAP you need to define which folders you would like retrieved. You can create a list of multiple folders, but I just wanted a bulk backup of all of my mail - handily GMail has a folder called 'All Mail'. Note that folder to IMAP clients shows up as '[Gmail]/All Mail'. Now on the destination section getmail has several options, including passing along the mail to another MTA or application for further processing. The two options for storage are maildir & mbox formats. The primary visible difference is that maildir stores each message a file in a directory, while mbox stores all of the messages in a single file. The maildir format is very straightforward, and the mbox format has some variants as well as file-locking issues. Due to the volume of the mail I was going to backup - I chose the maildir. The path needs to exist already, and it must have three subdirectories (cur, new & tmp) - they will not be created for you. Finally the options I defined...'received' and 'delivered_to' stop getmail from adding any headers to the messages as they are downloaded - I wanted the mail as is. The 'read_all = false' tells getmail to not re-read messages it has already pulled down, but rather only new messages, and 'verbose' eliminated a status update for each message it retrieved. I saved this file as 'datalinkcontrol' in the .getmail directory, and I created one file for each mail account I wanted to backup at Google. Now to use getmail...
getmail -r testdata
That's it. It diligently went to work and a bit later it had retrieved a little over 34,000 messages from the mailbox. Each subsequent time it runs, only new messages are retrieved and is quite fast. Add an entry to cron or scheduled tasks for each mailbox and you are done. If you ever need to upload the messages or transfer them to another account, many email clients and scripts can easily handle the maildir format. A variation you might consider is creating a filter that labels the messages by a date range (i.e. Mail 2010, Mail 2011, etc.) and then specifying those on the mailboxes line in the rc file.
Posted at 03:14PM feb 28, 2012 by Boris in Generalno |
Amazon cloud storage gateway
With SG, now in beta, Amazon offers a storage service built on a software-based appliance that lets companies continue to store data in private clouds while backing it up to AWS' Simple Storage Service (S3) in the background.
This perceptual shift will result in running AWS code behind their firewalls, a fact which will enhance the perception of AWS as an enterprise player and not only public cloud service providers.
Once the AWS Storage Gateway’s software appliance is installed on a local host, you can mount Storage Gateway volumes to on-premises application servers as iSCSI devices. Data written to these volumes is maintained on the on-premises storage hardware while being asynchronously backed up to AWS, where it is stored in Amazon S3 in the form of Amazon EBS (Elastic Block Store) snapshots.
Posted at 04:19PM feb 01, 2012 by Boris in Generalno |








