Deduplication for VM Storage Environments

I’ve been working with a co-worker who is performing a Tivoli Storage Manager migration, moving from a strictly tape-based backup scheme to a disk-to-tape scheme. As part of the new disk-to-tape scheme, he has configured TSM’s deduplication feature on the disk-based storage pool. For those who are not familiar with deduplication, this is a nifty technology that identifies duplicate blocks and replaces the non-unique blocks with pointers to the first inode on disk that represents the duplicate data. This has the benefit of reducing the amount of duplicate data that needs to be stored on disk. Depending on data type, this can have a huge implication on the amount of storage needed.

This deduplication got me to thinking…what if we could deduplicate our VMDK storage for our VMware environment. We have about 100 VMs operating in this environment, with a mix of Linux and Windows guests.  There must be a huge amount of duplicate data stored here. Some expensive enterprise grade iSCSI arrays include the ability to do deduplication on the array, but these come with a hefty price tag. I wonder if we could use some open-source software to “roll our own” deduplicated VM storage using deduplication software like lessfs or SDFS/opendedup (or a filesystem like ZFS that has dedup built-in) and an iscsi target software like iscsitarget.

