Thursday, October 25, 2012

Parallelism in disk readout?

Suppose you have a high performance RAID array; just for laughs say RAID6+1. The read speed is significantly faster than the read speed for a single disk, and the mirroring helps preserve the data from loss when one drive dies. Seems cool, though pricy; and people use them all the time.

But how easy is it to back up? Don't ask. Unless you get the full bandwidth it takes a long time to back up 80TB or so, and even with full bandwidth it takes a while.

Suppose your chassis was designed for a particular RAID configuration (instead of being customer-selectable). Let it be, instead of Nx2, Nx3. Instead of having a single mirror, you have 2, one of which is read/write and the other of which is "write only" for the main controller.

However, each of the drives in that third "write-only" row have their own independent controller. (For the moment forget about spares. They add some complexity but nothing overwhelming.) Each controller is linked to a multiplexer that manages 2 or 3 drives and communicates over a fibre-channel connection.

Now suppose you have a second chassis with N drives, each of which has its own controller similarly linked to a multiplexer. A bundle of fibre-channel cables connects one chassis to the other.

  1. The chassis is told that a backup is at hand, and senses the presence and health of the other chassis
  2. The chassis pauses all pending reads and writes
  3. When all active reads and writes have completed, the controller takes a snapshot, and releases the pending reads and writes
  4. The controller releases all drives in the third (write-only) row and instructs the sub-controllers to begin
  5. Each sub-controller communicates with its companion in the backup chassis, and writes the blocks on its disk to the other.
  6. When the last copy is done, the sub-controllers relinquish their disks back to the main controller.
  7. When the main controller has full control again, it pauses pending reads and writes again.
  8. When all active reads and writes have completed, the controller takes another snapshot, and releases the pending reads and writes
  9. The controller uses the incremental change between snapshots to update the disks in the third row.
  10. When the update is complete, another pause and snapshot is required to take into account the changes that happened while the update was happening
  11. A few iterations may be needed until the incremental is small enough that the system may be acceptably frozen long enough to make the final update.
  12. The three rows are now again in sync, and the backup chassis holds a consistent backup of the system. It can be removed and stored, or whatever is desired.

Restoration uses the backup chassis as a source and a high-performance one as a target.

How long will such a backup take? A 3TB disk being read at 250GB/hour will take about 6 hours. That's better than a full backup any other way I know of (I'm not counting synthetic full backups. If you lose one of the "deltas" you're up the creek.)

I shudder to think how much this would cost. But I wish we had a few at work.

Labels: