Parallelism in disk readout?
But how easy is it to back up? Don't ask. Unless you get the full bandwidth it takes a long time to back up 80TB or so, and even with full bandwidth it takes a while.
Suppose your chassis was designed for a particular RAID configuration (instead of being customer-selectable). Let it be, instead of Nx2, Nx3. Instead of having a single mirror, you have 2, one of which is read/write and the other of which is "write only" for the main controller.
However, each of the drives in that third "write-only" row have their own independent controller. (For the moment forget about spares. They add some complexity but nothing overwhelming.) Each controller is linked to a multiplexer that manages 2 or 3 drives and communicates over a fibre-channel connection.
Now suppose you have a second chassis with N drives, each of which has its own controller similarly linked to a multiplexer. A bundle of fibre-channel cables connects one chassis to the other.
- The chassis is told that a backup is at hand, and senses the presence and health of the other chassis
- The chassis pauses all pending reads and writes
- When all active reads and writes have completed, the controller takes a snapshot, and releases the pending reads and writes
- The controller releases all drives in the third (write-only) row and instructs the sub-controllers to begin
- Each sub-controller communicates with its companion in the backup chassis, and writes the blocks on its disk to the other.
- When the last copy is done, the sub-controllers relinquish their disks back to the main controller.
- When the main controller has full control again, it pauses pending reads and writes again.
- When all active reads and writes have completed, the controller takes another snapshot, and releases the pending reads and writes
- The controller uses the incremental change between snapshots to update the disks in the third row.
- When the update is complete, another pause and snapshot is required to take into account the changes that happened while the update was happening
- A few iterations may be needed until the incremental is small enough that the system may be acceptably frozen long enough to make the final update.
- The three rows are now again in sync, and the backup chassis holds a consistent backup of the system. It can be removed and stored, or whatever is desired.
Restoration uses the backup chassis as a source and a high-performance one as a target.
How long will such a backup take? A 3TB disk being read at 250GB/hour will take about 6 hours. That's better than a full backup any other way I know of (I'm not counting synthetic full backups. If you lose one of the "deltas" you're up the creek.)
I shudder to think how much this would cost. But I wish we had a few at work.
Labels: Computers