How can you replace a device in a Linux Software RAID?

The state of the active Linux software RAID devices can be viewed by running:

cat /proc/mdstat

Software RAID in Linux is implemented by the multiple devices (MD) driver. MD devices can be managed via the mdadm utility. Read the man page for more details on usage.


When an error is returned to the operating system on a member device of a MD device, the MD driver will mark the member device as failed. An example of a failed device is:

# cat /proc/mdstat
Personalities : [raid1]
read_ahead 1024 sectors
Event: 11
md4 : active raid1 sdb1[1] sda1[0](F)
      513984 blocks [2/1] [_U]
unused devices: <none>



n the output above, the (F) indicates that the device /dev/sda1 is in a failed state.

The array members status [_U] shows which member devices are Up. In this case it shows that the second numbered member (i.e. the device with [1]). If you are confused, simply look for the (F).

Disk Removal

The procedure is pretty similar for ATA, SATA, SAS and SCSI systems, but SCSI, SAS & SATA let you hot add/remove drives. This step only deals with hot removal.

  1. Determine the device name of the failed disk. It is absolutely imperative that you do not stuff this step up.

  2. Disable smartd as this will prevent us from pulling the device from the RAID array:

    service smartd stop || /etc/init.d/smartd stop
    ps -C smartd

  3. Gather the list of all md devices that contain the failed disc:

    grep /dev/FAILED /proc/mdstat

    Where FAILED is your disk that has failed, e.g sda

  4. Remove the drive from all MD devices. For all partitions on the drive with the failure run:

    mdadm -f /dev/mdX /dev/FAILED_PARTX
    mdadm -r /dev/mdX /dev/FAILED_PARTX

  5. For hotswap SCSI/SATA/SAS, tell Linux to remove the device:
    1. Firstly, get the drive id:

      # cat /proc/scsi/scsi
      Host: scsi0 Channel: 00 Id: 00 Lun: 00
        Vendor: FUJITSU  Model: MAW3073NC        Rev: 0104
        Type:   Direct-Access                    ANSI SCSI revision: 03
      Host: scsi0 Channel: 00 Id: 01 Lun: 00
        Vendor: FUJITSU  Model: MAW3073NC        Rev: 0104
        Type:   Direct-Access                    ANSI SCSI revision: 03

    2. In this case, we want to remove the first drive:

      echo "scsi remove-single-device" 0 0 0 0 > /proc/scsi/scsi

  6. Now remove the fail disk from the machine.

Disk Addition

At this stage you should have grub installed on the remain hard disks that contain /boot.

  1. Insert the new hdd into the server, it should be of equal or greater capacity. And equal or greater RPM spindle speed.
  2. Now you need re-detect the hard disk on the SCSI channel you removed one from earlier.

    echo "scsi add-single-device" 0 0 0 0 > /proc/scsi/scsi

  3. Confirm you can see the new hard disk when you run cat /proc/scsi/scsi

Partition Table Setup & Raid Re-Sync

Now that you have the replacement drive installed into the machine you want to setup the partition table on the disk so you can begin a raid re-sync.

The easiest way to copy the partition table from disk to another, is to use sfdisk.

  1. To copy the partition table from /dev/sdb to /dev/sda you would run

    sfdisk -d /dev/sdb | sfdisk /dev/sda

  2. Check that it’s re-silvering. MD will intelligently queue the partitions so the drives aren’t hammered by several parallel reads/writes.

    cat /proc/mdstat

Clean Up

At this stage you should have a machine suffering from heavy IO load as MDADM re-syncs the raid array. If you are finding the load is causing to much of an impact on your business operations, you can slow down the rate of syncing.

  • Sync Speed:

    1. To check the max rate of the raid re-sync on the md1 raid device you would run

      cat /sys/block/md1/md/sync_speed_max

    2. To change this value you would use the echo command like such

      echo "50000" > /sys/block/md1/md/sync_speed_max


    1. Now that the drive has been swapped, you want to start smartd again

      service smartd start