rpm -ifv
command to install.tw_cli /c0 show
c0
).
Unit UnitType Status %RCmpl %V/I/M Stripe Size(GB) Cache AVrfy
------------------------------------------------------------------------------
u0 RAID-1 OK - - - 931.312 RiW ON
u1 RAID-5 DEGRADED - - 64K 11175.8 RiW ON
u2 RAID-5 INOPERABLE - - 64K 11175.8 RiW ON
VPort Status Unit Size Type Phy Encl-Slot Model
------------------------------------------------------------------------------
p0 OK u0 931.51 GB SAS 0 - SEAGATE ST31000424SS
p1 OK u0 931.51 GB SAS 1 - SEAGATE ST31000424SS
p2 OK u1 2.73 TB SATA 2 - Hitachi HDS723030AL
p3 OK u1 2.73 TB SATA 3 - Hitachi HDS723030AL
p3 OK u1 2.73 TB SATA 3 - Hitachi HDS723030AL
p4 SMART-FAILURE u1 2.73 TB SATA 3 - Hitachi HDS723030AL
p5 SMART-FAILURE u1 2.73 TB SATA 3 - Hitachi HDS723030AL
p6 OK u1 2.73 TB SATA 6 - Hitachi HDS723030AL
p7 OK u1 2.73 TB SATA 7 - Hitachi HDS723030AL
Removing Damaged Disks from the RAID Array
# tw_cli /c1/p4 remove
Exporting port /c0/p4 …
Done
on the p4 (degrated) disk, this worked a charm… However, when I tried to do the same on the p5 disk, by running:
tw_cli /c1/p5 remove
I got the following error:
Remove operation invalid for unit
Then I realized that the disk had been moved to its own array /c2. However, running:
tw_cli /c2/p5 remove
also gave an error.
# tw_cli maint deleteunit c2 u2 Deleting unit c2/u2 … Done.
Now, rerunning the show command:
# tw_cli /c0 show
reveals the following about the p5 drive
p5 OK - 2.73 TB SATA 7 - HITACHI HDS723030AL
Now it can be added to the c0/u1 raid unit as a spare:
# tw_cli /c0 add type=spare disk=5 Creating new unit on controller /c0 … Done. The new unit is /c0/u2
OK, looks like not much has happened, we’ve just removed unit u2, but then added it right back in again as a spare. However, whereas the complete unit u2 can only be deleted and not removed, a ‘spare’ drive can be removed from anywhere.
So, if you now run
tw_cli /c2/p5 remove
you get the message:
Exporting port /c0/p4 … Done
the drive has been removed cleanly. Both the p4 (degraded) and the p5 (inoperable) can now be removed from the system. Simply take out the drive bays and remove the screws holding the physical drives in place. Replace with the new drives and slot the new drives back into the machine.
Re-building the RAID with the New Drives
# tw_cli /c0 show
This will give output similar to that below:
As you can see, both the new drives have been recognized, but are not yet part of the array. To add the drives to the array, put them in as hot spares:
# tw_cli /c0 add type=spare disk=4:5
(this adds both drives at the same time, if you only have one drive you only need to specify the port number for that drive).
The controller will now add both drive as spares and it will then swap one of the drives into the array and start rebuilding the array.
If you issue a show command, you will see input similar to the following:
# tw_cli /c0 show
Unit UnitType Status %RCmpl %V/I/M Stripe Size(GB) Cache AVrfy
------------------------------------------------------------------------------
u0 RAID-1 OK - - - 931.312 RiW ON
u1 RAID-5 DEGRADED - - 64K 11175.8 RiW ON
VPort Status Unit Size Type Phy Encl-Slot Model
------------------------------------------------------------------------------
p0 OK u0 931.51 GB SAS 0 - SEAGATE ST31000424SS
p1 OK u0 931.51 GB SAS 1 - SEAGATE ST31000424SS
p2 OK u1 2.73 TB SATA 2 - Hitachi HDS723030AL
p3 OK u1 2.73 TB SATA 3 - Hitachi HDS723030AL
p4 OK u1 2.73 TB SATA 3 - Hitachi HDS723030AL
p5 Spare u1 2.73 TB SATA 3 - Hitachi HDS723030AL
p6 OK u1 2.73 TB SATA 6 - Hitachi HDS723030AL
p7 OK u1 2.73 TB SATA 7 - Hitachi HDS723030AL
If you want to monitor the rebuild process, issue the following command:
watch /sbin/tw_cli /c0 show
This will update information from the show command every 2 seconds, continually. Escape this process by using the [Ctrl]-[c] command.
Whilst the array is re-building (and this can cake a while for a large array) do not use the server for anything else. Indeed, it’s probably best to do this either over the weekend or at the end of the working day, so the server has uninterrupted time to re-build the array.
If you are paranoid, restart the server in single user mode. If you are using LILO as the boot loader, at the LILO boot prompt click [Ctrl]-[x] to exit the graphical screen. At the boot: prompt, type:
linux single
If you are using GRUB as the boot loader, , use the following steps to boot into single-user mode:
- 1. If you have a GRUB password configured, type p and enter the password.
- Select Red Hat Enterprise Linux with the version of the kernel that you wish to boot and type a to append the line.
- Go to the end of the line and type single as a separate word (press the [Spacebar] and then type single). Press [Enter] to exit edit mode.
- Back at the GRUB screen, type b to boot into single-user mode.