Testing simple ZFS failure on FreeBSD

ZFS is one of the most robust and fault tolerant filesystems around. It can also be fairly tricky to set up and maintain so I’m writing at least one blog post based on my experiences setting up my NAS. I definitely did make some mistakes when I configured my storage but it’s served me well for a while now. Unfortunately, other parts of the motherboard I selected have been a bit temperamental. See https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=213877 for details. By writing this I’m hoping I feel a bit more confident with my upcoming drive replacement – one of the issues I have is only 4 storage drives, so increasing the total storage available means replacing all of the drives, one by one.

An aside: VirtualBox is frustrating as hell

While writing this blog post, I added and removed a fair few drives, as well as scrapping the VM configuration completely and starting again. For the benefit of future searchers, if you hit an error like:

VirtualBox error message

UUID {c56e7d80-8abc-44db-a27c-669a8c98162c} of the medium 'F:\FreeBSD Test\rootfs.vdi' does not match the value {bad50d40-beeb-40a4-9e03-3a98bb85fe5e} stored in the media registry ('C:\Users\live\.VirtualBox\VirtualBox.xml').

when starting your virtual machine, have a read of https://forums.virtualbox.org/viewtopic.php?p=368077&sid=6ea45b253eb27bb8cf202f1857f3a2ce#p368077. Most of the rest of the thread is irrelevant.

The “bad” UUID wasn’t actually stored in the path shown, but in the .vbox file – the definition of the VM. Close VirtualBox’s interface, update the UUID and away we go again. Of course, if you’ve got important data in your VM, you may want to clone and reattach instead of editing files or modifying the drive UUID itself.

Virtual Machine Setup

I’m using VirtualBox 5.1.8 on a Windows 10 host. Newer versions should be fine, and the host OS shouldn’t matter.
To install FreeBSD, I’m using FreeBSD-11.0-RELEASE-amd64-disc1.iso, although most other recent releases should work.

Create a VM that looks something like this, but don’t add any disks yet
One of the initial VM creation screens

To speed things up, I’m creating disks using the vbox-img command instead of via the GUI

PS F:\FreeBSD Test> vbox-img.exe createbase --filename rootfs.vdi --size 10240000000
PS F:\FreeBSD Test> vbox-img.exe createbase --filename disk1.vdi --size 10240000000
PS F:\FreeBSD Test> vbox-img.exe createbase --filename disk2.vdi --size 10240000000
PS F:\FreeBSD Test> vbox-img.exe createbase --filename disk3.vdi --size 10240000000
PS F:\FreeBSD Test> vbox-img.exe createbase --filename disk4.vdi --size 10240000000
PS F:\FreeBSD Test> vbox-img.exe createbase --filename disk5.vdi --size 10240000000

You will actually need the full 50GB free by the end of this post.

Disk setup in VirtualBox

Set up the disks so they look like this in VirtualBox, with hotplugging enabled for every drive except the root FS.
The only other thing I needed to change was adding an IDE controller as VirtualBox didn’t want to boot from a SATA attached CD drive for unknown reasons

Adding an IDE controller so VirtualBox will boot

For posterity, here’s the other VM settings I used, although I’m not sure if they’re needed.

4-other-settings-1

5-other-settings-2

FreeBSD Setup

I won’t go through installing FreeBSD here, except for the following:
* Install the system to the first drive shown, ada0. The defaults should be fine
6-freebsd-install
* Install an SSH server
* You may want to change to Bridged Networking in VirtualBox – I couldn’t SSH in using the VirtualBox network
* Log in as root on first boot and add a user through the console as FreeBSD sets PermitRootLogin no for sshd by default
* I install sudo and nano to make my life easier

At this point you should be able to sudo su to root, and see the following setup

root@vm-freebsd:/usr/home/voltagex # camcontrol devlist
<VBOX CD-ROM 1.0>                  at scbus0 target 0 lun 0 (cd0,pass0)
<VBOX HARDDISK 1.0>                at scbus2 target 0 lun 0 (ada0,pass1)
<VBOX HARDDISK 1.0>                at scbus3 target 0 lun 0 (ada1,pass2)
<VBOX HARDDISK 1.0>                at scbus4 target 0 lun 0 (ada2,pass3)
<VBOX HARDDISK 1.0>                at scbus5 target 0 lun 0 (ada3,pass4)
<VBOX HARDDISK 1.0>                at scbus6 target 0 lun 0 (ada4,pass5)

I have based my ZFS configuration on http://jrs-s.net/2015/02/06/zfs-you-should-use-mirror-vdevs-not-raidz/ – my NAS only has 4+1 disk bays, so I’ve matched that configuration here. Apparently it’s better to have 6 devices, but I can’t yet bend space and time, nor afford to import another SuperMicro chassis to Australia.

For this test, we’re going to create two mirrors and ignore the cache.

root@vm-freebsd::~ # zpool create tank mirror ada1 ada2 mirror ada3 ada4

 root@vm-freebsd:~ # zpool status
  pool: tank
  state: ONLINE
  scan: none requested
config:

    NAME        STATE     READ WRITE CKSUM
    tank        ONLINE       0     0     0
      mirror-0  ONLINE       0     0     0
        ada1    ONLINE       0     0     0
        ada2    ONLINE       0     0     0
      mirror-1  ONLINE       0     0     0
        ada3    ONLINE       0     0     0
        ada4    ONLINE       0     0     0

errors: No known data errors
root@vm-freebsd:~ # zfs list
NAME   USED  AVAIL  REFER  MOUNTPOINT
tank    70K  18.4G    19K  /tank

Good enough for our testing purposes

Now, create a dataset.

root@vm-freebsd:~ # zfs create tank/data
root@vm-freebsd:~ # cd /tank/data/

Let’s create a 4GB file of random data to represent the important data installed on our NAS – the hardest part of this was working out what syntax to use for arithmetic. This is csh, we’re not in ~~Kansas~~ bash any more.

root@vm-freebsd:/tank/data # dd if=/dev/random of=file1.dat bs=`expr 1024 * 1024` count=4000
4000+0 records in
4000+0 records out
4194304000 bytes transferred in 93.065192 secs (45068451 bytes/sec)

After all of that, it's time to get to the point of this article – what happens when things go wrong?

In the virtual machine settings, go to Storage and remove the fourth disk. Annoyingly you can remove a disk (because we turned on hot-plugging) but not add one while the VM is turned on.

Most server/NAS boards are going to support hot plugging, even if the idea of ripping out a spinning drive worries me a little bit.

Now that we've removed a drive, what does our pool look like?

root@vm-freebsd:/tank/data # zpool status
  pool: tank
 state: DEGRADED
status: One or more devices has been removed by the administrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using 'zpool online' or replace the device with
        'zpool replace'.
  scan: none requested
config:

NAME                      STATE     READ WRITE CKSUM
    tank                      DEGRADED     0     0     0
      mirror-0                ONLINE       0     0     0
        ada1                  ONLINE       0     0     0
        ada2                  ONLINE       0     0     0
      mirror-1                DEGRADED     0     0     0
        ada3                  ONLINE       0     0     0
        12153649206614195490  REMOVED      0     0     0  was /dev/ada4

errors: No known data errors

Oh no, drive failure! Lucky we have smartmontools running and smtpd set up to email a real email address*, so I was able to catch the problem (that I just caused)

  • I wonder how many users actually have this set up. I still log on and check mutt when I remember.

If you “plug” the disk back in via the same Storage settings, you’ll need to run zpool online tank ada4 and you’ll be able to see

root@vm-freebsd:/tank/data # zpool status
  pool: tank
 state: ONLINE
  scan: resilvered 19.5K in 0h0m with 0 errors on Sun Nov  6 22:54:01 2016
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            ada1    ONLINE       0     0     0
            ada2    ONLINE       0     0     0
          mirror-1  ONLINE       0     0     0
            ada3    ONLINE       0     0     0
            ada4    ONLINE       0     0     0

In a real server, it’d take a lot longer to “resilver” and replace the disk in the array. In this configuration, if you lose a second disk while the resilver is happening, you lose everything.

Let’s try a slightly more realistic situation. If a drive has failed, you’d plug in a new one (disk5.vdi in my example) and replace old with new. I thought FreeBSD would assign a new device name but it took the old ada4, leading to this slightly confusing command

<br />root@vm-freebsd:/tank/data # zpool status
  pool: tank
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sun Nov  6 23:00:53 2016
        2.23G scanned out of 3.91G at 127M/s, 0h0m to go
        1.11G resilvered, 57.16% done
config:

        NAME                        STATE     READ WRITE CKSUM
        tank                        DEGRADED     0     0     0
          mirror-0                  ONLINE       0     0     0
            ada1                    ONLINE       0     0     0
            ada2                    ONLINE       0     0     0
          mirror-1                  DEGRADED     0     0     0
            ada3                    ONLINE       0     0     0
            replacing-1             OFFLINE      0     0     0
              12153649206614195490  OFFLINE      0     0     0  was /dev/ada4/old
              ada4                  ONLINE       0     0     0  (resilvering)

To make the resilver take a bit longer for my next test, so I’ve created a couple more 4GB files. Then I’m going to cause a kernel panic.

root@vm-freebsd:/tank/data # sysctl debug.kdb.panic=1

Yes, if you run this as root it will actually take your system down. Do(n’t) try this at home.

After rebooting and logging in again via SSH, has my pool survived? Aside from one slightly scary message when I first ran zpool status (1 byte a second?!), it seems to be running fine.

root@vm-freebsd:~ # zpool status
  pool: tank
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sun Nov  6 23:13:54 2016
        1.60G scanned out of 11.7G at 1/s, (scan is slow, no estimated time)
        818M resilvered, 13.64% done
config:

        NAME                       STATE     READ WRITE CKSUM
        tank                       DEGRADED     0     0     0
          mirror-0                 ONLINE       0     0     0
            ada1                   ONLINE       0     0     0
            ada2                   ONLINE       0     0     0
          mirror-1                 DEGRADED     0     0     0
            ada3                   ONLINE       0     0     0
            replacing-1            DEGRADED     0     0     0
              6482039156816553123  OFFLINE      0     0     0  was /dev/ada4/old
              ada4                 ONLINE       0     0     0

errors: No known data errors
root@vm-freebsd:/usr/home/voltagex # zpool status
  pool: tank
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sun Nov  6 23:13:54 2016
        2.70G scanned out of 11.7G at 93.5M/s, 0h1m to go
        1.34G resilvered, 22.99% done
config:

        NAME                       STATE     READ WRITE CKSUM
        tank                       DEGRADED     0     0     0
          mirror-0                 ONLINE       0     0     0
            ada1                   ONLINE       0     0     0
            ada2                   ONLINE       0     0     0
          mirror-1                 DEGRADED     0     0     0
            ada3                   ONLINE       0     0     0
            replacing-1            DEGRADED     0     0     0
              6482039156816553123  OFFLINE      0     0     0  was /dev/ada4/old
              ada4                 ONLINE       0     0     0  (resilvering)

Okay, what about my data?

root@vm-freebsd:~ # ls /tank/data
ls: /tank/data: No such file or directory

Uhh…

root@vm-freebsd:~ # zfs mount -a
root@vm-freebsd:~ # ls /tank/data/
file1.dat file2.dat file3.dat

Phew, looks good.

I’d like to find out if there’s any other failure modes I should know about (could I “remove” the cache from my NAS by re-purposing it as the boot drive?) but at least I know I’m not going to lose my data if my system decides to reboot halfway through resilvering, and that removing and replacing a drive isn’t really that scary.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s