Converting RAID1 to RAID10 online

Schema of a RAID10 array
Schema of a RAID10 array (CC BY JaviMZN)

I have a (now old) HP microserver with 4 HDDs. I installed Ubuntu 14.04 (then in beta) on it on a quiet Sunday in February 2014. It is now running Ubuntu 16.04 and still working perfectly. However, I’m not sure what I thought on that Sunday more than 3 years ago. I had partitioned the 4 HDDs in a similar fashion each with a partition for /boot, one for swap and the last one for a BTRFS volume (with subvolumes to separate / from other spaces like /var or /home). My idea was to have the 4 partitions for /boot in RAID10 and the 4 ones for swap in RAID0. I realised today that I only used 2 partitions for /boot and configured them in RAID1, and only used 3 partitions for swap in RAID0.

I have a recurrent problem that because each partition for /boot was 256MB, therefore instead of having 512 (RAID10 with 4 devices) I ended up having only 256MB (RAID1), and that’s not much especially if you install the Ubuntu HWE (Hardware Enablement) kernels, then you quickly have problems with unattended-update failing to install security update because there is no space left on /boot, etc. It was becoming high maintenance and with 4 kids to attend I had to remediate that quickly.

But here is the magic with Linux, I did an online reshaping from RAID1 to RAID10 (via RAID0) and an online resizing of /boot (ext4). And in 15 minutes I went from 256MB problematic /boot to 512MB low maintenance one without rebooting!

That’s how I did it, and it will only work if you have mdadm 3.3+ (could work with 3.2.1+ but not tested) and a recent kernel (I had 4.10, but should have worked with the 4.4 shipped with Ubuntu 16.04 and probably older Kernel). Note that you should backup, test your backup and know how to recover your /boot (or whatever partition you are trying to change).

Increasing the size a RAID0 array (for swap)

First this is how I fixed the RAID0 for the swap (no backup necessary, but you should make sure that you have enough free space to release the swap). The current RAID0 is called md0 and is composed of sda3, sdb3 and sdc3. The partition sdd3 is missing.

$ sudo mdadm --grow /dev/md0 --raid-devices=4 --add /dev/sdd3
mdadm: level of /dev/md0 changed to raid4
mdadm: added /dev/sdd3
mdadm: Need to backup 6144K of critical section..
$ cat /proc/mdstat
md0 : active raid4 sdd3[4] sdc3[2] sda3[0] sdb3[1]
      17576448 blocks super 1.2 level 4, 512k chunk, algorithm 5 [5/4] [UUU__]
      [>....................]  reshape =  1.8% (105660/5858816) finish=4.6min speed=20722K/sec
$ sudo swapoff /dev/md0
$ grep swap /etc/fstab
UUID=2863a135-946b-4876-8458-454cec3f620e none            swap    sw              0       0
$ sudo mkswap -L swap -U 2863a135-946b-4876-8458-454cec3f620e /dev/md0
$ sudo swapon -a

What I just did is tell MD that I need to grow the array from 3 to 4 devices and add the new device. After that, one can see that the reshape is taking place (it was rather fast because the partitions were small, only 256MB). After that first operation, the array is bigger but the swap size is still the same. So I “unmounted” or turn off the swap, recreated it using the full device and “remounted” it. I grepped for the swap in my `/etc/fstab` file in order to see how it was mounted, here it is using the UUID. So when formatting I reused the same UUID so I did not need to change my `/etc/fstab`.

Converting a RAID1 to RAID10 array online (without copying the data)

Now a bit more complex. I want to migrate the array from RAID1 to RAID10 online. There is no direct path for that, so we need to go via RAID0. You should note that RAID0 is very dangerous, so you should really backup as advised earlier.

Converting from RAID1 to RAID0 online

The current RAID1 array is called m1 and is composed of sdb2 and sdc2. I’m going to convert it to a RAID0. After the conversion, only one disk will belong to the array.

$ sudo mdadm --grow /dev/md1 --level=0 --backup-file=/home/backup-md0
$ cat /proc/mdstat
md1 : active raid0 sdc2[1]
      249728 blocks super 1.2 64k chunks
$ sudo mdadm --misc --detail /dev/md1
/dev/md1:
        Version : 1.2
  Creation Time : Sun Feb  9 15:13:33 2014
     Raid Level : raid0
     Array Size : 249664 (243.85 MiB 255.66 MB)
   Raid Devices : 1
  Total Devices : 1
    Persistence : Superblock is persistent

    Update Time : Tue Jul 25 19:27:56 2017
          State : clean 
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

     Chunk Size : 64K

           Name : jupiter:1  (local to host jupiter)
           UUID : b95b33c4:26ad8f39:950e870c:03a3e87c
         Events : 68

    Number   Major   Minor   RaidDevice State
       1       8       34        0      active sync   /dev/sdc2

I printed some extra information on the array to illustrate that it is still the same array but in RAID0 and with only 1 disk.

Converting from RAID0 to RAID10 online

$ sudo mdadm --grow /dev/md1 --level=10 --backup-file=/home/backup-md0 --raid-devices=4 --add /dev/sda2 /dev/sdb2 /dev/sdd2
mdadm: level of /dev/md1 changed to raid10
mdadm: added /dev/sda2
mdadm: added /dev/sdb2
mdadm: added /dev/sdd2
raid_disks for /dev/md1 set to 5
$ cat /proc/mdstat
md1 : active raid10 sdd2[4] sdb2[3](S) sda2[2](S) sdc2[1]
      249728 blocks super 1.2 2 near-copies [2/2] [UU]
$ sudo mdadm --misc --detail /dev/md1
/dev/md1:
        Version : 1.2
  Creation Time : Sun Feb  9 15:13:33 2014
     Raid Level : raid10
     Array Size : 249664 (243.85 MiB 255.66 MB)
  Used Dev Size : 249728 (243.92 MiB 255.72 MB)
   Raid Devices : 2
  Total Devices : 4
    Persistence : Superblock is persistent

    Update Time : Tue Jul 25 19:29:10 2017
          State : clean 
 Active Devices : 2
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 2

         Layout : near=2
     Chunk Size : 64K

           Name : jupiter:1  (local to host jupiter)
           UUID : b95b33c4:26ad8f39:950e870c:03a3e87c
         Events : 91

    Number   Major   Minor   RaidDevice State
       1       8       34        0      active sync set-A   /dev/sdc2
       4       8       50        1      active sync set-B   /dev/sdd2

       2       8        2        -      spare   /dev/sda2
       3       8       18        -      spare   /dev/sdb2

As the result of the conversion, we are in RAID10 but with only 2 devices and 2 spares. We need to tell MD to use the 2 spares as well if not we just have a RAID1 named differently.

$ sudo mdadm --grow /dev/md1 --raid-devices=4
$ cat /proc/mdstat
md1 : active raid10 sdd2[4] sdb2[3] sda2[2] sdc2[1]
      249728 blocks super 1.2 64K chunks 2 near-copies [4/4] [UUUU]
      [=============>.......]  reshape = 68.0% (170048/249728) finish=0.0min speed=28341K/sec
$ sudo mdadm --misc --detail /dev/md1
/dev/md1:
        Version : 1.2
  Creation Time : Sun Feb  9 15:13:33 2014
     Raid Level : raid10
     Array Size : 499456 (487.83 MiB 511.44 MB)
  Used Dev Size : 249728 (243.92 MiB 255.72 MB)
   Raid Devices : 4
  Total Devices : 4
    Persistence : Superblock is persistent

    Update Time : Tue Jul 25 19:29:59 2017
          State : clean, resyncing 
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

         Layout : near=2
     Chunk Size : 64K

  Resync Status : 99% complete

           Name : jupiter:1  (local to host jupiter)
           UUID : b95b33c4:26ad8f39:950e870c:03a3e87c
         Events : 111

    Number   Major   Minor   RaidDevice State
       1       8       34        0      active sync set-A   /dev/sdc2
       4       8       50        1      active sync set-B   /dev/sdd2
       3       8       18        2      active sync set-A   /dev/sdb2
       2       8        2        3      active sync set-B   /dev/sda2

Once again, the reshape is very fast but this is due to the small size of the array. Here what we can see is that the array is now 512MB but only 256MB are used. Next step is to increase the file system size.

Increasing file system to use full RAID10 array size online

This cannot be done online with all file systems. But I’ve tested it with XFS or ext4 and it works perfectly. I suspect other file systems support that too, but I never tried it online. In all cases, as already advised, make a backup before continuing.

$ sudo resize2fs /dev/md1
resize2fs 1.42.13 (17-May-2015)
Filesystem at /dev/md1 is mounted on /boot; on-line resizing required
old_desc_blocks = 1, new_desc_blocks = 2
The filesystem on /dev/md1 is now 499456 (1k) blocks long.

$ df -Th /boot/
Filesystem     Type  Size  Used Avail Use% Mounted on
/dev/md1       ext4  469M  155M  303M  34% /boot

When changing the /boot array, do not forget GRUB

I already had a RAID array before. So the Grub configuration is correct and does not need to be changed. But if you reshaped your array from something different than RAID1 (e.g. RAID5), then you should update Grub because it is possible that you need different module for the initial boot steps. On Ubuntu run `sudo update-grub`, on other platform see `man grub-mkconfig` on how to do it (e.g. `sudo grub-mkconfig -o /boot/grub/grub.cfg`).

It is not enough to have the right Grub configuration. You need to make sure that the GRUB bootloader is installed on all HDDs.

$ sudo grub-install /dev/sdX  # Example: sudo grub-install /dev/sda

A Time Server in a Container – Part 1

GPL https://commons.wikimedia.org/wiki/File:Sablier-temps-icone-5376-128.png

To learn Docker in details I decided to use it to run a local time server using ntpd from the ntp.org project.

I have used an incremental approach where I started with an easy setup and then increased the challenges either to improve the time server or to better understand Docker.

Getting Started

So why ntpd and not <put your favourite time server>

I know ntpd for having configured it many times in the past 10 years. So I wanted to start with it first to quickly get time synchronisation working.

Which platform?

I have a Raspberry Pi (abbreviated RPi from now on) which serves as DHCP server and local forwarding and caching of DNS queries for my LAN. It had early support for Docker back in October when I started my experiment which added a bit of spice to it.

Getting Docker on a Raspberry Pi

There are many ways to get Docker running on your RPi. You could get the Hypriot OS Linux distribution which has everything setup nicely for running Docker containers. You can compile Docker on your platform of choice (which I had to do to squash a few early adopters’ bugs). You can install a tarball containing the binaries for your platform. But if running Raspbian Jessie – like I was – you can today just include Docker’s own repository and install a binary version using apt-get. Make sure your Kernel is recent (Docker requires 3.10 at least, but if you have a properly updated Raspbian it should be running 4.4 at the time of writing).

You can follow Docker’s installation guide for Debian, but by default it will install you the x86_64 Docker repository. As hinted in the documentation, for other architecture you need to use the [arch=...] clause. In addition, Docker provides a specific variant of the package for Raspbian. So for Raspbian Jessie, use the following entry for your docker.list file:

deb [arch=armhf] https://apt.dockerproject.org/repo raspbian-jessie main

Continue to follow the Docker guide, including how to set up non-root access to a specific user.

Creating a Docker image for ntpd

Create a specific folder somewhere on your Raspberry Pi storage (e.g. mkdir -p ~/projects/docker/ntpd) and create a file Dockerfile.armhf (I use the extension .armhf so I can have distinct Dockerfiles for each platform I use) with the following content:

FROM armhf/ubuntu:16.04
RUN apt-get update \
    && apt-get install -y --no-install-recommends ntp \
    && apt-get clean -q \
    && rm -Rf /var/lib/apt/lists/*
ENTRYPOINT ["/usr/sbin/ntpd"]

Note: This file as well as newer version of it and instructions to build and run the container are available on my GitHub ntp container project. In the rest of this blog post, I’m only going to detailed how I approach running the container and solve problems.

The first line state that the base image for the container will be Ubuntu 16.04 (the specific variant for RPi architecture). The second until the fifth lines are commands we execute on top of the base image, basically it updates the packages list to install the latest version of ntpd with the smallest dependencies, and it removes any cached or temporary files. So we minimise the size of the image on disk. Finally the last line, is the command that will be executed by Docker when instructed to run the container. I have used the term ENTRYPOINT because it allows me – while experimenting – to change the list of parameters I send to ntpd when I create the container and run it. This gives me flexibility with testing different parameters.

I picked up Ubuntu as the base image because it has sane default for the ntpd configuration file. It will use the NTP Pool project and the configuration is secured by default. Note that other base images could have also worked and have also sane default. I could have used Alpine Linux base image, it is really compact and lightweight, would have been perfect for a small platform like a Raspberry Pi, but it does not provide the ntpd packages from the NTP project which I wanted to start with. It only supports OpenNTPD (which does not support leap seconds, so it was a no go for me) and Chrony (which could be a good alternative but as I mentioned before I wanted to first experiment with Docker not learn yet another NTP application).

Let’s build the container image (I named the image “article/armhf/ntpd” and tagged it with the current date, but just name it like you want):

$ docker build -f Dockerfile.armhf -t article/armhf/ntpd:20170106.1 .

Running the NTP container

We are now going to spawn an instance of the container image in foreground to see what is going on and to notice any error:

$ docker run --rm -it article/armhf/ntpd:20170106.1 -n
 6 Jan 14:03:30 ntpd[1]: ntpd 4.2.8p4@1.3265-o Wed Oct  5 12:38:30 UTC 2016 (1): Starting
 6 Jan 14:03:30 ntpd[1]: Command line: /usr/sbin/ntpd -n
 6 Jan 14:03:30 ntpd[1]: Cannot set RLIMIT_MEMLOCK: Operation not permitted
 6 Jan 14:03:30 ntpd[1]: proto: precision = 1.198 usec (-20)
 6 Jan 14:03:30 ntpd[1]: Listen and drop on 1 v4wildcard 0.0.0.0:123
 6 Jan 14:03:30 ntpd[1]: Listen normally on 2 lo 127.0.0.1:123
 6 Jan 14:03:30 ntpd[1]: Listen normally on 3 eth0 172.17.0.2:123
 6 Jan 14:03:30 ntpd[1]: Listen normally on 4 lo [::1]:123
 6 Jan 14:03:30 ntpd[1]: Listening on routing socket on fd #21 for interface updates
 6 Jan 14:03:30 ntpd[1]: start_kern_loop: ntp_loopfilter.c line 1126: ntp_adjtime: Operation not permitted
 6 Jan 14:03:30 ntpd[1]: set_freq: ntp_loopfilter.c line 1089: ntp_adjtime: Operation not permitted
 6 Jan 14:03:31 ntpd[1]: Soliciting pool server 193.200.241.66
 6 Jan 14:03:32 ntpd[1]: Soliciting pool server 90.187.7.5
 6 Jan 14:03:32 ntpd[1]: adj_systime: Operation not permitted
 6 Jan 14:03:32 ntpd[1]: Soliciting pool server 129.70.132.37
 6 Jan 14:03:33 ntpd[1]: Soliciting pool server 85.25.210.112
 6 Jan 14:03:33 ntpd[1]: Soliciting pool server 31.25.153.77
 6 Jan 14:03:34 ntpd[1]: Soliciting pool server 178.63.9.212
 6 Jan 14:03:34 ntpd[1]: Soliciting pool server 193.22.253.13
^C 6 Jan 14:03:40 ntpd[1]: ntpd exiting on signal 2 (Interrupt)

We have a few errors (Operation not permitted) which I have highlighted above, one is about RLIMIT_MEMLOCK (this is about resetting the limit of the maximum locked-in-memory address space, ntpd uses it to forbid its main process from swapping to limit jitter) and the other ones are about ntp_adjtime and adj_systime (both are used by ntpd to interface with the Kernel and adjust the system time).

By default ntpd is running as root user, so it should have enough privilege for these operations. In addition, even though Docker supports running unprivileged containers (i.-e. the root user inside the container is mapped to a normal user on the host, this is based on user namespaces (see namespaces(7)), this is not the default Docker configuration, so my root user inside the container is the root user outside the container (and if Docker would be configured to use user namespace, they are not compiled in the Raspberry Pi foundation Kernel. So it is at the moment not possible to use that feature on a Raspberry Pi without some extra efforts, but I will details this in a future article).

In order to implement basic privilege limitations of container, Docker can use various security feature of the Linux Kernel to limit the container accessing certain sensible Kernel calls, the most notable ones are Linux Capabilities (since Docker 1.2), Linux SECCOMP filtering (since Docker 1.10, but better use Docker 1.12+ as pervious default SECCOMP profiles were in conflict with the Linux Capabilities management of Docker. In addition, the Raspbian Kernel (version 4.4 as of writing) has not the built-in support for SECCOMP filtering, so this functionality is not usable on Raspberry Pi, unless you compile your own Kernel) and Linux MAC (like SELinux or AppArmor, but none of them are available on Raspberry Pi without recompiling your own Kernel and installing the user space tools). So Docker on Raspberry Pi can only use Linux Capabilities as security feature.

By default Docker provides each container with a reasonable set of capabilities (see Docker documentation on capabilities). If you check both documentation (the Linux Capability manual and the Docker runtime privileges doc), you will find out that basically our container is missing the CAP_SYS_RESOURCE and CAP_SYS_TIME capabilities. Now there are 2 ways to add them, most online guide would tell you that when you run into “operations denied” errors, just add the --privilege flag to the docker run command line and it will be fixed, that’s the first way and it’s the wrong approach (sure it works, but it is like deactivating SELinux because you are not allowed to perform an operation). The other way is to add the missing capabilities to the container. This can be done by using the --cap-add flag. That’s what I’m going to show now:

$ docker run --rm -it --cap-add SYS_RESOURCE --cap-add SYS_TIME article/armhf/ntpd:20170106.1 -n
 7 Jan 11:19:24 ntpd[1]: ntpd 4.2.8p4@1.3265-o Wed Oct  5 12:38:30 UTC 2016 (1): Starting
 7 Jan 11:19:24 ntpd[1]: Command line: /usr/sbin/ntpd -n
 7 Jan 11:19:24 ntpd[1]: proto: precision = 1.823 usec (-19)
 7 Jan 11:19:24 ntpd[1]: Listen and drop on 0 v6wildcard [::]:123
 7 Jan 11:19:24 ntpd[1]: Listen and drop on 1 v4wildcard 0.0.0.0:123
 7 Jan 11:19:24 ntpd[1]: Listen normally on 2 lo 127.0.0.1:123
 7 Jan 11:19:24 ntpd[1]: Listen normally on 3 eth0 172.17.0.2:123
 7 Jan 11:19:24 ntpd[1]: Listen normally on 4 lo [::1]:123
 7 Jan 11:19:24 ntpd[1]: Listening on routing socket on fd #21 for interface updates
 7 Jan 11:19:25 ntpd[1]: Soliciting pool server 213.95.21.43
 7 Jan 11:19:26 ntpd[1]: Soliciting pool server 134.119.8.130
 7 Jan 11:19:26 ntpd[1]: Soliciting pool server 46.4.32.135
 7 Jan 11:19:27 ntpd[1]: Soliciting pool server 213.136.86.203
 7 Jan 11:19:27 ntpd[1]: Soliciting pool server 178.63.9.212
 7 Jan 11:19:27 ntpd[1]: Listen normally on 7 eth0 [fe80::42:acff:fe11:2%6]:123
 7 Jan 11:19:27 ntpd[1]: new interface(s) found: waking up resolver
 7 Jan 11:19:27 ntpd[1]: Soliciting pool server 46.165.212.205
 7 Jan 11:19:28 ntpd[1]: Soliciting pool server 109.239.58.247
 7 Jan 11:19:28 ntpd[1]: Soliciting pool server 131.188.3.221
 7 Jan 11:19:28 ntpd[1]: Soliciting pool server 78.46.189.152
 7 Jan 11:19:28 ntpd[1]: Soliciting pool server 195.50.171.101
^C  7 Jan 11:22:40 ntpd[1]: ntpd exiting on signal 2 (Interrupt)

To make sure this is working, first verify that you do not have any time synchronisation service running: $ sudo systemctl stop systemd-timesyncd ntp.

Then change the system time by shifting it by 5 seconds: $ sudo date -s "5 seconds".

Check that your system clock is now off by 5 seconds:

$ ntpdate -q time1.google.com
server 216.239.35.0, stratum 2, offset -5.002284, delay 0.14117
18 Jan 11:27:55 ntpdate[5217]: step time server 216.239.35.0 offset -5.002284 sec

Start the container in the background this time: $ docker run --name ntpd --detach --restart always --cap-add SYS_RESOURCE --cap-add SYS_TIME article/armhf/ntpd:20170106.1 -g -n

Wait a few seconds and query again the network time using the above ntpdate command. The offset should now be below 5 seconds and probably close to 0 second.

You have now a ntp service running inside a container and synchronising your system clock using Internet time servers from the NTP pool project. If you want to stop the experiment here and restore your system, you need to stop the container ($ docker stop ntpd) and block it from restarting at next boot ($ docker update --restart=no ntpd) and perhaps reboot so that you reactivate the default time synchronisation service.

But if you want to keep experimenting or let the container do its job of time synchronisation, you should make sure to deactivate any other time synchronisation mechanisms to avoid conflicts if you want to keep your NTP container running:

$ sudo timedatectl set-ntp false 
$ sudo systemctl disable ntp chronyd
$ sudo systemctl mask systemd-timesyncd
$ sudo systemctl stop systemd-timesyncd ntp chronyd

Foreword about Time and NTP on a Raspberry Pi

The Raspberry Pi (at the time of writing this applies to all models) has no real time clock (RTC) module on its board. A RTC is a small oscillator (e.g. quartz, like in your electronic wristwatch) plus some electronic to keep track of time and a battery (or equivalent). Those RTCs help a system keep track of time when there are off and in the early phases of boot. On a standard desktop or laptop computer the motherboard has an RTC. Many oscillators are not particularly accurate (low quality) with non-stable frequencies which can depend on external factors such as room temperature. It is possible to add a RTC module to the Raspberry Pi (I will have a detailed article on that soon), but without RTC you need a network connection in order for the RPi to know the current time.

On Linux, the kernel manage 2 clocks, the hardware clock (which is based on the RTC) and the system clock (which is the clock used by the system to query/set the time, this clock is ticking using a clocksource such as a CPU/SoC timer, Kernel jiffies, etc.). On boot, the current time is read from the hardware clock and is used to initialise the system clock. The system clock is then driven by the ticks from the selected clocksource and the time read at boot from the hardware clock. Usually, on shutdown, many Linux distribution are configured to store the system clock in the hardware clock.

The Raspberry Pi has maybe no hardware clock but it has a clock source (current clocksource on Raspberry Pi 2, other models may differ):

[    0.000000] arm_arch_timer: Architected cp15 timer(s) running at 19.20MHz (phys).
[    0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0x46d987e47, max_idle_ns: 440795202767 ns
[    0.000010] sched_clock: 56 bits at 19MHz, resolution 52ns, wraps every 4398046511078ns
[    0.000032] Switching to timer-based delay loop, resolution 52ns

So if the time is set on boot, the OS can keep track of the time even if disconnected and as long as it is up and running. Systemd 213 introduced a new service systemd-timesyncd which is a SNTP client implementation, so it is able to query a network time server and set the OS system time based on the response. This service has an extra feature for systems without RTCs, it saves the system time on disk on shutdown. So when your Raspberry Pi reboot, it can use the stored time to initialise its system time while waiting for more accurate time once the network is ready. Sure during the early boot process the system time might be off by a couple of seconds but it is better than nothing.

As for NTP, it is adjusting the system time based on responses from network time servers or when offline based on the clock drift NTP has been calculating for the current clock source. This means that if you run NTP, it is good to let it run at least 24hours so it can accurately measure the clock source drift and then it can compensate it during network disconnection periods. In addition, NTP will regularly sync back the system time to the hardware time to correct the RTC clock. In up coming articles, we will see how we can add a RTC to our Raspberry Pi and how to overcome the challenges of allowing RTC access to NTP inside the container and increasing the clock accuracy. In addition, we will see how we can become an NTP network time server for the local LAN.

What did we learn about Docker

First, we practiced the basics of building a container (the Dockerfile syntax and docker build ... command), running a container in foreground or background mode (docker run ...) and controlling the running container (docker stop ... and docker update ...). I did not yet elaborate much on the capabilities of these commands offer but it is my intention that we will discover them further as we progress we the experiment.

Second, we learned about some of Docker security measures (like Linux capabilities) and limitations of the current Raspberry Pi platform (like no SECCOMP filtering or AppArmor or user namespace), and we also learned how to extend a container permission by adding new capabilities.

Next to learn will be how to provide access to specific devices (such as an RTC), how to do simple monitoring (checking the container is running, its resource usage and logs), how to increase its security (dropping unnecessary capabilities, using the other security measures). With this quest we will learn a lot on the Raspberry Pi as well, we will add an RTC module, we will compile our own Kernels in order to add new security functions and improve the OS jitter, etc.

Setting Shared Folder Compression on Synology NAS (BTRFS)

disk-managementIf you have a Synology NAS that supports BTRFS (mostly the intel based NASes) and that you decided to use BTRFS, there are a couple of shared folders automatically created for you (like the “homes” or “video”) but they don’t have the “compression” option set, and trying to edit the shared folder in the administration GUI does not help, the check box is grayed out, meaning it is not possible.

BTRFS compression is quite “clever”. It has some heuristics that evaluate if a file is worth being compressed or not so it won’t try to compress the 1GB video of your toddlers playing together which is a waste of time given that the compression achieved might not be visible. But anyway, even if BTRFS is “clever” it does not mean that if you have a folder named video that you should consider using compression. Simply just don’t do it.

For folders with mixed data like “homes” (which is the shared folder for all user home directory) you might have wished Synology would have activated the compression. Or if you forgot to tick the check box once creating the volume, you might want to change it. But there is a way to change that. It is not guaranteed that it won’t break your NAS, especially if you do execute the wrong command, but if you don’t mind the risk then follow on.

BTRFS allows you to change the option on a live system without troubles. However, existing data on the shared folder won’t be compressed after activating the option, you would need to copy again the existing data to take benefits for it or defragment it using the compression option (-c see man btrfs-filesystem), however depending on your amount of data this might take a while.

To do it, you will need to activate SSH remote connection (try to limit it to your local LAN and do not open it to the internet unless you know what you are doing). You will need to connect via SSH using the administrator account (admin by default, but you would be wise to change the default name). I trust you know how to activate SSH on your NAS box, if not I would recommend you don’t try to do the rest of this article, ask someone who might know it! From a Linux or macOS (OS X) system, just open a terminal and type:

$ ssh <admin>@<hostname>

(and replace admin by the correct user account and hostname by your NAS hostname or IP address)

On Windows, you could use putty and achieve a similar fate.

Once connected, you need to know your BTRFS volume path:

$ mount -t btrfs
/dev/mapper/vg1-volume_1 on /volume1 [...]

In the above example, it is /volume1. Now you should have a BTRFS subvolume (think of it as a BTRFS internal sub partition which Synology uses to define shared folders) called “homes” (or whatever other shared folder you would like to tweak):

$ sudo btrfs subvolume list /volume1
[...]
ID 259 gen 1688 top level 257 path homes
[...]
ID 264 gen 1686 top level 257 path video

So here we have made sure that the “homes” shared folder is located on /volume1/homes. Now let us check its properties:

$ sudo btrfs property get /volume1/homes
ro=false

Here we can confirm that compression is not set (note that compression was not set as a mount option, nor at the volume root). To activate is, you need to create the “compression” property, you can choose either zlib or lzo. The former compress better but is slower, the latter is fast but as much lower compression ratio. I personnaly choose lzo:

$ sudo btrfs property set /volume1/homes compression lzo

You can use again the previous command to get the properties for the volume and see if it was set. Now you can copy your files to the shared folder, and BTRFS will try to compress them if it thinks it makes sense.

Picture credits: Picture is from the KDE project. The original materials is licensed under GNU LGPLv3.

How to verify your Synology NAS hard disks

I upgraded my 2 HDD in my Synology NAS to bigger ones. The change and rebuild of the RAID mirror was seemless. But I wanted to verify the health of the filesystems before growing the volumes. Here is how to do it.

Note: I am making the following assumptions, you know what you are doing, you activated SSH on your box and know how to connect as root, you know and understand how your NAS HDD have been configured, you have a wokring backup of your HDD data, you are not afraid of losing your data.

First find out what is the drive name of your volume(s) and also what is the filesystem type:

# mount
/dev/root on / type ext4 (rw,relatime,barrier=0,journal_checksum,data=ordered)
(...)
/dev/mapper/vol1-origin on /volume1 type ext4 (usrjquota=aquota.user,grpjquota=aquota.group,jqfmt=vfsv0,synoacl)
/dev/mapper/vol2-origin on /volume2 type ext4 (usrjquota=aquota.user,grpjquota=aquota.group,jqfmt=vfsv0,synoacl)

In my case, I have created 2 volumes on top of my mirror. The device on which these volumes are stored are /dev/mapper/vol1-origin and vol2-origin, they are both ext4 filesystems. But you probably do not have such a setup and only have one volume on top of your RAID array and your device might simply be /dev/md[x].

The fact that my devices were in /dev/mapper hinted me that they might be a LVM layer somewhere. So I executed the following command (harmless):

# lvs
LV                    VG   Attr   LSize    Origin Snap%  Move Log Copy%  Convert
syno_vg_reserved_area vg1  -wi-a-   12.00M
volume_1              vg1  -wi-ao    1.00T
volume_2              vg1  -wi-ao    1.00T

So my 2 volumes are LVM logical volumes. Now that I have this information, I can do the following to verify the filesystems’ health. First of all, shutting down most services and unmounting the filesystems:

# syno_poweroff_task

Now if you did not have LVM but rather a /dev/md[x] device, you can simply do (if you have an ext2/ext3/ext4 filesystem only, and replace the ‘x’ by the correct number):

# e2fsck -pvf /dev/mdx

But if like me you have LVM, then you will need a few extra steps. The ‘syno_power_off’ has probably deactivated the LVM logical volumes, to be sure check the “LV Status” given by the next command (harmless, not the complete output is here given):

# lvdisplay
--- Logical volume ---
LV Name                /dev/vg1/volume_1
VG Name                vg1
LV UUID                <UUID>
LV Write Access        read/write
LV Status              NOT available
LV Size                1.00 TB

As you can see the logical volume is not available. We need to make it available so that the link to the logical volume is accessible:

# lvm lvchange -ay vg1/volume_1
# lvdisplay
--- Logical volume ---
LV Name                /dev/vg1/volume_1
VG Name                vg1
LV UUID                <UUID>
LV Write Access        read/write
LV Status              available
# open                 0
LV Size                1.00 TB

The status has changed now to “available”, so we can proceed with the filesystem verification:

# e2fsck -pvf /dev/vg1/volume_1

To finish this, you need to remount and restart all the stopped services. I do not know a specific Synology command to do that, so I simply rebooted the machine:

# shutdown -r now

Home Server – What do I want?

What service do I want to run on my Home Server?

I do have a NAS already which has the following services: File Sharing (Samba, AFS and NFS), Media Streaming Server (DLNA), VPN Server, Cloud Sync Repository. So I do not intend to have redundant services on my Home Server. What is left?

My Home Server could support:

  • Backup: Having a proper backup of all important files from the NAS and our laptop. Implementations: rdiff-backup, Box Backup, fwbackups*, duplicity*, rsnapshot or storeBackup.
  • (N)-IDS: As I have services open to the internet, I want to take some precautions and check that no exploits is taken advantage of. I am not sure this is enough, but it is the least I can do. Implementations: AIDE or Suricata.
  • DNS cache/server: I am thinking of hosting my own DNS server to perform some caching and hopefully enhance a bit the browsing experience in terms of performance. Though I would need to benchmark this to make sure I have any gain as I suspect my old router to do some caching. Implementation: dnsmasq.
  • DHCP server: My home router is a Netgear WG614 and its features for what concern DHCP are fairly limited, having my home server addressing this issue is a nice idea (until we get a better router). I could be even tightly coupled with the DNS server (see earlier bullet point) so that one could use hostname within the local network. Implementation: dnsmasq.
  • Syslog server
  • Maybe – ownCloud: maybe one day I would prefer to use an open source solution for Cloud Sync rather than the closed source one from my NAS vendor.

*: FreeBSD support is uncertain.

As one can see, I could use Linux or BSD based OS or a mixture. However, ZFS is so compelling that I am seriously considering to go for FreeBSD+jails and basta cosi! February will be the month where I try to set-up a FreeBSD server.

My Future Home Server – Part 2

I am experimenting with different OS to find the right settings for my Home Server. I was interested by Fedora especially because there are several “Red Hat” technology which I would like to use on my server, namely: oVirt and virt-manager. Furthermore it sports a recent Linux Kernel (3.7 as of this writing) which could be beneficial if I choose Btrfs for the underlying file system.

However, testing the upgrade path from Fedora 17 to Fedora 18, I am not so thrilled by the robustness of this OS. I have managed after painfully hitting 3 different blocking bugs to recover from the upgrade and have a nice Fedora 18 up and running. But this gave me little trust in the Q&A of the community. It seems that it is not the first time such problems happen (see Fedora 11).

I am still willing to give a go to Fedora. But out of precaution, I am going to experiment first with Ubuntu (for which I had since 2006 only once an upgrade problem). I want to see the state of oVirt and virt-manager on this OS before I am making any choice.

Or maybe I forget entirely about Linux based OS, and I go for FreeBSD with several jails instead of using virtualisation. Though I would need to check the state of technologies like ownCloud, (n)IDS, etc. on this OS.

My Future Home Server – Part 1

I have finally my Home Server built, it has its first storage hard drive and I upgraded the memory to something decent. Time to install the operating system.

I am not yet fully decided which operating system to implement on my Home Server, I would love ZFS as a file system for managing my storage, but I would still want to use Linux and not make the full switch to BSD. I decided to go for Fedora as the main OS, and install BSD in a virtual machine and see how this setup performs.

I had tried for a few month Fedora 17 in a virtual machine, I liked it, although I prefer the Debian package manager over yum, but this is really based on my own feelings and not on technical grounds.

So let’s go and install Fedora 18 (just released) on my server.

Continue reading “My Future Home Server – Part 1”

ZFS on Linux

In my previous post, I was stating that ZFS on Linux was not mature enough. The native ZFS port to Linux, although active, is still in release candidate stage and requires significant work to install. As for the ZFS FUSE version, it is still a 0.7 version not updated for long but it is easy to install on Ubuntu as it is available in the Software Centre (the link only works if your system supports the ‘apt:‘ scheme like on Ubuntu).

I have tried and installed the later, and although I cannot give any conclusion from a stability/reliability point of view, I was able to perform successfully the same steps I had performed on FreeBSD using ZFS.

Btrfs – Linux answer to ZFS

Sadly ZFS on Linux is not at the same maturity level than on FreeBSD (or even Solaris). There is a FUSE implementation but it is now more than 16 month since anything happen there, and in my opinion not yet stable. Regarding native ZFS port, only one ZFS implementation for Linux is still developed by the Lawrence Livermore National Laboratory but it is still a release candidate version.
The state of ZFS on Linux is perhaps not too good today, but there is another file system in development and good support that could soon compete with ZFS, its name is btrfs (pronounce ‘butter-fs‘). Btrfs is still experimental
Yesterday, one of my virtual machines running Oracle Linux 6.3 got its root file system full, as it was configured with LVM it was not so much trouble but I wanted to try btrfs. I decided to move the /var to another partitions using btrfs. I have created a new hard disk in my VM and started it. Here is the rest of the story.

Warning: following these instructions might break your system. As an advice, create a virtual machine and experience with it before doing so on a real system.

Continue reading “Btrfs – Linux answer to ZFS”

Securing ZFS data by mirroring them

This article is a follow-up of an earlier post about ZFS on FreeBSD. We have created a ZFS pool with one disk and put some data on it. Now we want to mirror the data to safeguard them from disk failure.

In my virtual machine I created a new disk of the same size than previous ZFS dedicated disk and fire-up the machine.

Creating a dataset with 2 internal copies for each file

But before I added the second disk, I decided to create a dataset (of the file system type) inside the pool I have created in previous article. The dataset will be configured to replicate internally the data for safety. This is an entirely optional step which I did just to experiment with ZFS.

The reader should notice that my pool had only 1 drive which means that each file in this dataset will appear twice on the same drive. If the drive fails, everything is lost. It just help if one version of the file gets corrupted, ZFS will detect it and use the (hopefully) uncorrupted copy to restore the file.

# zfs create -o copies=2 laug/safe

Note about: mirror/striped pools and dataset copies

Dataset copies are in addition to any pool configuration such as mirroring or RAID-Z. In case of a stripped pool (the case if you use zpool add command), ZFS will try to use different disks in the pool for each copy, if it can! In case of mirrors (the case if you use zpool attach command) or RAID-Z, in addition to the pool duplication of data, ZFS will try to keep extra copies on different drives.

Preparing the second ZFS drive and adding it as a mirror to the existing pool

As the hard disk is exactly of the same size (same disk space and number of sectors) I can reuse the commands from the previous articles:

gpart create -s gpt ada2
gpart add -b 2048 -s 41932733 -t freebsd-zfs -l disk01 ada2

But now we are going to add the new disk to the existing pool in a mirror configuration. For this we use zpool attach:

# zpool attach laug ada1p1 ada2p1
# zpool status
  pool: laug
 state: ONLINE
 scan: resilvered 1.37M in 0h0m with 0 errors on Tue Jul 31 18:16:43 2012
config:

        NAME        STATE     READ WRITE CKSUM
        laug        ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            ada1p1  ONLINE       0     0     0
            ada2p1  ONLINE       0     0     0

errors: No known data errors

As I don’t have much data on my pool, the resilvering was fast (see the scan message). In addition, one can see that the 2 disk partitions are now inside a mirror.

I really like ZFS, the command line interface is clean, it is easy to manage and it is powerful.