Lets start out with saying that ZFS on Linux (ZoL) is pretty awesome, snapshots, compression-on-the-fly, quota’s, … I love it. However ZoL also has issue’s, I even blogged about some in the past. (a full ZFS pool, failing to load ZFS module) Most cases where my own fault, but still, its very uncommon for a file system to be such a pain in the ass.


The last few days, I tried to find out why, my installation was so slow. Slow in this context is hard to define. Sometimes users complained about slow ls runs, while write speed was decent enough … read speed was also fine, perhaps NFS was at fault, perhaps disks, controller, … ?

Long story short, I looked for some tuning options, as the defaults most likely aren’t perfect. Before I take all the credit, I was masterly helped by senior systems engineer, ewwhite in this matter. So lets go over some “simple” tune options during initial creation.

The setup

  • Centos 7.3 (up-to-date, nightly)
  • 128GB RAM
  • Xeon E5-2620 v3
  • 3 * raidz2 with each 12 x 4Tb drives SAS 3 (HGST Ultrastar 7K6000)
  • 2 * Intel SSD DC P3600 Series PCI-E, 400GB (used for cache and in mirror logs, partitioned)
  • ZFS on Linux : v0.6.5.8

Bragging rights, right ? Sadly its for work… (or happy for my power bill)

Tuning during creation

Setting these value’s can be done on a live system, however most of them will only apply for new data or rewrites, so try to do them when creating a pool. Setting them to a pool is done :

# in case you want to apply to a specific subpool
zfs set parameter=value pool/subpool

# pool and all not locally set subpools, will receive this parameter
zfs set parameter=value pool

Sub-pools inherit from the pool above them, so setting these features will auto-magically inherit from the top level setting when you create sub-pools. (Eg. tank/users will inherit from tank unless defined otherwise)

  • xattr=sa, default xattr=on
    • What happens on default behavior is that for every xattr object a file in a hidden directory is created, so browsing or ls‘ing requires up to three disk seeks, on top of that it seems its not really well cached. The reason it was not changed to default behavior seems to be compatibility, which is a good reason if you work with other systems such as Solaris. But for Linux only environment its not relevant.
    • This option only works for newly written files, so if you only notice this tune option post-creation, you will need to remove and copy them in the pool again to take effect. Perhaps you can use zfs send and zfs receive. First set the xattr=sa then zfs send to an image, destroy the location, and zfs receive in the (old) location. (note : Not sure if this works, you could also use rsync here, which most likely will work.)
    • original implementation, I’m not alone award.
    • zfs set xattr=sa tank/users
      zfs send tank/users > tank/users_disk.img
      zfs destroy tank/users
      zfs receive tank/users < tank/users_disk.img
  • acltype=posixacl, default acltype=off
    • The reason behind this is fuzzy to me, but they parameters come up together regularly, so they are most likely related somehow. (need to research further)
  • compression=lz4, default compression=off
    • I did activate this option when I first started, basically the file system will try and compress all files, but when an application requests them, it will decompress them, without anyone noticing, of-course this creates a bit of CPU but any decent server should be able to do it. On a ~100 TB system :
      [[email protected] jbod1]# zfs get all huginn| grep compres
      huginn  compressratio         1.36x                  -
      huginn  compression           lz4                    local

      That’s 36 TB of extra space… note : allot of data is already compressed more heavy using gzip. This is a must have. Not only does it help you get more storage, writing uncompressed data is most likely taking longer then writing compressed data.

  • atime=off
    • atime is short for “last” access time, so every time a file is read a update has to flush to disk to reflect this access item. For most applications that’s totally irrelevant and a quit overhead.  I always set it to off. In cases where you wish to use access time, look to relatime.
  • relatime=off
    • relatime was introduced in Linux 2.6.20 kernel. It only updates the atime when the previous atime is older then mtime (modification time) or ctime (changed time) or when the atime is older then 24h (based on setting). I am assuming the zfs implementation is using this kernel feature. I don’t require relatime or atime, but this is the preferred option. Perhaps in the future we might see lazytime (linux kernel 4.X) which uses memory to store atime updates and write them during normal I/O.

And that’s it, next time lets look at the kernel module 🙂