Doesn’t happen often that I need Windows tools, but this worked nicely, so I though I share it with the future me that will have forgotten about this method;

On the domain server open a “Windows Powershell”;

Import the module :

Import-Module ActiveDirectory

Then create the list of “enabled” users :

Get-ADComputer -Filter {enabled -eq $true} -properties *|select Name, OperatingSystem, LastLogonDate

This wil result in :

To export it in CSV :

Get-ADComputer -Filter {enabled -eq $true} -properties *|select Name, OperatingSystem, LastLogonDate | export-csv my_export.csv

Will create a file my_export.csv in csv format ! Enjoy.

A simple issue, but can be tricky nevertheless !

bareos-sd JobId 265: Warning: stored/mount.cc:270 Open device "FileStorage3" (/storage/block1) Volume "Full-0015" failed: ERR=stored/dev.cc:731 Could not open: /storage/block1/Full-0015, ERR=No such file or directory

While experimenting I had made the owner of /storage/block1 different from the Bareos user; This resulted in the storage deamon (bareos-sd) not able to make new files; Solving it, was chowning the directory and restarting the bareos-sd & bareos-dir :

chown bareos.bareos /storage/block1
systemctl restart bareos-sd
systemctl restart bareos-dir

 

Surprisingly perhaps for newbies such as myself; Bareos will only start one backup job at once; meaning if you got 100 clients, there will be a serieus delay between the first and last job. So why is this happening ?

Bareos is a fork off Bacula and this was originally developed for the use of other types of media, such as tapes, DVD/CD and disks. In current age you might think only disks remain, but that’s incorrect; While DVD/CD’s are legacy media, tapes are still very much used. Here we use LTO tapes although not (yet?) in combination with Bareos. Tapes however are not random access and in allot situations only one tape can be loaded at once;

Hence only one backup could run at once, unless you had multiple devices. So disk storage users, have to trick Bareos to run multiple backups at once in order to work around the whole tape issue;

More devices

Make more devices in the storage deamon; however be specific to only set 1 maximum concurrent jobs;

/etc/bareos/bareos-sd.d/device/FileStorage.conf

Device {
  Name = FileStorage
  Media Type = File
  #Archive Device = /var/lib/bareos/storage
  Archive Device = /storage/block1
  LabelMedia = yes;                   # lets Bareos label unlabeled media
  Random Access = yes;
  AutomaticMount = yes;               # when device opened, read it
  RemovableMedia = no;
  AlwaysOpen = no;
  Description = "File device. A connecting Director must have the same Name and MediaType."
 
}

Change this to :

Device {
  Name = FileStorage0
  Media Type = File
  #Archive Device = /var/lib/bareos/storage
  Archive Device = /storage/block1
  LabelMedia = yes;                   # lets Bareos label unlabeled media
  Random Access = yes;
  AutomaticMount = yes;               # when device opened, read it
  RemovableMedia = no;
  AlwaysOpen = no;
  Description = "File device. A connecting Director must have the same Name and MediaType."
  Maximum Concurrent Jobs = 1
}

Device {
  Name = FileStorage1
  Media Type = File
  #Archive Device = /var/lib/bareos/storage
  Archive Device = /storage/block1
  LabelMedia = yes;                   # lets Bareos label unlabeled media
  Random Access = yes;
  AutomaticMount = yes;               # when device opened, read it
  RemovableMedia = no;
  AlwaysOpen = no;
  Description = "File device. A connecting Director must have the same Name and MediaType."
  Maximum Concurrent Jobs = 1
}

Device {
  Name = FileStorage2
  Media Type = File
  #Archive Device = /var/lib/bareos/storage
  Archive Device = /storage/block1
  LabelMedia = yes;                   # lets Bareos label unlabeled media
  Random Access = yes;
  AutomaticMount = yes;               # when device opened, read it
  RemovableMedia = no;
  AlwaysOpen = no;
  Description = "File device. A connecting Director must have the same Name and MediaType."
  Maximum Concurrent Jobs = 1
}

Then let the director know :

/etc/bareos/bareos-dir.d/storage/File.conf

add the Devices :

Storage {
  Name = File
  Address = bareos                # N.B. Use a fully qualified name here (do not use "localhost" here).
  Password = "Re/vZBBjYg1Nm65DycOKtjiw0+WH7Byp3VTeifzhIlSl"
  Device = FileStorage3
  Device = FileStorage0
  Device = FileStorage1
  Device = FileStorage2
  Media Type = File
  Maximum Concurrent Jobs = 10
}

And finish up with restarting the service :

systemctl restart bareos-fd
systemctl restart bareos-sd
systemctl restart bareos-dir

Now concurrent jobs can run on these 4 devices ūüôā

 

more Bareos articles.

Bareos uses FileSets to decide what to backup and what not to backup. While the documentation is extensive, the approach is mostly include everything and exclude parts (include all, exclude after).¬† While this is the way average people want to backup. If you work around non-IT minded people, you know that they will store EVERYTHING, EVERYWHERE and mix downloadable data with self created data. So the “include all, exclude after” method would make backups explode in size, with allot of unnecessary data.

So my idea is to exclude everything and include parts I know are relevant to backup. Inevitably this will make for allot exceptions to the rule, but with Bareos that is easier then including everything and finding a way to keep performance up. (exclude all, include some)

I want to have multiple small jobs that finish “quickly” on the client. Using this approach I can do multiple backups per day on specific data without generating an extreme load on client machines. The “backup everything” jobs can then be run on a lower frequency. As the frequently changed data is backed up already.

Since most of the work is in Microsoft Office applications, I started there. Now we have allot of scientific data floating around, this data however, is backed up on result servers and don’t belong in the client desktop backups. This is another reason to go with exclude everything, include what you need; (although nobody ever got fired for making a backup extra…)

Creating a new FileSet

Let’s start by creating a new FileSet, you can name it whatever, but best to name it something you can recognize.

nano /etc/bareos/bareos-dir.d/fileset/win_office.conf

A “template” would look like this :

FileSet {
  # required name
  Name = "win_office"

  # volume shadow copy service
  # this is windows specific
  Enable VSS = yes
  
  # include
  Include {

    # include from this directory
    # 
    File = "C:/Users"

    Options {
      # config
      Signature = MD5
      IgnoreCase = yes
      noatime = yes

      # Word, Excel, Powerpoint
      WildFile = "*.doc"
      WildFile = "*.docx"
      WildFile = "*.xls"
      WildFile = "*.xlsx"
      WildFile = "*.ppt"
      WildFile = "*.pptx"

      # open office
      WildFile = "*.odt"
      WildFile = "*.ods"
      WildFile = "*.odp"

      # pdf
      WildFile = "*.pdf"
    }	

    Options {
      # all files not in include
      RegExFile = ".*"
      Exclude = yes
    }
  }
}

Test the FileSet

Now before we deploy this FileSet, you can test this on a client to see what exactly gets backed up (dry-run); The easiest way is to create a Job Definition and Job :

nano /etc/bareos/bareos-dir.d/jobdefs/BackupWindowsOffice.conf
JobDefs {
  # name (required)
  Name = "BackupWindowsOffice"
  
  # type can be backup/restore/verify
  Type = Backup
  
  # the default level bareos will try
  # can also be Full/Differential(since last full)/Incremental(since last incremental)
  Level = Incremental
  
  # the default client, to be overwritten by the job.conf
  Client = bareos-fd
  
  # the fileset we just created
  FileSet = "win_office"
  
  # the schedule
  Schedule = "Nightly"
  
  # where to store it
  Storage = File
  
  # the message reporting
  Messages = Standard
  
  # the pool where to store it
  Pool = Incremental
  
  # the higher the value priority the lower it will be dropped in the queue
  # so for important jobs priority=1 will run first
  Priority = 10
  
  # the bootstrap file keeps a "log" of all the backups, and gets rewritten every time a 
  # full backup is made, it can be used during recovery
  Write Bootstrap = "/var/lib/bareos/%c.bsr"
  
  # in case these value's get overwritten
  # define where would be a good pool to write 
  # note that full backup will be used atleast once because no full
  # backup will exist
  Full Backup Pool = Full
  Differential Backup Pool = Differential
  Incremental Backup Pool = Incremental
}

Then the Job :

nano /etc/bareos/bareos-dir.d/job/fileset_test.conf
Job {
  # required
  Name = "svennd-office"
  
  # the default settings
  JobDefs = "BackupWindowsOffice"
  
  # overwrite the client here
  Client = "svennd"
}

Obviously we are not going to wait until the nightly is ran; open bconsole and run :

Enter a period to cancel a command.
*estimate job=svennd-office listing

To see what files are scheduled for backup; The backup set I used as template is useful, but includes way to much. So you can change it, run a reload and test again using estimate :

2000 OK estimate files=7,741 bytes=141,521,221
You have messages.
*reload
reloaded
*estimate job=svennd-office listing

Until you have exactly what you need.

Current Windows FileSet

I have Windows 7 and Windows 10 users to backup. While generally that does not make much difference for Bareos, the Windows 7 users, have specifically a funny directory structure :

  • C:/users/username is moved to D:/users/username, but C:/users still exists …
  • D:/users/username/documents is moved to D:/documents

Bareos will only issue a warning if File = "D:/Users" is not found, so you could only use a single config files, but I dislike ignoring warnings, so I made a difference between windows 7 and windows 10 FileSets;

Below you will find my current FileSets, since I’m still playing around with these sets, they will most likely still change. Note that you don’t have to generate a separate config file for every FileSet {} .¬†So I think its a good idea to combine similar configs.

All these are found:

/etc/bareos/bareos-dir.d/fileset/

win_images.conf : images on windows 7 + 10, images are badly compressible, so don’t spend time on these.

# windows 7 & windows 10 images

FileSet {
  Name = "Win7_images"
  
  # volume shadow copy service
  Enable VSS = yes
  Include {
  
    # location
    File = "D:/Users"
  
  Options {
    # config
    Signature = MD5
    IgnoreCase = yes
    noatime = yes
    
    # images
    WildFile = "*.jpg"
    WildFile = "*.gif"
    WildFile = "*.tif"
    WildFile = "*.png"
  }	

  # exclude everything else
    Options {
    
    # all files not in include
    RegExFile = ".*"
    
    # default user profiles
    WildDir = "[C-D]:/Users/All Users/*"
    WildDir = "[C-D]:/Users/Default/*"
    
    # explicit don't backup
    WildDir = "[C-D]:/Users/*/AppData"
    WildDir = "[C-D]:/Users/*/Music"
    WildDir = "[C-D]:/Users/*/Videos"
    WildDir = "[C-D]:/Users/*/Searches"
    WildDir = "[C-D]:/Users/*/Saved Games"
    WildDir = "[C-D]:/Users/*/Links"
  
    # application specific
    WildDir = "[C-D]:/Users/*/MicrosoftEdgeBackups"
    WildDir = "[C-D]:/Users/*/Documents/R"
    WildDir = "*.svn/*"
    WildDir = "*.git/*"
    WildDir = "*.metadata/*"
    WildDir = "*cache*"
    WildDir = "*temp*"
    
    # share services
    WildDir = "*iCloudDrive*"
    WildDir = "*OneDrive*"
    WildDir = "*stack*"
    
    # windows specific
    WildDir = "*RECYCLE.BIN*"
    WildDir = "[C-D]:/System Volume Information"
    
    Exclude = yes
  }
  }
}

FileSet {
  Name = "Win10_images"
  
  # volume shadow copy service
  Enable VSS = yes
  Include {
  
  # location
    File = "C:/Users"
  
  Options {
    # config
    Signature = MD5
    IgnoreCase = yes
    noatime = yes
    
    # images
    WildFile = "*.jpg"
    WildFile = "*.gif"
    WildFile = "*.tif"
    WildFile = "*.png"
  }	

  # exclude everything else
    Options {
    
    # all files not in include
    RegExFile = ".*"
    
    # default user profiles
    WildDir = "[C-D]:/Users/All Users/*"
    WildDir = "[C-D]:/Users/Default/*"
    
    # explicit don't backup
    WildDir = "[C-D]:/Users/*/AppData"
    WildDir = "[C-D]:/Users/*/Music"
    WildDir = "[C-D]:/Users/*/Videos"
    WildDir = "[C-D]:/Users/*/Searches"
    WildDir = "[C-D]:/Users/*/Saved Games"
    WildDir = "[C-D]:/Users/*/Links"
  
    # application specific
    WildDir = "[C-D]:/Users/*/MicrosoftEdgeBackups"
    WildDir = "[C-D]:/Users/*/Documents/R"
    WildDir = "*.svn/*"
    WildDir = "*.git/*"
    WildDir = "*.metadata/*"
    WildDir = "*cache*"
    WildDir = "*temp*"
    
    # share services
    WildDir = "*iCloudDrive*"
    WildDir = "*OneDrive*"
    WildDir = "*stack*"
    
    # windows specific
    WildDir = "*RECYCLE.BIN*"
    WildDir = "[C-D]:/System Volume Information"
    
    Exclude = yes
  }
  }
}

win_office.conf

# all office files in users (c:/ and d:/)
# for win 7		= D
# for win 10 	= C 


FileSet {
  Name = "Win7_office"
  
  # volume shadow copy service
  Enable VSS = yes
  Include {
  
  # location
    File = "D:/Users"
    File = "D:/My Documents"
  
  Options {
    # config
    Signature = MD5
    compression = LZ4
    IgnoreCase = yes
    noatime = yes
    
    # Word
    WildFile = "*.doc"
    WildFile = "*.dot"
    WildFile = "*.docx"
    WildFile = "*.docm"

    # Excel
    WildFile = "*.xls"
    WildFile = "*.xlt"
    WildFile = "*.xlsx"
    WildFile = "*.xlsm"
    WildFile = "*.xltx"
    WildFile = "*.xltm"

    # Powerpoint
    WildFile = "*.ppt"
    WildFile = "*.pot"
    WildFile = "*.pps"
    WildFile = "*.pptx"
    WildFile = "*.pptm"
    WildFile = "*.ppsx"
    WildFile = "*.ppsm"
    WildFile = "*.sldx"

    # access
    WildFile = "*.accdb"
    WildFile = "*.mdb"
    WildFile = "*.accde"
    WildFile = "*.accdt"
    WildFile = "*.accdr"

    # publisher
    WildFile = "*.pub"

    # open office
    WildFile = "*.odt"
    WildFile = "*.ods"
    WildFile = "*.odp"

    # pdf
    WildFile = "*.pdf"
    
    # flat text / code
    WildFile = "*.xml"
    WildFile = "*.log"
    WildFile = "*.rtf"
    WildFile = "*.tex"
    WildFile = "*.sql"
    WildFile = "*.txt"
    WildFile = "*.tsv"
    WildFile = "*.csv"
    WildFile = "*.php"
    WildFile = "*.sh"
    WildFile = "*.py"
    WildFile = "*.r"
    WildFile = "*.rProj"
    WildFile = "*.js"
    WildFile = "*.html"
    WildFile = "*.css"
    WildFile = "*.htm"
  }	

  # exclude everything else
    Options {
    
    # all files not in include
    RegExFile = ".*"
    
    # default user profiles
    WildDir = "[C-D]:/Users/All Users/*"
    WildDir = "[C-D]:/Users/Default/*"
    
    # explicit don't backup
    WildDir = "[C-D]:/Users/*/AppData"
    WildDir = "[C-D]:/Users/*/Music"
    WildDir = "[C-D]:/Users/*/Videos"
    WildDir = "[C-D]:/Users/*/Searches"
    WildDir = "[C-D]:/Users/*/Saved Games"
    WildDir = "[C-D]:/Users/*/Favorites"
    WildDir = "[C-D]:/Users/*/Links"
  
    # application specific
    WildDir = "[C-D]:/Users/*/MicrosoftEdgeBackups"
    WildDir = "[C-D]:/Users/*/Documents/R"
    WildDir = "*iCloudDrive*"
    WildDir = "*.svn/*"
    WildDir = "*.git/*"
    WildDir = "*.metadata/*"
    WildDir = "*cache*"
    WildDir = "*temp*"
    WildDir = "*OneDrive*"
    WildDir = "*RECYCLE.BIN*"
    WildDir = "[C-D]:/System Volume Information"
    Exclude = yes
  }
   
  }
}

FileSet {
  Name = "Win10_office"
  
  # volume shadow copy service
  Enable VSS = yes
  Include {
  
  # location
    File = "C:/Users"
  
  Options {
    # config
    Signature = MD5
    compression = LZ4
    IgnoreCase = yes
    noatime = yes
    
    # Word
    WildFile = "*.doc"
    WildFile = "*.dot"
    WildFile = "*.docx"
    WildFile = "*.docm"

    # Excel
    WildFile = "*.xls"
    WildFile = "*.xlt"
    WildFile = "*.xlsx"
    WildFile = "*.xlsm"
    WildFile = "*.xltx"
    WildFile = "*.xltm"

    # Powerpoint
    WildFile = "*.ppt"
    WildFile = "*.pot"
    WildFile = "*.pps"
    WildFile = "*.pptx"
    WildFile = "*.pptm"
    WildFile = "*.ppsx"
    WildFile = "*.ppsm"
    WildFile = "*.sldx"

    # access
    WildFile = "*.accdb"
    WildFile = "*.mdb"
    WildFile = "*.accde"
    WildFile = "*.accdt"
    WildFile = "*.accdr"

    # publisher
    WildFile = "*.pub"

    # open office
    WildFile = "*.odt"
    WildFile = "*.ods"
    WildFile = "*.odp"

    # pdf
    WildFile = "*.pdf"
    
    # flat text / code
    WildFile = "*.xml"
    WildFile = "*.log"
    WildFile = "*.rtf"
    WildFile = "*.tex"
    WildFile = "*.sql"
    WildFile = "*.txt"
    WildFile = "*.tsv"
    WildFile = "*.csv"
    WildFile = "*.php"
    WildFile = "*.sh"
    WildFile = "*.py"
    WildFile = "*.r"
    WildFile = "*.rProj"
    WildFile = "*.js"
    WildFile = "*.html"
    WildFile = "*.css"
    WildFile = "*.htm"
  }	

  # exclude everything else
    Options {
    
    # all files not in include
    RegExFile = ".*"
    
    # default user profiles
    WildDir = "[C-D]:/Users/All Users/*"
    WildDir = "[C-D]:/Users/Default/*"
    
    # explicit don't backup
    WildDir = "[C-D]:/Users/*/AppData"
    WildDir = "[C-D]:/Users/*/Music"
    WildDir = "[C-D]:/Users/*/Videos"
    WildDir = "[C-D]:/Users/*/Searches"
    WildDir = "[C-D]:/Users/*/Saved Games"
    WildDir = "[C-D]:/Users/*/Favorites"
    WildDir = "[C-D]:/Users/*/Links"
  
    # application specific
    WildDir = "[C-D]:/Users/*/MicrosoftEdgeBackups"
    WildDir = "[C-D]:/Users/*/Documents/R"
    WildDir = "*iCloudDrive*"
    WildDir = "*.svn/*"
    WildDir = "*.git/*"
    WildDir = "*.metadata/*"
    WildDir = "*cache*"
    WildDir = "*temp*"
    WildDir = "*OneDrive*"
    WildDir = "*RECYCLE.BIN*"
    WildDir = "[C-D]:/System Volume Information"
    Exclude = yes
  }
   
  }
}

More Bareos articles.

By default the device used by Bareos for data archival/storage is FileStorage. Meaning data stored is stored in images on spinning disks in the file structure:

/var/lib/bareos/storage

While this might be a good place for you, its tricky with this location is in root … (full root and whatnot) You can change this in :

/etc/bareos/bareos-sd.d/device/FileStorage.conf

By changing the archive device :

Archive Device = /backup

Obviously you could make an extra device, and I’m only scratching the options you can do here, but since your root might not be terabytes in size, changing this is pretty crucial;

After changing this, you need to restart the bareos-fd :

systemctl restart bareos-fd

You can also remove the images that already are in /var/lib/bareos/storage and delete the volumes from the database :

bconsole
delete volume=Full-0001 yes

Don’t forget to physically remove them (rm -f)

When installing Bareos client on Centos 7, I could not start the service with the following “cryptical error” :

[[email protected] ~]# systemctl restart bareos-fd
A dependency job for bareos-fd.service failed. See 'journalctl -xe' for details.
[[email protected] ~]# systemctl status bareos-fd
‚óŹ bareos-fd.service - Bareos File Daemon service
   Loaded: loaded (/usr/lib/systemd/system/bareos-fd.service; enabled; vendor preset: disabled                                                                                                                        )
   Active: inactive (dead)
     Docs: man:bareos-fd(8)
Mar 20 18:06:11 svennd.be systemd[1]: Dependency failed for Bareos File Daemon service.
Mar 20 18:06:11 svennd.be systemd[1]: Job bareos-fd.service/start failed with result 'dependency'.
Mar 20 18:06:29 svennd.be systemd[1]: Dependency failed for Bareos File Daemon service.
Mar 20 18:06:29 svennd.be systemd[1]: Job bareos-fd.service/start failed with result 'dependency'.

This means the Bareos server will keep trying and eventually fail to backup this client; The “workaround” is found in this closed ticket. On the client change :

nano /usr/lib/systemd/system/bareos-fd.service

Comment this line : (add #)

#Requires=nss-lookup.target network.target remote-fs.target time-sync.target

Reload & restart the service, it should now start normally :

systemctl daemon-reload
systemctl start bareos-fd

 

Create ZFS Raidz2 pool

20 March, 2019

This is a quick howto, I made a raidz2 pool on ZFS, its very similar to how to create a mirror.

find out what disks you are giving to the pool :

[email protected]:~# fdisk -l /dev/sd* | grep Disk
Disk /dev/sda: 9.1 TiB, 10000831348736 bytes, 19532873728 sectors
Disk /dev/sdb: 9.1 TiB, 10000831348736 bytes, 19532873728 sectors
Disk /dev/sdc: 9.1 TiB, 10000831348736 bytes, 19532873728 sectors
Disk /dev/sdd: 9.1 TiB, 10000831348736 bytes, 19532873728 sectors
Disk /dev/sde: 9.1 TiB, 10000831348736 bytes, 19532873728 sectors
Disk /dev/sdf: 9.1 TiB, 10000831348736 bytes, 19532873728 sectors
Disk /dev/sdg: 9.1 TiB, 10000831348736 bytes, 19532873728 sectors
Disk /dev/sdh: 9.1 TiB, 10000831348736 bytes, 19532873728 sectors
Disk /dev/sdi: 9.1 TiB, 10000831348736 bytes, 19532873728 sectors
Disk /dev/sdj: 9.1 TiB, 10000831348736 bytes, 19532873728 sectors
Disk /dev/sdk: 9.1 TiB, 10000831348736 bytes, 19532873728 sectors
Disk /dev/sdl: 9.1 TiB, 10000831348736 bytes, 19532873728 sectors

I was lucky that the main system is on a nvme so not /dev/sd* so basically all disks can be in the pool.

Create the pool, the name is panda, and the “raid” level raidz2 meaning 2 disks can fail before data loss occurs similar to RAID6.

zpool create panda raidz2 /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl

After that the pool is made :

[email protected]:~# zfs list
NAME    USED  AVAIL  REFER  MOUNTPOINT
panda   768K  80.4T   219K  /panda
[email protected]:~# zpool status
  pool: panda
 state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        panda       ONLINE       0     0     0
          raidz2-0  ONLINE       0     0     0
            sda     ONLINE       0     0     0
            sdb     ONLINE       0     0     0
            sdc     ONLINE       0     0     0
            sdd     ONLINE       0     0     0
            sde     ONLINE       0     0     0
            sdf     ONLINE       0     0     0
            sdg     ONLINE       0     0     0
            sdh     ONLINE       0     0     0
            sdi     ONLINE       0     0     0
            sdj     ONLINE       0     0     0
            sdk     ONLINE       0     0     0
            sdl     ONLINE       0     0     0

errors: No known data errors

Note : you can rename the disks to be sure they are consistent across hardware changes, but it never gave any issues for me personally …

Don’t forget to change the defaults settings to something you prefer :

zfs set xattr=sa panda
zfs set acltype=posixacl panda
zfs set compression=lz4 panda
zfs set atime=off panda
zfs set relatime=off panda

See the basic tuning tips for more info.

CellProfiler is not an easy to install tool; or perhaps I was clumsy on the first attempt (building from source) but I could not get it to work properly on a Linux machine; After another attempt using miniconda, I managed to get it running. This is just a documentation of how I got it working. In case I have to do it again. By no means am I an expert on the case.

Dependencies

I don’t know if these are all the dependencies but at some point I had to install them.

yum install java-1.8.0-openjdk java-1.8.0-openjdk-devel bzip2 mariadb-devel libstdc++-devel gcc-c++ gtk2 ImageMagick ImageMagick-devel

Another issue is that the installation requires is libjbig.so.0 which can not be found, however jbigkit-libs provides libjbig.so.2.0; which can be soft-linked and it will then work;

cd /usr/lib64/
ln -s libjbig.so.2.0 libjbig.so.0

Install Miniconda

Get the latest version for Linux 64bit (or other) and make it executable; On Centos python 2.7 is default, and I believe it is required for CellProfiler.

wget https://repo.anaconda.com/miniconda/Miniconda2-latest-Linux-x86_64.sh
chmod 755 Miniconda2-latest-Linux-x86_64.sh

Install CellProfiler

Finally we are ready to try the installation; Create a directory “cellprofiler” (not specific)

mkdir cellprofiler
cd cellprofiler
touch environment.yml

environment.yml should contain (using nano, vi, vim, emacs,…)

# run: conda env create -f environment.yml
# run: conda env update -f environment.yml
# run: conda env remove -n cellprofiler
name: cellprofiler
# in order of priority: highest (top) to lowest (bottom)
channels:
  - anaconda
  - goodman # mysql-python for mac
  - bioconda
  - cyclus # java-jdk for windows
  - conda-forge # libxml2 for windows
  - BjornFJohansson # wxpython for linux
dependencies:
  - appdirs
  - boto3
  - cython
  - h5py
  - ipywidgets
  - java-jdk
  - joblib
  - jupyter
  - libtiff
  - libxml2
  - libxslt
  - lxml
  - packaging
  - pillow
  - pip
  - python=2
  - pyzmq=15.3.0
  - mahotas
  - matplotlib!=2.1.0,>2.0.0
  - mysqlclient
  - numpy
  - raven
  - requests
  - scikit-image>=0.13
  - scikit-learn
  - scipy
  - sphinx
  - tifffile
  - wxpython=3.0.2.0
  - pip:
    - cellh5
    - centrosome
    - inflect
    - prokaryote==2.4.0
    - javabridge==1.0.15
    - python-bioformats==1.4.0
¬†¬†¬†¬†-¬†git+https://github.com/CellProfiler/[email protected]

source

After this, make the environment; using :  (will take a while)

conda env create -f environment.yml

While debugging you can also use : (to update)

conda env update -f environment.yml

This kinda works but generates two warnings that don’t seem to impact the tools (but perhaps I haven’t use specific functions that depend on these)

cellprofiler 3.1.8 has requirement prokaryote==2.4.1, but you'll have prokaryote 2.4.0 which is incompatible.
cellprofiler 3.1.8 has requirement python-bioformats==1.5.2, but you'll have python-bioformats 1.4.0 which is incompatible.

Once that is finished, we can activate & run.

conda activate cellprofiler
cellprofiler

Since this is on a server, I also needed to allow X11 forwarding over ssh;

So I was happily using sanoid, when someone made me aware of pyznap (thanks !). Ever since that comment, it was on my to-do list to check out pyznap, but if a system works, why change it ? Definitely a important cog such as automated snapshots.

The new version of sanoid, breaks comparability with older versions (at least at config level) and is not documented well at the moment; One has to look into the pull requests to actually understand what is required to get it running. I know open-source project are sometimes in large changes, and its all run on love & joy, but its a sorry state for a project of ~39 contributors. I also find it useful if a project or tool is simple in its setup and understanding for people not looking at this daily.  I might sound critically, but I still think sanoid is a wonderful tool, but personally I just need to get it up & running in 10 minutes and then move on. The feature list of sanoid and its companion syncoid seem ever growing and with it the complexity to find out what is going wrong, that it was time to take pyznap for a run.

And pyznap is actually a shining gem, I got it up & running in 10 minutes. Nothing fancy, just works out of the box. So here’s how I set it up;

First I needed to install Python 3.5+ version on Centos 7, I won’t go into detail, cause its basically part of every fresh install these days.

yum install yum-utils groupinstall development
yum install https://centos7.iuscommunity.org/ius-release.rpm
yum install python36u python36u-pip

After that, install pyznap using pip; we need to specify the version otherwise it will take 2.7 (still default on Centos 7)

pip3.6 install pyznap

This installs all dependency’s, optional you can install pv and mbuffer to visualize transfer speed on sending snapshots to backup locations.

yum install pv mbuffer

Now onto the setup/configuration; This tool wil generate a default config and directory if you set it up :

[[email protected] ~]# pyznap setup -p /etc/pyznap
Feb 21 16:06:30 INFO: Starting pyznap...
Feb 21 16:06:30 INFO: Initial setup...
Feb 21 16:06:30 INFO: Creating directory /etc/pyznap...
Feb 21 16:06:30 INFO: Creating sample config /etc/pyznap/pyznap.conf...
Feb 21 16:06:30 INFO: Finished successfully...

After the setup; its time for configuration, all items are clearly documented. One remark : don’t put # (hashtag) behind to comment; this will generate errors as only lines started with # (hashtag) are ignored;

# default settings parent
[data]
  # every $cron runtime (~15 minutes)
  frequent = 2
  hourly = 6
  daily = 3
  weekly = 1
  monthly = 0
  yearly = 0
  # take snapshots ?
  snap = no
  # clean snapshots ?
  clean = no
  
[data/brick1]
  snap = yes
  clean = yes
  dest = ssh:22:[email protected]:backup/brick1
  dest_key = /root/.ssh/id_rsa_backup
  
[data/brick2]
  daily = 24
  snap = yes
  clean = yes

To give some information I have one pool split up in multiple sub-file systems. (data is the parent and data/brick* are the actually location for data) This means that I can setup defaults for the data but don’t really want snapshots of that, as no data resides there; Only in the bricks, so I overwrite the defaults (snap/clean);

A special case is¬†dest =¬†that’s a build-in backup system; just make a password-less ssh login for server and you can leverage this feature; One thing to remark is that the backup server needs to be a ZFS filesystem and not the actually physical location. (a bit of trickery cause the location is /backup/brick1 but one needs to remove the first / for ZFS) if you use this feature, perhaps its worth downgrading a package cryptography cause the latest version generates a insane amount of warnings about upcoming deprecation of some function calls. See the issue.

pip3.6 install cryptography==2.4.2

A way to clean up the backup location is also provided by pyznap, one can setup a remote cleanup job for the brick by adding :

# cleanup on backup
[ssh:22:[email protected]:backup/brick1]
        frequent = 2
        hourly = 6
        daily = 3
        weekly = 1
        key = /root/.ssh/id_rsa
        clean = yes

The only thing to do now is to either make a cron, which is the easiest :

*/15 * * * *   root    /usr/bin/pyznap snap | logger
0 * * * *   root    /usr/bin/pyznap send | logger

Take snapshots every 15 minutes (frequent) and sync to the backup location (dest=) once per hour; Other snapshots are taken based on the need; (note | logger sends output to /var/log/messages, by being processed by rsyslog, you could also log to a static file, such as explained by the docs)

Alternative on systemd systems like Centos 7, you can leverage systemd timers. Instead of cron, but who wants to go into that mess ?

And that’s it, automated snapshots <3. Thanks yboetz for an small but useful tool ūüôā

yum install centos-release-gluster
yum install glusterfs-server

Verify

[[email protected] ~]# glusterfs --version
glusterfs 3.12.9
Repository revision: git://git.gluster.org/glusterfs.git
Copyright (c) 2006-2016 Red Hat, Inc. <https://www.gluster.org/>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.

start service

service glusterd start
service rpcbind start

probe node

gluster peer probe gluster3
gluster peer probe gluster2

[[email protected] ~]# service glusterd start
Redirecting to /bin/systemctl start glusterd.service
[[email protected] ~]# gluster peer probe gluster3
peer probe: success.
[[email protected] ~]# gluster peer probe gluster2
peer probe: success. Probe on localhost not needed

create /data, create volume

[[email protected] ~]# mkdir /data
[[email protected] ~]# gluster volume create triple gluster1:/data gluster2:/data gluster3:/data
volume create: triple: failed: The brick gluster3:/data is being created in the root partition. It is recommended that you don't use the system's root partition for storage backend. Or use 'force' at the end of the command if you want to override this behavior.
[[email protected] ~]# gluster volume create triple gluster1:/data gluster2:/data gluster3:/data force
volume create: triple: success: please start the volume to access data

verify

# gluster volume info

Volume Name: triple
Type: Distribute
Volume ID: 34767df2-24e3-438f-be5d-53edeaefef4f
Status: Created
Snapshot Count: 0
Number of Bricks: 3
Transport-type: tcp
Bricks:
Brick1: gluster1:/data
Brick2: gluster2:/data
Brick3: gluster3:/data
Options Reconfigured:
transport.address-family: inet
nfs.disable: on

start volume

[[email protected] ~]# gluster volume start triple
volume start: triple: success
[[email protected] ~]# gluster volume info

Volume Name: triple
Type: Distribute
Volume ID: 34767df2-24e3-438f-be5d-53edeaefef4f
Status: Started
Snapshot Count: 0
Number of Bricks: 3
Transport-type: tcp
Bricks:
Brick1: gluster1:/data
Brick2: gluster2:/data
Brick3: gluster3:/data
Options Reconfigured:
transport.address-family: inet
nfs.disable: on

check status

[[email protected] ~]# gluster volume status
Status of volume: triple
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick gluster1:/data                        49152     0          Y       387
Brick gluster2:/data                        49152     0          Y       1297
Brick gluster3:/data                        49152     0          Y       1216

Task Status of Volume triple
------------------------------------------------------------------------------
There are no active volume tasks