Source: Linode, DigitalOcean and VULTR Comparison

I’m currently having allot of problems with DigitalOcean, I don’t know if my setup/configuration is bad or DO is just crapping on my VPS…

time dd if=/dev/zero of=test bs=16k count=10000
10000+0 records in
10000+0 records out
163840000 bytes (164 MB) copied, 12.0611 s, 13.6 MB/s

real 0m12.093s
user 0m0.004s
sys 0m0.374s

This is not really giving allot of hope …

// update

I contacted the support and they offered -after confirmation of these results- to move to another hypervisor in the same region. Weirdly enough they asked me to shutdown the machine myself, then respond to the ticket. While I know I could have floating ip/failover/… they could have just shutdown, move and restart. Minimizing the downtime was not an option. That said, they where quick to help me out, see the timeline :

  • me> Created ticket @ 10:24
  • digitalocean> First response @ 10:58
  • My repeat test @ 11:25
  • digitalocean> second response @ 11:54
  • confirmation of move @ 12:08
  • digitalocean> move confirmed. @ 12:43

I had database backed up, and took a snapshot before confirming the move;  Cause I had to shutdown myself the downtime was ~2 hours. The sad part is, the upgrade itself only took 4 min 2 seconds and 8 seconds to boot up again. So a 5 minute “upgrade” took the site out for 2 hours.

Now rerunning this dd the lowest value was 177MB/s and the highest 646 MB/s.

time dd if=/dev/zero of=test bs=16k count=10000
10000+0 records in
10000+0 records out
163840000 bytes (164 MB) copied, 0.253633 s, 646 MB/s

real    0m0.256s
user    0m0.007s
sys     0m0.247s

That is more like a RAID5 SSD speed.

I recently switched from Apache to Nginx and also to a new server. Yeey! I copy’d /etc/letsencrypt/ over from the first server to the second. Everything seemed to be fine. Sadly, nope! For some reason it doesn’t accept certificates made on another server. Fixing it is easy once you find it :

# remake / save your config file
nano /etc/letsencrypt/cli.ini

# remove all info on letsencrypt
rm -rf /etc/letsencrypt/

# remake cert's
/opt/letsencrypt/letsencrypt-auto --config /etc/letsencrypt/cli.ini --debug certonly

# restart server might be needed
service nginx restart

Happy encrypting 🙂

With Centos 7.1 MySQL got replaced by MariaDB.  That is great, but now I also want MariaDB on Centos 6.7, because. By default its not in the repo’s (no suprise there) so you need to add the repo yourself. Good thing MariaDB has a download page telling you howto.

create /etc/yum.repos.d/MariaDB.repo

# MariaDB 10.1 CentOS repository list - created 2016-01-07 08:22 UTC
# http://mariadb.org/mariadb/repositories/
[mariadb]
name = MariaDB
baseurl = http://yum.mariadb.org/10.1/centos6-x86
gpgkey=https://yum.mariadb.org/RPM-GPG-KEY-MariaDB
gpgcheck=1

After you get that code, it will tell you detailed info can be found here if you’re google query came on that page first, and like me you did not read the page but just copy/pasted the repo. It won’t work 😀 yum clean all will save you.

After that you can just run :

yum install MariaDB-server MariaDB-client

Also

yum install MySQL-server MySQL-client

Will resolve in MariaDB.

Have fun with Maria … db.

The interwebs is a bit unclear on this topic, but on a “clean” Centos 7.1 (tested on digitalocean) the PHP version is v5.4 , which is a bit of a bugger, since from v5.5 Zend Optimizer+ is included in the core of PHP, making PHP allot faster, about 70% faster for sure, if we can believe the benchmarks. I need APC not for the user cache but for the in memory storage of compiled PHP bytecode. So both Zend Optimizer+ or APC would do fine. Since Zend Optimizer is included in the PHP core of newer versions, Centos will someday push this (in the not so near future I guess). So I would recommend to use Zend Optimizer+, but feel free to ignore that advice, I am no expert. I will test both and decide then. Thought I doubt I will find much difference.

APC

If you need the user cache : yum install php-pecl-apcu

# Install APC
pecl install apc

#nano /etc/php.d/apc.ini
extension=apc.so
apc.enabled=1

restart httpd / php-fpm

You can validate that its running + stats by activating the apc.php file. Which can be found here (source).

Zend Optimizer+

yum install php-pecl-zendopcache

restart httpd / php-fpm

The config file is located : /etc/php.d/opcache.ini

There is no “official” statics script for Zend Opcache I could find, but a user (Rasmus Lerdorf) has made something similar.

wget https://raw.githubusercontent.com/rlerdorf/opcache-status/master/opcache.php it in some public directory to see the nice stats.

FUBAR PHP v5.4

You can also ditch PHP v5.4 from Centos repo’s and add another repo, such as webtastic repo or remirepo.

This is a project I have been working on for some time, I manage a phpBB 3.1 forum that is pulling some traffic -that is great- the problem : server is not following. I found the server (VPS) dead already a few times in the morning showing OOM errors, killing of MySQL or Httpd. Resulting in the forum being totally unreachable, bad for the site, bad for google results, bad for my rep. even worse for my honor as ad interim sysadmin.

Solving this problem can be very easy : throw in allot of cash and upgrade to larger VPS packet or dedicated server. Before I take that expensive path, I want to be sure I have squeezed everything out of the current hardware and configuration. I “know” four webservers : Apache, Nginx, lighthttpd and ISS. However, the last one is the windows one, I wanne stay far away from it (blind discrimination). Lighthttpd I know from memory, but I haven’t looked it up, and I -probably wrongly- think it focuses on dropping less-used features and being a stable simple webserver, hence the name, simple and stable is not my ballpark for this. Apache I have setup a few times for small websites, while I would not call myself experienced I am certain that I could fix most common mistakes. On Nginx I have read the story’s that it could be setup to be multiple factors faster then Apache, due to the way it works, however on most recommendations, I found that people gave the advice to use Apache, cause its simpler to setup/maintain. For the purpose of squeezing everything from setup I will go with Apache since I’m a bit experience with it and Nginx, as its large selling point is the increase in speed at a lower usage in resources.

Current stack

Currently I am running vanilla Apache. The first thing to check if something is going slow is seeing what is making it go slow, the so called bottleneck. The stack is Centos 6.7, Apache 2.2, PHP 5.6, MySQL 5.1 and phpBB 3.1. Those in itself can be updated to the latest version, though updates rarely speed up application hugely unless a bug is known. Centos has not the bleeding edge new software and that makes them a good bet on being stable. They also update the software packages so that those huge bugs would be fixed regardless of the version.  So while updating could make some difference, I doubt that is where I am going to find the golden egg.  phpBB can’t be replaced in this case. (at-least that is not the preferred route)

Testing setup

There are some good tools online to test your application under load, sadly all the nice ones aren’t free. I don’t wanne go and spend a few hundred bucks on testing, that is simply not a option.  Ideally a load tester for phpBB should register user, post topics, read topics, edit topics, … most users don’t reload 100 times a same page … Still that is the only easy test I found. The tool Apache benchmark “simply” makes concurrent connections to the webserver. There are some alternatives, such as httperf, siege, jMeter, Tsung, gor, gatling, … all these tools are probably better then ab, but sadly take allot of configuration and work to get started with, I am going to try at-least a few of them but for now, AB has proven to be able to shoot the arrow straight in at my servers ‘Achilles’. Since I want to test multiple setups, I created a small bash script that will do a serie of ab tests.

SERVER="http://server/"

# ab run function will pull ab test
function ab_run { 
  # report
  echo "run $1 / $2" 
  # ab -n requests -c concurrent_users url
  ab -n $1 -c $2 $3 > result/ab_$2_$1 2>/dev/null;
  
  # let it stabilize
  sleep 15
}

# 1000 requests/10 concurrent_users

ab_run 50 5 $SERVER
ab_run 100 10 $SERVER
ab_run 200 20 $SERVER
ab_run 300 30 $SERVER
ab_run 400 40 $SERVER
ab_run 500 50 $SERVER
echo "ab run completed"

The scripts writes output to result/ directory, this output I will later reuse. Since I ran this on a VPS with limited resources (2CPU/2GB RAM) I had to scale down to get load that would not crash the server to begin with. I don’t want to test long stability, so I took a tenfold of concurrent requests. I let the server stabilize for 15 seconds, to see some more spread in load peaks. 15 seconds is not long enough to drop to zero load, so there is a bit of overlap to be expected. This is a simple test that will put load on the server, so don’t run it in production (unless you want customers on the phone). Also note that there is a limit on concurrent request a client machine is able to produce, though these values don’t come even close to that number. (while 50 users sound little, its a good starting point) This is not a the best method, but it is a decent starting point in my opinion.

Setups

I have tested 7 setups :

  • Test 1, was a ‘base line‘ test, I pulled static file from a freshly installed LAMP stack. (the forum was installed, just not accessed) This is useful to see if the test in itself is possible to generate load and to see if everything is working. Its also a good way to check the difference between static vs dynamic pages.
  • Test 2, was default Apache setup pulling the index page of phpBB forum, I believe this is presentabele enough to other public pages of the forum.  The default value of keep-alive is off in apache, so adding the flag keep-alive won’t speed up, in fact keep-alive keeps the connection open for a timeout, but AB won’t request any other files (normal users would). But unless I shorten the timeout, I would need to add a large amount of time to the stabilizing. Like this, it is the worst case scenario setup.
  • Test 3, was changing the file cache phpBB with memcache. Setting up memcache with 1024 connections and 64 mb RAM. While some more RAM in production might be advisable for the ab test, only one page will be requested. In fact lowering the memory (since this only a 1 page test) might give a better test result. (as in, more realistic)
  • Test 4, was setup 3 with the APC module enabled for PHP, this would save the PHP bytecode in memory. I found most blogs/fora referring APC as the best/easiest way to go.  It was officially endorsed by PHP to be included in PHP 6, but seemingly the opinion has changed in favor for Zend opcache … While phpBB can also use APC as cache, I am not sure what is recommended here. APC will work no matter what phpBB decides, while memcache is only used when the applications requests it.
  • Test 5, was setup 4 with MaxClients set to 20, the idea behind that was, its better to queue clients then the forum is to go dark. (20 concurrent users over 24h would be 1.7 m requests, not 1.7m visitors!)
  • Test 6 : enough of Apache, lets hit the Nginx park. A clean setup. Lets take this as a baseline. php-fpm was used. memcache was kept with the same setup as test 3.
  • Test 7 : phpBB has an example in there repo of how to setup nginx for phpBB, I took it over and checked if I missed some low hanging fruit. The most important addition was the explicit configuration of using gzip.

Those setup form a basis for me to go into more specific setups. During the test I ran this simple bash script to get values on load/memory.

while :
do
  free -m >> memory_data
  uptime >> cpu_load
  
  tail -n 1 cpu_load
  sleep 5
done

It will print out the load values so you could stop the test if the values get to high. (they did, sorry co-vps’rs)

Pulling data from results

I put every result file in a directory load_($test_nr) getting the data to something workable are a bunch of hacks. But I share them for the future me.

# get cpu load 1 min
for i in {1..7}
do
   cat load_$i/cpu_load | cut -c45-49 > compile/cpu$i
done

# get memory useage
for i in {1..7}
do
   sed -n '2,${p;n;n}' load_$i/memory_data > compile/memory$i
done

# get failed requests
for i in {1..7}
do
  cat result_$i/ab_*_* | grep Failed | cut -c 25-40 > compile/failed$i
done
 
# total time
for i in {1..7}
do
  cat result_$i/ab_*_* | grep "Time taken for tests:" | cut -c 25-40 > compile/total_time$i
done

# requests per second
for i in {1..7}
do
  cat result_$i/ab_*_* | grep "Time taken for tests:" | cut -c 25-40 > compile/rps$i
done

# time per request accros all
for i in {1..7}
do
  cat result_$i/ab_*_* | grep "Time per request" | grep "across all concurrent req" | cut -c 25-31 > compile/tpr$i
done

Results & Discussion

note : I am no expert in this, these test have huge biases and are in no way close to how normal users would interact with your application/board. These values and results are not statistically correct. Don’t change stuff you haven’t tested. 

note 2 : While the production server has been running Centos 6.7 I took Centos 7.1 to test run.

CPU

One of the ways to see how good or bad a server is doing is checking out the CPU load, this is a value that tells you how much CPU power has been used over a period (1, 5, 15 minutes). I set out the total run time against the load out. The total run time are the count(values)*5s since I had no easy way of setting it out I did not bother, so no exact values there.

First off, these loads are crazy. (this is the 1 minute load) since this is a 2 CPU VPS, a maximum load of 2 should be the target. (roughly two virtual CPU maxed out) I left out the first test, as the maximum CPU load was 0,03. That was to be expected, requesting static files with no processing on server side should not generate load on CPU.  The clean version (test 2) has a maximum load of 39,3 and ran the longest, while using memcache as forum cache (test 3) reduced the time it required to finish, the load is approximately the same. (max test 2 : 39,8). The largest win time wise, is using APC (test 4), it speeds up PHP execution hugely, also the CPU’s are slightly less stressed,  load maximum : 27,4 probably cause the CPU doesn’t have to make bytecode from phpBB’s code. In real life examples I don’t think the effect will be this huge under those load conditions, cause APC can’t predict, it can just keep bytecode of most used files. Since I still had a load ~20 fold the amount I can be sure to claim (VPS!) I needed to be sure to pull down the CPU usage, one way I found was to lower the MaxClients this will queue users that hit the server over the limit and as such make sure the server remains stable, by default apache comes with 256, this clearly is too much for dynamic PHP pages. I took arbitrary value 20 to see the effect on CPU. The result was astonishing, the CPU load kept below 5,6. Which is still about twice too high load, but its a good indication of what I can do with this parameter. Lowering it even more would put to much visitors in queue and result in very slow experience, to large value would result in to high CPU and eventually some service would give out and break. Perhaps this parameter should be combined with keep-alive. (more testing needed)

The results with Nginx where not significantly different from Apache, the load was 24,64 without gzip and with 25,33. The load was a bit lower then Apache : 27,4 (memcache was in both used and APC was active on both). This was somehow expected, while the technology of how Apache and Nginx handle the requests differ greatly, the most powerful feature, server side caching where on both Apache and Nginx omitted. I also have close to no experience with Nginx and as such purely on these results Nginx is not a clear winner. I am however not certain APC was active on the php-fpm (v : 5.4.16) I will test this later. The reason for not using server side caching, is simple : I hit a single PHP page that can be cached in the test, but in real world scenario can’t be cached long. That being said, a microcache such as is possible in Nginx might be very useful as multiple hits on the same page could be short-live cached and so making 100 hits on a single page could be pretty much be done from a 10 second cache. However the setup of something like that would be rather difficult without good knowledge of Nginx. (I am working on it !) CPU wise some more research is needed to conclude anything. Now most errors where memory wise, out of memory errors. So it might be good to see what exactly happens with the memory during the tests.

Memory

Testing memory is not as easy, while there is a tool free that reads and parses memory usage, Linux/GNU is not as straightforward as : you still have memory/you don’t have any memory left. So for these values I used the “available” memory:  free from procps-ng 3.3.10 (version) :

Estimation of how much memory is available for starting new applications, without swapping. Unlike the data provided by the cache or free fields, this field takes into account page cache and also that not all reclaimable memory slabs will be reclaimed due to items being in use

ab results

The ab results gives me allot of data, but my doubt has become true. The data is not that useful. I parsed : requests per second, total run time and time per request.  I will only show a graph for the last one, as they are all pretty similar.

Showing pretty much what I expected from the CPU usage, APC is the only factor that really gives the server a boost. Making the static serving time come closer.

Conclusion & Thoughts

There is no such thing as a bad experiment or bad results, just bad use of the given data. This test was setup to give a raw idea of where to go. As such I have no experience with testing tools or testing a webserver to begin with.

  1. ab is a simple and great to use tool. But aside from hammering the server, the application below is not really tested.
  2. APC or any other tool that caches the output of PHP bytecode has a real effect both on load and speed. I am not surprised there was some voice to include it by default, but since I tried this on Centos 7. I believe it was not yet included as of now.
  3. Memcache seems to have little to no effect, I was a bit surprised in that, but  the test is the problem here. Since phpBB in itself caches on file, the move from file cache to memory cache is only limited.
  4. While this test was for phpBB, the effect on phpBB has not really been tested and further tests are needed to say anything useful, expect for the fact that APC is a real booster, at least in these tests.

While I keep on searching, feel free to comment or give advice!  All the data shown here can be accessed on google docs.

29730888 29730848          29729976       29730272

Intelligent content caching is one of the most effective ways to improve the experience for your site’s visitors. Caching, or temporarily storing content from previous requests, is part of the core content delivery strategy implemented within the HTTP

Source: Web Caching Basics: Terminology, HTTP Headers, and Caching Strategies | DigitalOcean

 

Very interesting read on web caching.

etckeeper setup on Centos 7

15 December, 2015

etckeeper is a package to keep track of changes in the /etc folder, that’s where the configuration is supposed to be. So you get a trackrecord of changes in configuration -a must have-, most definitely when you run yum-cron nightly. Its also a great way to document why some things have been changed.

Like most cool tools, after doing it, you totally forget how you got it working. So here I share how I did it and plan how to use it. I picked git, as this is the default way, and git is hip these days.

On the server you want keep in revision

# like most things in life, package is in epel
yum install epel-release

# I choice you, git!
yum install etckeeper git

# go etc
cd /etc

# lets init
etckeeper init

# first commit
etckeeper commit "init our configuration server"

Remote

While strictly speaking not necessary I like to have my configuration saved somewhere else, when git FUBAR’s or server won’t boot, at-least we can look how the configuration was (or was not) changed.

on our target server : (I try to create a password less login, perhaps other methods are available)

# I want the configuration on a remote server (central in my case)
# note : security wise this might not be 100%

# create a key
ssh-keygen

# copy the key 
ssh-copy-id -i etckeeper@sysadmin

# or alternative
cat .ssh/id_rsa.pub | ssh etckeeper@sysadmin 'cat >> /home/etckeeper/.ssh/authorized_keys'

on the remote server :

# adduser and set pasword for first time login
adduser etckeeper
passwd etckeeper

# create git
su etckeeper
git init --bare /opt/etckeeper/public.git

and finally on the target server add the remote : (adapt as needed)

git remote add origin etckeeper@sysadmin:/opt/etckeeper/public.git

and change the configuration :

nano +43 /etc/etckeeper/etckeeper.conf

change

PUSH_REMOTE=""

to

PUSH_REMOTE="origin"

Manually record changes

Changing something  in /etc ? A good idea to tell your colleagues why (or the future you).

etckeeper commit "I added this ip to /etc/hosts cause I'm to lazy to type a ip."

Auto changes to /etc

Defaults will catch those ! Yum, yum-cron are caught by a plugin. I am not sure about rpm, but etckeeper will autocommit all changes it finds!

What changed ?

Since we use git, most git commands work (git status, git log). So its as easy as : cd /etc && git log  or for short cd /etc && git log --pretty=oneline

Pulling back changes 

I have not yet pulled back from the repo, but this should work :

etckeeper vcs checkout [HASH]

if you only need one file :

etckeeper vcs checkout [HASH] [FILE]

Useful sources :

A little annoying :

[:error] [pid 8725] [client :51515] PHP Warning:  date(): It is not safe to rely on the system's timezone settings. You are *required* to use the date.timezone setting or the date_default_timezone_set() function. In case you used any of those methods and you are still getting this warning, you most likely misspelled the timezone identifier. We selected the timezone 'UTC' for now, but please set date.timezone to select your timezone. in on line 29, referer:

PHP being a bit silly is it not ? Fix this crazy error : (Centos 7)

nano +878 /etc/php.ini

Change

;date.timezone =

to

date.timezone = Europe/Brussels

possible time zones

restart (reload might work) httpd to get it active

service httpd restart

Be gone from my log file evil error!

While most people would think (including me) that yum reinstall kernel kernel-headers kernel-firmware would be enough, its not!  You need to select the one you wanne reinstall by removing it and then reinstalling it. (since its the only package that has multiple versions I guess)

For me this was :

# remove the evil
yum remove kernel-2.6.32-573.8.1.el6.x86_64 kernel-devel-2.6.32-573.8.1.el6.x86_64

# reinstall it again
# note : in my case I was oke with the "latest and newest" kernel
yum install kernel kernel-devel kernel-firmware

Happy kernel reinstalling ! (don’t think that a thing)

zfs: disagrees about version

9 December, 2015

ZFSonLinux (ZOL) is a great project that creates a Linux kernel port of the ZFS filesystem. However when the kernel updates, it always makes problems with ZFS kernel module 🙁  I have not found a stable solution, only a very dirty “windows alike method”. I will share it as a future reference for my colleagues and -primary- myself.

Failed to load ZFS module stack.

In essence, a new kernel is installed, it will “weak” link the ZFS modules, for some reason ZFS doesn’t like that and gets partially updated. Both the new and the old kernel will not be able to load the ZFS data, for people who are now in full panic mode (like myself every-time this happens) : Your data is not lost. 

# find the version of spl and zfs
dkms status

# both remove them
dkms remove -m zfs -v 0.6.3 --all
dkms remove -m spl -v 0.6.3 --all

# install the headers for the new kernel
# ubuntu/debian
apt-get install linux-headers-$(uname -r)

# centos
yum install kernel-headers

# reinstall zfs
yum reinstall zfs

#add & build them again
dkms add -m spl -v 0.6.3
dkms add -m zfs -v 0.6.3
dkms install -m spl -v 0.6.3
dkms install -m zfs -v 0.6.3

# try loading in :
modprobe zfs

# if you can load zfs again now, you can skip this step
# I can't so I had to reboot my machine. (I know its crazy)
# find the data 
zpool import

# my poolname is tank
zpool import tank

And that is how I saved myself! (for now.)

Some notes :

  • Reinstalling doesn’t always work, sometimes you just need to remove zfs yum remove zfs after that its a good idea to clean up dkms manually. The command below is floating around on the web; It comes down to removing the modules from /lib/modules/$(kernel_version)/extra/  I removed them from all the kernels, as I only wanted to use the newest kernel anyway.
    find /lib/modules/$(uname -r)/extra -name "splat.ko" -or -name "zcommon.ko" -or -name "zpios.ko" -or -name "spl.ko" -or -name "zavl.ko" -or -name "zfs.ko" -or -name "znvpair.ko" -or -name "zunicode.ko" | xargs rm -f
    find /lib/modules/$(uname -r)/weak-updates -name "splat.ko" -or -name "zcommon.ko" -or -name "zpios.ko" -or -name "spl.ko" -or -name "zavl.ko" -or -name "zfs.ko" -or -name "znvpair.ko" -or -name "zunicode.ko" | xargs rm -f
  • Update 11/01/2016
    • The same problem happened today, a Centos 6.X server crashed due to a raidcontoller blocking. This forced a reboot, for some reason this booted in the not-latest-installed kernel, so zfs was installed in a newer kernel and weak linked to the “older” kernel. Rebooting in this case is the thing you should try first.
  • Update  14/12/2016